Learning to read aloud in Chinese

Reading aloud in Chinese is really hard, much harder than reading aloud in most other languages. Ever since exploring this topic in a recent article, I’ve been thinking about reading aloud in Chinese and what factors are involved. In particular, I want to know why I’m not good at it and what to do about it. The goal with this article isn’t merely to discuss reading aloud, though; I also want to show how I tackle weaknesses in my language ability. I also want to show that even though reading aloud is hard, it’s not impossible.

screenshot28A brief summary: Why reading aloud in Chinese is hard

In the article referred to above, I said that what makes reading aloud in Chinese extra difficult is that Chinese characters might have clues about pronunciation, but they’re not phonetic. When reading silently, we just need to be able to recognise the characters and associate them with meaning, but when reading aloud, we also need to retrieve the correct pronunciation from memory and we need to do it quickly.

To add insult to injury, there is no such thing as word spacing in Chinese, which further increases the cognitive load. Finally, since Chinese is an analytic language, a significant share of the communicative burden lies on the reader, who needs to decipher ambiguities and use context to determine the meanings of words and clauses.

I also summarised the skill components needed to read aloud in Chinese as follows:

  1. Map characters to meaning (character recognition)
  2. Group characters into meaningful words (vocabulary)
  3. Group words into meaningful sentences (grammar)
  4. Understand the meaning of sentences in context (pragmatics)
  5. Understand the writer’s intent (reading between the lines)
  6. Map characters to pronunciation (pronunciation recall)
  7. Understand how the pronunciation syllables influences each other
  8. Understand how meaning influences pronunciation (intonation, sentence stress)

What am I lacking? Why am I not better at X?

This is the basic process I go through whenever I encounter a language learning problem:

  1. Identify and describe the symptoms
  2. Find out what I need to be able to do to achieve X
  3. Find out which part is the weakest link
  4. Practise that component and see if it helps
  5. If it doesn’t,  there’s probably a mistake in step 1-3

So, let’s do this for reading aloud. I have dealt with step one and two already, so let’s move on. Looking at the above list of skills, I can exclude a number of factors immediately. The first five can’t be the reason why I’m bad at reading aloud, because I can read silently about twice as fast as I can read aloud (~250 characters/minute for relaxed material). This means that getting the meaning of a text is not the problem. The three remaining factors are more interesting, though.

To test this, I decided to dramatically increase the amount of Chinese I read aloud, to try to figure out not only which the limiting factor is, but also if there might be more to it than this. I had the nagging feeling that reading aloud is a skill in itself, so even if I had all the components above, I might still not be good at it.

The experiment: Reading a novel aloud

To test this hypothesis, I decided to read an entire novel aloud in Chinese and time each chapter. Rate of speech is not a very good measurement of one’s ability to read aloud, because quality is definitely more important than raw speed, but since I was the only test subject, I could be quite sure that most other variables were kept constant during the experiment. For instance, my comprehension of the novel stayed roughly the same throughout the entire text and I’m quite sure pronunciation quality didn’t deteriorate, but might have increased somewhat.

I selected a novel I thought would be relatively easy without being childish (I didn’t want to have too many cases where my reading was actually limited by poor character recognition, for instance). For reasons not connected to the experiment, I chose the Chinese translation of Suzanne Collins’ The Hunger Games (飢餓遊戲). If you haven’t read the Chinese version, I can tell you that it’s slightly easier than other translated works of fantasy that I have read, but not by much, it still contains a fair amount of flowery language and idioms, but it’s not a difficult read in any case.

Here are some statistics for characters used in the novel, which of course doesn’t give a complete picture, but should still give you an idea (the numbers are calculated for comparable lengths of each book):

  • Unique characters in 飢餓遊戲: 2800
  • Unique characters in Twilight: 2600
  • Unique characters in 醜陋的中國人: 3000
  • Unique characters in the Bible: 3000

In case you want to count unique characters in electronic texts, I suggest using DimSum:


The question: Which factors influenced my performance?

The question I asked myself was what factors were slowing me down. Based on the list above, I identified four potential problem areas. As we shall see, most of them turned out to be irrelevant or of marginal importance.

  • Character recognition (which is the first half of factor #6 above) can be ignored because the number of characters I didn’t recognise in the novel were fairly evenly spread out and were far and few between (no more than 100 in the entire book). This factor was clearly not interfering significantly with fluency and even though I learnt a few new characters, that certainly didn’t improve reading fluency in any measurable way. it might have saved a few seconds here and there.
  • Character pronunciation recall time (which is the other half of factor #6 above) was more important, because even though I knew almost all characters, I sometimes found it hard to recall their pronunciation quickly enough. This was typically not a question of whether or not I could recall the pronunciation, but rather how long it took to do it. A delay of a second or more is quite noticeable on the rate of speech. Even though the recall rate should have dropped a bit throughout the novel (familiar objects, locations and so on), there are so many characters than I doubt that my overall recall speed increased at all during the experiment.
  • Understand how the pronunciation of one syllable influences other syllables (#7 in the list above) would be a factor if someone never read anything aloud. When you read silently, you don’t have to parse several consecutive third tones, you don’t have to worry about tone changes for 不, 一 and so on. However, I have read a fair bit aloud in Chinese before and since I know I get this right most of the time in speaking, I don’ think it’s a major factor in this case. Still, my ability to do this quickly might have increased throughout the experiment.
  • Understand how meaning influences pronunciation (#8 in the list above) was largely ignored in this experiment. I simply can’t parse the text quickly enough to figure out where the sentence stress should fall or how the type of sentence should influence pronunciation. I simply focused on reading the sentence correctly. This is a factor which is beyond being able to read aloud in a fluent manner and is closer to being able to read aloud well compared with native speakers.

The result: Reading speed might be a separate skill

I spent 1316 minutes (about 22 hours) reading 163380 characters. This is what reading speed looked like across the length of the novel, measured in characters per minute:


readingspeedAs we can see, there is an expected increase in reading speed between the first two parts, which is easily explained by the fact that I got used to the characters in the novel, some setting-specific words and the author’s style. As we can also see, the speed didn’t increase much from page 100 to page 300.

But then something interesting happened.I started feeling that I could look ahead much more than before and that speed crept up quite a bit. Thus, the increase towards the end was definitely noticeable subjectively and actually didn’t flatten out towards the end of the book (the final chapter had an average speed of 145).

Even though the skill factors listed above did improve during the experiment, I don’t think the they are enough to explain the ~12% increase in reading speed. This is further supported by the subjective feeling that reading page 300-400 was qualitatively different from reading page 200-300.


Reading aloud in Chinese is hard, but apart from reinforcing the separate components, reading aloud itself is also a skill. By reading aloud, you increase your ability to pronounce words and, while doing so, looking ahead at the subsequent few words, as well as the ability to anticipate tone sandhi problems.

Your ability to read aloud in a fluent manner depends largely how much processing capacity you have left over to focus on something other than what you’re currently saying and I believe that this split attention can and should be practised, at least if you want to be able tor read aloud in Chinese.

However, that being said, I think most people have more fundamental problems than that. Based on personal experience, I think most students are unable to read aloud fluently simply because they either don’t know enough characters/words or can’t remember the pronunciation quickly enough. Still, if you’re reasonably good at reading silently, but is struggling with reading aloud, remember that reading aloud is in itself a skill you need to practise!

Reading aloud in Chinese is really hard

Reading unfamiliar text aloud in any language is a complicated process and is generally much harder than people who haven’t tried it realise. This is especially true for reading aloud in Chinese. Because of this, if someone can do it well, you can be quite sure that person is (really) good at the language. However, the opposite isn’t true, meaning that you can be bad at reading aloud while still being very proficient in almost all other areas, including reading (silently) and speaking.

Image credit:  Public Record Office Victoria
Image credit:
Public Record Office Victoria

Reading aloud is a very complex process that requires a whole set of skills, rather than just one called “reading aloud” or whatever. Thus, it’s a good evaluation of all these skills combined, but if  says almost nothing about the component parts..

What skills are involved in reading aloud in Chinese?

The cognitive processes involved in reading have been thoroughly researched, but this is a simplified summary. You need to be able to:

  • Map characters to meaning (character recognition)
  • Group characters into meaningful words (vocabulary)
  • Group words into meaningful sentences (grammar)
  • Understand the meaning of sentences in context (pragmatics)
  • Map characters to pronunciation (pronunciation recall)
  • Understand how the pronunciation of one syllable influences other syllables
  • Understand how meaning influences pronunciation (intonation and stress)
  • Understand the writer’s intent (reading between the lines)

Naturally, you don’t need to do all these steps all the time. For instance, experienced readers seldom read individual characters, but rather read words in their entirety (this is why it’s possible to read Chinese which is printed with a font size so small that individual strokes can’t be discerned). This is true for strokes of individual characters as well, just as in English, where you don’t read the individual letters of every word. Similarly, we tend to remember the pronunciation of words (if they are common) rather than the individual characters they consist of.

Why reading in Chinese is significantly harder than reading in, say, French

Reading aloud is tricky in any language, but now I’m going to explain why it’s significantly harder in Chinese than most other languages (and when I say most, I refer to languages likely studied by readers of Hacking Chinese). The key difference is that Chinese is different kind of language altogether from, say, French.

The most obvious reason is of course that there is no systematic mapping between characters and pronunciation. Sure, if you’re well-versed in semantic-phonetic characters (see relevant article here, part 1 and part 2), you might find some clues in the characters, but the fact remains that reading aloud in Chinese is much, much harder than in any language with a phonetic writing system. This should be fairly obvious to anyone who studies Chinese, in fact it was so obvious that I forgot to write this paragraph in the first version of this article, so thanks to David Moser who highlighted this shortcoming in the comments.

The less obvious reason why Chinese is hard to read aloud

French is a synthetic language, meaning that it has a high morpheme-per-word ratio, which in normal English means that a single word carries much information. For instance, verbs in French contain much more information than in English. Not only can you see when the action took place (tense), you can also see who did it, because the verb changes according to the subject of the sentence (person). This is true for some English verbs as well, such as “to be”. This means that there is a lot of redundancy in the system, because you don’t actually have to understand both the subject and the verb of a sentence. In English, if you know the verb is “am”, the subject has to be “I”, for instance.

Chinese is at the other end of the spectrum. Languages that have a low morpheme-per-word ratio are called isolating languages and Chinese is a very good example of this. In French, we could see who did it and when, in English only when (and sometimes who), but in Chinese we can’t even see if it’s a verb or not! Most of the time, the inflections of words that allow us to see that a word is a verb, noun or adjective simply aren’t there. Boundaries between word classes are not distinct. But how is meaning conveyed in such a language? Through context, mostly, and this is the first key to understanding why reading aloud in Chinese is so hard.

The information is there even if you can’t see it directly

The fact that we can’t know if 冰 is a noun, verb or adjective simply by looking at the character (compare ice, to ice and icy in English) doesn’t mean that it doesn’t matter which one it is. In a specific sentence, it’s usually only one of these, not all three. In order to make sense of a sentence, you still need to figure out what function it has in that context. You don’t need to know what linguists call it, but you need to be able to do it in practice. This is roughly how my linguistics teacher put it:

In synthetic languages (such as French), the burden rests mostly with the writer (or speaker). He needs to write clearly and use the right tense, number and gender and so on. In Chinese (an isolating language), the burden lies mainly on the reader (or listener), who needs to figure out all these things based on context. The information is there, it’s just not encoded on the word level.

To add insult to injury, we also need to figure out which characters belong to which words. This is very easy if you only encounter words you’ve seen hundreds of times, but it’s not easy when you approach the limits of your reading ability. This adds significantly to the difficulty of reading texts aloud, because not being able to find word boundaries is more or less guaranteed to make the reading very awkward and will most likely result in restarting the sentence once you’ve figured out where the words actually are.

Reading aloud in Chinese is really hard

This actually explains why reading in Chinese is hard in general, so it follows that reading aloud is even harder, because not only do you need to remember how all the characters are read, but you also need to sort all the above things while you read. You need to do it quickly enough so that you can read and understand a sentence during the time it takes you to read the previous sentence, otherwise there’s simply no way that you can understand how the next sentence is supposed to be read. You might not need to finish the entire sentence before you start it, but you need a good enough grasp of Chinese to be able to make educated guesses on the fly.

In addition, just reading at a reasonable pace (125-250 characters per minute) is not easy, even if you don’t do it aloud! If this is your main problem, please check this article: Reading speed: Learning how to read ten lines at a glance. To put this into context, you can pass some quite advanced tests in Chinese without reading quicker than, say, 150 characters per minute, and most people who fail reading tests still fail because of a lack of speed.

Of course,  you can do what most foreigners do and simply ignore anything above the character level and just pronounce the characters one by one. This is fairly easy, but you will have zero intonation and you will also fail characters that have multiple readings (为/為 being a prime example of this; you have to understand what you’re reading to get it right).

The next step would be to read word by word, which requires a much higher level of proficiency, but which is still doable for most people after studying Chinese for some time, but it still sounds very unnatural and lacks intonation. Being able to read an unfamiliar text aloud and include information on the sentence level (intonation and stress) is really, really hard. I’ve studied Chinese full-time for five years and still can’t do it well. I doubt that I will be able to do it well five years from now either.

Why all this matters

So what’s the big deal? Why publish an article like this?

First, I think many learners of Chinese have noticed that reading in Chinese is hard without understanding why or perhaps thinking that they are a bit dense because they can’t read even simple stories aloud. Don’t worry, it’s not your fault, it’s normal. You just need more patience and more practice. Reading fluency is definitely possible, but the effort needed to get there shouldn’t be underestimated. Additionally, being able to read aloud is probably not part of your main motivation for learning Chinese, even if it’s sometimes used to evaluate your ability.

Second, I want to highlight the fact that reading aloud is a very complex task and that people shouldn’t use it as the sole method to evaluate one of the component skills. For instance, if you want to evaluate someone’s pronunciation, don’t ask them to read an unfamiliar text aloud. You have no way of knowing if their errors are due to lack of character knowledge, too slow parsing speed or actual pronunciation problems. If you want to test pronunciation, many of the above hurdles can be overcome simply by previewing the text, looking up words you don’t know and practising a few times. Then you can read the text aloud.

Many native speakers find it strange that students (such as myself) can write adequate reports and papers in Chinese, listen to lectures targeted at native speakers, read novels and newspapers without using a dictionary and engage in social conversations and academic discussions without much effort, but still think it’s challenging to read texts aloud. In fact, this is not strange at all, because reading aloud in Chinese really is very hard.

Follow-up article: Since writing this article, I have experimented with improving my ability to read aloud in Chinese. You can read about both the process and the results here: Learning to read aloud in Chinese.

Phonetic components, part 2: Hacking Chinese characters

Last week, we looked at how understanding phonetic components can help us learn to read and write Chinese characters. That’s usually something learners pick up more or less automatically, provided that the phonetic component is also a common character in itself. It’s kind of hard not to notice that most characters containing 青 are pronounced qing, albeit with different tones. This week, we’re going to look at some less obvious applications of phonetic components and how they can help us solve a truly tricky problem.

lianggenSome Chinese characters are confusingly similar

In the beginning, you can easily create mnemonics for each individual character and since you have so few visually similar characters, it’s not that hard to keep them separate. As the number of character increases, though, you will soon run into a very tricky problem: series of characters that look almost the same and only differs in one or two strokes.

If you try to learn these simply by writing them a lot, you will probably fail, or at least waste  a lot of time. Instead of doing that, there is a trick you can use to solve many of these problems. Often, the reason you keep confusing characters  is because it’s hard to remember meaningless things (the absence of a dot, the addition of a stroke). It’s much easier to remember pronunciation and/or concrete objects.

Confusing characters can be easily hacked by paying attention to the phonetic component

Naturally, not all confusing characters can be solved this way, but I’m going to show you some that are very easy to deal with so that you can keep your eyes peeled for these in the future. In short, the characters are really easy to confuse, but you can deduce which one is which based only on the phonetic component.

Let me give you a basic example first (adapted from this article). 良 (liang) and 艮 (gen) – When you write characters with these two components, it’s extremely hard to remember if there should be a dot or not. Considering that I know at least 25 characters with these components, it can become very confusing indeed. Until you notice that all characters containing 良 (with the dot) end with -iang and all characters with 艮 (without the dot) end with -in or -en. Like this:

With dot (view all here): 娘, 浪, 狼, 莨, 阆, 琅, 稂, 锒, 粮, 蜋, 酿, 踉
Without dot (view all here): 艰, 限, 垦, 很, 恨, 狠, 退, 垠, 哏, 恳, 根, 痕, 眼, 银, 裉, 跟

This means that you can know if there should be a dot or not simply by knowing the pronunciation of the character! You never need to worry about remembering this, you just need to know the pronunciation of the phonetic components. Conversely, you can sometimes guess the pronunciation of a new character if you know the phonetic component. Any character containing 良 (liang) are likely to be pronounced either liang or niang, and characters with 艮 (gen) tend to be pronounced hen or gen.

More examples (please add your own in the comments)

To show you how powerful this is, here are a few more examples of characters that might be trolling you. Some of these are not relevant for simplified characters, but rather than caring too much about that, focus more on the principles. Even though simplified characters sometimes avoid the problem, more and trickier problems are created by merging character components. That’s beyond the scope of this article, though.

延 (yan) and 廷 (ting)

Characters based on 延 (yan) are always pronounced -an…

  • 诞 dàn
  • 蜒 yán
  • 涎 xián
  • 筵 yán
  • 埏 yán shān
  • 綖 yán
  • 蜑 dàn
  • 莚 yán
  • 駳 dàn
  • 鋋 yán
  • 硟 chàn

…and those with 廷 (ting) are pronounced ting:

  • 庭 tíng
  • 艇 tǐng
  • 挺 tǐng
  • 霆 tíng
  • 蜓 tíng
  • 铤 tǐng
  • 梃 tǐng
  • 閮 tíng
  • 莛 tíng
  • 綎 tīng
  • 鼮 tíng

易 (yi) and 昜 (yang)

Characters based on 易 (yi) are always pronounced -i…

  • 锡 xí
  • 赐 cì
  • 踢 tī
  • 惕 tì
  • 剔 tī
  • 蜴 yì
  • 裼 xí
  • 埸 yì
  • 逷 tì

…and those with 昜 (yang) end with -ang:

  • 諹 yáng
  • 逿 dàng táng
  • 輰 yáng
  • 颺 yáng
  • 鍚 yáng

令 (ling) and 今 (jin)

Characters based on 令 (ling) all start with l-:

  • 领 lǐng
  • 冷 lěng
  • 零 líng
  • 龄 líng
  • 怜 lián
  • 邻 lín
  • 玲 líng
  • 铃 líng
  • 岭 lǐng
  • 伶 lín
  • 拎 līng
  • 翎 líng
  • 聆 líng
  • 羚 líng

…and those with 今 (jin) don’t start with l-:

  • 念 niàn
  • 含 hán
  • 琴 qín
  • 贪 tān
  • 吟 yín
  • 岑 cén
  • 矜 jīn
  • 黔 qián
  • 芩 qín

I think this is enough to show you what I mean. If you have more examples of your own, please leave a comment! And if you want to check out more like this, I suggest you head over to the list of phonetic sets at HanziCraft. I also recommend using Zhongwen.com. Of course, not all sets are easy to confuse, but I hope that this article and the previous one will make you pay more attention to the phonetic components of Chinese characters.

Phonetic components, part 1: The key to 80% of all Chinese characters

When introducing characters for the first time, most teachers explain that there are six different kinds of ways that characters are composed in Chinese (六书/六書 in Chinese, read more here if you don’t know what I’m talking about). The first category brought up is usually pictographs, which are (or at least were) pictures of objects in the real world.

Sometimes, teachers spend a lot of time explaining how these work, showing how a picture of the sun turned into 日, how the moon turned into 月 and a tree into 木. Then, to show that Chinese characters aren’t that scary, some teachers demonstrate that character can be combined to form new characters, so that if 木 means tree, 林 means forest and 森 means luxuriant growth.

yangRegardless of what the teacher does next, this is what sticks in students minds. There might be other explanations of the other ways of character formation, but since they are less direct and requires you to already understand a bit about characters before you fully understand what it’s all about, they are either glossed over or not remembered by the students.

This is serious, because while pictographs are pretty and easy to explain, they only make up around 5% of all characters. Phonetic-semantic components, on the other hand, make up almost 80% of all characters. Funny that most material online and in textbooks tend to focus on the former and not the latter. Indeed, most textbooks I’ve seen don’t do more than give a few lines defining what phonetic-semantic compounds are.

A typical phonetic-semantic compound is shown in the picture to the right. It consists of one semantic part that relates to the meaning of the character (white, water in this case) and one phonetic part that indicates the pronunciation of the characters (red, sheep in this case).

A huge majority of characters belong to one category: phonetic-semantic compounds

After the introduction course, teachers will assume that the students already learnt about phonetic-semantic compounds the first week, so no-one will really make up for it later. This means that there are myriads of intermediate and even advanced learners who haven’t actually understood why phonetic components are crucial.

This article is the first of two about phonetic components of Chinese characters. Apart from introducing phonetic components, this article will show you how knowing about them can help you tremendously with your character learning and your ability to read out loud (and even guess the pronunciation of characters even if you’ve never seen them). This is something native speakers do all the time and most second language learners pick up sooner or later. I’d like to make it sooner rather than later for you.

The second article deals with something much less widely discussed: How you use phonetic components to hack Chinese characters. This is a variant of horizontal character learning, where you focus on a common phonetic component in order to distinguish between visually similar characters that would otherwise be very hard to learn and would keep on trolling you for years. This is avoidable if you understand phonetic components.

What does a phonetic-semantic character look like

In order to understand these characters, it helps being aware of how they were created. Spoken language of course predates written language, so when people in ancient China started to write, they already had a developed spoken language they wanted to express using characters. The most obvious pictographs probably weren’t that hard since they are just slightly stylised versions of real-world objects, but it should be obvious for everyone that you can’t have a picture of every single word you want to say. How do you draw an ocean? What about love? Yesterday? An hour?

Of course, these concepts already existed in the spoken language, so what people started doing was combining one character that represented meaning (the semantic component) and one that represented the sound in the spoken language (the phonetic component). Thus, such a character consists of two completely different parts that have no relationship to each other, but which still make up a new character.

To show what I mean, let’s look at an examples. 洋 (ocean) – this character consists of water 氵 and sheep 羊. Now, it should be obvious that this is not simply a combination of two related characters to form a third related character (such as 木, 林 and 森). Instead, the semantic component 氵 tells us that the character is related to water and 羊 tells us that the character is pronounced the same way as sheep is, i.e. yáng.

The power of phonetic components

As mentioned above, this kind of construction makes up around 80% of all characters in Chinese. That’s a considerable majority and if you want to learn many characters, you need to understand how they work. Most importantly, knowing about phonetic-semantic compounds gives you clues about the pronunciation of characters. Thus, it’s not true that written and spoken Chinese are completely separate, because in most cases, there is a phonetic component to the character. Still, it might have mutated, sometimes beyond recognition, through the ages, but most are still clearly discernible.

Here are some examples of phonetic components and characters they appear in, along with their pronunciations in Mandarin, as well as their meanings (which are usually unrelated to the pronunciation, of course). I have only included common sample characters, there are many more, of course.

Phonetic component: 羊, yáng (sheep)

  • 洋, yáng (ocean)
  • 樣, yàng (manner, appearance)
  • 養, yǎng (to support, to raise)
  • 氧, yǎng (oxygen)

Phonetic component: 青, qīng (green/blue)

  • 請, qǐng (please, to ask)
  • 清, qīng (clear)
  • 情, qíng (emotion)
  • 晴, qíng (clear, fine)

As you can see, sometimes the pronunciation isn’t identical. For instance, the characters might have different tones (氧/洋, yǎng/yáng), initial (湯/傷, tāng/shāng) or final (踉/浪, liàng/làng) or any combination of these, but these are still incredibly valuable clues. Some phonetic components are extremely regular. Have a look at these characters: 碟, 諜, 喋, 牒, 堞, 蝶, 蹀, 鰈. They are all pronounced dié!

Towards a better understanding of Chinese characters

This is just the beginning. When you understand what phonetic components are, you will see them all over the place. Chinese characters look very confusing at first, but phonetic components make up the most important piece in the puzzle. Read the second article about how to hack Chinese characters with our newly-acquired knowledge of phonetic components!

Update: Regarding phonetic components, I just thought of another pair: 唐 and 庸. This is actually a perfect case, since all common characters with 唐 are pronounced exactly like 唐 (táng): 糖塘搪瑭醣溏螗磄, and all common characters with 庸 are pronounced like 庸 (yōng): 慵镛墉鳙鄘. So, in essence, you just need to create one mnemonic for 唐 and one for 庸 and you’ll never confuse these characters again! I don’t know if this particular pair has cause you any problems, but since this seems to be a perfect case, I thought I’d share it with you.