The building blocks of Chinese, part 5: Making sense of Chinese words

How many Chinese characters do you need to know to be able to read?

How many characters are there in total?

These are common questions that are often based on a misunderstanding of how the Chinese writing system works. While it’s possible to count how many unique characters a text in Chinese contains, let’s say 3,000 in an ordinary novel, knowing all those characters doesn’t mean that you can understand the novel.

Even if you learnt all the characters that are in common use, maybe 6,000 or so depending on how “common use” is defined, this still only provides a foundation for reading ability, not reading ability itself.

This means that the previous articles in this series are important, but not enough. You do need to know characters, but you also need to know how they fit together into words. If you haven’t read them yet, here’s a list of all the articles about the building blocks of Chinese:

Of course, understanding written Chinese involves more than words as well, most importantly grammar and the ability to interpret words correctly in context, but in this article and the next, we’re going to focus on word formation.

Chinese has a strong preference for two-character words

The reason characters only provide a foundation is that meaning in modern Chinese is conveyed through words, not individual, stand-alone characters. This includes both single-character words, two-character words and even longer ones, but the difference between a character and a single-character word might not be obvious, but we’ll get to that later.

In general, everyday language is more likely to contain single-syllable words. If you think about it, the most common verbs work well on their own, for example:

  • 看 (kàn) “to see”
  • 吃 (chī) “to eat”
  • 喝 (hē) “to drink”
  • 给 (gěi) “to give”
  • 走 (zǒu) “to walk”

These are considered words as they can be used independently. However, if you browse through the word lists your textbook or a word dictionary, you’ll see that most words are made up of  two characters, not one.

Since native speakers know many tens of thousands of words, and there are only 6,000 characters in use, it should be obvious that most words can’t consist of only a single character.

In fact, Chinese has a very strong preference for two-character words, or two-syllable words in the spoken language. The key to understanding these words is to realise that the they consist of meaningful parts, just like we have seen that characters consist of meaningful and functional components in earlier articles in this series.

In some cases, you can deduce what a word means just by knowing the characters, but not always:

  • Easy: 足球 (zúqiú) = foot + ball = football (or soccer, if you prefer)
  • Medium: 跳舞 (tiàowǔ) = jump + dance = to dance
  • Hard: 东西 (dōngxi) = east + west = thing; stuff (if pronounced with two first tones, it could also mean east-west literally, but this is much rarer than the word we’re interested in here)

Whether or not the composition makes intuitive sense by looking at the individual characters, it’s extremely important to realise that compound words can (almost) always be broken down into individual characters that are meaningful and will either help you understand the word, make it easier to remember, or both.

What is a word in Chinese anyway?

If you don’t know what the characters in a word means, you’re essentially forced to memorise meaningless combinations of complex jumbles of strokes, and as anyone with the slightest insight into learning and memory knows, memorising meaningless things is very hard.

Let’s look at the words from the previous section again:

  • 足球 (zúqiú) = foot + ball = football (or soccer, if you prefer)
  • 跳舞 (tiàowǔ) = jump + dance = to dance
  • 东西 (dōngxi) = east + west = thing; stuff

You can see that they are quite different in structure. The first word, 足球 (zúqiú), is just a compound of what looks like two nouns, which also happen to be the same in English. If you know or learn what the components mean, the meaning of the word is very easy to remember.

In the second case, 跳舞 (tiàowǔ), the first character is a verb and the second is a noun, so it looks like a verb-object phrase, where in English, we only use a single verb to express this. To “jump-dance” is not very intuitive the first time you see it, but I think you’d agree that it could make sense from a certain angle.

The third case, 东西 (dōngxi), makes no sense at all on a superficial level, as “east” and “west” have no apparent relationship to the meaning “thing”. Still, since these are basic characters that have their own meanings, remembering the word 东西 (dōngxi) is much easier if you know the individual characters. It’s not like in English, where breaking down “thing” into “thi” and “ng” makes any sense.

What is a word anyway? Is 你好 a word? What about 吃饭?

At this point, you might ask yourself what a word actually is in Chinese. In English, this is less of a problem, because in writing, we use spaces around words and we also use lots of inflections and other modifications that help us identify word boundaries in the spoken language.

Chinese doesn’t use spacing around words and it doesn’t use inflections in the way English does either. Traditionally, Chinese dictionaries list only single characters, so 字典 (zìdiǎn), and word dictionaries, 词典 (cídiǎn) only appeared in the twentieth century. That means that the concept of a “word” (wordhood) is rather hazy and hard to pin down.

Let’s have a quick look at what a “word” is in Chinese! First, do you think that 你好 (nǐhǎo) is a word? If you translate it to “hello” in English, it certainly looks like a word! What about 您好 (nínhǎo), then, is that a word too? What about 你们好 (nǐmenhǎo) or 大家好 (dàjiāhǎo)? Or 老师好 (lǎoshīhǎo)? Or maybe 早上好 (zǎoshanghǎo)? Are these words or phrases?

In fact, all of these are usually treated as phrases in Chinese, not words.

Let’s look at a few more examples to see what is necessary for something to qualify as a word:

  • 大衣 (dàyī) “big + clothes”
  • 大家 (dàjiā) “big + home”
  • 大树 (dàshù) “big + tree”
  • 大桥 (dàqiáo) “big + bridge”

Which of these are words? The first two should be treated as words, because they have a meaning beyond what you get when combining the characters: 大衣 doesn’t just mean “big clothes”, it means “overcoat” specifically, and 大家 normally doesn’t mean “big home”, but “everybody”.

The other two are not words, because they are pure compositions, meaning “big tree” and “big bridge” respectively. In such cases, you can try to insert a 的 between the adjective and the noun part of the word: if it still means the same thing, it’s a phrase, but if it doesn’t, it’s a word. So 大的衣 doesn’t work, but 大的树 works. Hence, 大衣 is a word, but 大树 is not.

As you can see, this is rather complicated, and I have only scratched the surface here. If you’re really interested to learn more, I suggest you check out chapter 5 in Duanmu San’s The Phonology of Standard Chinese (2007), which has an in-depth look at wordhood in Chinese and presents various theories and hypotheses. Packard (2000) also has an accessible introduction in chapter 2.

Review: The Phonology of Standard Chinese

Can’t I just leave the problem of what a word is to linguists?

Yes, you can, because there is a shortcut! Anything that appears in a word dictionary is a word, and anything that doesn’t is probably a phrase. If you look up all the above examples in 现代汉语词典 (Xiàndài hànyǔ cídiǎn), the most authoritative word dictionary in China, you will see that it exactly follows what I have said  when it comes to these examples (I just checked).

But it’s not that easy. What about words like 吃饭? In’t that a phrase? It certainly looks like a verb-object phrase: “eat rice”!

Still, 吃饭 is a word, and it is listed in the dictionary. Not all words are indivisible units, because we also have a category of words called 离合词 (líhécí), “separable words” (or “verbs” as these tend to be).

In dictionaries, these are often marked with // between the syllables. Below, you can see the entry for 吃饭 in 现代汉语词典. If you know of a good online dictionary that provides this kind of information, please let me know in the comments!

These are considered words, but can still be manipulated as phrases. This means that you can’t just treat all the words you learn as their English counterparts and be done with it, or at least you can’t do that for long. Knowing the component characters is essential!

Here are a few examples:

  • 吃饭 (chīfàn) “to eat”
  • 跳舞 (tiàowǔ) “to dance”
  • 睡觉 (shuìjiào) “to sleep”

Most beginners think of these as being words just like any other, which is okay if you only just learnt them. But then you’ll get confused when people say the phrases listen below, or when you get corrected when you treat these words as inseparable units:

  • 吃过饭 (吃饭过 is wrong)
  • 睡不了觉 (睡觉不了 is wrong)
  • 跳起舞来 (跳舞起来 is wrong)

So, it’s clear that you have to learn the meaning of the individual characters that make up these words in order to be able to use them properly, and you also need to understand their function. A bit like learning components help you understand characters, right?

吃饭 doesn’t just mean “to eat”, it actually means “to eat-rice”, 睡觉 doesn’t mean “to sleep”, but “to sleep-sleep”, and 跳舞 doesn’t mean “to dance”, it means “to jump-dance”. If these were actual words in English, you could manipulate them like you do in Chinese, and you’d maybe say that you “ate-rice”, not “eat-riced”. Even if both examples are strange, it should be clear that it’s the verb that needs to be inflected, not the noun. It’s the same in Chinese!

So no, you can’t treat words in Chinese like words in English, which are (almost) never split like this and don’t behave as phrases in some contexts. The fact that Chinese is almost always constructed out of smaller building blocks matters for how the language is used!

Of course, I don’t mean to say that all words are like this. In contrast with the words mentioned above, verbs like 学习 (xuéxí) “to study”, 运动 (ỳundòng) and 休息 (xiūxi) shouldn’t be split. These aren’t of a verb-object structure (and aren’t listed as separable verbs either), and are treated as units, so it’s 学习过, 运动起来 and 休息了. As a beginner, you can’t know that just by looking at the characters, but we’ll look more at the internal structure of compounds in the next article.

Tricky cases and the Cthulhu bubble

As is often the case with languages in general, how words behave can be irregular and unpredictable. For example, 关心 (guānxīn) is also a verb-object compound (close + heart) meaning “concern”, but you can’t say 关了心, and have to say 关心了 (guānxīn le) instead. But, as if to mock us second language learners, you can split it to say 关什么心 (guān shénme xīn), which strongly hints at the verb-object structure of this word. In 现代汉语词典, 关心 is not listed as a separable verb. As we have seen before, you can’t know this by looking at the word.

A word of warning, though: don’t get stuck in exceptions in tricky cases, for in that direction madness lies. It will take a lot of time and will offer you little in return, even if you do occasionally find a good answer as to why something is said in a certain way and not another. Staying inside the bubble of safety where language is fairly regular is fine; don’t poke the monsters that dwell outside. I wrote more about this here:

The Cthulhu bubble and studying Chinese

Word length in Chinese is flexible

It’s worth pointing out that word length is often flexible in Chinese, and that Chinese can even be said to have a “dual vocabulary”, meaning that many things can be expressed with two different words that mean the same thing, one with a single syllable and the other with two. Here are a few examples:

  • 商店 (shāngdiàn) “shop”, means the same thing as just 店
  • 大蒜 (dàsuàn) “garlic” means the same thing as just 蒜
  • 种植 (zhòngzhí) “to plant” means the same thing as just 种

It is a major headache for second language learners to figure out which one to use, as this is often dictated by things like rhythm, which can be hard to get an intuitive grasp of. For an overview, see chapter 7 in Duanmu (2007).

This is not something you can learn by studying theory, though, even if doing so can be interesting. Instead, the solution is extensive reading and listening; by exposing yourself enough to the way people write and speak, you’ll automatically learn how words are used. Understanding how things work can help, which is why I wrote this article.

Knowing where a word comes from can make it easier to understand

Words can be very different depending on how they entered the mainstream language. Just like words in any language, Chinese words come from a number of different sources.Let’s briefly look at where words come from (examples are from 邵敬敏 (2007)):

  1. 传承词, ”inherited words“ – These are simply words that have been in the Chinese language for a very long time. Most basic parts of the language belong to this category, including 人 (rén) “person”,  山 (shān) “mountain”, 天空 (tiānkōng) “sky” and 土地 (tǔdì) “land”.
  2. 古语词, ”ancient words” – These are words related to a specific historical period and aren’t really used in modern Chinese unless referring to something of that period. Thus, these words are not beginner-friendly, but here are a couple of examples: 陛下 (bìxià) “ your majesty”, 东宫 (dōnggōng) “eastern palace (crown prince residence)”.
  3. 方言词, “topolect words” – These words come from other topolects (or dialects, if you prefer) of Chinese. This relationship can be very complex, but it’s enough for now to be aware of the fact that many words enter Mandarin from other dialects. For example: 打工 (dǎgōng) “to work for others; to work part-time”,  炒鱿鱼 (chǎo yóuyú), “get fired (literally: “to fry squid”), both come from Cantonese.
  4. 社区词, “community words” – These are words that have their origin in specific communities. For example, 北漂 (běipiāo) refers to people not from Beijing who have moved there for work, and is today not only used in Beijing, but has spread to the language in general. Here in Sweden, Chinese people call Stockholm 斯京 (sījīng) , which is not an official word, as it’s still limited to this specific community. The official name  is 斯德哥尔摩 (sīdégēěrmó). This word could become a “real” word if more people used it!
  5. 专业词, “professional words” – These words were originally limited to a professional context, but then spread to the general language. An example is 套牢 (tàoláo), which originally meant to be stuck with a bad stock investment, but is now used in the mainstream language to mean “trapped” or “stuck,”, such as in a marriage (although I would never use this word to describe my own marriage, of course).
  6. 外来词, “foreign words” – Chinese naturally has borrowed words from other languages, too. This is sometimes done mimicking the sound, e.g. 咖啡 (kāfēi), “coffee”, a mix of sound and meaning, e.g. 新西兰 (xīnxīlán) “New Zealand”, sound plus a Chinese affix, e.g. 啤酒 (píjiǔ), both sound and meaning at the same time, e.g. 基因 (jīyīn) “gene”, or foreign words used exactly as they are, e.g. “OK”, “DVD” and “GDP”.
  7. 新造词语, “newly created words” – New concepts, ideas and technologies have lead to the creation of completely new words, such as in economy: 国企 (guóqǐ), “state-owned enterprise”, in technology: 电脑 (diànnǎo) “computer”, or daily life things, such as 方便面 (fāngbiànmiàn), “instant noodles”.

The point here is not that you need to know all these categories, but that you should be aware of the fact that sometimes a word won’t make sense unless you know where it’s from. This is particularly true for foreign words, because if you look at a word like 啤酒 (píjiǔ) without taking English “beer” or maybe German “bier” into consideration, it won’t make much sense.

This can be particularly tricky with names. For example, Iceland is 冰岛 (bīngdǎo), using meaning only, but Greenland is 格陵兰 (gélínglán), using sound only. While English is of course an important source of both names and words in general, you sometimes need to know other languages to fully make sense of a Chinese name.

For example, the radio station Al Jazeera  is called 半岛电视台 (bàndǎo diànshìtái), “peninsula radio station” in Chinese, referring to the name in Arabic, which means “the island”, referring to the Arabian Peninsula, hence 半岛 “peninsula” . There are also many words from Japanese that use the same characters as in Japanese even though they are pronounced completely differently; 东京 (dōngjīng) is Tokyo, for example. We talked more about transliterations in this article:

Lost in transcription: Saylaw, Ice Island and Aristotle

There are of course many, many loan words from Japanese that aren’t names. Since these are pronounced in Chinese and use Chinese characters, they often don “feel” like loan words, and many are indeed not aware that they are. Examples include 杂志 (zázhì) “magazine”, 电话 (diànhuà) “telephone” and 革命 (gémìng) “revolution”. For an in-depth discussion, see this article (in Chinese) or in English, Zhao (2006).

Conclusion: Understanding characters is necessary to understand and remember words

What I hope this discussion has made clear is that you can’t just treat words in Chinese like words in English. You don’t need to understand the intricacies of the debate over wordhood in Chinese, of course, but you do need to know that it certainly doesn’t work the way it does in English.

The most important takeaway is that in order to make sense of Chinese words, you need to know what the individual characters mean. In some cases, this is not strictly necessary, but can still be helpful for memorisation, but in other cases, it’s crucial.

Some things you think of as indivisible units, such as 跳舞 (tiàowǔ) “to dance” actually aren’t. If you understand the individual characters and their relationship (in this case a verb-object compound: to jump-dance), you stand a much better chance of understanding and also remembering the word and how it’s used!

In the next article, we’ll continue looking at compound words in Chinese, focusing on different types of compounds.

Have you ever been confused why “tiger” is 老虎 (lǎohǔ), even if all tigers aren’t old?

Or wondered why 子 (zi) appears at the end of so many words, without seeming to add any meaning to the word?

Or have mixed up the order in a compound word and accidentally said 密蜂 (mìfēng) “bee” instead of 蜂蜜 (fēngmì) “honey”?

Do you want to know how you can learn Chinese words quickly and effectively?

Then the next article is for you:

The building blocks of Chinese, part 6: Learning and remembering compound words

References and further reading

Duanmu, S. (2007). The phonology of standard Chinese. Oxford University Press.

Packard, J. L. (2000). The morphology of Chinese: A linguistic and cognitive approach. Cambridge University Press.

Sun, C. (2006). Chinese: A linguistic introduction. Cambridge University Press.

Zhao, J. (2006). Japanese loanwords in modern Chinese. Journal of Chinese Linguistics34(2), 306-327.

邵敬敏 (Ed.). (2007). 现代汉语通论. 上海教育出版社.

Editor’s note: This article, originally published in 2010, was rewritten from scratch and massively updated in December, 2021.

