web analytics

I’ve heard many stories about people in East Asia who try to learn English by memorising dictionaries. Even if it’s true that some people actually do that, I think this somewhat puzzling technique isn’t common in the West. Hearing such stories, it’s easy to shake one’s head and wonder how someone could be so stupid to think that memorising dictionaries is the same as learning a language.

Then perhaps it comes to you as a surprise that I a couple of years ago spent roughly one hundred hours spread out over six weeks learning all the characters in the Far East 3000 Chinese Characters Dictionary. Of course, I knew a lot of them from before, but I learnt a considerable amount of new words as well. This article is not about this particular thing or about this dictionary. Any dictionary (or website) based on frequently use characters and/or words will be fine. If you don’t have a book already, I suggest using this online list.

I’ll try to explain why I think that going through such lists is an excellent idea if you do it right and at the right time, and I will also share some thoughts on how to do this without running into some of the problems I did. I never expected this, but the day has come when I actually recommend other people to memorise a dictionary!

Please note that you should only do something like this if you already know the majority of words in the list you want to study. If you use the online list I provided above, you can chose your own number. If you’ve only studied for a year, choose the 1000 most commonly used characters. If you studied for years, choose all 3000. It’s up to you, but I would rather aim slightly too low than too high.

Learning a dictionary isn’t necessarily stupid

First things first, why would memorising a dictionary be a good idea? I’ve argued before that Chinese is a language consisting of many building blocks (see my articles about building a toolkit) and rather than learning a character, it’s fruitful to learn its composition instead. The same goes for words in Chinese (words consisting of more than one character). Making sure that you know the 3000 most common characters, you gain access to a huge number of new words. By access I mean:

  • You can guess the meaning of a compound word because you know the characters in it
  • You can learn new words more easily, because you know the component characters

I have argued elsewhere that vocabulary is not only king, but god emperor as well. If you don’t feel convinced that vocabulary is extremely important, you should check my article about the importance of knowing many words.

Let’s look closer at above-mentioned benefits. The first one might be either useless or invaluable depending on the word. Chinese consists of lots synonym compounds (i.e. words that consist of two characters which mean the same thing, such as 快捷 or 馈赠 (饋贈)) and if you know both the characters, you can be pretty sure about what the word means, whereas if you only know one, the meaning might be anything. This is an example where your toolkit allows you to learn words for free, so to speak.

Moreover, there are numerous examples where there are more than one similar way of saying something. For instance, compare 时限 (時限), 期限 and 年限, which are really easy to distinguish if you know what the individual characters mean, but might cause trouble if you don’t. There are of course more examples, but I think this is enough to illustrate the point.

Now, let’s look at a graph I think some of you have seen before:

The picture is from Patrick Zein’s excellent introduction to Chinese (in Swedish, sorry). On the X-axis is number of characters one knows and on the Y-axis is the expected ability to understand written Chinese, assuming that grammar and character combinations are not a problem (which they of course are, but that’s not the issue here).

What does this graph tell us? Basically, it shows that if we know 3000 characters, we will very rarely come upon characters we don’t know when we read normal Chinese text, provided that we know the correct 3000 characters. If you’ve spent lots of time learning characters that aren’t within the 3000 most common, referring to this graph is wrong.

Using frequency lists plugs holes and makes your foundation more solid

Going through lists of words based on frequency allows you to learn characters you should know (because they are common) but have missed because your textbook or teacher hasn’t presented them yet. This means that you broaden your base, including more words that lie outside your textbook and your course. This provides you with a more solid foundation which you can later use to learn more words and understand spoken and written Chinese with more ease.

Suggestions and tips

Still, after having said all this, I’d like to say that memorising dictionaries is quite stupid. Of course, you shouldn’t just try to commit everything to memory by rote learning, you should use all the clever hacks I talk about in other articles. You use dictionaries to find commonly used words and to gain information about these words. However, this is not enough. Here is some more advice for you:

  • Be careful, sometimes you just think you know what a character means because it’s so common, but in fact it means something completely different when it’s on its own. Check all characters carefully once. This will either allow you to find flaws in your knowledge, or, if no such flaws are found, it will increase your confidence.
  • Learn at least one example word where a given character appears, also make a note of this word in connection with the single character so that when you review it, you can easily see at least one example. Learning words in complete isolation is bad for more than one reason.
  • Don’t feel forced to use the example words in the book. Some dictionaries provide examples that are extremely rare that some native speakers have never heard of. Dictionaries tend to focus on accuracy which isn’t necessarily a good idea. I suggest using an online corpus of examples, such as the one over at nciku.com.
  • Don’t learn the words in alphabetical order, starting from page one and going through the book, because it will be extremely hard to distinguish between one hundred different “shi”. A better way would be to first learn the first character on every page, then the next time learn the second character on every page.
  • Spread it out! Even if you’ve studied for a while, 3000 characters will take a while to go through (100 hours in my case). I managed this by portioning it out, going through a dozen characters at a time whenever I had some time to spare.

Some final words

Conclusively, memorising dictionaries is not a very good idea in general, but I think there is some merit in studying frequency lists, thus making sure you know characters and/or words you really should know. When I did this, I felt that the 3000 characters resulted in a quantum leap in reading comprehension. This will not take care of reading speed, complex grammar or other problems associated with reading ability, but it will enable you to understand many texts you would otherwise have been completely unable to decipher. More importantly, it will make it a lot easier for you to learn more later, given that you now have more building blocks and tools to understand and analyse the language you are learning!

Please consider supporting Hacking Chinese so that I can keep providing free content. Please also visit the site sponsors for high-quality Chinese products and services.

Tagged with:

14 Responses to Memorising dictionaries to boost reading ability

  1. gweipo says:

    Actually your post makes absolute sense. When you’re a beginner it is useful to use something like Tuttle’s Learning Chinese characters, which has the first 800 in frequency order with easy memory tricks. I used to cross reference their number to my text book, and then at the end of the first year I just went and learnt the rest that hadn’t been covered.

    I’m now paging through Chinese Character Fast finder (also Tuttle / Matthews) which is shape / radical based, with 3200 character, and more useful like you mention than something that is alphabetical – and yes I do kind of sit down and learn a page from time to time.

  2. [...] a high number of individual characters (parts of words would be applicable for other languages). I went through the 3000 most common Chinese characters before starting. This turned out to be incredibly useful since I usually only need to combine [...]

  3. Paul says:

    Linked (https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AqJaLYri_ZnYdF9YWG0zR3F5UDN6Nll5Q2d5MUM4UWc&single=true&gid=1&output=html) is a visualisation of character frequency, making the same point as the chart on this page.

    (I can’t work out how to make the chart bigger on google docs, and for some reason it doesn’t like Explorer, but you can drill down a bit, and also hover your mouse if you want to see which character is which.)

  4. Olle Linge says:

    @Paul: Thanks for sharing! What I like most is that this gives me a feeling for what character frequency means. Just viewing the words in a list is one thing and you can see where a certain character is, but this is a lot clearer. Is the size of each cell representative of the characters frequency? If so, where did you get the frequency data?

  5. Paul says:

    Yes, the size of the cell is proportional to the usage – frequency data is from the Modern Chinese Character Frequency List compiled here: http://lingua.mtsu.edu/chinese-computing/statistics/

    What I would really like to do is to match this up with radicals as well – if anyone knows where to find a text version of characters sorted by radical (i.e. the front index section of a dictionary), I can do this. That way, it can tell you not only which characters you might focus on, but the radicals that are most important to know as well.

  6. Olle Linge says:

    @Paul: This is really cool and it would be even more awesome for radicals. However, is the front page of a dictionary enough? Don’t you want all the characters in the dictionary listed by radical? There should of course be such lists, but I haven’t found anything yet. I’m going to Berlin later tonight and won’t be able to look more until next week when I get back. How about asking for this on http://www.chinese-forums.com? Someone should be able to help you!

  7. Paul says:

    By the front page, what I actually mean is what my paper dictionary calls the 检字表 – the second stage of the character lookup when all the characters are listed by radical then stroke order before it tells you which page to look on. I’ll see how I go hunting..

  8. Olle Linge says:

    Yeah, I know what you mean. I’ll do my best to help, but I probably won’t have time until next week. Let me know if you find anything!

  9. [...] going through frequency lists thoroughly and make sure you know all the individual characters (doing so boosted my reading ability a lot). Still, at any levels, learning words that occur naturally in your environment is a good way [...]

  10. Trystan says:

    Olle – how did you go about entering / uploading the dictionary to Anki?

    The reason I ask is that I would like to do something very similar for a different language. There are no online versions of the dictionary so I can only think of entering the vocabulary manually, which is clearly going to be time consuming.

    Your advice would be massively appreciated!

    • Olle Linge says:

      Hi Trystan,

      The boring but true answer is that I just typed in all the characters, definitions and example sentences. Did it take a lot of time? Yes, of course. Do I think that time was wasted? No, definitely not. Retying example sentences and making sure they are correct is a way of studying, selecting the right character from a list to match the character in a book is a way of reviewing. And so on. I don’t think creating one’s own word lists by manually inputting words is a bad idea.

  11. Tyson says:

    I downloaded the SUBTLEX chinese frequency list of words found in subtitles of TV shows, matched it up with HSK lists and have been going downwards through those.

    Even in the first 300 i found some gaps which have really helped me… read subtitles! For example 家伙 is like “guy” and used a lot casually and in action movies, but is considered HSK6. But in subtitles it’s as common as 穿 or 写.

    What I do is go through them, and mark how well i know the word on a scale of 1-5. 1 is a blank. 2 is seen before but pretty unfamiliar. 3 is I know the word but not well. 4 is in my SRS but not perfect yet. 5 is I can write this character from memory reliably.

    Words that are 1-3 go into my SRS via an example sentence. I also have a report of all the words rated 1-4 and occasionally check through them to see if they can be promoted. SRS does this too, so it’s almost repetition, but SRS doesn’t know which ones are highest frequency words.

    And for fun, I also calculate the total % so I have a rough idea of the % of words I understand. Right now it’s 70% at 5 although actually there’s quite a few words I haven’t rated yet so perhaps more like 75%-80% are 3-5 when I mark further down.

    • Olle Linge says:

      Sounds interesting! Can you offer some more information about that list? Also, I’m a bit surprised that 傢伙 is HSK6, it’s really quite common!

      • Tyson says:

        You can download the data here: http://expsy.ugent.be/subtlex-ch/

        There is a paper (easy to find online) that explains the data better, after reading the abstract I was happy enough about their segmentation strategy – I’m no linguistics expert, but I was convinced enough to use it.

        So I just load up a big old Excel file and I search away. I’ve sorted it all on frequency and just work my way down the list.

        It’s also fun to look for the most common 3 character and 4 character words (不好意思 is the most common).

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>