web analytics

Image credit: Max Braun

Learning a new language sometimes feels like parachuting from high altitude and landing in a foreign and sometimes also hostile land. Slowly, you establish a base camp; you start exploring your surroundings. You start building up confidence, you start knowing your way around. As you reach the intermediate level, you know the area close to your camp pretty well and you start exploring the high mountains and deep valleys beyond, gaining ever more experience. However, the larger the territory you explore becomes, the longer becomes the perimeter. As the area increases, it also becomes more and more difficult to make sure that you haven’t overlooked something important.

Just because you can read an advanced text doesn’t mean that you know everything on the intermediate level

It’s easy to fool yourself and believe that because you can read a text at a certain level, it means you master everything below that, which is never the case, not even for native speakers. Your map might reach far and wide, but there will always be hidden caves and valleys that you haven’t visited, sometimes really close to home. In other words, dropping the metaphors for a while, you might know how to say “claustrophobia”, “recession” and “promulgate” in Chinese, but if you haven’t been exposed to enormous amounts of Chinese, it’s very likely that there are fairly easy words that you don’t know. The problem is that you don’t know that you don’t know them.

Finding and mapping unknown terrain

There are basically two ways to improve your map:

  • Expose yourself to huge volumes of Chinese
  • Use frequency dictionaries and compiled word lists

The first one is what native speakers do and what we should do as much as we can. However, saying that we should simply listen and read more Chinese is not very helpful (and it has already been said many times on Hacking Chinese), so in this case we’re going to focus on the second method. I have touched upon this subject before when talking about using more than one textbook to diversify vocabulary, but using frequency lists is taking the same thinking one step further. I have also written about memorising dictionaries, which also touches on the same topic.

It’s fairly straightforward: use a frequency list (you can use official lists for HSK preparation or similar) and go through them to see what words you don’t know. If you combine this with proper flashcard software and just import the lists and see what you lack, this will only take a few minutes. It allows you to find words you ought to know (or at least that someone thinks you ought to know), but don’t. If you haven’t got all your words in a program already, it might still be worth the effort going through the words manually.

Learn words below your expected level

Note that I’m talking about learning words within the limits of the map you already have. I do not suggest that you use only word lists to expand your vocabulary in general. In other words, if you think that you are on an intermediate level, use this method to learn beginner-level words. If you’re advanced, don’t use this method to learn advanced words, but anything below that is cool.

Some practical aspects and an example

I use Anki and some time ago I had around 15000 cards in my Chinese deck. This means that I should know a significant amount of words, but as we shall see, there were many I tried adding TOCFL lists (the Taiwanese HSK equivalent) , but I tried it with the HSK lists as well and the results were similar. I imported these lists to Anki, which of course rejected words already in my list. This is what I did and how it turned out.

  1. Adding the beginner words (800) gave me two words I didn’t know
  2. Adding the basic words (+1600) gave me roughly a dozen new words
  3. Adding the intermediate words (+3400) gave me a couple of hundred words
  4. Adding the advanced words (+2800) gave me over one thousand new words

Naturally, you should stop at a decent level. Adding a thousand new words is quite a daunting task, so I would advice against doing that. The point here is not to cram in more words (even though you can do that if you want to, of course), but rather to note that there were words on the easier levels I didn’t know. Not all of these were words I truly didn’t know; some of them just wasn’t in my deck, but a significant number were words I actually didn’t know.

Special note for traditional characters in Anki: There is a plugin called Traditional Hanzi Statistics which will compare your deck with a list of characters based on frequency and see what words and/or characters are lacking. This plugin is extremely useful.

This is a good example of a map that is very spread out and has lots of blank areas. Considering that my deck consisted of more than twice the number of words than the complete list of vocabulary for the test, it goes without saying that I know a lot more words than required for the advanced level. However, there were more than one thousand words I didn’t know!

What does that mean? It means that there were one thousand words someone responsible for preparing the word lists thought important, but that I hadn’t learnt yet. Regardless if I’m preparing for that test or if I’m just thirsty for knowledge is irrelevant, adding these words is truly useful, provided that you use good word lists. Regarding the words I didn’t know in the beginner, basic and intermediate lists, let me just say that there were some words I was amazed that I actually didn’t know how to say in Chinese!

Advanced level context

When learning words that are very common, I don’t think that spending time to find good example sentences is necessary, especially when we’re talking about nouns and verbs that are fairly straightforward to use. You will pick up how to use common words simply by listening and reading if you do it enough. Combine this with speaking and writing practise and you’ll be fine.

This is not true on an advanced level, though, because you’ll be much less likely to encounter the words you learn again. Thus, I strongly suggest that you learn any advanced words in contexts and with at least one clear example of how to use it. If you doubt the validity or correctness of your sentence, ask a friend or use Lang-8 (you can post several sentences at once and ask people questions about them).


Whether for test preparation or simply to enhance your vocabulary, using frequency lists is really useful. It might be incredibly hard to find these words, which might be embarrassing/bad/catastrophic if you don’t know them. The outcome depends on why you want to learn Chinese in the first place, but I think that we can all agree that learning words that are actually below our average level is desirable regardless of how far we’ve come in our studies.

Please consider supporting Hacking Chinese so that I can keep providing free content!

  • Get access to extra tips, hacks, news and other things I don't share on the website:

Tagged with:

21 Responses to Mapping the terra incognita of vocabulary

  1. Alan says:


    You’re totally on point as usual.

    I think everyone has very unique strengths and weaknesses in their vocabulary. This is a result of their background and own preferences. So someone might be awesome at chengyu but really bad at casual talk.

    That’s why I really think personalized language learning is going to be wave of the future.

  2. Olle Linge says:

    @Alan: I think it’s a balancing cat. Yes, it’s absolutely essential that we are motivated and learn what we want to learn (to each his own), so in this way language learning has to be personalised. On the other hand, focusing only our strengths while overlooking weaknesses isn’t good either, especially if we’re serious about learning Chinese (or anything else). I think people have a tendency to practise what they’re already good at while failing to practise what we actually need to practise the most.

  3. Chris says:

    Hey Olle,

    You are referring to the TOP word lists, is that right? Is there a shared ANKI deck of these, or how did manage to import them?

    Also, do you add example sentences to each flashcard?

    Site is looking great!


    • Olle Linge says:

      I added converted the online lists myself, because as far as I know, there are no Anki decks (or there weren’t at that time anyway). It’s doable with some more advanced copy-paste-replace-fu. :) If you want, I can check if I have the converted lists still somewhere on my hard drive, but I’m not sure.

  4. Olle Linge says:

    I just checked more carefully and there is a list of 8000 words in Anki, which should be everything. I haven’t checked the quality of the entries, but it will be a lot better than starting from the text files which is what I have. I integrated these words into my normal deck and so I only have the words I was previously lacking. Here’s info about the deck:

    標題: Test of Proficiency (TOP)
    標籤: chinese china taiwan traditional characters hsk

  5. Sara K. says:

    Yes, I have the TOP Anki deck (I think it is based on an older version of the TOP … but I don’t think the vocabulary changes very much from version to version, and if you just want to round out your vocabulary, it doesn’t matter which version it’s based on).

    Currently, I am focusing on the vocabulary I need in order to understand 1) wuxia and 2) soap operas. There is quite a bit of crossover, consider that both include lots of scenes where people are describing their passionate feelings. I am focusing on wuxia and soap operas because I am focusing on quantity right now, and wuxia and soap opera are things I can take in large quantities because 1) there are lots cliffhangers and 2) the stories are long (reading/watching a long work with the same characters is easier than reading/watching many short works – if I had to constantly figure out new contexts and adjust to different artistic styles, it would a lot more frustrating). I am focusing on quantity because I want to increase my reading speed and my comprehension skills *other* than vocabulary.

    However, I am also very aware that I am missing out on a lot of other useful vocabulary – even though much of that vocabulary is “easier” than the vocabulary I am picking up now. I actually plan to use the TOP Anki deck to round out my vocabulary when I transfer my focus from quantity to variety (being able to read/listen to Chinese about many different topics).

  6. Billy Waters says:

    Do you use Mindmanager? I find it useful for mapping out areas that I do or don’t know.

  7. Matt says:

    Hey Olle,

    This is kind of off-topic but you said “of course” Anki filtered out any words already in your list. My deck doesn’t seem to be doing that when I import tab separated text files from pleco. What options do I need to select to ensure there are no duplicates? Thanks!

    • Olle Linge says:

      Hi Matt,

      You have to tell Anki to prevent duplicates in certain field. If you go to Card Layout, you can select the various fields and check a box called “prevent duplicates”. I only have this checked for 漢字 because it doesn’t make sense for any other field. Thus, if I either try to import or manually enter a character (or word) that already exists, Anki won’t allow it. When I import lists, it won’t import duplicate expressions. Note that it only blocks exact duplicates, so if you have a blank space after a character, it won’t show p as a duplicate. Also, variants of the same character such as 為 and 爲 are treated as two different characters.

      Hope this helps!

  8. Matt says:

    Thanks, Olle, I’ve got those settings checked, and it does prevent me from manually creating duplicates, so there must be some differences in the expressions that I didn’t notice. I’ll keep playing around with it. Cheers!

    • Olle Linge says:

      Yes, everything counts, including things that aren’t visible. So if you’ve coloured something black (as opposed to just having the default colour), it will still appear to be identical to another entry, but in fact, they are different.

  9. Steven says:

    Hi Olle,

    I just decided I wanted to learn Chinese. I’m really glad I found out about your website, it gives me a great framework to start with. Thanks for the good work!


    • Olle Linge says:

      Steven: Glad to hear you like the site! Feel free to ask questions if there is anything in particular you’d like to know. Otherwise, I’ll just wish you good luck! Enjoy!

  10. Matt says:

    Ok I’ve come across another problem now; about 75% of my deck is in simplified. I want to import the TOP vocab lists but of course Anki won’t catch any duplicates that exist only in simplified form. Anyone have an idea of how I could go about converting my whole deck? I suck at Anki!

    • Olle Linge says:

      Matt: I’m not sure how to do this, but it would probably be easiest to export the deck, convert the characters (there are lots of tools that do that, even though it would harder to go from simplified to traditional; I don’t know how reliable this kind of automatic conversion is) and then import the deck again. It should be possible to do it while retaining statistics.

      See this and this.

  11. Matt says:

    Yes, my concern was keeping the stats, that’s the hard part. I had a look at those threads but neither seem to address the issue of the stats. I feel like there must be a way to do it using the pinyin toolkit. Anyway if I figure it out I’ll back here and post about it. Thanks for your help Olle, you are truly the Don Corleone of Chinese SLA blogs =)

    • Olle Linge says:

      It’s possible to export review statistics as well, so it’s at least theoretically possible to achieve what you want. I’ve never actually tried to do it, though. How’s it going?

  12. […] However, you mustn’t fool yourself into thinking that being able to walk a distance in one direction means that you can do the same in another. A new language is a huge landscape and even the easier parts of the terrain are vast. Approaching more difficult challenges doesn’t mean that the easier ones are all mastered. This is where additional textbooks come in, but before I explain how, I shall illustrate with an example (if you want to read more about maps, terrain and vocabulary, check Mapping the terra incognita of vocabulary). […]

  13. […] One more word of warning. Don’t make the mistake of thinking that just because you master something at a certain level, you master everything below that level. This is a dangerous misconception; I’ve often found that beginner and intermediate textbooks and courses have things to teach me, it’s just that I don’t want them to be my main source of learning. Even if you strive for the stars, be sure to spend a decent amount of time making sure to solidify your foundation. […]

  14. Hey guys.

    I’ve turned the character frequency list here: http://www.zein.se/patrick/3000en.html

    Into an Anki deck.

    You can download it here: https://ankiweb.net/shared/info/1418386664