Hacking Chinese

A better way of learning Mandarin

Can ChatGPT pass the HSK?

ChatGPT is a chat bot developed by OpenAI and was launched in late 2022. It’s been all over the press as the fastest growing web platform of all time, so most of you have probably heard of it. Reports of it being able to pass university exams in a range of subjects have been all over both traditional and social media, which makes me wonder: Can ChatGPT pass the HSK?

I will write more about using ChatGPT for learning and teaching Chinese in future articles, so if you have ideas for how it can or cannot be used, please leave a comment below or send me an email! In this article, we will focus on the HSK specifically.

Tune in to the Hacking Chinese Podcast to listen to the related episode:

Available on Apple Podcasts, Google Podcasts, Overcast, Spotify, YouTube and many other platforms!

Can ChatGPT pass the HSK?

Robot reading. Created with Stable Diffusion and some GIMP editing for the book cover.I considered letting ChatGPT write the article for me, but letting it write articles about itself  has been done by so many others that it’s getting boring. Also, ChatGPT does much better with shorter texts and clear prompts. Let’s hit two birds with one stone and use ChatGPT to introduce itself, the HSK and the task at hand!

Note: I have introduced ChatGPT to many people recently, mostly in a professional context, and I have noticed that simply telling people about it doesn’t work, because they often dismiss it as just another mediocre chat bot. It is not. While I can show you my interactions with ChatGPT, I strongly encourage you to try it out yourself if you haven’t!

Here’s what we will cover in this article:

  1. Asking ChatGPT
  2. Can ChatGPT even take the HSK?
  3. Listening, speaking, reading and writing
  4. Using mock HSK exams to test ChatGPT’s reading ability
  5. ChatGPT vs. HSK 3
  6. ChatGPT vs. HSK 4-5
  7. ChatGPT vs. HSK 6
  8. So can ChatGPT pass the HSK?
  9. What does this mean?
  10. How to use ChatGPT to learn and teach Chinese

ChatGPT: What is ChatGPT?

Screenshot of a conversation between Olle and ChatGPT: Olle: What is ChatGPT? ChatGPT: ChatGPT is a conversational language model developed by OpenAI. It's based on the GPT-3 (Generative Pretrained Transformer 3) architecture, which is a state-of-the-art language model that has been trained on a massive corpus of text data from the internet. The goal of ChatGPT is to generate human-like text responses given a prompt, allowing it to participate in conversations and answer questions on a variety of topics. The model has been fine-tuned for various conversational tasks, including response generation, question answering, and dialogue generation. Olle: Write that in one sentence in plain language. ChatGPT: ChatGPT is a computer program created by OpenAI that can have written conversations with people by generating text based on the information it learned from the internet.

ChatGPT: What is the HSK?

Screenshot of ChatGPT describing the HSK: Olle: Tell me about the HSK. ChatGPT: The HSK (Hànyǔ Shuǐpíng Kǎoshì) is a standardized test that assesses the Chinese language proficiency of non-native speakers. It was created by the Chinese government and is used for various purposes, including for admission to Chinese universities, for visa and immigration purposes, and for job applications. The HSK test covers a wide range of language skills, including vocabulary, grammar, reading comprehension, and listening comprehension. The test is divided into six levels, with Level 1 being the most basic and Level 6 being the most advanced. Each level becomes increasingly difficult, testing more advanced grammar structures, vocabulary, and comprehension abilities. The HSK test is offered in various locations around the world and is taken by individuals who are learning Chinese as a second language, as well as by people who are living and working in Chinese-speaking countries. It is widely recognized as a measure of Chinese language proficiency and is considered to be an important tool for people who want to improve their Chinese language skills and demonstrate their proficiency to others.

ChatGPT: Do you think you could pass the HSK?

Screenshot of a conversation between Olle and ChatGPT: Olle: Let's try and see how well you do! ChatGPT: Certainly! I'll do my best to answer any questions you have in Chinese. Please keep in mind that my knowledge is limited to what I've been trained on and my responses might not always be perfect. However, I'll do my best to help with any questions or tasks you have in mind!

Can ChatGPT even take the HSK?

We will see some more of ChatGPT’s capabilities later when we get to the actual exam, but let’s discuss some practical issues first, some of which were mentioned above by ChatGPT itself.

Obviously, ChatGPT can’t take the HSK. You have to be a real person to sign up and you need to be smarter than ChatGPT to understand how the exam works. However, with some handholding, it’s still possible to feed the exam questions to the bot to see how it would perform on various levels of the HSK.

Unfortunately, the two lowest levels on the exam rely on images, which makes them impossible to use (ChatGPT has no image recognition capabilities). Starting from HSK 3, all reading comprehension questions are pure text, however, and any images do nothing more than provide hints to the general topic of the text. Hence, we’ll focus on HSK 3-6. This will all be HSK 2.0 as HSK 3.0 has only started rolling out, and only with the most advanced levels.

The new HSK 3.0: What you need to know

Listening, speaking, reading and writing Chinese

To pass the exam, you also need to be able to listen, speak, read and write. For ChatGPT, listening, would the same as reading, plus an added speech-to-text layer. If it can pass reading questions, it should be able to deal with listening as well, provided that the transcribing goes well. The goal here is not to evaluate transcription software, so we’ll focus on reading.

Speaking and writing are hard to test, because I’m not a trained HSK examiner and don’t know how the speaking and writing parts are graded. The listening and reading parts contain only multiple-choice questions, which makes it easy to check the right answer. If there’s enough interest, I could return to the topic of writing (and maybe speaking) in a follow-up article and evaluate it anyway.

Using mock HSK exams to test ChatGPT’s reading ability

Thus, we are left with reading comprehension. This doesn’t sound like much, but it is a substantial part of the exam, and don’t forget that we expect it to be able to do roughly as well on the listening part, given that speech-to-text is quite good these days.

These combined make up 70 out of the 80 points on HSK 3, for example, so if ChatGPT does well enough, it might be able to pass even with 0 points on the writing part!

To test if ChatGPT can pass the HSK, I used the official mock exams from HSK Mock. If you want to check them yourself or see exactly which questions ChatGPT got wrong, just download the appropriate mock exam from the website. Here are direct links to the PDF documents:

  1. HSK 3
  2. HSK 4
  3. HSK 5
  4. HSK 6

Spending an hour helping ChatGPT finish HSK 3 in one minute

In this article, I’m not going to go through every single question of all the four HSK levels, as this would just be tedious. Most of the process is just me feeding texts and questions to ChatGPT, and it spitting out answers in a few seconds. To show you the process, I will go through HSK 3 in some detail.

ChatGPT vs. HSK 3 Reading: Part 1

Interestingly, there are no instructions on the exam paper for what to do in part one, but the example shown makes it clear that there are six responses (A-F) that each are supposed to be matched with one of six utterances. The example shown is:



Here, I needed to prompt ChatGPT so it had a reasonable chance of giving the correct answer. For example, I told it explicitly that each option could only be used once. In general, I kept tinkering with prompts until ChatGPT gave an answer according to the instructions. If it did something else, such as translating the questions, I tried again until it produced an answer. Obviously, I did not check if the answer it produced was correct until afterwards!

Here’s question 1 (note that I included the example as a real question to make it a bit harder):For this question, there are six responses (A-F) that are supposed to be matched with six utterances (40-45). For each utterance, please list the most appropriate response. Each response can only be used once. A 爷爷教了我很多次。 B 做作业遇到不会的词语时,你用铅笔在旁边记一下。 C 我还以为你忘了呢,你真好! D 决定了没?去还是不去? E 当然。我们先坐公共汽车,然后换地铁。 F 你先吃个苹果吧,我去给你做面条儿。 40. 你知道怎么去那儿吗? ( ) 41.生日快乐!这是我给你买的礼物,喜欢不喜欢? ( ) 42.天阴了,我怕一会儿要下雨,明天吧? ( ) 43.现在去看字典吧,字典里有。 ( ) 44.你的游泳水平最近提高很快啊。 ( ) 45.我饿了,冰箱里有什么吃的吗? ( ) ChatGPT: E 当然。我们先坐公共汽车,然后换地铁。 C 我还以为你忘了呢,你真好! D 决定了没?去还是不去? B 做作业遇到不会的词语时,你用铅笔在旁边记一下。 A 爷爷教了我很多次。 F 你先吃个苹果吧,我去给你做面条儿。

Checking the answer key, we can indeed see that C, D, B, A, F are the correct answer (E is the example, so ignore that). Moving on to questions 46-50:

For this question, there are five responses (A-E) that are supposed to be matched with five utterances (46-50). For each utterance, please list the most appropriate response. Each response can only be used once. A 吃完饭以后,你来洗碗。 B 他在黑板上写的字太小!我们都看不清楚! C 她已经结婚了,孩子都两岁了。 D 他们很认真、很努力地练习了一个夏天。 E 他对自己画的画儿总是不满意。 46.安静,你需要换一个眼镜了。 47.其实,主要是他对自己的要求太高了。 48.我准备给你妹妹介绍个男朋友。 49.为什么总是我啊? 50.今天的节目看了吗?那些学生的表演好极了。 46.安静,你需要换一个眼镜了。 (E) 47.其实,主要是他对自己的要求太高了。 (E) 48.我准备给你妹妹介绍个男朋友。 (C) 49.为什么总是我啊? (A) 50.今天的节目看了吗?那些学生的表演好极了。 (D)

This is a good example of the hand-holding I mentioned earlier. ChatGPT ignored my instruction to only use each response once and used E twice and didn’t use B. Thus, I prompted it again:

Please use each option only once. 46.安静,你需要换一个眼镜了。 (A) 47.其实,主要是他对自己的要求太高了。 (E) 48.我准备给你妹妹介绍个男朋友。 (C) 49.为什么总是我啊? (D) 50.今天的节目看了吗?那些学生的表演好极了。 (B)

Checking the answer key, we can see that the right answer is actually B, E, C, A, D. This means that my hand-holding here actually lost the bot points. This is a good opportunity to talk about one of the limitations of ChatGPT, namely that it doesn’t actually understand anything and it’s hard to know when it outputs nonsense and when it actually knows what it’s doing.

We know that it’s not just guessing, because it gets most questions right (although not in this case), but if I allowed it to output anything and just registered that as its answer, it would in general lose an awful lot of points (I have tried). This is probably because the question format confuses it, not because it doesn’t have the data to answer the questions correctly. Thus, I decided to always force it to answer according to the instruction; no parley bets allowed.

So far, ChatGPT has scored 7/10, so much better than mere guessing.

ChatGPT vs. HSK 3 Reading: Part 2

Moving on to the next part, where you are to insert the right word into the blanks. In this case, I omitted the example, which is why there is no option E:

For the next question, you are supposed to use the words A, B, C, D or F to fill in the blanks marked with () in the sentences 51-55. Each word can only be used once. A 刻 B 双 C 音乐 D 其他 F 干净 51.除了数学,弟弟的( )成绩都不错。 52.我特别想买这( )鞋,问题是我的脚太大。 53.这条裤子昨天洗过了,是( )的。 54.现在是三点一( ),别担心,我们不会迟到。 55.姐姐从小就喜欢唱歌,现在她是一位( )老师。 51.D 其他 52.B 双 53.F 干净 54.A 刻 55.C 音乐

Checking the key, we can see that this is spot on! D, B, F, A, C. As we shall see, ChatGPT almost never gets this type of cloze deletion wrong, which is not surprising considering that it can check vast amounts of data for what words are most likely to appear together in sentences like these. It gets the next five right too:

For the next question, you are supposed to use the words A, B, C, E or F to fill in the blanks marked with () in the dialogues 56-60. Each word can only be used once. A 明白 B 角 C 被 D 爱好 E 办法 F 甜 56.A:这个问题你们准备什么时候解决? B:我们正在想( ),今天应该可以解决。 57.A:怎么样,听( )了吗? B:对不起,您读得太快,我没听懂。 58.A:盘子里的肉( )那只黄猫吃了。 B:那你该给它准备点儿水了。 59.A:菜有点儿( ),但很好吃。 B:我妈妈是上海人,她做菜喜欢放糖。 60.A:你好,香蕉怎么卖? B:这种三块五一斤,那种便宜两( )五分。 E A C F B

ChatGPT vs. HSK 3 Reading: Part 3

The final part of the reading section consists of short texts followed by reading comprehension questions. We can expect thees to be harder, because it’s not longer just a matter of seeing which option is statistically the most likely based on a huge dataset of written Chinese. Here’s the example given on the mock exam:


★ 会议最可能几点开始?

A 8 点
B 8 点半
C 9 点

Here, we might expect ChatGPT to do worse, because this is no longer just a matter of judging which resulting chunk of text is statistically the most likely. But no, it passes with flying colours:

For this question, please answer the question marked with ★ based on the text, using the most appropriate alternative. 61.中国人经常说:早饭要吃好,午饭要吃饱,晚饭要吃少。 ★ 根据这句话,可以知道: A 早饭要少吃 B 午饭要多吃 C 不要吃晚饭 62.公司来了一位新同事,长得跟我有点儿像,而且我们都姓王,很多人总是笑着问我: “新来的同事是不是你哥哥?” ★ 新来的同事: A 是我哥 B 长得像我 C 长得像我弟弟 63.北京的秋天比较短,但这是北京最好的季节。 秋天的北京不冷也不热,下雨的时候不多,这时候去北京旅游最好。 北京的春天也很漂亮,只是有时 候会刮大风。 ★ 北京的秋天: A 很舒服 B 非常热 C 经常刮风 64.张先生,您到饭店以后,先休息一下。 中午我们一起吃午饭,下午 4 点以前没有什么事,如果您愿意,我可以带您到附近看看。 5 点,我们经理和您见面,晚上经理请您吃晚饭。 ★ 张先生 5 点做什么? A 和经理见面 B 到外面去看看 C 和经理一起吃饭 B 午饭要多吃 B 长得像我 A 很舒服 A 和经理见面 For this question, please answer the question marked with ★ based on the text, using the most appropriate alternative. 65.小李,洗手间在一楼东边,如果有客人要去,你带他们去。还有,注意洗 手间的灯不要关。 ★ 根据这段话,可以知道: A 小李是客人 B 小李表示同意 C 别关洗手间的灯 66.一年级二班的考试已经结束了,我听大家说,虽然题很多,但不难,很简 单。 ★ 这次考试: A 很难 B 题很多 C 时间不长 67.我对外面的世界很有兴趣,我希望有机会到别的国家看一看,了解它们不同的文化。 ★ 我希望: A 学习普通话 B 有很多节日 C 去国外看看 68.自行车?没问题,你别客气。就在楼下,就是那辆蓝色的。 我这几天不骑,你不用急着还。 ★ 那辆自行车: A 坏了 B 是蓝色的 C 是别人的 69.两年不见,没想到他变化这么大,比以前瘦了很多,但和以前一样的是,他还是那么热情、快乐、关心别人。 ★ 他以前怎么样? A 又矮又瘦 B 容易生气 C 热情、快乐 70.下了飞机,坐出租车的时候,我把行李箱放在了车箱里。 下车时,我忘了拿行李箱,里面有很多重要的东西。 但是,我有出租车票,我相信能找到我的行李箱。 ★ 我的行李箱现在最可能在: A 机场 B 飞机上 C 出租车上 C 别关洗手间的灯 B 题很多 C 去国外看看 B 是蓝色的 C 热情、快乐 C 出租车上

B, B, A, A, C, B, C, B, C, C. All correct.

But wait, isn’t it just copying the answers?

When seeing ChatGPT in action the first time, many people think that it must be copying existing text from the internet. This is not the case, but it is a valid question, because the mock exams I”m using have been available online for years, so doesn’t ChatGPT know the answers based on that?

No. To begin with, the answers are only given as number-letter combinations at the end of the mock exam document, so the answers aren’t written out. Okay, but wait, there probably are people who have explained the correct answers to the mock exam questions online. Still no; it’s not copying anything.

You can easily check this yourself by changing the text in a way that changes which answer is the correct one. If you do, ChatGPT will still give you the correct answer based on the newly created and therefore unique text.

Obviously, I didn’t try modifying every single text in the exam, but I tried it enough to satisfy my curiosity and to be able to write this paragraph in good conscience. Also, if the correct answers were available, it’s hard to explain why it would get some questions wrong

I have also tried using ChatGPT to answer similar reading questions on exams I or my colleagues at university have created and are extremely unlikely to be part of the training data. ChatGPT performed on par with what it does here, so I have reason to suspect that the results are due to the answers being part of the training data.

ChatGPT got 27/30 on HSK 3

Here’s the scorecard for HSK 3:

ChatGPT got 27 out of 30 on HSK 3 reading.

ChatGPT vs. HSK 4 and HSK 5 reading

Like I said earlier, I’m not going to go go through the process of taking screenshots of my interactions with ChatGPT for all the questions on every level. It took many hours to just collect the data here and posting screenshots would have takes hours more, without adding any new insights. Instead, I will give you the results and discuss them. Remember, you can always download the mock exams to check which questions it got wrong if you’re curious! Here’s a link to my spreadsheet.

ChatGPT got 35 out of 40 on HSK 4 reading, and 38 out of 45 on HSK 5 reading.

In general, ChatGPT excels at matching questions and simple cloze deletion, but it sometimes struggles with more complex questions.For example, it didn’t do very well on part 1  of HSK 5 reading, where it’s supposed to pick sets of three characters that fit into three gaps in a text. Here’s an example:

周围的人都觉得很 46 。没想到老人把另一只鞋也从窗口扔了出去。他的行为让
周围的人感到很吃惊。这时候,老人笑着 47 说:“剩下的那只鞋无论多么好,
就有人可能 48 到一双鞋子,说不定他还可以穿呢。”

46.A 浪费 B 伤心 C 可惜 D 痛苦
47.A 解释 B 理解 C 建议 D 思考
48.A 捡 B 选 C 买 D 换

It’s hard to prompt the bot to do the right thing here. One way of doing it would have been to present three versions of the text with each set inserted into the gaps but this would have been very time-consuming to do for every question, so I did’t do that. Even so, it only got two of these wrong.

Another thing worth repeating is that ChatGPT doesn’t think and can’t reason logically. This means that some complex reading questions are hard to answer, something that becomes more relevant the more advanced the reading questions become.

As you all know, reading on a basic level is just about extracting information that is clearly presented in the text, but more advanced reading is about inferring an answer fro what’s said in the text, even if the answer isn’t explicitly presented in the text. This becomes very relevant on HSK 6, so let’s move on!

ChatGPT vs. HSK 6

As I mentioned, ChatGPT doesn’t actually think, even though it certainly feels like it does at times (for a related philosophical question, see Chinese Room on Wikipedia, or read Blindsight by Peter Watts). This limitation becomes very obvious on many HSK 6 questions, especially those where the test is constructed to trick you into thinking that something is the answer unless you really understand what it means.

Consequently, ChatGPT did not do as well on HSK 6, and it did particularly badly on parts 1 and 4. When you look at the data here, remember that most questions have ¼ chance of being correct by pure chance.

ChatGPT got 30 out of 50 on HSK 6 reading.

While this type of question isn’t common on the HSK, I tried to run several more complex reading comprehension questions through ChatGPT, and it did very badly in some cases (worse than guessing). One question we used on an exam featured an explanation of the Monty Hall problem, and then asked some basic questions about it,  including a “choose an appropriate title for this text”. ChatGPT got every single question wrong. It’s clear that question type matters quite a lot, but it’s beyond the scope of this article to probe this further.

So can ChatGPT pass the HSK?

Well, if we mean if it can give the correct answers to multiple-choice reading questions, the answer is a resounding “yes”, at least if the questions are straight up, honest reading questions not deliberately designed to trick you.

This is quite amazing! Going into this project, I expected it to do very well on matching and fill-in-the-blanks types of questions, which it did. I also expected it to struggle with trickier questions, which it also did. However, I e expected it to struggle a lot more than it did!

I was also blown away by the accuracy of most of the reading comprehension questions. Sure, it’s one thing to be able to fill in the blanks or match words, but actually, content-based reading questions also went surprisingly well!

I should have predicted that, though, because ChatGPT also does pretty well on most take-home exam style questions, as shown in multiple YouTube videos (search for your area of expertise and ChatGPT and you’ll probably find something).

It should be pointed out that it would be premature to say that ChatGPT could pass HSK, even ignoring the fact that it needs human guidance to format the questions properly, because the HSK is a test you either pass or fail the whole exam, not just the reading part. However, if we assume that the percentage of correct answers required for the whole exam were required for each part, ChatGPT would breeze through HSK 3-5. It would also pass HSK 6, although not with as safe a margin.

What does this mean?

First and foremost, it means that you shouldn’t ignore ChatGPT or its competitors. If you haven’t taken recent AI development seriously in spite of (or maybe because of) the hype in the news, it’s time that you do. If you haven’t tried ChatGPT yourself, you should.

It’s is already quite useful for many tasks, and AI will only get better with time. The Rubicon is crossed, the die is cast, and the machine rebellion is drawing nigh. Or something.

It also means that it’s become even easier to cheat on exams if you have access to the internet. To language teachers like me, this is not really a big thing, because it’s pretty much the same problem as we’ve had with Google Translate for more than a decade, just a bit wider in scope so it now affects all other subjects too.

Additionally, I’m mostly interested how we can leverage tools like ChatGPT and Google Translate to our advantage as language learners and language teachers. If you want to cheat on an exam, that’s your problem, and whatever you’re doing, it does’t count as language learning and it’s outside the scope of Hacking Chinese.

If you want good grades and learn the language at the same time, you can check this article:

How to get good grades when studying Chinese


How to use ChatGPT to learn and teach Chinese

There are many ways of using ChatGPT to learn or teach Chinese. I have spent a lot of time experimenting with it in the past few months, but rather than posting something quick and dirty, I want to collect good ideas and leave some time for reflection before I write something.

This means that while this article ends here, I will write more about how you can use ChatGPT to learn and teach Chinese in the near future. To be able to do that well, I need your help! If you have used ChatGPT for learning or teaching Chinese, please leave a comment below or send me an email:

  • What did you try?
  • What worked well?
  • What didn’t work so well?

To match the theme of AI, the images used in this article were created using Stable Diffusion with various prompts related to robots and reading. The “HSK” on the book cover on one of the images was added manually afterwards, as SD is very bad at text.

Further reading about HSK on Hacking Chinese

Using the HSK as a roadmap to learning Chinese


Tips and tricks for how to learn Chinese directly in your inbox

I've been learning and teaching Chinese for more than a decade. My goal is to help you find a way of learning that works for you. Sign up to my newsletter for a 7-day crash course in how to learn, as well as weekly ideas for how to improve your learning!


  1. F says:

    One could use ChatGPT to generate mnemonics/sentences for learning Hanzi. Where the generated mnemonic would contain the components and radicals of the Hanzi and one or several meanings. It would be easy to build a mental garden/castle this way so that all mnemonics relate to the same theme. E.g. a walk in a magic forest or any area of interest of the student. Together with a DALL-E generated image and anki this could be a powerful and customizable tool.

  2. Pierre says:

    Came across this Reddit post: you can ask Chatgpt to write graded reader. The few tests I did gave me articles similar to what TCB are proposing (at least for HSK4):

    1. Olle Linge says:

      Yeah, I saw that one too! But thanks for recommending it, I spend less and less time on Reddit these days. I’ve tried similar things, always with pretty bad results. This is on my list of things to evaluate more properly and write about, for sure!

  3. Julia says:

    I’ve been using ChatGPT to create grammar exercises (for example, I might use the grammar points in a textbook chapter that we’re studying). I’ll prompt it to write me a summary lesson and then drill me with sentences to translate, critique my answers, and suggest any improvements. In each lesson I’ll give it a list of 6 or so grammar points to cover at a time. I’m using GPT4 and the Prompt Perfect add in.

    The results have definitely helped me to practice – I especially like getting the immediate critique of my translations and the suggestions for improvement and I can ask for clarification. So it’s a lot more useful than just doing exercises from textbooks.

    I do find that if I give it too much content to cover it tends to omit stuff from the exercises so 6 topics at a time seems about right. I create a new chat for each new batch of topics and go back and practice the old ones from my history from time to time.

  4. Julia says:

    Oh – one other shortcoming is that I haven’t found a way to restrict the vocabulary it uses in the exercises (I tried specifying the HSK level but to no avail). But I don’t find that too much of an issue.

  5. Julia says:

    Recently I’ve been playing twenty questions with ChatGPT – you can restrict the mystery item to practice more specific vocab (eg “must be an inanimate object you would find inside a house”).

    I’ve also used it to generate mnemonics for hanzi (although it’s annoyingly puritanical).

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.