Beyond tīng bu dǒng, part 3: Using what you already know to aid listening comprehension in Chinese

At first glance, listening comprehension seems to be about extracting information from spoken language. That is indeed part of it, but speech never contains all the information necessary to make sense of it, so in order to understand, we need to rely on prior knowledge.

Obviously, to understand spoken Mandarin, you need familiarity with the sounds and tones, how they fit together to form syllables, which words these syllables are related to, what they mean and how they can be combined to form meaning.

Tune in to the Hacking Chinese Podcast to listen to the related episode:

Available on Apple Podcasts, Google Podcast, Overcast, Spotify, YouTube and many other platforms!

Knowledge about the language is only one type of knowledge that’s needed for successful listening, however. Listening is a very active process where we constantly seek to make sense of what’s being said in light of what has been said before, what we know about the situation and what we know about the world. Listening comprehension is about reconciling information gleaned from the spoken language with what we already know.

Using what you already know to aid listening comprehension in Chinese

Have you ever wondered why words pronounced the same way but with different meaning (homonyms) aren’t confused more often? In English, “there”, “their” and “they’re” share the same pronunciation, yet this hardly veer causes the slightest bit of problem. This is because spoken language isn’t only processed from the bottom and up, but also from the top and down. Which word meaning to activate is not just determined by the sounds you hear, but also what fits into our mental grammar. If you hear someone say “I want to go. […]”, you know it’s “there”, because neither “their nor “they’re” fits.

Another example from Mandarin is the fact that native speakers can (usually) still understand what you mean even if your tones are a bit off, at least if they have enough context to rely on. If you’re at the zoo and say that the xiōngmáo (胸毛) are very cute, people will understand that you mean the pandas, not someone’s chest hair. Thus, their prior knowledge and conceptual framework of what you might say makes “panda” a much more likely word, even though you should have said xióngmāo (熊猫). I wrote more about this here: The importance of tones is inversely proportional to the predictability of what you say.

In this article, we’re going to explore the top-down processes involved in listening comprehension. If bottom-up processing is like using bricks to build a house, synthesising comprehension from the smallest units, top-down processing is like using a blueprint when building, relying on what you already comprehend to make sense of what you hear.

This article is part of a series about listening comprehension in Mandarin. Here’s a list of the articles I’ve planned for this series so far. Article tittles without links have not been published yet!

A guide to Chinese listening comprehension
From sound to meaning in Mandarin
Using what you already know to aid listening comprehension in Chinese
Learning to process spoken Mandarin quickly and effortlessly
Becoming a better listener as a student of Chinese
Why is listening in Chinese so hard?
How to master different kinds of listening in Chinese
Building an arsenal of Chinese listening strategies for every situation
The best listening exercises to improve your Chinese

Beyond tīng bu dǒng: Using what you already know to aid listening comprehension in Chinese

In the previous article about bottom-up processing, we looked at three steps: perception, parsing and utilisation. However, none of these would actually work without top-down processing as well. Let’s review quickly what these steps are and how prior knowledge and top-down processing can help with each:

Perception is about identifying cues in the spoken language which are relevant for Mandarin. For example, we can use tone height and how it changes over time to identify tones, or other acoustic features to identify initials and finals. For this to work, we need to rely on prior knowledge about Mandarin in particular.
Parsing is about connecting the speech sounds identified in the perception step to the meaning of words stored in long-term memory. The meaning of identified words is activated and stored temporarily in our working memory. This clearly requires prior knowledge of words and expressions.
Utilisation is where understanding of the spoken language occurs. Words, expressions and structures identified in the parsing step are combined to a meaningful whole, which also involves matching what we hear with what we know, expect and think about the situation, enabling us to interpret and understand what’s being said on a higher level. For this to work properly, we need extensive knowledge not just about how the language is used, but also about the situation, human interaction and the world around us in general.

Since we’re talking about top-down processing, let’s start with utilisation this time. As was the case last time, this article is also largely based on a model by Vandergrift (2011), which is in turn based on one from Anderson (1995). Most of the information about the cognitive aspects of listening comprehension in this article comes from Vandregrift & Goh (2012) and Rost (2011), although I’ve tried to adapt as much as possible make the discussion relevant for learners of Mandarin.

Here”s the model we looked at last time, but don’t worry if you think it looks scary. By the end of this article, you’ll know what it means! This is my own rendering of the model in Vandergrift (2011).

Utilisation: How we make sense of and interpret what we hear

The reason some people feel perplexed when they hear that prior knowledge is essential for listening comprehension is probably that they take this knowledge for granted and don’t realise how big a role it plays.

To begin with, in order to be able to interpret not just the literal meaning, but also the intended message, we need to take context into account. This is why language teachers and reasonable native speakers often want you to give more context before they feel that they can answer what something means. This is not to annoy students, even though it can sometimes feel like that.

Some of that context is provided in the spoken language itself and this is often enough. If you’re listening to a longer explanation of something or following a conversation on TV, you’ll often have to rely on what was said earlier to understand what is being said now. This is not surprising, as it can be difficult to understand our native language if we’re thrown in to an unfamiliar conversation.

Context is not limited to what’s already been said

Context is not limited to the spoken language itself, however, but can also include things such as where you are, what’s happening around you, who the speaker is, what their facial expression is like, and so on.

A good example of this is watching a movie vs. listening to an audio drama. For most learners, the former is considerably easier than the latter, and that’s because in a movie, much of what we need in terms of context is shown on screen; we don’t need Chinese to access that information. In a radio play, however, there’s nothing to look at and any contextual information necessary to make sense of what’s being said has to be conveyed through spoken Chinese only.

The speaker has to guess what the listener already knows

Even though providing enough context is important, naturally spoken language is also efficient. This means that we tend to say only as much as necessary to make the other person understand, but not so little that what we say in unclear. The problem is that this varies between listeners, and that non-native speakers often don’t know things nativ speakers are assumed to know.

Efficiency is one of Paul Grice’s maxims, principles that we intuitively follow to facilitate conversation. It basically states that we try to be informative, but not more than necessary (see e.g. Wardhaugh, 2006). Consider the case of asking the way to the nearest metro station in Chinese. We could imagine a few different answers:

Just keep walking.
Keep walking, turn right at the next intersection, then when you reach Dà’ān sēnlín gōngyuán turn left, and you’ll see the station ahead of you.
Keep walking on the pavement instead of in the middle of the road, because there are cars there, which are wheeled vehicles that move quickly, so if you’re hit by one, you might be injured or even killed. When you get to the next intersection, which has three lanes in each direction, remember to use the zebra crossing, which is an area with white stripes painted on the road to indicate that you should cross there. Turn left and walk 283 metres until you reach Dà’ān sēnlín gōngyuán, which is a 259,293 square metre area with trees and a lake. Then turn left and you’ll see the station ahead of you.

The first answer contains too little information. Technically, it’s true that just walking can get you there, but since it doesn’t include information about where to turn, this is not very helpful. It violates the maxim of quantity by not including enough information.

The second answer is closer to what would be considered a helpful and appropriate answer, but note that it assumes that you know some things. If you’re new in town, you might not know what Dà’ān sēnlín gōngyuán (大安森林公園) is and might not be able to recognise it if you’re unable to decode the name in Chinese. If you don’t, the name might refer to anything, so while the instructions would be super clear for most natives, it might be impossible to understand for a foreign student.

The third answer is clearly too verbose, including lots of information that is either irrelevant or can be assumed that the listener already knows. Anybody asking this question will know what a zebra crossing is and be aware of the dangers of being hit by a car, and the extra information about the park might be interesting, but not relevant.

Listening comprehension can required prior knowledge you don’t have

The point is that the speaker’s idea of what the listener knows might be incorrect. It is, after all, hard to know what a foreigner knows and doesn’t know, and even harder to examine what prior information is required to make sense of what one says. If you’re listening to someone who isn’t speaking directly to you, such as if you watch a movie or listen to a podcast, the speaker doesn’t even know who you are, and so can’t adjust the amount of information to include even if they wanted to.

In some cases, we might have the knowledge necessary to make sense of something, but it’s hidden below a layer of superficial differences, such as an accent or different ways of naming the same thing. A good example of this is foreign names in Chinese, so if someone refers to the famous Saylaw, you don’t understand, whereas if they would have said Christiano Ronald, you’d understand. I explored this particular challenge in an article called Lost in transcription: Saylaw, Ice Island and Aristotle

Lost in transcription: Saylaw, Ice Island and Aristotle

Three types of prior knowledge

In summary, we have three kinds of prior knowledge that operates in the utilisation step:

Pragmatic knowledge, which is about how words are used in real-world contexts, including circumlocution, euphemism, figurative language, politeness and much more. In Chinese, this includes knowing that nǐ chī le ma? (你吃了吗) can be used as a greeting, that somebody saying “no” might actually just mean “yes”, or that “yes” might sometimes actually mean “no”, but they’re too polite to say so directly, or that zǒu le (走了) in some contexts can mean “die”, not just “leave”.
Discursive knowledge, which is about the structure of conversations in given contexts. For example, phone calls follow a certain pattern, as do conversations you have with the staff immediately after entering a restaurant. These also differ between cultures, and assuming the wrong structure will make it harder to understand what’s being said. This is why trying out y our Chinese in a completely new setting can be unexpectedly hard, even if you know the words and grammar needed.
World knowledge, which is information about the world we live in. A good example is knowing about Dà’ān sēnlín gōngyuán, as is knowing what a pop culture reference in a TV show is about, or being able to link Saylaw to Christiano Ronaldo (and knowing who he is) . While not strictly part of the listening process, this information is vital for comprehension, and once you reach a more advance level, most failures in comprehension comes from not having enough background knowledge.

How prior knowledge helps us identify sounds and words

We have now looked at some cases where prior knowledge is required to make sense of spoken language, but this is not the only way top-down processing is important. Our evolving understanding of a situation can also be directly used to guide and support both perception and parsing. One way of looking at it is to say that since we have an outline already, it becomes much easier to identify the right building blocks and figure out where to put them.

For example, if a word meaning that you have activated in the parsing step does not fit with your conceptual framework of the situation, you might surmise that something went wrong in the perception step. Maybe you thought someone was selling their house using, mài (卖), but when they start complaining about the high price, you can repair your perceptual error and realise that they are probably buying a house, mǎi (买). You misheard the tone and activated the wrong word, but your understanding of the situation helped you fix the problem. The example I mentioned at the beginning with xiōngmáo (胸毛)and xióngmāo (熊猫). at the zoo is a similar case. I discussed this more here: Tone errors in Mandarin that actually can cause misunderstandings.

Tone errors in Mandarin that actually can cause misunderstandings

Corrections like these can be a conscious process as I described above, a form a problem solving, but it’s usually subconscious and happens all the time. The brain tries to figure out what words might be the right ones and does so based on all the information available, eliminating candidates whose meaning seems irrelevant or that don’t fit the available data.

This is how we are able to tell the difference between words that are pronounced the same way without even noticing that we do so. When you hear “they’re selling their house over there”, you don’t have to stop to think which word is which, even though “they’re”, “their” and “there” are pronounced the same, but mean completely different things.

How to test your top-down processing with a voice call

In our native language, we’re very good at top-down processing, which means that we don’t need to rely as much on bottom-up processing. If you call a friend, and the audio quality is bad, you can often still have a conversation, even though you certainly can’t identify all the sounds and might even miss entire words.

You know enough about the topic of the conversation, what has already been said, who your friend is and so on, to be able to fill in the gaps. I recently had a conversation with my brother over the phone in Swedish, and even though he was on the metro and the connection was so bad I only heard every other word, I could still understand what he wanted to say. This would have been completely impossible without heavy use of top-down processing, which happened completely naturally and without me thinking about it at the time.

This is not the case when listening to low-quality audio in a foreign language, however, where even the slightest distortion can throw you off completely. I once sat a TOCFL exam in a classroom with so much echo that I hardly understood anything, even though the language was well within my capacity to understand. Announcements in busy train stations are the worst, though!

Linguistic knowledge about Mandarin

Apart from the higher-level forms of knowledge we have mostly focused on so far, we also need to rely on knowledge about the language itself to be able to process it. We already discussed this in the previous article, but to summarise briefly, we need phonological knowledge to be able to identify phonemes, tones, intonation, stress and how these vary in context.

We also need semantic knowledge, which is about the meaning of words. This can be tricky because Chinese has so few words in common with English. When learning closely related languages, we can rely on similar words we already know in other language, greatly facilitating and speeding up parsing. People who haven’t tried to learn a truly foreign language seldom appreciate how much of a difference this makes. Understanding spoken French is not the same thing as understanding spoken Mandarin if you’re a native speaker of English.

Finally, we also need syntactic knowledge, or knowledge about the grammar and structure of Mandarin. Considering that there are many similar-sounding words in Mandarin, it’s especially important to be able to discard word candidates because of their function in a sentence (as we saw with the “they’re”, “there” and “their” example above). If only one word fits grammatically in the sentence you hear, you can ignore any other similar-sounding words.

Summarising bottom-up and top-down processing

Before we wrap up, here’s Vandergrift’s model again for those of you who want a visual representation of how everything we’ve talked about in this and the previous article relates to each other:

Automated processing is the key to listening comprehension

It’s fascinating that the things I’ve discussed in this series so far happen fast enough to keep up with the pace of naturally spoken language. The actual process is more involved than I have described here, too, so it really is amazing that we understand anything at all!

The reason this works is mostly that processing is highly automated, meaning that we’ve been doing it so many times that it takes no conscious effort or attention. This doesn’t mean that attention is unimportant, though, and you can aid processing somewhat by directing your attention.

This will be covered in more detail later in this series, but just to mention one example, activating prior information can be done consciously and can speed up processing significantly. In other words, simply thinking about what people might say in a given situation will activate the parts in the brain where this knowledge is stored, and when you later hear someone talk about this, processing is sped up.

Similarly, by directing your attention to contextual clues, you can aid top-down processing, which will make it easier to form a prototype of what the speaker wants to convey, which in turn will support your ability to discern sounds and identify words. You will not always be correct and you’ll sometimes need to backtrack and revise your model, but top-down processing like this is crucial for good listening comprehension!

In the next article in this series, we will talk more about automated and controlled processing, where the former can deal with many things at once and happens subconsciously, and the latter can only deal with one thing at a time, but is controlled by the listener. Stay tuned!

References and further reading

Field, J. (2009). Listening in the Language Classroom. Cambridge University Press.

Rost, M. (2011). Teaching and researching: Listening (2nd ed.). Routledge.

Vandergrift, L. (2011). Second language listening: Presage, process, product, and pedagogy. In Handbook of research in second language teaching and learning (pp. 455-471). Routledge.

Vandergrift, L., & Goh, C. (2012). Teaching and learning second language listening: Metacognition in action. Routledge.

Wardhaugh, R.. 2006. An Introduction to Sociolinguistics. Blackwell.

Tips and tricks for how to learn Chinese directly in your inbox

I've been learning and teaching Chinese for more than a decade. My goal is to help you find a way of learning that works for you. Sign up to my newsletter for a 7-day crash course in how to learn, as well as weekly ideas for how to improve your learning!

Hacking Chinese

A better way of learning Mandarin