Why inner translation slows down your German — and what works instead
Inner translation is the main reason you do not reach fluency. We show what the direct method is and why learning through images works faster.
· 8 min read
Almost every adult who starts learning German from zero or from A2 runs into the same dead-end. In class you understand the grammar; in an app you collect 2,000 words; and yet, as soon as you have to say something live, the brain freezes — you look up the word in memory in your native language, translate it in your head, and only then speak. The result: slowed speech, the sense that you “don’t know anything”, and slow disillusionment.
That pause is not just your personal problem. It is the direct consequence of how you learnt. If every German word arrived as a pair with a translation, the brain stored exactly that route: hear the German word → recall the native word → understand the meaning. The same in reverse: want to say a particular word → think of the native one → look up the translation → speak. Two transitions per word.
With 200 words you don’t notice. With 2,000 it slows you down. From 5,000 onwards it is the main brake on fluency.
The direct method: an idea a hundred and thirty years old
In 1878 Maximilian Berlitz opened a language school in Providence (Rhode Island) on an unusual pattern. He fell ill, and in his place a young Frenchman, Nicolas Joly, arrived to teach — speaking no English at all. Berlitz asked him to point at objects, name them in French and to do without translation. Within a few weeks his students were speaking. A few years later the school had 200 branches worldwide.
This is the direct method (Berlitz Direct Method): learning without the native language as a mediator. The word is bound to an image, an object, a scene — bypassing translation. The brain learns to think in German from day one, instead of translating in its head.
Later the same principle underpinned Rosetta Stone, and modern cognitive scientists call it embodied cognition: concepts are stored more firmly and retrieved faster when they are encoded through sensory experience, not through a text label in the native language.
What happens in the brain when you translate
You have two competing pathways:
- Label–label. German word → native-language label → concept. Two transitions, two sources of delay.
- Word–image. German word → concept directly. One transition.
The first pathway does not vanish on its own. If you already started with translation pairs, the brain has stored that connection as the main route. To reach fluency you must either spend a long, painful time relearning a new habit — or learn a different way from the start.
There is another effect: visual and auditory information is stored in different brain regions from verbal information. If you learn a word through image and sound, several redundant retrieval pathways are built. If one fails — another takes over. This is dual coding: the more sensory channels participate in encoding, the more reliable the recall.

Why vocabulary apps fail at intermediate level
Duolingo, Babbel, Memrise and Anki are, in their standard configuration, built on translations. It is a rational choice: images are expensive, translations are cheap, and for the first 500 words the method works.
The trouble starts after that. Try to learn the word Verantwortung through translation. The literal equivalent looks clear, but in German this word carries a heavy legal undertone — it is not interchangeable with Haftung or Pflicht. The translation swallows those differences. You remember the native-language word and believe you have learnt the German one.
Or take Gemütlichkeit. There is no single equivalent. If you learn through translations, you end up with a mix of several semantically close words but no single correct usage scenario. If you learn through an image (a warm kitchen, lamplight, a table with coffee, a conversation without hurry), you understand the word the way Germans understand it.
What “learning through images” really means
It does not mean just sticking a picture next to a translation. It is a different card architecture:
- The German word — large.
- The image — an illustration that carries the concept without words.
- The pronunciation — so the word enters the auditory memory and you hear a native speaker’s stress and phonetics.
- An example in context — a short sentence in which the word lives in real language.
- The article (for nouns) — built in visually, not stored as a separate fact to memorise.
And, above all, no translation on the card. When you see a word for the first time, approach it through image, example and sound. If after three seconds you still do not get it — fine, then reveal a hint. But the primary retrieval route must work without the detour through the native language.
Abstract words and the limits of the method
You may rightly ask: “Apfel is easy to draw, but Freiheit?” That is exactly the point where poorly made picture apps fall apart.
The solution: abstracts are encoded through metaphor and scene, not through trying to “draw freedom”. Freiheit — an open door, a horizon, a person on a hilltop with arms spread wide. Verantwortung — a hand holding a heavy weight. Entscheidung — a fork in the road. The brain encodes abstracts very well through concrete images — provided the image is methodically chosen, not random.
Finding a fitting image for 10,000 words, all coming from the same series, is an expensive task. That is exactly what we do.

What this looks like in practice
For learners who switch from the translation approach to the visual one, three things tend to change:
- The pause before speaking disappears. The word arrives in consciousness without the detour through the native language — the answer is faster.
- A sense of context emerges on its own. After 500–1,000 visually learnt words you intuitively feel which of the synonyms fits here — like a native speaker, not a translator.
- Words stop “getting forgotten”. Our brain remembers images better than text — a phenomenon in its own right, confirmed by hundreds of studies. It is called the picture superiority effect.
That does not mean translations are useless. For the first hundred basic words they are the best tool — don’t overcomplicate. But from A2 onwards the visual approach gives a noticeable edge.
What to do next
If you are hitting the “intermediate ceiling” — try to consciously change the way you learn new words. Pick a word, find a vivid image or a short video for it, look at it for a few seconds, say it out loud, listen to the pronunciation — and do not pronounce the native-language equivalent even once. Strange at first. After a week you feel the difference in how the word emerges when you speak.
Artikle does the same thing — but at scale: 10,000+ cards in a single style, with correct pronunciation from German native speakers, and a repetition algorithm that brings a word back exactly when you start to forget it. We are launching the app shortly — join the waitlist to get early access.