What’s a Language Model? (In 3 Minutes)

You’re hearing “language model” everywhere. GPT-4, Claude, Llama, Gemini — they’re all language models. But what exactly is that?

In 3 minutes, you’ll understand. Promise.

A supercharged word predictor

A language model is a computer program that does one thing: predict the next word.

You give it “The Montreal Canadiens won the game” and it predicts the next word could be “last,” “against,” “thanks.” It picks the most likely one and continues. Word by word, it builds sentences, paragraphs, entire pages.

It’s the autocomplete on your phone, but on steroids. Your phone predicts one word. A language model predicts entire texts.

How it knows what to predict

It was trained on a staggering amount of text. Books, articles, forums, websites, documents. Imagine your neighborhood library, multiplied by a million.

By reading all of that, the model learned the unwritten rules of language. Not just grammar — style, tone, associations of ideas, facts, opinions. It’s seen so many texts that it can generate new ones that look like something a human would write.

The word “model” is key. It’s a model of human language. A statistical representation of how humans use words. Not a copy of the brain. Not understanding. A model.

Why it’s impressive

What’s wild is that just by predicting the next word, the model develops abilities nobody taught it. It can summarize, translate, explain, code, reason (a bit), crack jokes, write poetry.

Nobody told it how to write a summary. But it’s read so many summaries that it knows what one looks like. And it can generate a new one that fits your data.

It’s like a musician who’s listened to so much music they can improvise in any style. They didn’t consciously learn the “rules” of jazz — they absorbed them through exposure.

The limits you need to know

A language model doesn’t know what’s true. It knows what’s probable. That’s a massive difference. If the most likely text after “The capital of Australia is” is “Sydney,” it’ll say Sydney. Even though the right answer is Canberra.

It doesn’t have real-time Internet access (unless specifically connected). Its knowledge stops at its training date. It doesn’t know what happened yesterday.

And it makes things up. With confidence. Because its job is to produce probable text, not true text. That’s why you should always verify important facts.

There, you know the essentials

A language model predicts words. It learned by reading billions of texts. It’s impressive but not 100% reliable. And understanding that is the foundation for using all these tools well — ChatGPT, Claude, Gemini, or Sherpa, our free AI guide.

For a deeper technical understanding, Laeka Research publishes accessible research on how these models work and how to improve them.