Module 1: Understanding Large Language Models

Large Language Models (LLMs) are sophisticated AI systems, trained on vast amounts of text data, that can understand, generate, and manipulate human language. These powerful tools form the foundation of modern AI applications like chatbots, content generators, and virtual assistants.

Hands-On Lab:
Try the LLM Foundations Lab in Jupyter!
Launch the companion lab notebook to experiment with context windows, tokenization, embeddings, and more using real examples.

What You'll Learn

In this module, you'll master the following key areas:

  • Context Window: How LLMs process and limit information
  • Tokenization: How text is broken down for model processing
  • Embeddings: How LLMs represent meaning and relationships
  • Logits & Temperature: How LLMs make predictions and control creativity
  • Response Format: How to structure and interpret model outputs
  • Model Evolution: Advances in LLM architectures and capabilities

1. Context Window: The Model's Working Memory

Concept: The context window is the model's "working memory"—the total number of tokens (chunks of text) it can consider at once. This includes both your input and the model's output.
Modern AI models, known as transformers (introduced by Google), use an attention mechanism to focus on all tokens in this window simultaneously—but nothing outside it.

Everyday Example: Imagine a whiteboard with limited space. You write your question (input tokens) and leave room for the model's answer (output tokens). If your question fills the board, there's less space for the answer. If you run out of space, the model stops writing—even mid-sentence.

Your Prompt
(e.g., 3,000 tokens)
Model's Response
(up to 5,000 tokens)

Context Window: 8,000 tokens total (input + output)

Why it matters:

  • Hard limit: input tokens + output tokens ≤ max context window
  • If your input is large, you have less room for the model's answer.
  • If you hit the limit, the model will stop—sometimes in the middle of a sentence.
  • This applies to all transformer models: OpenAI's GPT-4o/o1, Anthropic's Claude 3.7, and Amazon's Nova Premier.
  • Cost control: Although foundational models have a maximum context window, most APIs let you set a smaller limit (using parameters like max_tokens) if you want to control costs or keep responses shorter.

Note on Reasoning Models & Token Budgets: Newer models (like OpenAI's o1, Anthropic's Claude 3.7, and Amazon's Nova Premier) support very large context windows—sometimes up to 1 million tokens! These models can "think" for longer and do multi-step reasoning, but every step and intermediate thought also uses up tokens. Many APIs let you control this with a budget_tokens or reasoning budget parameter, so you can balance depth of reasoning with cost and performance.
💡Tip: Keep your prompts concise and leave enough space for the model's answer—especially for complex tasks that need extended reasoning.

Resources