In the world of Large Language Models (LLMs), the context window acts like a spotlight on a conversation. It defines the amount of text the LLM can consider when generating a response.

What is context window?

The context window is essentially a limited history that the LLM uses to understand the current prompt or question.
Imagine you're reading a book but with a small window that only shows a few sentences at a time. As you move forward, previous sentences disappear, replaced by new ones. That's similar to how context windows function in LLMs.

Impact of Context Window Size

The size of the context window is crucial for an LLM's performance in several ways:
- Coherence and Relevance: Larger windows allow the LLM to consider more context, leading to more relevant and coherent responses.
- Memory and Processing: Bigger windows demand more processing power and memory.
- Task Performance: The optimal size depends on the specific task. For instance, summarizing a long document might require a larger window than completing a short sentence.

Understanding Context vs. Limits: Don't Get Confused by Tokens

Many mistake the number of tokens for a language model's ability to handle long text. Here's the key difference: Context tokens represent the model's memory, allowing it to remember past conversations for coherence (e.g., GPT-4 Turbo at 128K tokens). However, token limits restrict the total input and output length in one interaction (e.g., GPT-4 Turbo's limit is 4,096 tokens for response). So, a large context allows the model to follow complex discussions, but each interaction has a separate size limit.

You can read more details about this in this article.

OmniGPT's Context Window Sizes and Token Limits by each model

Model	Context Window Size	Token limits
GPT 3.5 Turbo	16K tokens	4K tokens
GPT 3.5 Turbo 16k	16K tokens	4K tokens
GPT 4	8K2 tokens	4K tokens
GPT 4 Turbo	128K tokens	4K tokens
Claude 2.0	100K tokens	4K tokens
Claude 2.1	200K tokens	4K tokens
Claude 3 Haiku	200K tokens	4K tokens
Claude 3 Sonnet	200K tokens	4K tokens
Claude 3 Opus	200K tokens	4K tokens
Perplexity 7B Online	4K tokens	4K tokens
Perplexity 70B Online	4K tokens	4K tokens
Llama 2 13B	4K tokens	4K tokens
Llama 2 70B	4K tokens	4K tokens
Llama 3 8B Instruct	16K tokens	16K tokens
Llama 3 70B Instruct	8.2K tokens	8.2K tokens
Gemini Pro 1.5	2.8M tokens	22.9K tokens
Mixtral 8x7B (instruct)	32.8K tokens	32.8K tokens
Mixtral 8x22B (base)	66K tokens	66K tokens
Mixtral 8x22B (instruct)	66K tokens	66K tokens

What can you do?

Even without knowing the exact size, you can still influence how LLMs leverage context:

Prompt Engineering: By carefully crafting prompts that provide relevant information and summarize key points, you can guide the LLM's attention within its context window.

We hope this explanation clarifies the concept of context windows in LLMs!