Skip to main content
All CollectionsLanguage Models
OmniGPT Context Window Limits
OmniGPT Context Window Limits
Bao Nguyen avatar
Written by Bao Nguyen
Updated over a week ago

In the world of Large Language Models (LLMs), the context window acts like a spotlight on a conversation. It defines the amount of text the LLM can consider when generating a response.

What is context window?

  • The context window is essentially a limited history that the LLM uses to understand the current prompt or question.

  • Imagine you're reading a book but with a small window that only shows a few sentences at a time. As you move forward, previous sentences disappear, replaced by new ones. That's similar to how context windows function in LLMs.

Impact of Context Window Size

  • The size of the context window is crucial for an LLM's performance in several ways:

    • Coherence and Relevance: Larger windows allow the LLM to consider more context, leading to more relevant and coherent responses.

    • Memory and Processing: Bigger windows demand more processing power and memory.

    • Task Performance: The optimal size depends on the specific task. For instance, summarizing a long document might require a larger window than completing a short sentence.

Understanding Context vs. Limits: Don't Get Confused by Tokens

Many mistake the number of tokens for a language model's ability to handle long text. Here's the key difference: Context tokens represent the model's memory, allowing it to remember past conversations for coherence (e.g., GPT-4 Turbo at 128K tokens). However, token limits restrict the total input and output length in one interaction (e.g., GPT-4 Turbo's limit is 4,096 tokens for response). So, a large context allows the model to follow complex discussions, but each interaction has a separate size limit.


You can read more details about this in this article.

OmniGPT's Context Window Sizes and Token Limits by each model

Model

Context Window Size

Token limits

GPT 3.5 Turbo

16K tokens

4K tokens

GPT 3.5 Turbo 16k

16K tokens

4K tokens

GPT 4

8K2 tokens

4K tokens

GPT 4 Turbo

128K tokens

4K tokens

Claude 2.0

100K tokens

4K tokens

Claude 2.1

200K tokens

4K tokens

Claude 3 Haiku

200K tokens

4K tokens

Claude 3 Sonnet

200K tokens

4K tokens

Claude 3 Opus

200K tokens

4K tokens

Perplexity 7B Online

4K tokens

4K tokens

Perplexity 70B Online

4K tokens

4K tokens

Llama 2 13B

4K tokens

4K tokens

Llama 2 70B

4K tokens

4K tokens

Llama 3 8B Instruct

16K tokens

16K tokens

Llama 3 70B Instruct

8.2K tokens

8.2K tokens

Gemini Pro 1.5

2.8M tokens

22.9K tokens

Mixtral 8x7B (instruct)

32.8K tokens

32.8K tokens

Mixtral 8x22B (base)

66K tokens

66K tokens

Mixtral 8x22B (instruct)

66K tokens

66K tokens

What can you do?

Even without knowing the exact size, you can still influence how LLMs leverage context:

  • Prompt Engineering: By carefully crafting prompts that provide relevant information and summarize key points, you can guide the LLM's attention within its context window.

We hope this explanation clarifies the concept of context windows in LLMs!

Did this answer your question?