TL;DR
| Model | Context Window | Max Output Tokens | Training Data |
|---|---|---|---|
| GPT-5.5 GPT-5.5-Pro | 1,050,000 | 128,000 | Dec 01, 2025 |
| GPT-5.4-Mini GPT-5.4-Nano | 400,000 | 128,000 | Aug 31, 2025 |
| GPT-5 | 400,000 | 128,000 | Sep 30, 2024 |
| GPT-5-Mini GPT-5-Nano | 400,000 | 128,000 | May 31, 2024 |
| GPT-4.1 | 1,047,576 | 32,768 | Jun 01, 2024 |
| GPT-5.3-Codex | 400,000 | 128,000 | Aug 31, 2025 |
ChatGPT token limits depend on where you use the model. The OpenAI API has model-level context windows and max output limits. ChatGPT has plan-level limits, message caps, tool limits, and file limits.
For API users, the most important numbers are the model context window and max output tokens. For ChatGPT users, the most important limits are the plan context window, the number of messages available in a time window, and whether the selected mode uses extra reasoning tokens.
This guide separates those limits so you can quickly see what applies to ChatGPT Free, Plus, Pro, and the OpenAI API.
At a Glance
As of May 2026, GPT-5.5 and GPT-5.4-class API models have context windows up to 1,050,000 tokens and max output limits up to 128,000 tokens. GPT-4.1 models have a 1,047,576-token context window and a 32,768-token max output limit.
ChatGPT plan limits are different from API limits. In ChatGPT, GPT-5.5 Instant uses smaller context windows by plan, while GPT-5.5 Thinking can use larger context windows on paid tiers. ChatGPT message limits also depend on plan, model, mode, rollout status, and usage guardrails.
| User goal | Most useful limit |
|---|---|
| Use ChatGPT Free, Plus, or Pro | ChatGPT plan context and message limits |
| Build with the OpenAI API | API model context window and max output tokens |
| Upload long files | ChatGPT file limits and available context |
| Generate long answers | Max output tokens |
| Fix “too many requests” errors | Rate limits, not token limits |
Quick Comparison
| Limit type | What it controls |
|---|---|
| Context window | Total input, conversation history, instructions, tool context, reasoning tokens, and output budget |
| Max output tokens | Longest possible answer from one API response |
| Message limit | How many ChatGPT messages a plan can send in a time window |
| Rate limit | API throughput across requests, usually requests per minute, tokens per minute, or daily quota |
| File limit | Upload size, number of files, and extracted text inside ChatGPT tools |
Do not mix these limits together. A model can have a large API context window and still have a smaller max output limit. A ChatGPT Plus account can have higher usage than Free, but it does not automatically match every API limit.
ChatGPT Free vs Plus vs Pro Token Limits
ChatGPT limits depend on the plan, the selected model, and the selected mode. The same model family can have different practical limits in ChatGPT and in the OpenAI API.
| ChatGPT mode | Free | Plus / Business | Pro / Enterprise |
|---|---|---|---|
| GPT-5.5 Instant context window | 16K | 32K | 128K |
| GPT-5.5 Thinking context window | Limited availability | 256K paid-tier limit | 400K Pro-tier limit |
| GPT-5.5 Thinking max output | Plan-dependent availability | Up to 128K output within the paid-tier budget | Up to 128K output within the Pro budget |
| GPT-5.5 message allowance | Up to 10 messages every 5 hours | Up to 160 messages every 3 hours | Unlimited access subject to abuse guardrails |
ChatGPT usage limits can change by plan, region, rollout status, and abuse guardrails. Check the model picker and usage notices inside your ChatGPT account before relying on a limit for important work.
OpenAI API Token Limits by Model
OpenAI API token limits apply per request. The context window is the total budget for input and output. The max output token limit is only the response budget.
| Model family | Context window | Max output tokens | Best use |
|---|---|---|---|
| GPT-5.5 / GPT-5.5 Pro | 1,050,000 | 128,000 | Largest current API context and high-end reasoning or generation |
| GPT-5.4 / GPT-5.4 Pro | 1,050,000 | 128,000 | Large-context production apps |
| GPT-5.4 Mini / GPT-5.4 Nano | 400,000 | 128,000 | Lower-cost long-context tasks |
| GPT-5.3 Codex | 400,000 | 128,000 | Coding-agent workflows and large repository context |
| GPT-5.2 / GPT-5.2 Pro | 400,000 | 128,000 | Current GPT-5-class workloads |
| GPT-5 / GPT-5.1 | 400,000 | 128,000 | General GPT-5-class API use |
| GPT-4.1 / GPT-4.1 Mini / GPT-4.1 Nano | 1,047,576 | 32,768 | Large input with shorter output |
| o3 / o3-pro / o4-mini class models | 200,000 | 100,000 | Reasoning-heavy tasks |
| o1 | 200,000 | 100,000 | Older reasoning workloads |
| o1-mini | 128,000 | 65,536 | Smaller reasoning workloads |
| GPT-4o class legacy models | 128,000 | Usually 4,096 to 16,384 | Legacy multimodal and chat workloads |
Do not use the largest number in the table as the answer for every case. A 1,050,000-token context window does not mean the model can return a 1,050,000-token answer. The response is still capped by the model’s max output tokens and by the output budget you set in the API request.
Context Window vs Max Output Tokens
The context window is the total amount of text the model can consider in one request. It includes system instructions, developer instructions, user input, conversation history, retrieved context, tool results, hidden reasoning tokens, and the final answer.
Max output tokens control how long the model’s answer can be. This is only one part of the full context window.
For example, if a model has a 400,000-token context window and a 128,000-token max output limit, you cannot use all 400,000 tokens for the prompt and still expect a 128,000-token answer. You must leave space for the response and any hidden reasoning tokens.
In practical terms:
- Use the context window to estimate how much input you can send.
- Use max output tokens to estimate how long the answer can be.
- Leave extra room for reasoning models.
- Reduce file size or retrieved context if the model stops early.
Token Limits vs Rate Limits
Token limits and rate limits solve different problems.
Token limits control how much content can fit in one request or one ChatGPT working context. If your prompt, files, and expected answer exceed the available window, the model must truncate, refuse, or stop early.
Rate limits control how much you can use the API over time. They can include requests per minute, tokens per minute, requests per day, or other usage caps. A higher API usage tier can raise rate limits, but it does not change a model’s per-request context window.
For example, your account could have a high tokens-per-minute allowance and still fail if a single request exceeds the selected model’s context window.
Learn more: OpenAI API rate limits
What Counts as a Token
Tokens are the units language models process. A token can be a short word, part of a long word, a number, punctuation, or whitespace. English text often averages around four characters per token, but the real number depends on the text.
Short English words often count as one token. Long words, code, numbers, URLs, and non-English text can use tokens differently. Code and JSON can become token-heavy because punctuation, indentation, keys, and repeated strings all add to the count.
This matters because a “short” file can still use many tokens if it contains dense code, logs, tables, or structured data.
Why ChatGPT May Stop Early
ChatGPT can stop before an answer is complete for several reasons:
- The answer reached the available output budget.
- The conversation used too much of the available context.
- Uploaded files consumed too much working space.
- A reasoning mode used hidden reasoning tokens.
- A tool call or browser action hit a product limit.
- The current plan or model mode has a usage cap.
If ChatGPT stops early, ask it to continue from the last heading or reduce the task size. For long documents, split the file into sections and ask for one output at a time.
Reasoning Tokens and Hidden Token Use
Reasoning models can use hidden reasoning tokens before they produce the visible answer. These tokens count against the request budget even though the user does not see them.
This matters for coding, math, planning, research synthesis, and multi-step analysis tasks. A short visible answer can still consume a large internal token budget if the model spends many tokens reasoning through the problem.
If a reasoning model stops earlier than expected, reduce the input size, lower the requested output length, or reserve more of the context window for reasoning and completion.
Which Limit Matters for Your Use Case
| Use case | Limit to check first | Why it matters |
|---|---|---|
| Long ChatGPT conversation | ChatGPT context window | Older messages may fall out of working context |
| Long answer generation | Max output tokens | The model may stop before the full answer is complete |
| PDF or document analysis | File limits and context window | Extracted text may exceed the available working space |
| API app development | Model context and output limits | Each request must fit inside the selected model’s limits |
| Coding with large repositories | Context window and retrieval strategy | The model needs enough relevant files, but not the whole repo |
| High-volume API app | Rate limits | Throughput depends on RPM, TPM, and account tier |
Choose by limit type, not only by model name. Use a large-context API model when you need to process long documents, large code files, or many examples in one request. Use a high max-output model when you need long structured responses. Use ChatGPT Plus or Pro when you need higher ChatGPT usage, larger product context windows, and access to advanced modes inside the ChatGPT interface.
How to Avoid Hitting Token Limits
The best way to avoid token-limit problems is to control input size before you send the request.
Use shorter prompts. Remove repeated instructions, long examples, unused formatting rules, and unrelated context.
Split long documents into sections. Process each section separately, then combine the summaries or findings in a final pass.
Use retrieval instead of pasting everything. Send only the most relevant passages, files, or rows for the current task.
Summarize older conversation history. Keep the decision history, constraints, and current goal. Drop old turns that no longer affect the task.
Reserve output space. If you need a long answer, avoid filling the entire context window with input.
Use the right model. A small-context model may be cheaper for simple tasks, but it can fail on long documents. A larger-context model is better when the task depends on many details.
Legacy and Related Model Limits
Older models still appear in existing apps, tutorials, and legacy integrations. Keep these limits in mind when maintaining older projects.
| Model family | Context window | Max output tokens |
|---|---|---|
| GPT-4 Turbo | 128,000 | 4,096 |
| GPT-4o | 128,000 | 4,096 to 16,384 depending on model version |
| GPT-4o mini | 128,000 | 16,384 |
| GPT-4 | 8,192 | 8,192 |
| GPT-3.5 Turbo | 4,096 to 16,385 | 4,096 |
| davinci-002 / babbage-002 | 16,384 | Model-dependent |
Embedding models and moderation models use different limits and output formats. Embedding models return vectors, not normal text answers. Moderation models classify content, so their limits should not be compared directly with chat or reasoning models.
| Model | Description | Output Dimension |
|---|---|---|
| text-embedding-3-large | Most capable embedding model for both english and non-english tasks. | 3,072 |
| text-embedding-3-small | Increased performance over 2nd generation ada embedding model | 1,536 |
| text-embedding-ada-002 | Most capable 2nd generation embedding model, replacing 16 first generation models. | 1,536 |
Real-world Examples
Long PDF
If you upload a long PDF into ChatGPT, the file size is not the only limit that matters. ChatGPT must extract text from the file and fit the relevant content into the available working context. A 200-page document may need chunking even if the upload itself succeeds.
Best approach: ask for a section-by-section summary, then ask for a final synthesis after the main sections are processed.
Long API Response
If your API app needs a 30,000-word report, check the model’s max output tokens before you design the workflow. A large context window helps the model read more input, but the output cap still controls the response length.
Best approach: generate the report in sections, then run a final editing pass for consistency.
Coding Agent
Large repositories can exceed the context window quickly. Sending every file usually wastes tokens and reduces answer quality.
Best approach: send the task, relevant files, error output, dependency files, and a concise project map. Add more files only when the model needs them.
For more related tools, see Best AI Coding Agents.
FAQs
Q: What is the ChatGPT token limit?
A: ChatGPT token limits depend on your plan and selected mode. GPT-5.5 Instant uses 16K context on Free, 32K on Plus and Business, and 128K on Pro and Enterprise. GPT-5.5 Thinking can use larger paid-tier context windows.
Q: What is the OpenAI API token limit?
A: The OpenAI API token limit depends on the model. GPT-5.5 and GPT-5.4-class API models support context windows up to 1,050,000 tokens and max output up to 128,000 tokens. GPT-4.1 supports a 1,047,576-token context window and a 32,768-token max output limit.
Q: Is max output tokens the same as context window?
A: No. The context window is the total token budget for input and output. Max output tokens only controls how long the model’s answer can be.
Q: Why does ChatGPT stop before the answer is complete?
A: ChatGPT can stop early when it reaches an output limit, a tool limit, a plan limit, or a safety limit. Long prompts, long files, and reasoning-heavy tasks can reduce the space left for the visible answer.
Q: Can ChatGPT Plus handle more tokens than Free?
A: Yes. Plus has higher usage limits and larger context access than Free for supported models. It still does not match every OpenAI API context window.
Q: Can I increase the OpenAI API context window by paying more?
A: No. The context window is set by the model. A higher usage tier can raise rate limits, but it does not change the model’s per-request context window.
Q: Do reasoning tokens count toward my limit?
A: Yes. Reasoning tokens count against the request budget even though they are not shown in the final answer.
Q: What is the difference between token limits and rate limits?
A: Token limits control how much content fits in one request or conversation window. Rate limits control how many requests or tokens you can process over time.
Q: Which OpenAI model has the largest context window?
A: Among the current API models covered here, GPT-5.5 and GPT-5.4-class models have context windows up to 1,050,000 tokens.
Q: Does a 1M-token context window mean a 1M-token answer?
A: No. The context window includes input and output together. The model’s max output token limit still controls the longest response it can return in one request.
Final Thoughts
Use ChatGPT plan limits when you work inside the ChatGPT app. Use OpenAI API model limits when you build software with the API.
The most important distinction is context window vs max output tokens. Context controls how much the model can consider. Max output controls how long the answer can be. Rate limits control usage over time.
Once you separate those three limits, most ChatGPT and OpenAI token-limit problems become easier to diagnose.
References and Resources
Changelog:
05/11/2026
- Revised.
04/24/2026
- Updated for GPT-5.5
03/18/2026
- Cleanup
03/17/2026
- Updated for GPT-5.4-Mini & GPT-5.4-Nano
03/05/2026
- Updated for GPT-5.4
12/11/2025
- Updated for GPT-5.2
10/18/2025
- Updated token limits
08/07/2025
- Updated for GPT-5
04/16/2025
- Added o4-mini and o3
04/15/2025
- Added GPT-4.1 family
12/18/2024
- o1-preview => o1
10/04/2024
- Fixed ‘Max Output Tokens’ in ‘Token Limit in GPT-4’
- Added gpt-4o-realtime-preview
08/07/2024
- Clean up
- Update for gpt-4o-2024-08-06
05/13/2024
- Updated for GPT-4o











In this article, the “Token Limit in GPT-4” heading, and the “Max tokens” in the table are incorrect. The numbers do not indicate the Max tokens. They are Context Window.
Fixed. I forget to add the ‘Max Output Tokens’ column. Thanks for your feedback.