ChatGPT Token Limit: Free, Plus, Pro, and OpenAI API Limits (2026)

TL;DR

Model	Context Window	Max Output Tokens	Training Data
GPT-5.5 GPT-5.5-Pro	1,050,000	128,000	Dec 01, 2025
GPT-5.4-Mini GPT-5.4-Nano	400,000	128,000	Aug 31, 2025
GPT-5	400,000	128,000	Sep 30, 2024
GPT-5-Mini GPT-5-Nano	400,000	128,000	May 31, 2024
GPT-4.1	1,047,576	32,768	Jun 01, 2024
GPT-5.3-Codex	400,000	128,000	Aug 31, 2025

Last updated: May 11, 2026

ChatGPT token limits depend on where you use the model. The OpenAI API has model-level context windows and max output limits. ChatGPT has plan-level limits, message caps, tool limits, and file limits.

For API users, the most important numbers are the model context window and max output tokens. For ChatGPT users, the most important limits are the plan context window, the number of messages available in a time window, and whether the selected mode uses extra reasoning tokens.

This guide separates those limits so you can quickly see what applies to ChatGPT Free, Plus, Pro, and the OpenAI API.

At a Glance

As of May 2026, GPT-5.5 and GPT-5.4-class API models have context windows up to 1,050,000 tokens and max output limits up to 128,000 tokens. GPT-4.1 models have a 1,047,576-token context window and a 32,768-token max output limit.

ChatGPT plan limits are different from API limits. In ChatGPT, GPT-5.5 Instant uses smaller context windows by plan, while GPT-5.5 Thinking can use larger context windows on paid tiers. ChatGPT message limits also depend on plan, model, mode, rollout status, and usage guardrails.

User goal	Most useful limit
Use ChatGPT Free, Plus, or Pro	ChatGPT plan context and message limits
Build with the OpenAI API	API model context window and max output tokens
Upload long files	ChatGPT file limits and available context
Generate long answers	Max output tokens
Fix “too many requests” errors	Rate limits, not token limits

Quick Comparison

Limit type	What it controls
Context window	Total input, conversation history, instructions, tool context, reasoning tokens, and output budget
Max output tokens	Longest possible answer from one API response
Message limit	How many ChatGPT messages a plan can send in a time window
Rate limit	API throughput across requests, usually requests per minute, tokens per minute, or daily quota
File limit	Upload size, number of files, and extracted text inside ChatGPT tools

Do not mix these limits together. A model can have a large API context window and still have a smaller max output limit. A ChatGPT Plus account can have higher usage than Free, but it does not automatically match every API limit.

ChatGPT Free vs Plus vs Pro Token Limits

ChatGPT limits depend on the plan, the selected model, and the selected mode. The same model family can have different practical limits in ChatGPT and in the OpenAI API.

ChatGPT mode	Free	Plus / Business	Pro / Enterprise
GPT-5.5 Instant context window	16K	32K	128K
GPT-5.5 Thinking context window	Limited availability	256K paid-tier limit	400K Pro-tier limit
GPT-5.5 Thinking max output	Plan-dependent availability	Up to 128K output within the paid-tier budget	Up to 128K output within the Pro budget
GPT-5.5 message allowance	Up to 10 messages every 5 hours	Up to 160 messages every 3 hours	Unlimited access subject to abuse guardrails

ChatGPT usage limits can change by plan, region, rollout status, and abuse guardrails. Check the model picker and usage notices inside your ChatGPT account before relying on a limit for important work.

OpenAI API Token Limits by Model

OpenAI API token limits apply per request. The context window is the total budget for input and output. The max output token limit is only the response budget.

Model family	Context window	Max output tokens	Best use
GPT-5.5 / GPT-5.5 Pro	1,050,000	128,000	Largest current API context and high-end reasoning or generation
GPT-5.4 / GPT-5.4 Pro	1,050,000	128,000	Large-context production apps
GPT-5.4 Mini / GPT-5.4 Nano	400,000	128,000	Lower-cost long-context tasks
GPT-5.3 Codex	400,000	128,000	Coding-agent workflows and large repository context
GPT-5.2 / GPT-5.2 Pro	400,000	128,000	Current GPT-5-class workloads
GPT-5 / GPT-5.1	400,000	128,000	General GPT-5-class API use
GPT-4.1 / GPT-4.1 Mini / GPT-4.1 Nano	1,047,576	32,768	Large input with shorter output
o3 / o3-pro / o4-mini class models	200,000	100,000	Reasoning-heavy tasks
o1	200,000	100,000	Older reasoning workloads
o1-mini	128,000	65,536	Smaller reasoning workloads
GPT-4o class legacy models	128,000	Usually 4,096 to 16,384	Legacy multimodal and chat workloads

Do not use the largest number in the table as the answer for every case. A 1,050,000-token context window does not mean the model can return a 1,050,000-token answer. The response is still capped by the model’s max output tokens and by the output budget you set in the API request.

Context Window vs Max Output Tokens

The context window is the total amount of text the model can consider in one request. It includes system instructions, developer instructions, user input, conversation history, retrieved context, tool results, hidden reasoning tokens, and the final answer.

Max output tokens control how long the model’s answer can be. This is only one part of the full context window.

For example, if a model has a 400,000-token context window and a 128,000-token max output limit, you cannot use all 400,000 tokens for the prompt and still expect a 128,000-token answer. You must leave space for the response and any hidden reasoning tokens.

In practical terms:

Use the context window to estimate how much input you can send.
Use max output tokens to estimate how long the answer can be.
Leave extra room for reasoning models.
Reduce file size or retrieved context if the model stops early.

Token Limits vs Rate Limits

Token limits and rate limits solve different problems.

Token limits control how much content can fit in one request or one ChatGPT working context. If your prompt, files, and expected answer exceed the available window, the model must truncate, refuse, or stop early.

Rate limits control how much you can use the API over time. They can include requests per minute, tokens per minute, requests per day, or other usage caps. A higher API usage tier can raise rate limits, but it does not change a model’s per-request context window.

For example, your account could have a high tokens-per-minute allowance and still fail if a single request exceeds the selected model’s context window.

Learn more: OpenAI API rate limits

What Counts as a Token

Tokens are the units language models process. A token can be a short word, part of a long word, a number, punctuation, or whitespace. English text often averages around four characters per token, but the real number depends on the text.

Short English words often count as one token. Long words, code, numbers, URLs, and non-English text can use tokens differently. Code and JSON can become token-heavy because punctuation, indentation, keys, and repeated strings all add to the count.

This matters because a “short” file can still use many tokens if it contains dense code, logs, tables, or structured data.

Why ChatGPT May Stop Early

ChatGPT can stop before an answer is complete for several reasons:

The answer reached the available output budget.
The conversation used too much of the available context.
Uploaded files consumed too much working space.
A reasoning mode used hidden reasoning tokens.
A tool call or browser action hit a product limit.
The current plan or model mode has a usage cap.

If ChatGPT stops early, ask it to continue from the last heading or reduce the task size. For long documents, split the file into sections and ask for one output at a time.

Reasoning Tokens and Hidden Token Use

Reasoning models can use hidden reasoning tokens before they produce the visible answer. These tokens count against the request budget even though the user does not see them.

This matters for coding, math, planning, research synthesis, and multi-step analysis tasks. A short visible answer can still consume a large internal token budget if the model spends many tokens reasoning through the problem.

If a reasoning model stops earlier than expected, reduce the input size, lower the requested output length, or reserve more of the context window for reasoning and completion.

Which Limit Matters for Your Use Case

Use case	Limit to check first	Why it matters
Long ChatGPT conversation	ChatGPT context window	Older messages may fall out of working context
Long answer generation	Max output tokens	The model may stop before the full answer is complete
PDF or document analysis	File limits and context window	Extracted text may exceed the available working space
API app development	Model context and output limits	Each request must fit inside the selected model’s limits
Coding with large repositories	Context window and retrieval strategy	The model needs enough relevant files, but not the whole repo
High-volume API app	Rate limits	Throughput depends on RPM, TPM, and account tier

Choose by limit type, not only by model name. Use a large-context API model when you need to process long documents, large code files, or many examples in one request. Use a high max-output model when you need long structured responses. Use ChatGPT Plus or Pro when you need higher ChatGPT usage, larger product context windows, and access to advanced modes inside the ChatGPT interface.

How to Avoid Hitting Token Limits

The best way to avoid token-limit problems is to control input size before you send the request.

Use shorter prompts. Remove repeated instructions, long examples, unused formatting rules, and unrelated context.

Split long documents into sections. Process each section separately, then combine the summaries or findings in a final pass.

Use retrieval instead of pasting everything. Send only the most relevant passages, files, or rows for the current task.

Summarize older conversation history. Keep the decision history, constraints, and current goal. Drop old turns that no longer affect the task.

Reserve output space. If you need a long answer, avoid filling the entire context window with input.

Use the right model. A small-context model may be cheaper for simple tasks, but it can fail on long documents. A larger-context model is better when the task depends on many details.

Legacy and Related Model Limits

Older models still appear in existing apps, tutorials, and legacy integrations. Keep these limits in mind when maintaining older projects.

Model family	Context window	Max output tokens
GPT-4 Turbo	128,000	4,096
GPT-4o	128,000	4,096 to 16,384 depending on model version
GPT-4o mini	128,000	16,384
GPT-4	8,192	8,192
GPT-3.5 Turbo	4,096 to 16,385	4,096
davinci-002 / babbage-002	16,384	Model-dependent

Embedding models and moderation models use different limits and output formats. Embedding models return vectors, not normal text answers. Moderation models classify content, so their limits should not be compared directly with chat or reasoning models.

Model	Description	Output Dimension
text-embedding-3-large	Most capable embedding model for both english and non-english tasks.	3,072
text-embedding-3-small	Increased performance over 2nd generation ada embedding model	1,536
text-embedding-ada-002	Most capable 2nd generation embedding model, replacing 16 first generation models.	1,536

Real-world Examples

Long PDF

If you upload a long PDF into ChatGPT, the file size is not the only limit that matters. ChatGPT must extract text from the file and fit the relevant content into the available working context. A 200-page document may need chunking even if the upload itself succeeds.

Best approach: ask for a section-by-section summary, then ask for a final synthesis after the main sections are processed.

Long API Response

If your API app needs a 30,000-word report, check the model’s max output tokens before you design the workflow. A large context window helps the model read more input, but the output cap still controls the response length.

Best approach: generate the report in sections, then run a final editing pass for consistency.

Coding Agent

Large repositories can exceed the context window quickly. Sending every file usually wastes tokens and reduces answer quality.

Best approach: send the task, relevant files, error output, dependency files, and a concise project map. Add more files only when the model needs them.

For more related tools, see Best AI Coding Agents.

FAQs

Q: What is the ChatGPT token limit?
A: ChatGPT token limits depend on your plan and selected mode. GPT-5.5 Instant uses 16K context on Free, 32K on Plus and Business, and 128K on Pro and Enterprise. GPT-5.5 Thinking can use larger paid-tier context windows.

Q: What is the OpenAI API token limit?
A: The OpenAI API token limit depends on the model. GPT-5.5 and GPT-5.4-class API models support context windows up to 1,050,000 tokens and max output up to 128,000 tokens. GPT-4.1 supports a 1,047,576-token context window and a 32,768-token max output limit.

Q: Is max output tokens the same as context window?
A: No. The context window is the total token budget for input and output. Max output tokens only controls how long the model’s answer can be.

Q: Why does ChatGPT stop before the answer is complete?
A: ChatGPT can stop early when it reaches an output limit, a tool limit, a plan limit, or a safety limit. Long prompts, long files, and reasoning-heavy tasks can reduce the space left for the visible answer.

Q: Can ChatGPT Plus handle more tokens than Free?
A: Yes. Plus has higher usage limits and larger context access than Free for supported models. It still does not match every OpenAI API context window.

Q: Can I increase the OpenAI API context window by paying more?
A: No. The context window is set by the model. A higher usage tier can raise rate limits, but it does not change the model’s per-request context window.

Q: Do reasoning tokens count toward my limit?
A: Yes. Reasoning tokens count against the request budget even though they are not shown in the final answer.

Q: What is the difference between token limits and rate limits?
A: Token limits control how much content fits in one request or conversation window. Rate limits control how many requests or tokens you can process over time.

Q: Which OpenAI model has the largest context window?
A: Among the current API models covered here, GPT-5.5 and GPT-5.4-class models have context windows up to 1,050,000 tokens.

Q: Does a 1M-token context window mean a 1M-token answer?
A: No. The context window includes input and output together. The model’s max output token limit still controls the longest response it can return in one request.

Final Thoughts

Use ChatGPT plan limits when you work inside the ChatGPT app. Use OpenAI API model limits when you build software with the API.

The most important distinction is context window vs max output tokens. Context controls how much the model can consider. Max output controls how long the answer can be. Rate limits control usage over time.

Once you separate those three limits, most ChatGPT and OpenAI token-limit problems become easier to diagnose.

References and Resources

Changelog:

05/11/2026

Revised.

04/24/2026

Updated for GPT-5.5

03/18/2026

Cleanup

03/17/2026

Updated for GPT-5.4-Mini & GPT-5.4-Nano

03/05/2026

Updated for GPT-5.4

12/11/2025

Updated for GPT-5.2

10/18/2025

Updated token limits

08/07/2025

Updated for GPT-5

04/16/2025

Added o4-mini and o3

04/15/2025

Added GPT-4.1 family

12/18/2024

o1-preview => o1

10/04/2024

Fixed ‘Max Output Tokens’ in ‘Token Limit in GPT-4’
Added gpt-4o-realtime-preview

08/07/2024

Clean up
Update for gpt-4o-2024-08-06

05/13/2024

Updated for GPT-4o

2 Comments

roozbeh
September 26, 2024 / 4:03 pm Reply
In this article, the “Token Limit in GPT-4” heading, and the “Max tokens” in the table are incorrect. The numbers do not indicate the Max tokens. They are Context Window.
- ScriptByAI
  October 4, 2024 / 5:38 am Reply
  Fixed. I forget to add the ‘Max Output Tokens’ column. Thanks for your feedback.