ChatGPT Token Limit: Free, Plus, Pro, and OpenAI API Limits (2026)

Compare ChatGPT Free, Plus, Pro, and OpenAI API token limits, including context windows, max output tokens, and message limits.

TL;DR

ModelContext WindowMax Output TokensTraining Data
GPT-5.5
GPT-5.5-Pro
1,050,000128,000Dec 01, 2025
GPT-5.4-Mini
GPT-5.4-Nano
400,000128,000Aug 31, 2025
GPT-5400,000128,000Sep 30, 2024
GPT-5-Mini
GPT-5-Nano
400,000128,000May 31, 2024
GPT-4.11,047,57632,768Jun 01, 2024
GPT-5.3-Codex400,000128,000Aug 31, 2025
Last updated: May 11, 2026

ChatGPT token limits depend on where you use the model. The OpenAI API has model-level context windows and max output limits. ChatGPT has plan-level limits, message caps, tool limits, and file limits.

For API users, the most important numbers are the model context window and max output tokens. For ChatGPT users, the most important limits are the plan context window, the number of messages available in a time window, and whether the selected mode uses extra reasoning tokens.

This guide separates those limits so you can quickly see what applies to ChatGPT Free, Plus, Pro, and the OpenAI API.

At a Glance

As of May 2026, GPT-5.5 and GPT-5.4-class API models have context windows up to 1,050,000 tokens and max output limits up to 128,000 tokens. GPT-4.1 models have a 1,047,576-token context window and a 32,768-token max output limit.

ChatGPT plan limits are different from API limits. In ChatGPT, GPT-5.5 Instant uses smaller context windows by plan, while GPT-5.5 Thinking can use larger context windows on paid tiers. ChatGPT message limits also depend on plan, model, mode, rollout status, and usage guardrails.

User goalMost useful limit
Use ChatGPT Free, Plus, or ProChatGPT plan context and message limits
Build with the OpenAI APIAPI model context window and max output tokens
Upload long filesChatGPT file limits and available context
Generate long answersMax output tokens
Fix “too many requests” errorsRate limits, not token limits

Quick Comparison

Limit typeWhat it controls
Context windowTotal input, conversation history, instructions, tool context, reasoning tokens, and output budget
Max output tokensLongest possible answer from one API response
Message limitHow many ChatGPT messages a plan can send in a time window
Rate limitAPI throughput across requests, usually requests per minute, tokens per minute, or daily quota
File limitUpload size, number of files, and extracted text inside ChatGPT tools

Do not mix these limits together. A model can have a large API context window and still have a smaller max output limit. A ChatGPT Plus account can have higher usage than Free, but it does not automatically match every API limit.

ChatGPT Free vs Plus vs Pro Token Limits

ChatGPT limits depend on the plan, the selected model, and the selected mode. The same model family can have different practical limits in ChatGPT and in the OpenAI API.

ChatGPT modeFreePlus / BusinessPro / Enterprise
GPT-5.5 Instant context window16K32K128K
GPT-5.5 Thinking context windowLimited availability256K paid-tier limit400K Pro-tier limit
GPT-5.5 Thinking max outputPlan-dependent availabilityUp to 128K output within the paid-tier budgetUp to 128K output within the Pro budget
GPT-5.5 message allowanceUp to 10 messages every 5 hoursUp to 160 messages every 3 hoursUnlimited access subject to abuse guardrails

ChatGPT usage limits can change by plan, region, rollout status, and abuse guardrails. Check the model picker and usage notices inside your ChatGPT account before relying on a limit for important work.

OpenAI API Token Limits by Model

OpenAI API token limits apply per request. The context window is the total budget for input and output. The max output token limit is only the response budget.

Model familyContext windowMax output tokensBest use
GPT-5.5 / GPT-5.5 Pro1,050,000128,000Largest current API context and high-end reasoning or generation
GPT-5.4 / GPT-5.4 Pro1,050,000128,000Large-context production apps
GPT-5.4 Mini / GPT-5.4 Nano400,000128,000Lower-cost long-context tasks
GPT-5.3 Codex400,000128,000Coding-agent workflows and large repository context
GPT-5.2 / GPT-5.2 Pro400,000128,000Current GPT-5-class workloads
GPT-5 / GPT-5.1400,000128,000General GPT-5-class API use
GPT-4.1 / GPT-4.1 Mini / GPT-4.1 Nano1,047,57632,768Large input with shorter output
o3 / o3-pro / o4-mini class models200,000100,000Reasoning-heavy tasks
o1200,000100,000Older reasoning workloads
o1-mini128,00065,536Smaller reasoning workloads
GPT-4o class legacy models128,000Usually 4,096 to 16,384Legacy multimodal and chat workloads

Do not use the largest number in the table as the answer for every case. A 1,050,000-token context window does not mean the model can return a 1,050,000-token answer. The response is still capped by the model’s max output tokens and by the output budget you set in the API request.

Context Window vs Max Output Tokens

The context window is the total amount of text the model can consider in one request. It includes system instructions, developer instructions, user input, conversation history, retrieved context, tool results, hidden reasoning tokens, and the final answer.

Max output tokens control how long the model’s answer can be. This is only one part of the full context window.

For example, if a model has a 400,000-token context window and a 128,000-token max output limit, you cannot use all 400,000 tokens for the prompt and still expect a 128,000-token answer. You must leave space for the response and any hidden reasoning tokens.

In practical terms:

  • Use the context window to estimate how much input you can send.
  • Use max output tokens to estimate how long the answer can be.
  • Leave extra room for reasoning models.
  • Reduce file size or retrieved context if the model stops early.

Token Limits vs Rate Limits

Token limits and rate limits solve different problems.

Token limits control how much content can fit in one request or one ChatGPT working context. If your prompt, files, and expected answer exceed the available window, the model must truncate, refuse, or stop early.

Rate limits control how much you can use the API over time. They can include requests per minute, tokens per minute, requests per day, or other usage caps. A higher API usage tier can raise rate limits, but it does not change a model’s per-request context window.

For example, your account could have a high tokens-per-minute allowance and still fail if a single request exceeds the selected model’s context window.

Learn more: OpenAI API rate limits

What Counts as a Token

Tokens are the units language models process. A token can be a short word, part of a long word, a number, punctuation, or whitespace. English text often averages around four characters per token, but the real number depends on the text.

Short English words often count as one token. Long words, code, numbers, URLs, and non-English text can use tokens differently. Code and JSON can become token-heavy because punctuation, indentation, keys, and repeated strings all add to the count.

This matters because a “short” file can still use many tokens if it contains dense code, logs, tables, or structured data.

Why ChatGPT May Stop Early

ChatGPT can stop before an answer is complete for several reasons:

  • The answer reached the available output budget.
  • The conversation used too much of the available context.
  • Uploaded files consumed too much working space.
  • A reasoning mode used hidden reasoning tokens.
  • A tool call or browser action hit a product limit.
  • The current plan or model mode has a usage cap.

If ChatGPT stops early, ask it to continue from the last heading or reduce the task size. For long documents, split the file into sections and ask for one output at a time.

Reasoning Tokens and Hidden Token Use

Reasoning models can use hidden reasoning tokens before they produce the visible answer. These tokens count against the request budget even though the user does not see them.

This matters for coding, math, planning, research synthesis, and multi-step analysis tasks. A short visible answer can still consume a large internal token budget if the model spends many tokens reasoning through the problem.

If a reasoning model stops earlier than expected, reduce the input size, lower the requested output length, or reserve more of the context window for reasoning and completion.

Which Limit Matters for Your Use Case

Use caseLimit to check firstWhy it matters
Long ChatGPT conversationChatGPT context windowOlder messages may fall out of working context
Long answer generationMax output tokensThe model may stop before the full answer is complete
PDF or document analysisFile limits and context windowExtracted text may exceed the available working space
API app developmentModel context and output limitsEach request must fit inside the selected model’s limits
Coding with large repositoriesContext window and retrieval strategyThe model needs enough relevant files, but not the whole repo
High-volume API appRate limitsThroughput depends on RPM, TPM, and account tier

Choose by limit type, not only by model name. Use a large-context API model when you need to process long documents, large code files, or many examples in one request. Use a high max-output model when you need long structured responses. Use ChatGPT Plus or Pro when you need higher ChatGPT usage, larger product context windows, and access to advanced modes inside the ChatGPT interface.

How to Avoid Hitting Token Limits

The best way to avoid token-limit problems is to control input size before you send the request.

Use shorter prompts. Remove repeated instructions, long examples, unused formatting rules, and unrelated context.

Split long documents into sections. Process each section separately, then combine the summaries or findings in a final pass.

Use retrieval instead of pasting everything. Send only the most relevant passages, files, or rows for the current task.

Summarize older conversation history. Keep the decision history, constraints, and current goal. Drop old turns that no longer affect the task.

Reserve output space. If you need a long answer, avoid filling the entire context window with input.

Use the right model. A small-context model may be cheaper for simple tasks, but it can fail on long documents. A larger-context model is better when the task depends on many details.

Legacy and Related Model Limits

Older models still appear in existing apps, tutorials, and legacy integrations. Keep these limits in mind when maintaining older projects.

Model familyContext windowMax output tokens
GPT-4 Turbo128,0004,096
GPT-4o128,0004,096 to 16,384 depending on model version
GPT-4o mini128,00016,384
GPT-48,1928,192
GPT-3.5 Turbo4,096 to 16,3854,096
davinci-002 / babbage-00216,384Model-dependent

Embedding models and moderation models use different limits and output formats. Embedding models return vectors, not normal text answers. Moderation models classify content, so their limits should not be compared directly with chat or reasoning models.

ModelDescriptionOutput Dimension
text-embedding-3-largeMost capable embedding model for both english and non-english tasks.3,072
text-embedding-3-smallIncreased performance over 2nd generation ada embedding model1,536
text-embedding-ada-002Most capable 2nd generation embedding model, replacing 16 first generation models.1,536

Real-world Examples

Long PDF

If you upload a long PDF into ChatGPT, the file size is not the only limit that matters. ChatGPT must extract text from the file and fit the relevant content into the available working context. A 200-page document may need chunking even if the upload itself succeeds.

Best approach: ask for a section-by-section summary, then ask for a final synthesis after the main sections are processed.

Long API Response

If your API app needs a 30,000-word report, check the model’s max output tokens before you design the workflow. A large context window helps the model read more input, but the output cap still controls the response length.

Best approach: generate the report in sections, then run a final editing pass for consistency.

Coding Agent

Large repositories can exceed the context window quickly. Sending every file usually wastes tokens and reduces answer quality.

Best approach: send the task, relevant files, error output, dependency files, and a concise project map. Add more files only when the model needs them.

For more related tools, see Best AI Coding Agents.

FAQs

Q: What is the ChatGPT token limit?
A: ChatGPT token limits depend on your plan and selected mode. GPT-5.5 Instant uses 16K context on Free, 32K on Plus and Business, and 128K on Pro and Enterprise. GPT-5.5 Thinking can use larger paid-tier context windows.

Q: What is the OpenAI API token limit?
A: The OpenAI API token limit depends on the model. GPT-5.5 and GPT-5.4-class API models support context windows up to 1,050,000 tokens and max output up to 128,000 tokens. GPT-4.1 supports a 1,047,576-token context window and a 32,768-token max output limit.

Q: Is max output tokens the same as context window?
A: No. The context window is the total token budget for input and output. Max output tokens only controls how long the model’s answer can be.

Q: Why does ChatGPT stop before the answer is complete?
A: ChatGPT can stop early when it reaches an output limit, a tool limit, a plan limit, or a safety limit. Long prompts, long files, and reasoning-heavy tasks can reduce the space left for the visible answer.

Q: Can ChatGPT Plus handle more tokens than Free?
A: Yes. Plus has higher usage limits and larger context access than Free for supported models. It still does not match every OpenAI API context window.

Q: Can I increase the OpenAI API context window by paying more?
A: No. The context window is set by the model. A higher usage tier can raise rate limits, but it does not change the model’s per-request context window.

Q: Do reasoning tokens count toward my limit?
A: Yes. Reasoning tokens count against the request budget even though they are not shown in the final answer.

Q: What is the difference between token limits and rate limits?
A: Token limits control how much content fits in one request or conversation window. Rate limits control how many requests or tokens you can process over time.

Q: Which OpenAI model has the largest context window?
A: Among the current API models covered here, GPT-5.5 and GPT-5.4-class models have context windows up to 1,050,000 tokens.

Q: Does a 1M-token context window mean a 1M-token answer?
A: No. The context window includes input and output together. The model’s max output token limit still controls the longest response it can return in one request.

Final Thoughts

Use ChatGPT plan limits when you work inside the ChatGPT app. Use OpenAI API model limits when you build software with the API.

The most important distinction is context window vs max output tokens. Context controls how much the model can consider. Max output controls how long the answer can be. Rate limits control usage over time.

Once you separate those three limits, most ChatGPT and OpenAI token-limit problems become easier to diagnose.

References and Resources

Changelog:

05/11/2026

  • Revised.

04/24/2026

  • Updated for GPT-5.5

03/18/2026

  • Cleanup

03/17/2026

  • Updated for GPT-5.4-Mini & GPT-5.4-Nano

03/05/2026

  • Updated for GPT-5.4

12/11/2025

  • Updated for GPT-5.2

10/18/2025

  • Updated token limits

08/07/2025

  • Updated for GPT-5

04/16/2025

  • Added o4-mini and o3

04/15/2025

  • Added GPT-4.1 family

12/18/2024

  • o1-preview => o1

10/04/2024

  • Fixed ‘Max Output Tokens’ in ‘Token Limit in GPT-4’
  • Added gpt-4o-realtime-preview

08/07/2024

  • Clean up
  • Update for gpt-4o-2024-08-06

05/13/2024

  • Updated for GPT-4o

2 Comments

  1. In this article, the “Token Limit in GPT-4” heading, and the “Max tokens” in the table are incorrect. The numbers do not indicate the Max tokens. They are Context Window.

Leave a Reply

Your email address will not be published. Required fields are marked *

Get the latest & top AI tools sent directly to your email.

Subscribe now to explore the latest & top AI tools and resources, all in one convenient newsletter. No spam, we promise!