OpenAI API Rate Limits: GPT-5.5, GPT Image 2 & Free Tier (2026)

Full 2026 OpenAI rate limits by model and tier. Compare RPM, TPM, and batch limits for GPT-5.5, GPT-5.5 Pro, GPT Image 2, and free trial users with clear tables.

OpenAI API rate limits control how many requests, tokens, images, and batch jobs your organization can send over a fixed period.

They are not the same as token limits, model context windows, or monthly billing limits.

The exact limits depend on your organization, project, usage tier, model, endpoint, and sometimes the request type.

This guide explains the main OpenAI rate limit types, how usage tiers work, where to check your current limits, and how to fix 429 “Too Many Requests” errors.

This page lists all current OpenAI API rate limits (2026), including free trial and Tier 1-5 users, with quick-reference charts for GPT-5.5, GPT-5.5 Pro, GPT-5.4 Pro/Mini/Nano, and GPT Image 2 models.

Quick Summary Table (2026)

ModelFree TierTier 1Tier 3Tier 5
GPT-5.5500 RPM / 500k TPM5k RPM / 2M TPM15k RPM / 40M TPM
GPT-5.5 Pro500 RPM / 30k TPM5k RPM / 800k TPM10k RPM / 30M TPM
GPT-5.4 Mini500 RPM / 500k TPM5k RPM / 4M TPM30k RPM / 180M TPM
GPT-5.4 Nano500 RPM / 200k TPM5k RPM / 4M TPM30k RPM / 180M TPM
GPT Image 25 img/min50 img/min250 img/min

Understanding the OpenAI API Rate Limits

OpenAI API rate limits are measured mainly through RPM, RPD, TPM, TPD, IPM, and audio-minute limits for some streaming audio models:

LimitMeaningWhat it controls
RPMRequests per minuteHow many API calls you can send per minute
RPDRequests per dayHow many API calls you can send per day
TPMTokens per minuteHow many input and output tokens you can process per minute
TPDTokens per dayHow many tokens you can process per day
IPMImages per minuteHow many image generations or image edits you can run per minute
Audio minutes per minuteAudio throughput per minuteHow much streaming audio some audio models can process per minute
Batch queue limitEnqueued prompt tokens per modelHow many input tokens can sit in the Batch API queue

You can hit any one of these limits first. A request can fail even if you are under your TPM limit, because you may have hit RPM. A request can also fail even if you are under RPM, because the prompt, response budget, or queued batch tokens may exceed another limit.

OpenAI Rate Limits vs Token Limits

Rate limits and token limits are different.

Limit typeWhat it meansExample problem
Rate limitHow much API usage your organization can send over time429 error after too many requests in one minute
Context windowHow much input and output can fit in one model requestLong document is too large for the selected model
Max output tokensLongest answer the model can return in one responseModel stops before a long report is complete
Usage limitMonthly spend cap for an organization or projectAPI stops because the monthly budget is reached

If your request is too large for a model’s context window, raising your rate limit will not fix it. If your app gets 429 errors during traffic spikes, switching to a larger context model will not fix it.

For model context windows and max output limits, see ChatGPT and OpenAI token limits.

Current Rate Limits by Usage Tier

OpenAI automatically assigns you to usage tiers based on your payment history and API usage patterns. Higher tiers get better rate limits and access to newer models.

TIERQUALIFICATIONMAX CREDITS
FreeUser must be in an allowed geography$100
Tier 1$5 paid$100
Tier 2$50 paid and 7+ days since first successful payment$500
Tier 3$100 paid and 7+ days since first successful payment$1,000
Tier 4$250 paid and 14+ days since first successful payment$5,000
Tier 5$1,000 paid and 30+ days since first successful payment$200,000

Rate Limits For Pay-as-you-go Users (Tier 1 – Tier 5)

ModelRPMRPDTPMBatch Queue Limit
gpt-5.5500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
15,000 (T5)
500,000 (T1)
1,000,000 (T2)
2,000,000 (T3)
4,000,000 (T4)
40,000,000 (T5)
150,000 (T1)
3,000,000 (T2)
100,000,000 (T3)
200,000,000 (T4)
15,000,000,000 (T5)
gpt-5.5-pro500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
10,000 (T5)
500,000 (T1)
1,000,000 (T2)
2,000,000 (T3)
4,000,000 (T4)
40,000,000 (T5)
150,000 (T1)
3,000,000 (T2)
100,000,000 (T3)
200,000,000 (T4)
15,000,000,000 (T5)
gpt-5.4
500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
15,000 (T5)
500,000 (T1)
1,000,000 (T2)
2,000,000 (T3)
4,000,000 (T4)
40,000,000 (T5)
150,000 (T1)
3,000,000 (T2)
100,000,000 (T3)
200,000,000 (T4)
15,000,000,000 (T5)
gpt-5.4-pro500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
10,000 (T5)
30,000 (T1)
450,000 (T2)
800,000 (T3)
2,000,000 (T4)
30,000,000 (T5)
90,000 (T1)
1,350,000 (T2)
50,000,000 (T3)
200,000,000 (T4)
5,000,000,000 (T5)
gpt-5.4-mini500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
30,000 (T5)
500,000 (T1)
2,000,000 (T2)
4,000,000 (T3)
10,000,000 (T4)
180,000,000 (T5
5,000,000 (T1)
20,000,000 (T2)
40,000,000 (T3)
1,000,000,000 (T4)
15,000,000,000 (T5)
gpt-5.4-nano500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
30,000 (T5)
200,000 (T1)
2,000,000 (T2)
4,000,000 (T3)
10,000,000 (T4)
180,000,000 (T5)
2,000,000 (T1)
20,000,000 (T2)
40,000,000 (T3)
1,000,000,000 (T4)
15,000,000,000 (T5)
gpt-5.3-Codex500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
15,000 (T5)
500,000 (T1)
1,000,000 (T2)
2,000,000 (T3)
4,000,000 (T4)
40,000,000 (T5)
150,000 (T1)
3,000,000 (T2)
100,000,000 (T3)
200,000,000 (T4)
15,000,000,000 (T5)
gpt-5.2500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
15,000 (T5)
500,000 (T1)
1,000,000 (T2)
2,000,000 (T3)
4,000,000 (T4)
40,000,000 (T5)
150,000 (T1)
3,000,000 (T2)
100,000,000 (T3)
200,000,000 (T4)
15,000,000,000 (T5)
gpt-5.2-pro500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
10,000 (T5)
30,000 (T1)
450,000 (T2)
800,000 (T3)
2,000,000 (T4)
30,000,000 (T5)
90,000 (T1)
1,350,000 (T2)
50,000,000 (T3)
200,000,000 (T4)
5,000,000,000 (T5)
gpt-5500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
15,000 (T5)
500,000 (T1)
1,000,000 (T2)
2,000,000 (T3)
4,000,000 (T4)
40,000,000 (T5)
150,000 (T1)
3,000,000 (T2)
100,000,000 (T3)
200,000,000 (T4)
15,000,000,000 (T5)
gpt-5-mini500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
30,000 (T5)
500,000 (T1)
2,000,000 (T2)
4,000,000 (T3)
10,000,000 (T4)
180,000,000 (T5
5,000,000 (T1)
20,000,000 (T2)
40,000,000 (T3)
1,000,000,000 (T4)
15,000,000,000 (T5)
gpt-5-nano500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
30,000 (T5)
200,000 (T1)
2,000,000 (T2)
4,000,000 (T3)
10,000,000 (T4)
180,000,000 (T5)
2,000,000 (T1)
20,000,000 (T2)
40,000,000 (T3)
1,000,000,000 (T4)
15,000,000,000 (T5)
gpt-5-pro500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
10,000 (T5)
30,000 (T1)
450,000 (T2)
800,000 (T3)
2,000,000 (T4)
30,000,000 (T5)
90,000 (T1)
1,350,000 (T2)
50,000,000 (T3)
200,000,000 (T4)
5,000,000,000 (T5)
gpt-4.1500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
10,000 (T5)
30,000 (T1)
450,000 (T2)
800,000 (T3)
2,000,000 (T4)
30,000,000 (T5)
90,000 (T1)
1,350,000 (T2)
50,000,000 (T3)
200,000,000 (T4)
5,000,000,000 (T5)
gpt-4.1-mini500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
30,000 (T5)
10,000 (T1)200,000 (T1)
2,000,000 (T2)
4,000,000 (T3)
10,000,000 (T4)
150,000,000 (T5)
2,000,000 (T1)
20,000,000 (T2)
40,000,000 (T3)
1,000,000,000 (T4)
15,000,000,000 (T5)
gpt-4.1-nano500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
30,000 (T5)
10,000 (T1)200,000 (T1)
2,000,000 (T2)
4,000,000 (T3)
10,000,000 (T4)
150,000,000 (T5)
2,000,000 (T1)
20,000,000 (T2)
40,000,000 (T3)
1,000,000,000 (T4)
15,000,000,000 (T5)
o4-mini1,000 (T1)
2,000 (T2)
5,000 (T3)
10,000 (T4)
30,000 (T5)
100,000 (T1)
200,000 (T2)
4,000,000 (T3)
10,000,000 (T4)
150,000,000 (T5)
1,000,000 (T1)
2,000,000 (T2)
40,000,000 (T3)
1,000,000,000 (T4)
15,000,000,000 (T5)
o4-mini-deep-research
1,000 (T1)
2,000 (T2)
5,000 (T3)
10,000 (T4)
30,000 (T5)
200,000 (T1)
2,000,000 (T2)
4,000,000 (T3)
10,000,000 (T4)
150,000,000 (T5)
200,000 (T1)
300,000 (T2)
500,000 (T3)
2,000,000 (T4)
10,000,000 (T5)
o3-pro500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
10,000 (T5)
30,000 (T1)
450,000 (T2)
800,000 (T3)
2,000,000 (T4)
30,000,000 (T5)
90,000 (T1)
1,350,000 (T2)
50,000,000 (T3)
200,000,000 (T4)
5,000,000,000 (T5)
o3500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
10,000 (T5)

30,000 (T1)
450,000 (T2)
800,000 (T3)
2,000,000 (T4)
30,000,000 (T5)
90,000 (T1)
1,350,000 (T2)
50,000,000 (T3)
200,000,000 (T4)
5,000,000,000 (T5)
o3-mini1,000 (T1)
2,000 (T2)
5,000 (T3)
10,000 (T4)
30,000 (T5)
100,000 (T1)
200,000 (T2)
4,000,000 (T3)
10,000,000 (T4)
150,000,000 (T5)
1,000,000 (T1)
2,000,000 (T2)
40,000,000 (T3)
1,000,000,000 (T4)
15,000,000,000 (T5)
o3-deep-research500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
10,000 (T5)
20,000 (T1)
450,000 (T2)
800,000 (T3)
2,000,000 (T4)
30,000,000 (T5)
200,000 (T1)
300,000 (T2)
500,000 (T3)
2,000,000 (T4)
10,000,000 (T5)
o1-pro500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
10,000 (T5)
30,000 (T1)
450,000 (T2)
800,000 (T3)
2,000,000 (T4)
30,000,000 (T5)
90,000 (T1)
1,350,000 (T2)
50,000,000 (T3)
200,000,000 (T4)
5,000,000,000 (T5)
o1500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
10,000 (T5)
30,000 (T1)
450,000 (T2)
800,000 (T3)
2,000,000 (T4)
30,000,000 (T5)
90,000 (T1)
1,350,000 (T2)
50,000,000 (T3)
200,000,000 (T4)
5,000,000,000 (T5)
o1-mini1,000 (T1)
2,000 (T2)
5,000 (T3)
10,000 (T4)
30,000 (T5)
100,000 (T1)
200,000 (T2)
4,000,000 (T3)
10,000,000 (T4)
150,000,000 (T5)
1,000,000 (T1)
2,000,000 (T2)
40,000,000 (T3)
1,000,000,000 (T4)
15,000,000,000 (T5)
Sora 2 ProDeprecated
SoraDeprecated
gpt-4o500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
10,000 (T5)
30,000 (T1)
450,000 (T2)
800,000 (T3)
2,000,000 (T4)
30,000,000 (T5)
90,000 (T1)
1,350,000 (T2)
50,000,000 (T3)
200,000,000 (T4)
5,000,000,000 (T5)
gpt-4o-mini500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
30,000 (T5)
10,000 (T1)200,000 (T1)
2,000,000 (T2)
4,000,000 (T3)
10,000,000 (T4)
150,000,000 (T5)
2,000,000 (T1)
20,000,000 (T2)
40,000,000 (T3)
1,000,000,000 (T4)
15,000,000,000 (T5)
gpt-4o-audio500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
30,000 (T5)
30,000 (T1)
450,000 (T2)
800,000 (T3)
2,000,000 (T4)
30,000,000 (T5)
90,000 (T1)
1,350,000 (T2)
50,000,000 (T3)
200,000,000 (T4)
5,000,000,000 (T5)
gpt-4o-mini-audio500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
30,000 (T5)
10,000 (T1)200,000 (T1)
2,000,000 (T2)
4,000,000 (T3)
10,000,000 (T4)
150,000,000 (T5)
2,000,000 (T1)
20,000,000 (T2)
40,000,000 (T3)
1,000,000,000 (T4)
15,000,000,000 (T5)
GPT-4o Realtime200 (T1)
400 (T2)
5,000 (T3)
10,000 (T4)
20,000 (T5)
10,000 (T1)40,000 (T1)
200,000 (T2)
800,000 (T3)
4,000,000 (T4)
15,000,000 (T5)
GPT-4o Mini Realtime200 (T1)
400 (T2)
5,000 (T3)
10,000 (T4)
20,000 (T5)
10,000 (T1)40,000 (T1)
200,000 (T2)
800,000 (T3)
4,000,000 (T4)
15,000,000 (T5)
gpt-image-25 img/min (T1)
20 img/min (T2)
50 img/min (T3)
100 img/min (T4)
250 img/min (T5)
100,000 (T1)
250,000 (T2)
800,000 (T3)
3,000,000 (T4)
8,000,000 (T5)
gpt-image-1.55 img/min (T1)
20 img/min (T2)
50 img/min (T3)
100 img/min (T4)
250 img/min (T5)
100,000 (T1)
250,000 (T2)
800,000 (T3)
3,000,000 (T4)
8,000,000 (T5)
gpt-image-1-mini5 img/min (T1)
20 img/min (T2)
50 img/min (T3)
150 img/min (T4)
250 img/min (T5)
100,000 (T1)
250,000 (T2)
800,000 (T3)
3,000,000 (T4)
8,000,000 (T5)
dall-e-3 (Deprecated)500 img/min (T1)
2500 img/min (T2)
5000 img/min (T3)
7500 img/min (T4)
10,000 img/min (T5)
dall-e-2 (Deprecated)500 img/min (T1)
2500 img/min (T2)
5000 img/min (T3)
7500 img/min (T4)
10,000 img/min (T5)
gpt-audio-1.5500 (T1)
5,000 (T2)
5,000 (T3)
10,000 (T4)
10,000 (T5)
30,000 (T1)
450,000 (T2)
800,000 (T3)
2,000,000 (T4)
30,000,000 (T5)
90,000 (T1)
1, 350,000 (T2)
50,000,000 (T3)
200,000,000 (T4)
5,000,000,000 (T5)
gpt-realtime-1.5200 (T1)
400 (T2)
5,000 (T3)
10,000 (T4)
20,000 (T5)
1,000 (T1)40,000 (T1)
200,000 (T2)
800,000 (T3)
4,000,000 (T4)
15,000,000 (T5)
GPT-4o mini TTS500 (T1)
2,000(T2)
5,000 (T3)
10,000 (T4)
10,000 (T5)
50,000 (T1)
150,000 (T2)
600,000 (T3)
2,000,000 (T4)
8,000,000 (T5)
tts-1500 (T1)
2,000(T2)
5,000 (T3)
10,000 (T4)
10,000 (T5)
TTS-1 HD500 (T1)
2,000(T2)
5,000 (T3)
10,000 (T4)
10,000 (T5)

Free Tier Rate Limits (2026)

Note: Free / trial API accounts may not have access to all models (e.g. GPT-5, o4-mini) or may be subject to stricter limits. The numbers here assume model availability when permitted by OpenAI.

ModelTPMRPMRPDTPD
Chat
gpt-5.5 (Not supported)10,0003200900,000
gpt-5.5-pro (Not supported)150,0003200
gpt-5.4 (Not supported)10,0003200900,000
gpt-5.4-pro (Not supported)150,0003200
gpt-5.3-codex10,0003200900,000
gpt-5.3-chat-latest10,0003200900,000
gpt-5.2 (Not supported)10,0003200900,000
gpt-5.2-pro (Not supported)10,0003200900,000
gpt-5-mini (Not supported)60,0003200200,000
gpt-5-nano (Not supported)60,0003200200,000
gpt-5-pro (Not supported)30,000320090,000
gpt-5-search-api (Not supported)3,0003200
gpt-4.110,0003200900,000
gpt-4.1 (long context)60,0003200200,000
gpt-4.1-mini60,0003200200,000
gpt-4.1-mini (long context)120,0003200400,000
gpt-4.1-nano60,0003200200,000
gpt-4.1-nano (long context)120,0003200400,000
gpt-4o10,0003200900,000
gpt-4o-audio-preview150,0003200
gpt-4o-search-preview3,0003200
gpt-4o-transcribe200
gpt-4o-mini60,0003200200,000
gpt-4o-mini-search-preview3,0003200
gpt-4o-mini-transcribe200
gpt-3.5-turbo40,0003200200,000
gpt-3.5-turbo-012540,0003200200,000
gpt-3.5-turbo-110640,0003200200,000
gpt-3.5-turbo-16k40,0003200540,000
gpt-3.5-turbo-instruct90,0003200200,000
gpt-3.5-turbo-instruct-091490,0003200200,000
Text
o1150,0003200 90,000
o1-mini150,0003200 –
o3100,0003200 90,000
o3-mini1,000,000150200,000
o4-mini100,0003200 90,000
babbage-002150,0003200 –
davinci-002150,0003200 –
text-embedding-3-large40,0001002,000
text-embedding-3-small40,0001002,000
text-embedding-ada-00240,0001002,000
Audio
gpt-4o-mini-tts200 
tts-1150,0003200 
tts-1-1106150,0003200 
tts-1-hd150,0003200 
tts-1-hd-1106150,0003200 
whisper-1150,0003200 
Moderation
omni-moderation-2024-09-265,00025010,000 
omni-moderation-latest5,00025010,000 
text-moderation-stable150,0003200 
text-moderation-latest150,0003200 
text-moderation-stable150,0003200 
Fine-tuning Inference
babbage-002150,0003
davinci-002150,0003
gpt-3.5-turbo-012540,0003
gpt-3.5-turbo-061340,0003
gpt-3.5-turbo-110640,0003
gpt-4-061340,0003
gpt-4o-2024-05-1310,0003
gpt-4o-mini-2024-07-1860,0003
Fine-tuning TrainingACTIVE / QUEUED JOBSJOBS PER DAY
babbage-002348
davinci-002348
gpt-3.5-turbo-0613348
Image
DALL·E 2150,000 TPM, 3 RPM, 200 RPD, 5 images per minute
DALL·E 3150,000 TPM, 3 RPM, 200 RPD
gpt-image-1 (Not supported)3 RPM, 200 RPD
gpt-image-1-mini (Not supported)3 RPM, 200 RPD
gpt-image-2 (Not supported)3 RPM, 200 RPD
Video
sora-2 (Not supported)Deprecated
sora-2-pro (Not supported)Deprecated
Other
Default limits for all other models150,0003200 

How OpenAI API Rate Limits Work

OpenAI rate limits apply at the organization and project level, not only at the individual API key level. They also vary by model. Some model families share rate limits, so calls to several related models can count against the same shared pool.

Long-context requests can have separate limits. This matters for models such as GPT-4.1 and other large-context models because a short request and a very long request may not draw from the same practical capacity.

Batch API limits are separate from normal synchronous API limits. Batch jobs use their own queue-based limits, so they are useful when you need to process a large amount of work without real-time responses.

OpenAI Usage Tiers

OpenAI assigns API accounts to usage tiers based on payment history and account status. Higher tiers usually receive higher usage limits and better rate limits across many models.

TierQualificationUsage limit
FreeUser must be in an allowed geography$100 per month
Tier 1$5 paid$100 per month
Tier 2$50 paid and 7+ days since first successful payment$500 per month
Tier 3$100 paid and 7+ days since first successful payment$1,000 per month
Tier 4$250 paid and 14+ days since first successful payment$5,000 per month
Tier 5$1,000 paid and 30+ days since first successful payment$200,000 per month

These tiers do not mean every model has the same RPM or TPM. A Tier 3 organization can have different limits for GPT models, reasoning models, embeddings, image models, realtime models, and fine-tuned models.

Where to Check Your Current OpenAI Rate Limits

Check your exact API limits in the OpenAI dashboard:

  1. Open the OpenAI Platform dashboard.
  2. Go to account settings.
  3. Open the Limits section.
  4. Select the relevant organization and project.
  5. Review limits by model, endpoint, and shared limit group.y public article because OpenAI can change model access, shared limits, and usage tiers over time.

You can also inspect rate-limit headers from API responses. These headers show the current request and token budget for the request you just made.

HeaderWhat it shows
x-ratelimit-limit-requestsMaximum request count before the request limit is exhausted
x-ratelimit-limit-tokensMaximum token count before the token limit is exhausted
x-ratelimit-remaining-requestsRemaining requests in the current rate-limit window
x-ratelimit-remaining-tokensRemaining tokens in the current rate-limit window
x-ratelimit-reset-requestsTime until the request limit resets
x-ratelimit-reset-tokensTime until the token limit resets

If you run production traffic, log these headers. They are often the fastest way to identify whether your app is limited by RPM, TPM, or short bursts.

Rate Limits by API Type

Different OpenAI APIs use different rate-limit patterns.

API typeMain limits to watchPractical issue
Responses APIRPM, TPM, TPD, shared model limitsChat, agents, tools, and multimodal workloads can hit token limits quickly
Chat Completions APIRPM, TPM, TPDLegacy chat apps often hit RPM before TPM
Embeddings APIRPM, TPM, request payload sizeLarge indexing jobs can hit token throughput limits
Images APIIPM and image-specific limitsImage generation may be limited by images per minute rather than tokens
Realtime APISession, audio, token, and model limitsVoice apps need careful concurrency control
Batch APIPer-batch size, batch creation rate, enqueued prompt tokensLarge offline jobs need queue planning
Fine-tuningTraining jobs, queued jobs, model-specific limitsTraining and inference limits are separate

For most apps, RPM and TPM are the first limits to monitor. For image apps, IPM matters more. For offline data jobs, Batch API queue limits matter more than standard synchronous limits.

Batch API Rate Limits

The Batch API is designed for asynchronous jobs that do not need an immediate response. It uses a separate rate-limit pool from standard synchronous requests.

Batch limits include:

Batch limitCurrent behavior
Requests per batchUp to 50,000 requests in one batch
Batch input file sizeUp to 200 MB
Batch creation rateUp to 2,000 batches per hour
Enqueued prompt tokensModel-specific queue limit shown in the Platform settings
Completion windowBatch jobs are designed around a 24-hour processing window

Use the Batch API for evaluations, classification jobs, embedding jobs, backfills, data cleanup, and other work that can wait. Do not use Batch API for chatbots, live assistants, realtime tools, or user-facing requests that need immediate answers.

Why You Can Get 429 Errors Below the Published Limit

Some rate limits are enforced over shorter windows than one full minute. This means a limit such as 60,000 requests per minute may behave like a smaller per-second allowance during bursts.

You can also trigger 429 errors when your max_completion_tokens is too high. OpenAI can estimate token usage from your prompt plus your requested completion budget. If you ask for a much larger completion than you usually need, your request may reserve more token capacity than the final answer uses.

Common reasons for surprise 429 errors include:

  • Too many requests sent at the same time.
  • A traffic spike exceeded a short internal window.
  • max_completion_tokens was set far above the needed response size.
  • Several models shared the same rate-limit pool.
  • Batch jobs filled the model’s queued-token limit.
  • The API key used a different organization than expected.
  • The project had a lower limit than the organization.

How to Fix OpenAI 429 Errors

The fastest fix is to slow down and retry with exponential backoff. Do not resend failed requests in a tight loop. Failed requests can still count against rate limits, so aggressive retries can make the problem worse.

Use this retry pattern:

  1. Catch 429 errors.
  2. Wait briefly before retrying.
  3. Increase the wait time after each failed retry.
  4. Add random jitter so many workers do not retry at the same time.
  5. Stop after a maximum number of retries.
  6. Log the response headers and error message.

Example Python pattern:

import random
import time
from openai import OpenAI
client = OpenAI()

def call_with_backoff(messages, max_retries=6):
    delay = 1.0
    for attempt in range(max_retries):
        try:
            return client.responses.create(
                model="gpt-5.4-mini",
                input=messages,
                max_output_tokens=800,
            )
        except Exception as error:
            message = str(error).lower()
            if "rate limit" not in message and "429" not in message:
                raise
            if attempt == max_retries - 1:
                raise
            sleep_for = delay + random.uniform(0, 0.5)
            time.sleep(sleep_for)
            delay *= 2

Keep max_output_tokens close to the response length you actually need. If your app usually returns 500 tokens, do not reserve 8,000 tokens by default.

How to Prevent Rate Limit Problems

Control Concurrency

Limit the number of simultaneous requests from workers, queues, and background jobs. A single server may stay under the limit, but several workers can exceed the shared organization or project pool together.

Use a Request Queue

Put API work into a queue when traffic is uneven. A queue lets you smooth spikes, retry failed jobs, and protect user-facing routes from sudden API backpressure.

Reduce Token Waste

Shorter prompts use less TPM. Remove repeated instructions, old conversation history, unused examples, large JSON blobs, and irrelevant retrieved text.

Set Realistic Output Budgets

Large max_output_tokens values can increase the estimated token budget for a request. Use different defaults for short answers, summaries, reports, and long-form generation.

Batch Offline Work

Use the Batch API for jobs that do not need live responses. This is often better for evaluations, embeddings, classification, extraction, and nightly processing.

Split Traffic by Workload

Separate live user requests from background jobs. Live traffic should not compete with batch enrichment, analytics, or testing jobs in the same uncontrolled queue.

Log Rate Limit Headers

Save remaining request and token headers in your logs. This helps you see whether the app is hitting RPM, TPM, or a shared model pool.

How to Increase OpenAI API Rate Limits

The normal way to increase OpenAI API rate limits is to move up usage tiers through paid API usage and payment history. Many accounts graduate automatically as spend and account age increase.

If your app already uses backoff, queueing, realistic token budgets, and the right model, but still needs more throughput, check the Limits section of the OpenAI Platform dashboard. Eligible accounts can request higher limits or reach the next tier through additional API usage.

Enterprise and high-volume customers may also use dedicated arrangements such as Scale Tier or other capacity options. Those options are separate from normal pay-as-you-go rate limits.

Real-world Examples

Chatbot With Too Many Small Requests

A support chatbot may hit RPM before TPM if users send many short messages. The fix is not a larger-context model. The fix is request throttling, queueing, caching repeated answers, and batching internal background calls.

Long Report Generator

A report generator may hit TPM before RPM because each request uses a large prompt and a large output budget. The fix is shorter retrieved context, section-by-section generation, and tighter output limits.

Embedding a Large Website

An embedding job may hit token throughput limits while indexing thousands of pages. The fix is a queue, backoff, deduplication, and Batch API use when the job does not need real-time responses.

Image Generation App

An image app may hit IPM rather than token limits. The fix is a per-user queue, clear wait states, and a cap on simultaneous image jobs.

Quick Troubleshooting Checklist

SymptomLikely causeFix
429 after many short callsRPM limitAdd queueing and concurrency control
429 after long promptsTPM limitReduce prompt size and output budget
429 during traffic spikesShort-window burst limitAdd backoff and jitter
429 only in one projectProject-level limitCheck project settings
429 after switching modelsShared model pool or lower model limitCheck the model’s limit group
Batch job will not startBatch queue limitReduce queued tokens or wait for jobs to finish
API stops after spend capUsage limitRaise monthly usage limit if appropriate

Relevant Resources

FAQs

Q: What are OpenAI API rate limits?
A: OpenAI API rate limits control how many requests, tokens, images, and batch jobs an organization or project can send within a time window. They protect platform stability and control throughput across users.

Q: What is RPM in the OpenAI API?
A: RPM means requests per minute. It controls how many API calls your organization or project can send in one minute.

Q: What is TPM in the OpenAI API?
A: TPM means tokens per minute. It controls how many input and output tokens your organization or project can process in one minute.

Q: What is TPD in the OpenAI API?
A: TPD means tokens per day. It controls total daily token throughput for a model, project, or organization.

Q: What is IPM in the OpenAI API?
A: IPM means images per minute. It applies to image generation and image editing models.

Q: Are OpenAI rate limits per API key?
A: Rate limits are mainly tied to organization and project settings. A single API key can be affected by limits shared across the organization or project.

Q: Are rate limits the same for every OpenAI model?
A: No. Rate limits vary by model, endpoint, usage tier, project, and shared limit group. Some long-context requests also have separate limits.

Q: How do I check my OpenAI rate limit?
A: Open the OpenAI Platform dashboard, select the correct organization and project, then open the Limits section. You can also inspect rate-limit headers in API responses.

Q: Why am I getting 429 errors if I am under the per-minute limit?
A: OpenAI can enforce limits over shorter windows than a full minute. You can also hit a different limit, such as TPM, shared model limits, project limits, or batch queue limits.

Q: How do I fix 429 Too Many Requests errors?
A: Use exponential backoff, add jitter, reduce concurrency, lower max_output_tokens, shorten prompts, and queue requests. Do not retry failed requests in a tight loop.

Q: Does increasing my usage tier increase rate limits?
A: Usually, yes. Higher usage tiers often receive higher rate limits across many models. Exact limits still vary by model and endpoint.

Q: Does a higher rate limit increase the model context window?
A: No. Rate limits control throughput over time. Context windows control how much content fits in one request.

Q: Does the Batch API use the same rate limits?
A: No. Batch API rate limits are separate from normal synchronous model rate limits. Batch jobs also have per-batch size, batch creation, and enqueued-token limits.

Final Thoughts

OpenAI API rate limits are not one fixed number. They depend on your organization, project, model, endpoint, usage tier, and workload type.

Use the OpenAI dashboard to check your exact limits. Use response headers to monitor live usage. Use queues, exponential backoff, realistic output budgets, and the Batch API to keep production apps stable under load.

See Also:

Changelog:

05/11/2026

  • Revised

04/24/2026

  • Updated for GPT‑5.5 and GPT Image 2

03/17/2026

  • Updated for GPT‑5.4 mini and nano

03/05/2026

  • Updated for GPT-5.4

12/11/2025

  • Updated for GPT-5.2

10/18/2025

  • Update rate limits

09/12/2025

  • gpt-5 and gpt-5-mini API rate limits are now more than doubled for T1-T4 tiers

08/07/2025

  • Updated for GPT-5

04/25/2025

  • Updated for gpt-4.1, o4, o3.
  • Removed old models like gpt-3.5

12/18/2024

  • Updated for o1

10/07/2024

  • Added gpt-4o-realtime-preview

10/02/2024

  • Update o1-preview & o1-mini

09/13/2024

  • Added o1-preview & o1-mini

08/07/2024

  • Updated for GPT-4o-mini
  • Clean up

05/14/2024

  • Updated for GPT-4o

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Get the latest & top AI tools sent directly to your email.

Subscribe now to explore the latest & top AI tools and resources, all in one convenient newsletter. No spam, we promise!