OpenAI API Rate Limits: GPT-5.5, GPT Image 2 & Free Tier (2026)

OpenAI API rate limits control how many requests, tokens, images, and batch jobs your organization can send over a fixed period.

They are not the same as token limits, model context windows, or monthly billing limits.

The exact limits depend on your organization, project, usage tier, model, endpoint, and sometimes the request type.

This guide explains the main OpenAI rate limit types, how usage tiers work, where to check your current limits, and how to fix 429 “Too Many Requests” errors.

This page lists all current OpenAI API rate limits (2026), including free trial and Tier 1-5 users, with quick-reference charts for GPT-5.5, GPT-5.5 Pro, GPT-5.4 Pro/Mini/Nano, and GPT Image 2 models.

Table Of Contents

Quick Summary Table (2026)
Understanding the OpenAI API Rate Limits
OpenAI Rate Limits vs Token Limits
Current Rate Limits by Usage Tier
Rate Limits For Pay-as-you-go Users (Tier 1 – Tier 5)
Free Tier Rate Limits (2026)
How OpenAI API Rate Limits Work
OpenAI Usage Tiers
Where to Check Your Current OpenAI Rate Limits
Rate Limits by API Type
Batch API Rate Limits
Why You Can Get 429 Errors Below the Published Limit
How to Fix OpenAI 429 Errors
How to Prevent Rate Limit Problems
How to Increase OpenAI API Rate Limits
Real-world Examples
Quick Troubleshooting Checklist

Quick Summary Table (2026)

Model	Free Tier	Tier 1	Tier 3	Tier 5
GPT-5.5	–	500 RPM / 500k TPM	5k RPM / 2M TPM	15k RPM / 40M TPM
GPT-5.5 Pro	–	500 RPM / 30k TPM	5k RPM / 800k TPM	10k RPM / 30M TPM
GPT-5.4 Mini	–	500 RPM / 500k TPM	5k RPM / 4M TPM	30k RPM / 180M TPM
GPT-5.4 Nano	–	500 RPM / 200k TPM	5k RPM / 4M TPM	30k RPM / 180M TPM
GPT Image 2	–	5 img/min	50 img/min	250 img/min

Understanding the OpenAI API Rate Limits

OpenAI API rate limits are measured mainly through RPM, RPD, TPM, TPD, IPM, and audio-minute limits for some streaming audio models:

Limit	Meaning	What it controls
RPM	Requests per minute	How many API calls you can send per minute
RPD	Requests per day	How many API calls you can send per day
TPM	Tokens per minute	How many input and output tokens you can process per minute
TPD	Tokens per day	How many tokens you can process per day
IPM	Images per minute	How many image generations or image edits you can run per minute
Audio minutes per minute	Audio throughput per minute	How much streaming audio some audio models can process per minute
Batch queue limit	Enqueued prompt tokens per model	How many input tokens can sit in the Batch API queue

You can hit any one of these limits first. A request can fail even if you are under your TPM limit, because you may have hit RPM. A request can also fail even if you are under RPM, because the prompt, response budget, or queued batch tokens may exceed another limit.

OpenAI Rate Limits vs Token Limits

Rate limits and token limits are different.

Limit type	What it means	Example problem
Rate limit	How much API usage your organization can send over time	429 error after too many requests in one minute
Context window	How much input and output can fit in one model request	Long document is too large for the selected model
Max output tokens	Longest answer the model can return in one response	Model stops before a long report is complete
Usage limit	Monthly spend cap for an organization or project	API stops because the monthly budget is reached

If your request is too large for a model’s context window, raising your rate limit will not fix it. If your app gets 429 errors during traffic spikes, switching to a larger context model will not fix it.

For model context windows and max output limits, see ChatGPT and OpenAI token limits.

Current Rate Limits by Usage Tier

OpenAI automatically assigns you to usage tiers based on your payment history and API usage patterns. Higher tiers get better rate limits and access to newer models.

TIER	QUALIFICATION	MAX CREDITS
Free	User must be in an allowed geography	$100
Tier 1	$5 paid	$100
Tier 2	$50 paid and 7+ days since first successful payment	$500
Tier 3	$100 paid and 7+ days since first successful payment	$1,000
Tier 4	$250 paid and 14+ days since first successful payment	$5,000
Tier 5	$1,000 paid and 30+ days since first successful payment	$200,000

Rate Limits For Pay-as-you-go Users (Tier 1 – Tier 5)

Model	RPM	RPD	TPM	Batch Queue Limit
gpt-5.5	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 15,000 (T5)	–	500,000 (T1) 1,000,000 (T2) 2,000,000 (T3) 4,000,000 (T4) 40,000,000 (T5)	150,000 (T1) 3,000,000 (T2) 100,000,000 (T3) 200,000,000 (T4) 15,000,000,000 (T5)
gpt-5.5-pro	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5)		500,000 (T1) 1,000,000 (T2) 2,000,000 (T3) 4,000,000 (T4) 40,000,000 (T5)	150,000 (T1) 3,000,000 (T2) 100,000,000 (T3) 200,000,000 (T4) 15,000,000,000 (T5)
gpt-5.4	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 15,000 (T5)	–	500,000 (T1) 1,000,000 (T2) 2,000,000 (T3) 4,000,000 (T4) 40,000,000 (T5)	150,000 (T1) 3,000,000 (T2) 100,000,000 (T3) 200,000,000 (T4) 15,000,000,000 (T5)
gpt-5.4-pro	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5)	–	30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5)	90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5)
gpt-5.4-mini	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5)	–	500,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 180,000,000 (T5	5,000,000 (T1) 20,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5)
gpt-5.4-nano	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5)	–	200,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 180,000,000 (T5)	2,000,000 (T1) 20,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5)
gpt-5.3-Codex	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 15,000 (T5)		500,000 (T1) 1,000,000 (T2) 2,000,000 (T3) 4,000,000 (T4) 40,000,000 (T5)	150,000 (T1) 3,000,000 (T2) 100,000,000 (T3) 200,000,000 (T4) 15,000,000,000 (T5)
gpt-5.2	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 15,000 (T5)	–	500,000 (T1) 1,000,000 (T2) 2,000,000 (T3) 4,000,000 (T4) 40,000,000 (T5)	150,000 (T1) 3,000,000 (T2) 100,000,000 (T3) 200,000,000 (T4) 15,000,000,000 (T5)
gpt-5.2-pro	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5)	–	30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5)	90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5)
gpt-5	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 15,000 (T5)		500,000 (T1) 1,000,000 (T2) 2,000,000 (T3) 4,000,000 (T4) 40,000,000 (T5)	150,000 (T1) 3,000,000 (T2) 100,000,000 (T3) 200,000,000 (T4) 15,000,000,000 (T5)
gpt-5-mini	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5)	–	500,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 180,000,000 (T5	5,000,000 (T1) 20,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5)
gpt-5-nano	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5)	–	200,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 180,000,000 (T5)	2,000,000 (T1) 20,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5)
gpt-5-pro	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5)	–	30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5)	90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5)
gpt-4.1	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5)	–	30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5)	90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5)
gpt-4.1-mini	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5)	10,000 (T1)	200,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 150,000,000 (T5)	2,000,000 (T1) 20,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5)
gpt-4.1-nano	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5)	10,000 (T1)	200,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 150,000,000 (T5)	2,000,000 (T1) 20,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5)
o4-mini	1,000 (T1) 2,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5)	–	100,000 (T1) 200,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 150,000,000 (T5)	1,000,000 (T1) 2,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5)
o4-mini-deep-research	1,000 (T1) 2,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5)	–	200,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 150,000,000 (T5)	200,000 (T1) 300,000 (T2) 500,000 (T3) 2,000,000 (T4) 10,000,000 (T5)
o3-pro	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5)	–	30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5)	90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5)
o3	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5)	–	30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5)	90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5)
o3-mini	1,000 (T1) 2,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5)	–	100,000 (T1) 200,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 150,000,000 (T5)	1,000,000 (T1) 2,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5)
o3-deep-research	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5)	–	20,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5)	200,000 (T1) 300,000 (T2) 500,000 (T3) 2,000,000 (T4) 10,000,000 (T5)
o1-pro	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5)		30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5)	90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5)
o1	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5)	–	30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5)	90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5)
o1-mini	1,000 (T1) 2,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5)	–	100,000 (T1) 200,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 150,000,000 (T5)	1,000,000 (T1) 2,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5)
Sora 2 Pro	Deprecated	–	–	–
Sora	Deprecated	–	–	–
gpt-4o	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5)	–	30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5)	90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5)
gpt-4o-mini	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5)	10,000 (T1)	200,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 150,000,000 (T5)	2,000,000 (T1) 20,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5)
gpt-4o-audio	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5)	–	30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5)	90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5)
gpt-4o-mini-audio	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5)	10,000 (T1)	200,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 150,000,000 (T5)	2,000,000 (T1) 20,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5)
GPT-4o Realtime	200 (T1) 400 (T2) 5,000 (T3) 10,000 (T4) 20,000 (T5)	10,000 (T1)	40,000 (T1) 200,000 (T2) 800,000 (T3) 4,000,000 (T4) 15,000,000 (T5)	–
GPT-4o Mini Realtime	200 (T1) 400 (T2) 5,000 (T3) 10,000 (T4) 20,000 (T5)	10,000 (T1)	40,000 (T1) 200,000 (T2) 800,000 (T3) 4,000,000 (T4) 15,000,000 (T5)	–
gpt-image-2	5 img/min (T1) 20 img/min (T2) 50 img/min (T3) 100 img/min (T4) 250 img/min (T5)		100,000 (T1) 250,000 (T2) 800,000 (T3) 3,000,000 (T4) 8,000,000 (T5)
gpt-image-1.5	5 img/min (T1) 20 img/min (T2) 50 img/min (T3) 100 img/min (T4) 250 img/min (T5)	–	100,000 (T1) 250,000 (T2) 800,000 (T3) 3,000,000 (T4) 8,000,000 (T5)	–
gpt-image-1-mini	5 img/min (T1) 20 img/min (T2) 50 img/min (T3) 150 img/min (T4) 250 img/min (T5)	–	100,000 (T1) 250,000 (T2) 800,000 (T3) 3,000,000 (T4) 8,000,000 (T5)	–
dall-e-3 (Deprecated)	500 img/min (T1) 2500 img/min (T2) 5000 img/min (T3) 7500 img/min (T4) 10,000 img/min (T5)	–	–	–
dall-e-2 (Deprecated)	500 img/min (T1) 2500 img/min (T2) 5000 img/min (T3) 7500 img/min (T4) 10,000 img/min (T5)	–	–	–
gpt-audio-1.5	500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5)	30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5)	90,000 (T1) 1, 350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5)	–
gpt-realtime-1.5	200 (T1) 400 (T2) 5,000 (T3) 10,000 (T4) 20,000 (T5)	1,000 (T1)	40,000 (T1) 200,000 (T2) 800,000 (T3) 4,000,000 (T4) 15,000,000 (T5)	–
GPT-4o mini TTS	500 (T1) 2,000(T2) 5,000 (T3) 10,000 (T4) 10,000 (T5)	–	50,000 (T1) 150,000 (T2) 600,000 (T3) 2,000,000 (T4) 8,000,000 (T5)	–
tts-1	500 (T1) 2,000(T2) 5,000 (T3) 10,000 (T4) 10,000 (T5)	–	–	–
TTS-1 HD	500 (T1) 2,000(T2) 5,000 (T3) 10,000 (T4) 10,000 (T5)	–	–	–

Free Tier Rate Limits (2026)

Note: Free / trial API accounts may not have access to all models (e.g. GPT-5, o4-mini) or may be subject to stricter limits. The numbers here assume model availability when permitted by OpenAI.

Model	TPM	RPM	RPD	TPD
Chat
gpt-5.5 (Not supported)	10,000	3	200	900,000
gpt-5.5-pro (Not supported)	150,000	3	200	–
gpt-5.4 (Not supported)	10,000	3	200	900,000
gpt-5.4-pro (Not supported)	150,000	3	200	–
gpt-5.3-codex	10,000	3	200	900,000
gpt-5.3-chat-latest	10,000	3	200	900,000
gpt-5.2 (Not supported)	10,000	3	200	900,000
gpt-5.2-pro (Not supported)	10,000	3	200	900,000
gpt-5-mini (Not supported)	60,000	3	200	200,000
gpt-5-nano (Not supported)	60,000	3	200	200,000
gpt-5-pro (Not supported)	30,000	3	200	90,000
gpt-5-search-api (Not supported)	3,000	3	200	–
gpt-4.1	10,000	3	200	900,000
gpt-4.1 (long context)	60,000	3	200	200,000
gpt-4.1-mini	60,000	3	200	200,000
gpt-4.1-mini (long context)	120,000	3	200	400,000
gpt-4.1-nano	60,000	3	200	200,000
gpt-4.1-nano (long context)	120,000	3	200	400,000
gpt-4o	10,000	3	200	900,000
gpt-4o-audio-preview	150,000	3	200	–
gpt-4o-search-preview	3,000	3	200	–
gpt-4o-transcribe	–	–	200	–
gpt-4o-mini	60,000	3	200	200,000
gpt-4o-mini-search-preview	3,000	3	200	–
gpt-4o-mini-transcribe	–	–	200	–
gpt-3.5-turbo	40,000	3	200	200,000
gpt-3.5-turbo-0125	40,000	3	200	200,000
gpt-3.5-turbo-1106	40,000	3	200	200,000
gpt-3.5-turbo-16k	40,000	3	200	540,000
gpt-3.5-turbo-instruct	90,000	3	200	200,000
gpt-3.5-turbo-instruct-0914	90,000	3	200	200,000
Text
o1	150,000	3	200	90,000
o1-mini	150,000	3	200	–
o3	100,000	3	200	90,000
o3-mini	1,000,000	–	150	200,000
o4-mini	100,000	3	200	90,000
babbage-002	150,000	3	200	–
davinci-002	150,000	3	200	–
text-embedding-3-large	40,000	100	2,000	–
text-embedding-3-small	40,000	100	2,000	–
text-embedding-ada-002	40,000	100	2,000	–
Audio
gpt-4o-mini-tts	–	–	200
tts-1	150,000	3	200
tts-1-1106	150,000	3	200
tts-1-hd	150,000	3	200
tts-1-hd-1106	150,000	3	200
whisper-1	150,000	3	200
Moderation
omni-moderation-2024-09-26	5,000	250	10,000
omni-moderation-latest	5,000	250	10,000
text-moderation-stable	150,000	3	200
text-moderation-latest	150,000	3	200
text-moderation-stable	150,000	3	200
Fine-tuning Inference
babbage-002	150,000	3
davinci-002	150,000	3
gpt-3.5-turbo-0125	40,000	3
gpt-3.5-turbo-0613	40,000	3
gpt-3.5-turbo-1106	40,000	3
gpt-4-0613	40,000	3
gpt-4o-2024-05-13	10,000	3
gpt-4o-mini-2024-07-18	60,000	3
Fine-tuning Training	ACTIVE / QUEUED JOBS	JOBS PER DAY
babbage-002	3	48
davinci-002	3	48
gpt-3.5-turbo-0613	3	48
Image
DALL·E 2	150,000 TPM, 3 RPM, 200 RPD, 5 images per minute
DALL·E 3	150,000 TPM, 3 RPM, 200 RPD
gpt-image-1 (Not supported)	3 RPM, 200 RPD
gpt-image-1-mini (Not supported)	3 RPM, 200 RPD
gpt-image-2 (Not supported)	3 RPM, 200 RPD
Video
sora-2 (Not supported)	Deprecated
sora-2-pro (Not supported)	Deprecated
Other
Default limits for all other models	150,000	3	200

How OpenAI API Rate Limits Work

OpenAI rate limits apply at the organization and project level, not only at the individual API key level. They also vary by model. Some model families share rate limits, so calls to several related models can count against the same shared pool.

Long-context requests can have separate limits. This matters for models such as GPT-4.1 and other large-context models because a short request and a very long request may not draw from the same practical capacity.

Batch API limits are separate from normal synchronous API limits. Batch jobs use their own queue-based limits, so they are useful when you need to process a large amount of work without real-time responses.

OpenAI Usage Tiers

OpenAI assigns API accounts to usage tiers based on payment history and account status. Higher tiers usually receive higher usage limits and better rate limits across many models.

Tier	Qualification	Usage limit
Free	User must be in an allowed geography	$100 per month
Tier 1	$5 paid	$100 per month
Tier 2	$50 paid and 7+ days since first successful payment	$500 per month
Tier 3	$100 paid and 7+ days since first successful payment	$1,000 per month
Tier 4	$250 paid and 14+ days since first successful payment	$5,000 per month
Tier 5	$1,000 paid and 30+ days since first successful payment	$200,000 per month

These tiers do not mean every model has the same RPM or TPM. A Tier 3 organization can have different limits for GPT models, reasoning models, embeddings, image models, realtime models, and fine-tuned models.

Where to Check Your Current OpenAI Rate Limits

Check your exact API limits in the OpenAI dashboard:

Open the OpenAI Platform dashboard.
Go to account settings.
Open the Limits section.
Select the relevant organization and project.
Review limits by model, endpoint, and shared limit group.y public article because OpenAI can change model access, shared limits, and usage tiers over time.

You can also inspect rate-limit headers from API responses. These headers show the current request and token budget for the request you just made.

Header	What it shows
x-ratelimit-limit-requests	Maximum request count before the request limit is exhausted
x-ratelimit-limit-tokens	Maximum token count before the token limit is exhausted
x-ratelimit-remaining-requests	Remaining requests in the current rate-limit window
x-ratelimit-remaining-tokens	Remaining tokens in the current rate-limit window
x-ratelimit-reset-requests	Time until the request limit resets
x-ratelimit-reset-tokens	Time until the token limit resets

If you run production traffic, log these headers. They are often the fastest way to identify whether your app is limited by RPM, TPM, or short bursts.

Rate Limits by API Type

Different OpenAI APIs use different rate-limit patterns.

API type	Main limits to watch	Practical issue
Responses API	RPM, TPM, TPD, shared model limits	Chat, agents, tools, and multimodal workloads can hit token limits quickly
Chat Completions API	RPM, TPM, TPD	Legacy chat apps often hit RPM before TPM
Embeddings API	RPM, TPM, request payload size	Large indexing jobs can hit token throughput limits
Images API	IPM and image-specific limits	Image generation may be limited by images per minute rather than tokens
Realtime API	Session, audio, token, and model limits	Voice apps need careful concurrency control
Batch API	Per-batch size, batch creation rate, enqueued prompt tokens	Large offline jobs need queue planning
Fine-tuning	Training jobs, queued jobs, model-specific limits	Training and inference limits are separate

For most apps, RPM and TPM are the first limits to monitor. For image apps, IPM matters more. For offline data jobs, Batch API queue limits matter more than standard synchronous limits.

Batch API Rate Limits

The Batch API is designed for asynchronous jobs that do not need an immediate response. It uses a separate rate-limit pool from standard synchronous requests.

Batch limits include:

Batch limit	Current behavior
Requests per batch	Up to 50,000 requests in one batch
Batch input file size	Up to 200 MB
Batch creation rate	Up to 2,000 batches per hour
Enqueued prompt tokens	Model-specific queue limit shown in the Platform settings
Completion window	Batch jobs are designed around a 24-hour processing window

Use the Batch API for evaluations, classification jobs, embedding jobs, backfills, data cleanup, and other work that can wait. Do not use Batch API for chatbots, live assistants, realtime tools, or user-facing requests that need immediate answers.

Why You Can Get 429 Errors Below the Published Limit

Some rate limits are enforced over shorter windows than one full minute. This means a limit such as 60,000 requests per minute may behave like a smaller per-second allowance during bursts.

You can also trigger 429 errors when your max_completion_tokens is too high. OpenAI can estimate token usage from your prompt plus your requested completion budget. If you ask for a much larger completion than you usually need, your request may reserve more token capacity than the final answer uses.

Common reasons for surprise 429 errors include:

Too many requests sent at the same time.
A traffic spike exceeded a short internal window.
max_completion_tokens was set far above the needed response size.
Several models shared the same rate-limit pool.
Batch jobs filled the model’s queued-token limit.
The API key used a different organization than expected.
The project had a lower limit than the organization.

How to Fix OpenAI 429 Errors

The fastest fix is to slow down and retry with exponential backoff. Do not resend failed requests in a tight loop. Failed requests can still count against rate limits, so aggressive retries can make the problem worse.

Use this retry pattern:

Catch 429 errors.
Wait briefly before retrying.
Increase the wait time after each failed retry.
Add random jitter so many workers do not retry at the same time.
Stop after a maximum number of retries.
Log the response headers and error message.

Example Python pattern:

import random
import time
from openai import OpenAI
client = OpenAI()

def call_with_backoff(messages, max_retries=6):
    delay = 1.0
    for attempt in range(max_retries):
        try:
            return client.responses.create(
                model="gpt-5.4-mini",
                input=messages,
                max_output_tokens=800,
            )
        except Exception as error:
            message = str(error).lower()
            if "rate limit" not in message and "429" not in message:
                raise
            if attempt == max_retries - 1:
                raise
            sleep_for = delay + random.uniform(0, 0.5)
            time.sleep(sleep_for)
            delay *= 2

Keep max_output_tokens close to the response length you actually need. If your app usually returns 500 tokens, do not reserve 8,000 tokens by default.

How to Prevent Rate Limit Problems

Control Concurrency

Limit the number of simultaneous requests from workers, queues, and background jobs. A single server may stay under the limit, but several workers can exceed the shared organization or project pool together.

Use a Request Queue

Put API work into a queue when traffic is uneven. A queue lets you smooth spikes, retry failed jobs, and protect user-facing routes from sudden API backpressure.

Reduce Token Waste

Shorter prompts use less TPM. Remove repeated instructions, old conversation history, unused examples, large JSON blobs, and irrelevant retrieved text.

Set Realistic Output Budgets

Large max_output_tokens values can increase the estimated token budget for a request. Use different defaults for short answers, summaries, reports, and long-form generation.

Batch Offline Work

Use the Batch API for jobs that do not need live responses. This is often better for evaluations, embeddings, classification, extraction, and nightly processing.

Split Traffic by Workload

Separate live user requests from background jobs. Live traffic should not compete with batch enrichment, analytics, or testing jobs in the same uncontrolled queue.

Log Rate Limit Headers

Save remaining request and token headers in your logs. This helps you see whether the app is hitting RPM, TPM, or a shared model pool.

How to Increase OpenAI API Rate Limits

The normal way to increase OpenAI API rate limits is to move up usage tiers through paid API usage and payment history. Many accounts graduate automatically as spend and account age increase.

If your app already uses backoff, queueing, realistic token budgets, and the right model, but still needs more throughput, check the Limits section of the OpenAI Platform dashboard. Eligible accounts can request higher limits or reach the next tier through additional API usage.

Enterprise and high-volume customers may also use dedicated arrangements such as Scale Tier or other capacity options. Those options are separate from normal pay-as-you-go rate limits.

Real-world Examples

Chatbot With Too Many Small Requests

A support chatbot may hit RPM before TPM if users send many short messages. The fix is not a larger-context model. The fix is request throttling, queueing, caching repeated answers, and batching internal background calls.

Long Report Generator

A report generator may hit TPM before RPM because each request uses a large prompt and a large output budget. The fix is shorter retrieved context, section-by-section generation, and tighter output limits.

Embedding a Large Website

An embedding job may hit token throughput limits while indexing thousands of pages. The fix is a queue, backoff, deduplication, and Batch API use when the job does not need real-time responses.

Image Generation App

An image app may hit IPM rather than token limits. The fix is a per-user queue, clear wait states, and a cap on simultaneous image jobs.

Quick Troubleshooting Checklist

Symptom	Likely cause	Fix
429 after many short calls	RPM limit	Add queueing and concurrency control
429 after long prompts	TPM limit	Reduce prompt size and output budget
429 during traffic spikes	Short-window burst limit	Add backoff and jitter
429 only in one project	Project-level limit	Check project settings
429 after switching models	Shared model pool or lower model limit	Check the model’s limit group
Batch job will not start	Batch queue limit	Reduce queued tokens or wait for jobs to finish
API stops after spend cap	Usage limit	Raise monthly usage limit if appropriate

Relevant Resources

Official OpenAI Documentation on Rate Limits: platform.openai.com/docs/guides/rate-limits
Your Account’s Rate Limit Dashboard: platform.openai.com/settings/organization/limits
OpenAI Cookbook on Handling Rate Limits: cookbook.openai.com/examples/how_to_handle_rate_limits

FAQs

Q: What are OpenAI API rate limits?
A: OpenAI API rate limits control how many requests, tokens, images, and batch jobs an organization or project can send within a time window. They protect platform stability and control throughput across users.

Q: What is RPM in the OpenAI API?
A: RPM means requests per minute. It controls how many API calls your organization or project can send in one minute.

Q: What is TPM in the OpenAI API?
A: TPM means tokens per minute. It controls how many input and output tokens your organization or project can process in one minute.

Q: What is TPD in the OpenAI API?
A: TPD means tokens per day. It controls total daily token throughput for a model, project, or organization.

Q: What is IPM in the OpenAI API?
A: IPM means images per minute. It applies to image generation and image editing models.

Q: Are OpenAI rate limits per API key?
A: Rate limits are mainly tied to organization and project settings. A single API key can be affected by limits shared across the organization or project.

Q: Are rate limits the same for every OpenAI model?
A: No. Rate limits vary by model, endpoint, usage tier, project, and shared limit group. Some long-context requests also have separate limits.

Q: How do I check my OpenAI rate limit?
A: Open the OpenAI Platform dashboard, select the correct organization and project, then open the Limits section. You can also inspect rate-limit headers in API responses.

Q: Why am I getting 429 errors if I am under the per-minute limit?
A: OpenAI can enforce limits over shorter windows than a full minute. You can also hit a different limit, such as TPM, shared model limits, project limits, or batch queue limits.

Q: How do I fix 429 Too Many Requests errors?
A: Use exponential backoff, add jitter, reduce concurrency, lower max_output_tokens, shorten prompts, and queue requests. Do not retry failed requests in a tight loop.

Q: Does increasing my usage tier increase rate limits?
A: Usually, yes. Higher usage tiers often receive higher rate limits across many models. Exact limits still vary by model and endpoint.

Q: Does a higher rate limit increase the model context window?
A: No. Rate limits control throughput over time. Context windows control how much content fits in one request.

Q: Does the Batch API use the same rate limits?
A: No. Batch API rate limits are separate from normal synchronous model rate limits. Batch jobs also have per-batch size, batch creation, and enqueued-token limits.

Final Thoughts

OpenAI API rate limits are not one fixed number. They depend on your organization, project, model, endpoint, usage tier, and workload type.

Use the OpenAI dashboard to check your exact limits. Use response headers to monitor live usage. Use queues, exponential backoff, realistic output budgets, and the Batch API to keep production apps stable under load.

Changelog:

05/11/2026

Revised

04/24/2026

Updated for GPT‑5.5 and GPT Image 2

03/17/2026

Updated for GPT‑5.4 mini and nano

03/05/2026

Updated for GPT-5.4

12/11/2025

Updated for GPT-5.2

10/18/2025

Update rate limits

09/12/2025

gpt-5 and gpt-5-mini API rate limits are now more than doubled for T1-T4 tiers

08/07/2025

Updated for GPT-5

04/25/2025

Updated for gpt-4.1, o4, o3.
Removed old models like gpt-3.5

12/18/2024

Updated for o1

10/07/2024

Added gpt-4o-realtime-preview

10/02/2024

Update o1-preview & o1-mini

09/13/2024

Added o1-preview & o1-mini

08/07/2024

Updated for GPT-4o-mini
Clean up

05/14/2024

Updated for GPT-4o