OpenAI API rate limits control how many requests, tokens, images, and batch jobs your organization can send over a fixed period.
They are not the same as token limits, model context windows, or monthly billing limits.
The exact limits depend on your organization, project, usage tier, model, endpoint, and sometimes the request type.
This guide explains the main OpenAI rate limit types, how usage tiers work, where to check your current limits, and how to fix 429 “Too Many Requests” errors.
This page lists all current OpenAI API rate limits (2026), including free trial and Tier 1-5 users, with quick-reference charts for GPT-5.5, GPT-5.5 Pro, GPT-5.4 Pro/Mini/Nano, and GPT Image 2 models.
Table Of Contents
- Quick Summary Table (2026)
- Understanding the OpenAI API Rate Limits
- OpenAI Rate Limits vs Token Limits
- Current Rate Limits by Usage Tier
- Rate Limits For Pay-as-you-go Users (Tier 1 – Tier 5)
- Free Tier Rate Limits (2026)
- How OpenAI API Rate Limits Work
- OpenAI Usage Tiers
- Where to Check Your Current OpenAI Rate Limits
- Rate Limits by API Type
- Batch API Rate Limits
- Why You Can Get 429 Errors Below the Published Limit
- How to Fix OpenAI 429 Errors
- How to Prevent Rate Limit Problems
- How to Increase OpenAI API Rate Limits
- Real-world Examples
- Quick Troubleshooting Checklist
Quick Summary Table (2026)
| Model | Free Tier | Tier 1 | Tier 3 | Tier 5 |
|---|---|---|---|---|
| GPT-5.5 | – | 500 RPM / 500k TPM | 5k RPM / 2M TPM | 15k RPM / 40M TPM |
| GPT-5.5 Pro | – | 500 RPM / 30k TPM | 5k RPM / 800k TPM | 10k RPM / 30M TPM |
| GPT-5.4 Mini | – | 500 RPM / 500k TPM | 5k RPM / 4M TPM | 30k RPM / 180M TPM |
| GPT-5.4 Nano | – | 500 RPM / 200k TPM | 5k RPM / 4M TPM | 30k RPM / 180M TPM |
| GPT Image 2 | – | 5 img/min | 50 img/min | 250 img/min |
Understanding the OpenAI API Rate Limits
OpenAI API rate limits are measured mainly through RPM, RPD, TPM, TPD, IPM, and audio-minute limits for some streaming audio models:
| Limit | Meaning | What it controls |
|---|---|---|
| RPM | Requests per minute | How many API calls you can send per minute |
| RPD | Requests per day | How many API calls you can send per day |
| TPM | Tokens per minute | How many input and output tokens you can process per minute |
| TPD | Tokens per day | How many tokens you can process per day |
| IPM | Images per minute | How many image generations or image edits you can run per minute |
| Audio minutes per minute | Audio throughput per minute | How much streaming audio some audio models can process per minute |
| Batch queue limit | Enqueued prompt tokens per model | How many input tokens can sit in the Batch API queue |
You can hit any one of these limits first. A request can fail even if you are under your TPM limit, because you may have hit RPM. A request can also fail even if you are under RPM, because the prompt, response budget, or queued batch tokens may exceed another limit.
OpenAI Rate Limits vs Token Limits
Rate limits and token limits are different.
| Limit type | What it means | Example problem |
|---|---|---|
| Rate limit | How much API usage your organization can send over time | 429 error after too many requests in one minute |
| Context window | How much input and output can fit in one model request | Long document is too large for the selected model |
| Max output tokens | Longest answer the model can return in one response | Model stops before a long report is complete |
| Usage limit | Monthly spend cap for an organization or project | API stops because the monthly budget is reached |
If your request is too large for a model’s context window, raising your rate limit will not fix it. If your app gets 429 errors during traffic spikes, switching to a larger context model will not fix it.
For model context windows and max output limits, see ChatGPT and OpenAI token limits.
Current Rate Limits by Usage Tier
OpenAI automatically assigns you to usage tiers based on your payment history and API usage patterns. Higher tiers get better rate limits and access to newer models.
| TIER | QUALIFICATION | MAX CREDITS |
|---|---|---|
| Free | User must be in an allowed geography | $100 |
| Tier 1 | $5 paid | $100 |
| Tier 2 | $50 paid and 7+ days since first successful payment | $500 |
| Tier 3 | $100 paid and 7+ days since first successful payment | $1,000 |
| Tier 4 | $250 paid and 14+ days since first successful payment | $5,000 |
| Tier 5 | $1,000 paid and 30+ days since first successful payment | $200,000 |
Rate Limits For Pay-as-you-go Users (Tier 1 – Tier 5)
| Model | RPM | RPD | TPM | Batch Queue Limit |
|---|---|---|---|---|
| gpt-5.5 | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 15,000 (T5) | – | 500,000 (T1) 1,000,000 (T2) 2,000,000 (T3) 4,000,000 (T4) 40,000,000 (T5) | 150,000 (T1) 3,000,000 (T2) 100,000,000 (T3) 200,000,000 (T4) 15,000,000,000 (T5) |
| gpt-5.5-pro | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5) | 500,000 (T1) 1,000,000 (T2) 2,000,000 (T3) 4,000,000 (T4) 40,000,000 (T5) | 150,000 (T1) 3,000,000 (T2) 100,000,000 (T3) 200,000,000 (T4) 15,000,000,000 (T5) | |
| gpt-5.4 | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 15,000 (T5) | – | 500,000 (T1) 1,000,000 (T2) 2,000,000 (T3) 4,000,000 (T4) 40,000,000 (T5) | 150,000 (T1) 3,000,000 (T2) 100,000,000 (T3) 200,000,000 (T4) 15,000,000,000 (T5) |
| gpt-5.4-pro | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5) | – | 30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5) | 90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5) |
| gpt-5.4-mini | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5) | – | 500,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 180,000,000 (T5 | 5,000,000 (T1) 20,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5) |
| gpt-5.4-nano | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5) | – | 200,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 180,000,000 (T5) | 2,000,000 (T1) 20,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5) |
| gpt-5.3-Codex | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 15,000 (T5) | 500,000 (T1) 1,000,000 (T2) 2,000,000 (T3) 4,000,000 (T4) 40,000,000 (T5) | 150,000 (T1) 3,000,000 (T2) 100,000,000 (T3) 200,000,000 (T4) 15,000,000,000 (T5) | |
| gpt-5.2 | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 15,000 (T5) | – | 500,000 (T1) 1,000,000 (T2) 2,000,000 (T3) 4,000,000 (T4) 40,000,000 (T5) | 150,000 (T1) 3,000,000 (T2) 100,000,000 (T3) 200,000,000 (T4) 15,000,000,000 (T5) |
| gpt-5.2-pro | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5) | – | 30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5) | 90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5) |
| gpt-5 | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 15,000 (T5) | 500,000 (T1) 1,000,000 (T2) 2,000,000 (T3) 4,000,000 (T4) 40,000,000 (T5) | 150,000 (T1) 3,000,000 (T2) 100,000,000 (T3) 200,000,000 (T4) 15,000,000,000 (T5) | |
| gpt-5-mini | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5) | – | 500,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 180,000,000 (T5 | 5,000,000 (T1) 20,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5) |
| gpt-5-nano | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5) | – | 200,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 180,000,000 (T5) | 2,000,000 (T1) 20,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5) |
| gpt-5-pro | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5) | – | 30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5) | 90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5) |
| gpt-4.1 | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5) | – | 30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5) | 90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5) |
| gpt-4.1-mini | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5) | 10,000 (T1) | 200,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 150,000,000 (T5) | 2,000,000 (T1) 20,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5) |
| gpt-4.1-nano | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5) | 10,000 (T1) | 200,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 150,000,000 (T5) | 2,000,000 (T1) 20,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5) |
| o4-mini | 1,000 (T1) 2,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5) | – | 100,000 (T1) 200,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 150,000,000 (T5) | 1,000,000 (T1) 2,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5) |
| o4-mini-deep-research | 1,000 (T1) 2,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5) | – | 200,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 150,000,000 (T5) | 200,000 (T1) 300,000 (T2) 500,000 (T3) 2,000,000 (T4) 10,000,000 (T5) |
| o3-pro | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5) | – | 30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5) | 90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5) |
| o3 | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5) | – | 30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5) | 90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5) |
| o3-mini | 1,000 (T1) 2,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5) | – | 100,000 (T1) 200,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 150,000,000 (T5) | 1,000,000 (T1) 2,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5) |
| o3-deep-research | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5) | – | 20,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5) | 200,000 (T1) 300,000 (T2) 500,000 (T3) 2,000,000 (T4) 10,000,000 (T5) |
| o1-pro | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5) | 30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5) | 90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5) | |
| o1 | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5) | – | 30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5) | 90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5) |
| o1-mini | 1,000 (T1) 2,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5) | – | 100,000 (T1) 200,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 150,000,000 (T5) | 1,000,000 (T1) 2,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5) |
| Sora 2 Pro | Deprecated | – | – | – |
| Sora | Deprecated | – | – | – |
| gpt-4o | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5) | – | 30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5) | 90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5) |
| gpt-4o-mini | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5) | 10,000 (T1) | 200,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 150,000,000 (T5) | 2,000,000 (T1) 20,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5) |
| gpt-4o-audio | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5) | – | 30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5) | 90,000 (T1) 1,350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5) |
| gpt-4o-mini-audio | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 30,000 (T5) | 10,000 (T1) | 200,000 (T1) 2,000,000 (T2) 4,000,000 (T3) 10,000,000 (T4) 150,000,000 (T5) | 2,000,000 (T1) 20,000,000 (T2) 40,000,000 (T3) 1,000,000,000 (T4) 15,000,000,000 (T5) |
| GPT-4o Realtime | 200 (T1) 400 (T2) 5,000 (T3) 10,000 (T4) 20,000 (T5) | 10,000 (T1) | 40,000 (T1) 200,000 (T2) 800,000 (T3) 4,000,000 (T4) 15,000,000 (T5) | – |
| GPT-4o Mini Realtime | 200 (T1) 400 (T2) 5,000 (T3) 10,000 (T4) 20,000 (T5) | 10,000 (T1) | 40,000 (T1) 200,000 (T2) 800,000 (T3) 4,000,000 (T4) 15,000,000 (T5) | – |
| gpt-image-2 | 5 img/min (T1) 20 img/min (T2) 50 img/min (T3) 100 img/min (T4) 250 img/min (T5) | 100,000 (T1) 250,000 (T2) 800,000 (T3) 3,000,000 (T4) 8,000,000 (T5) | ||
| gpt-image-1.5 | 5 img/min (T1) 20 img/min (T2) 50 img/min (T3) 100 img/min (T4) 250 img/min (T5) | – | 100,000 (T1) 250,000 (T2) 800,000 (T3) 3,000,000 (T4) 8,000,000 (T5) | – |
| gpt-image-1-mini | 5 img/min (T1) 20 img/min (T2) 50 img/min (T3) 150 img/min (T4) 250 img/min (T5) | – | 100,000 (T1) 250,000 (T2) 800,000 (T3) 3,000,000 (T4) 8,000,000 (T5) | – |
| dall-e-3 (Deprecated) | 500 img/min (T1) 2500 img/min (T2) 5000 img/min (T3) 7500 img/min (T4) 10,000 img/min (T5) | – | – | – |
| dall-e-2 (Deprecated) | 500 img/min (T1) 2500 img/min (T2) 5000 img/min (T3) 7500 img/min (T4) 10,000 img/min (T5) | – | – | – |
| gpt-audio-1.5 | 500 (T1) 5,000 (T2) 5,000 (T3) 10,000 (T4) 10,000 (T5) | 30,000 (T1) 450,000 (T2) 800,000 (T3) 2,000,000 (T4) 30,000,000 (T5) | 90,000 (T1) 1, 350,000 (T2) 50,000,000 (T3) 200,000,000 (T4) 5,000,000,000 (T5) | – |
| gpt-realtime-1.5 | 200 (T1) 400 (T2) 5,000 (T3) 10,000 (T4) 20,000 (T5) | 1,000 (T1) | 40,000 (T1) 200,000 (T2) 800,000 (T3) 4,000,000 (T4) 15,000,000 (T5) | – |
| GPT-4o mini TTS | 500 (T1) 2,000(T2) 5,000 (T3) 10,000 (T4) 10,000 (T5) | – | 50,000 (T1) 150,000 (T2) 600,000 (T3) 2,000,000 (T4) 8,000,000 (T5) | – |
| tts-1 | 500 (T1) 2,000(T2) 5,000 (T3) 10,000 (T4) 10,000 (T5) | – | – | – |
| TTS-1 HD | 500 (T1) 2,000(T2) 5,000 (T3) 10,000 (T4) 10,000 (T5) | – | – | – |
Free Tier Rate Limits (2026)
Note: Free / trial API accounts may not have access to all models (e.g. GPT-5, o4-mini) or may be subject to stricter limits. The numbers here assume model availability when permitted by OpenAI.
| Model | TPM | RPM | RPD | TPD |
|---|---|---|---|---|
| Chat | ||||
| gpt-5.5 (Not supported) | 10,000 | 3 | 200 | 900,000 |
| gpt-5.5-pro (Not supported) | 150,000 | 3 | 200 | – |
| gpt-5.4 (Not supported) | 10,000 | 3 | 200 | 900,000 |
| gpt-5.4-pro (Not supported) | 150,000 | 3 | 200 | – |
| gpt-5.3-codex | 10,000 | 3 | 200 | 900,000 |
| gpt-5.3-chat-latest | 10,000 | 3 | 200 | 900,000 |
| gpt-5.2 (Not supported) | 10,000 | 3 | 200 | 900,000 |
| gpt-5.2-pro (Not supported) | 10,000 | 3 | 200 | 900,000 |
| gpt-5-mini (Not supported) | 60,000 | 3 | 200 | 200,000 |
| gpt-5-nano (Not supported) | 60,000 | 3 | 200 | 200,000 |
| gpt-5-pro (Not supported) | 30,000 | 3 | 200 | 90,000 |
| gpt-5-search-api (Not supported) | 3,000 | 3 | 200 | – |
| gpt-4.1 | 10,000 | 3 | 200 | 900,000 |
| gpt-4.1 (long context) | 60,000 | 3 | 200 | 200,000 |
| gpt-4.1-mini | 60,000 | 3 | 200 | 200,000 |
| gpt-4.1-mini (long context) | 120,000 | 3 | 200 | 400,000 |
| gpt-4.1-nano | 60,000 | 3 | 200 | 200,000 |
| gpt-4.1-nano (long context) | 120,000 | 3 | 200 | 400,000 |
| gpt-4o | 10,000 | 3 | 200 | 900,000 |
| gpt-4o-audio-preview | 150,000 | 3 | 200 | – |
| gpt-4o-search-preview | 3,000 | 3 | 200 | – |
| gpt-4o-transcribe | – | – | 200 | – |
| gpt-4o-mini | 60,000 | 3 | 200 | 200,000 |
| gpt-4o-mini-search-preview | 3,000 | 3 | 200 | – |
| gpt-4o-mini-transcribe | – | – | 200 | – |
| gpt-3.5-turbo | 40,000 | 3 | 200 | 200,000 |
| gpt-3.5-turbo-0125 | 40,000 | 3 | 200 | 200,000 |
| gpt-3.5-turbo-1106 | 40,000 | 3 | 200 | 200,000 |
| gpt-3.5-turbo-16k | 40,000 | 3 | 200 | 540,000 |
| gpt-3.5-turbo-instruct | 90,000 | 3 | 200 | 200,000 |
| gpt-3.5-turbo-instruct-0914 | 90,000 | 3 | 200 | 200,000 |
| Text | ||||
| o1 | 150,000 | 3 | 200 | 90,000 |
| o1-mini | 150,000 | 3 | 200 | – |
| o3 | 100,000 | 3 | 200 | 90,000 |
| o3-mini | 1,000,000 | – | 150 | 200,000 |
| o4-mini | 100,000 | 3 | 200 | 90,000 |
| babbage-002 | 150,000 | 3 | 200 | – |
| davinci-002 | 150,000 | 3 | 200 | – |
| text-embedding-3-large | 40,000 | 100 | 2,000 | – |
| text-embedding-3-small | 40,000 | 100 | 2,000 | – |
| text-embedding-ada-002 | 40,000 | 100 | 2,000 | – |
| Audio | ||||
| gpt-4o-mini-tts | – | – | 200 | |
| tts-1 | 150,000 | 3 | 200 | |
| tts-1-1106 | 150,000 | 3 | 200 | |
| tts-1-hd | 150,000 | 3 | 200 | |
| tts-1-hd-1106 | 150,000 | 3 | 200 | |
| whisper-1 | 150,000 | 3 | 200 | |
| Moderation | ||||
| omni-moderation-2024-09-26 | 5,000 | 250 | 10,000 | |
| omni-moderation-latest | 5,000 | 250 | 10,000 | |
| text-moderation-stable | 150,000 | 3 | 200 | |
| text-moderation-latest | 150,000 | 3 | 200 | |
| text-moderation-stable | 150,000 | 3 | 200 | |
| Fine-tuning Inference | ||||
| babbage-002 | 150,000 | 3 | ||
| davinci-002 | 150,000 | 3 | ||
| gpt-3.5-turbo-0125 | 40,000 | 3 | ||
| gpt-3.5-turbo-0613 | 40,000 | 3 | ||
| gpt-3.5-turbo-1106 | 40,000 | 3 | ||
| gpt-4-0613 | 40,000 | 3 | ||
| gpt-4o-2024-05-13 | 10,000 | 3 | ||
| gpt-4o-mini-2024-07-18 | 60,000 | 3 | ||
| Fine-tuning Training | ACTIVE / QUEUED JOBS | JOBS PER DAY | ||
| babbage-002 | 3 | 48 | ||
| davinci-002 | 3 | 48 | ||
| gpt-3.5-turbo-0613 | 3 | 48 | ||
| Image | ||||
| DALL·E 2 | 150,000 TPM, 3 RPM, 200 RPD, 5 images per minute | |||
| DALL·E 3 | 150,000 TPM, 3 RPM, 200 RPD | |||
| gpt-image-1 (Not supported) | 3 RPM, 200 RPD | |||
| gpt-image-1-mini (Not supported) | 3 RPM, 200 RPD | |||
| gpt-image-2 (Not supported) | 3 RPM, 200 RPD | |||
| Video | ||||
| sora-2 (Not supported) | Deprecated | |||
| sora-2-pro (Not supported) | Deprecated | |||
| Other | ||||
| Default limits for all other models | 150,000 | 3 | 200 | |
How OpenAI API Rate Limits Work
OpenAI rate limits apply at the organization and project level, not only at the individual API key level. They also vary by model. Some model families share rate limits, so calls to several related models can count against the same shared pool.
Long-context requests can have separate limits. This matters for models such as GPT-4.1 and other large-context models because a short request and a very long request may not draw from the same practical capacity.
Batch API limits are separate from normal synchronous API limits. Batch jobs use their own queue-based limits, so they are useful when you need to process a large amount of work without real-time responses.
OpenAI Usage Tiers
OpenAI assigns API accounts to usage tiers based on payment history and account status. Higher tiers usually receive higher usage limits and better rate limits across many models.
| Tier | Qualification | Usage limit |
|---|---|---|
| Free | User must be in an allowed geography | $100 per month |
| Tier 1 | $5 paid | $100 per month |
| Tier 2 | $50 paid and 7+ days since first successful payment | $500 per month |
| Tier 3 | $100 paid and 7+ days since first successful payment | $1,000 per month |
| Tier 4 | $250 paid and 14+ days since first successful payment | $5,000 per month |
| Tier 5 | $1,000 paid and 30+ days since first successful payment | $200,000 per month |
These tiers do not mean every model has the same RPM or TPM. A Tier 3 organization can have different limits for GPT models, reasoning models, embeddings, image models, realtime models, and fine-tuned models.
Where to Check Your Current OpenAI Rate Limits
Check your exact API limits in the OpenAI dashboard:
- Open the OpenAI Platform dashboard.
- Go to account settings.
- Open the Limits section.
- Select the relevant organization and project.
- Review limits by model, endpoint, and shared limit group.y public article because OpenAI can change model access, shared limits, and usage tiers over time.
You can also inspect rate-limit headers from API responses. These headers show the current request and token budget for the request you just made.
| Header | What it shows |
|---|---|
| x-ratelimit-limit-requests | Maximum request count before the request limit is exhausted |
| x-ratelimit-limit-tokens | Maximum token count before the token limit is exhausted |
| x-ratelimit-remaining-requests | Remaining requests in the current rate-limit window |
| x-ratelimit-remaining-tokens | Remaining tokens in the current rate-limit window |
| x-ratelimit-reset-requests | Time until the request limit resets |
| x-ratelimit-reset-tokens | Time until the token limit resets |
If you run production traffic, log these headers. They are often the fastest way to identify whether your app is limited by RPM, TPM, or short bursts.
Rate Limits by API Type
Different OpenAI APIs use different rate-limit patterns.
| API type | Main limits to watch | Practical issue |
|---|---|---|
| Responses API | RPM, TPM, TPD, shared model limits | Chat, agents, tools, and multimodal workloads can hit token limits quickly |
| Chat Completions API | RPM, TPM, TPD | Legacy chat apps often hit RPM before TPM |
| Embeddings API | RPM, TPM, request payload size | Large indexing jobs can hit token throughput limits |
| Images API | IPM and image-specific limits | Image generation may be limited by images per minute rather than tokens |
| Realtime API | Session, audio, token, and model limits | Voice apps need careful concurrency control |
| Batch API | Per-batch size, batch creation rate, enqueued prompt tokens | Large offline jobs need queue planning |
| Fine-tuning | Training jobs, queued jobs, model-specific limits | Training and inference limits are separate |
For most apps, RPM and TPM are the first limits to monitor. For image apps, IPM matters more. For offline data jobs, Batch API queue limits matter more than standard synchronous limits.
Batch API Rate Limits
The Batch API is designed for asynchronous jobs that do not need an immediate response. It uses a separate rate-limit pool from standard synchronous requests.
Batch limits include:
| Batch limit | Current behavior |
|---|---|
| Requests per batch | Up to 50,000 requests in one batch |
| Batch input file size | Up to 200 MB |
| Batch creation rate | Up to 2,000 batches per hour |
| Enqueued prompt tokens | Model-specific queue limit shown in the Platform settings |
| Completion window | Batch jobs are designed around a 24-hour processing window |
Use the Batch API for evaluations, classification jobs, embedding jobs, backfills, data cleanup, and other work that can wait. Do not use Batch API for chatbots, live assistants, realtime tools, or user-facing requests that need immediate answers.
Why You Can Get 429 Errors Below the Published Limit
Some rate limits are enforced over shorter windows than one full minute. This means a limit such as 60,000 requests per minute may behave like a smaller per-second allowance during bursts.
You can also trigger 429 errors when your max_completion_tokens is too high. OpenAI can estimate token usage from your prompt plus your requested completion budget. If you ask for a much larger completion than you usually need, your request may reserve more token capacity than the final answer uses.
Common reasons for surprise 429 errors include:
- Too many requests sent at the same time.
- A traffic spike exceeded a short internal window.
max_completion_tokenswas set far above the needed response size.- Several models shared the same rate-limit pool.
- Batch jobs filled the model’s queued-token limit.
- The API key used a different organization than expected.
- The project had a lower limit than the organization.
How to Fix OpenAI 429 Errors
The fastest fix is to slow down and retry with exponential backoff. Do not resend failed requests in a tight loop. Failed requests can still count against rate limits, so aggressive retries can make the problem worse.
Use this retry pattern:
- Catch 429 errors.
- Wait briefly before retrying.
- Increase the wait time after each failed retry.
- Add random jitter so many workers do not retry at the same time.
- Stop after a maximum number of retries.
- Log the response headers and error message.
Example Python pattern:
import random
import time
from openai import OpenAI
client = OpenAI()
def call_with_backoff(messages, max_retries=6):
delay = 1.0
for attempt in range(max_retries):
try:
return client.responses.create(
model="gpt-5.4-mini",
input=messages,
max_output_tokens=800,
)
except Exception as error:
message = str(error).lower()
if "rate limit" not in message and "429" not in message:
raise
if attempt == max_retries - 1:
raise
sleep_for = delay + random.uniform(0, 0.5)
time.sleep(sleep_for)
delay *= 2Keep max_output_tokens close to the response length you actually need. If your app usually returns 500 tokens, do not reserve 8,000 tokens by default.
How to Prevent Rate Limit Problems
Control Concurrency
Limit the number of simultaneous requests from workers, queues, and background jobs. A single server may stay under the limit, but several workers can exceed the shared organization or project pool together.
Use a Request Queue
Put API work into a queue when traffic is uneven. A queue lets you smooth spikes, retry failed jobs, and protect user-facing routes from sudden API backpressure.
Reduce Token Waste
Shorter prompts use less TPM. Remove repeated instructions, old conversation history, unused examples, large JSON blobs, and irrelevant retrieved text.
Set Realistic Output Budgets
Large max_output_tokens values can increase the estimated token budget for a request. Use different defaults for short answers, summaries, reports, and long-form generation.
Batch Offline Work
Use the Batch API for jobs that do not need live responses. This is often better for evaluations, embeddings, classification, extraction, and nightly processing.
Split Traffic by Workload
Separate live user requests from background jobs. Live traffic should not compete with batch enrichment, analytics, or testing jobs in the same uncontrolled queue.
Log Rate Limit Headers
Save remaining request and token headers in your logs. This helps you see whether the app is hitting RPM, TPM, or a shared model pool.
How to Increase OpenAI API Rate Limits
The normal way to increase OpenAI API rate limits is to move up usage tiers through paid API usage and payment history. Many accounts graduate automatically as spend and account age increase.
If your app already uses backoff, queueing, realistic token budgets, and the right model, but still needs more throughput, check the Limits section of the OpenAI Platform dashboard. Eligible accounts can request higher limits or reach the next tier through additional API usage.
Enterprise and high-volume customers may also use dedicated arrangements such as Scale Tier or other capacity options. Those options are separate from normal pay-as-you-go rate limits.
Real-world Examples
Chatbot With Too Many Small Requests
A support chatbot may hit RPM before TPM if users send many short messages. The fix is not a larger-context model. The fix is request throttling, queueing, caching repeated answers, and batching internal background calls.
Long Report Generator
A report generator may hit TPM before RPM because each request uses a large prompt and a large output budget. The fix is shorter retrieved context, section-by-section generation, and tighter output limits.
Embedding a Large Website
An embedding job may hit token throughput limits while indexing thousands of pages. The fix is a queue, backoff, deduplication, and Batch API use when the job does not need real-time responses.
Image Generation App
An image app may hit IPM rather than token limits. The fix is a per-user queue, clear wait states, and a cap on simultaneous image jobs.
Quick Troubleshooting Checklist
| Symptom | Likely cause | Fix |
|---|---|---|
| 429 after many short calls | RPM limit | Add queueing and concurrency control |
| 429 after long prompts | TPM limit | Reduce prompt size and output budget |
| 429 during traffic spikes | Short-window burst limit | Add backoff and jitter |
| 429 only in one project | Project-level limit | Check project settings |
| 429 after switching models | Shared model pool or lower model limit | Check the model’s limit group |
| Batch job will not start | Batch queue limit | Reduce queued tokens or wait for jobs to finish |
| API stops after spend cap | Usage limit | Raise monthly usage limit if appropriate |
Relevant Resources
- Official OpenAI Documentation on Rate Limits: platform.openai.com/docs/guides/rate-limits
- Your Account’s Rate Limit Dashboard: platform.openai.com/settings/organization/limits
- OpenAI Cookbook on Handling Rate Limits: cookbook.openai.com/examples/how_to_handle_rate_limits
FAQs
Q: What are OpenAI API rate limits?
A: OpenAI API rate limits control how many requests, tokens, images, and batch jobs an organization or project can send within a time window. They protect platform stability and control throughput across users.
Q: What is RPM in the OpenAI API?
A: RPM means requests per minute. It controls how many API calls your organization or project can send in one minute.
Q: What is TPM in the OpenAI API?
A: TPM means tokens per minute. It controls how many input and output tokens your organization or project can process in one minute.
Q: What is TPD in the OpenAI API?
A: TPD means tokens per day. It controls total daily token throughput for a model, project, or organization.
Q: What is IPM in the OpenAI API?
A: IPM means images per minute. It applies to image generation and image editing models.
Q: Are OpenAI rate limits per API key?
A: Rate limits are mainly tied to organization and project settings. A single API key can be affected by limits shared across the organization or project.
Q: Are rate limits the same for every OpenAI model?
A: No. Rate limits vary by model, endpoint, usage tier, project, and shared limit group. Some long-context requests also have separate limits.
Q: How do I check my OpenAI rate limit?
A: Open the OpenAI Platform dashboard, select the correct organization and project, then open the Limits section. You can also inspect rate-limit headers in API responses.
Q: Why am I getting 429 errors if I am under the per-minute limit?
A: OpenAI can enforce limits over shorter windows than a full minute. You can also hit a different limit, such as TPM, shared model limits, project limits, or batch queue limits.
Q: How do I fix 429 Too Many Requests errors?
A: Use exponential backoff, add jitter, reduce concurrency, lower max_output_tokens, shorten prompts, and queue requests. Do not retry failed requests in a tight loop.
Q: Does increasing my usage tier increase rate limits?
A: Usually, yes. Higher usage tiers often receive higher rate limits across many models. Exact limits still vary by model and endpoint.
Q: Does a higher rate limit increase the model context window?
A: No. Rate limits control throughput over time. Context windows control how much content fits in one request.
Q: Does the Batch API use the same rate limits?
A: No. Batch API rate limits are separate from normal synchronous model rate limits. Batch jobs also have per-batch size, batch creation, and enqueued-token limits.
Final Thoughts
OpenAI API rate limits are not one fixed number. They depend on your organization, project, model, endpoint, usage tier, and workload type.
Use the OpenAI dashboard to check your exact limits. Use response headers to monitor live usage. Use queues, exponential backoff, realistic output budgets, and the Batch API to keep production apps stable under load.
See Also:
Changelog:
05/11/2026
- Revised
04/24/2026
- Updated for GPT‑5.5 and GPT Image 2
03/17/2026
- Updated for GPT‑5.4 mini and nano
03/05/2026
- Updated for GPT-5.4
12/11/2025
- Updated for GPT-5.2
10/18/2025
- Update rate limits
09/12/2025
- gpt-5 and gpt-5-mini API rate limits are now more than doubled for T1-T4 tiers
08/07/2025
- Updated for GPT-5
04/25/2025
- Updated for gpt-4.1, o4, o3.
- Removed old models like gpt-3.5
12/18/2024
- Updated for o1
10/07/2024
- Added gpt-4o-realtime-preview
10/02/2024
- Update o1-preview & o1-mini
09/13/2024
- Added o1-preview & o1-mini
08/07/2024
- Updated for GPT-4o-mini
- Clean up
05/14/2024
- Updated for GPT-4o











This is a very good article.
Thank you all for providing great help to our users.