APIs like OpenAI’s allow developers to integrate powerful AI into their applications easily. However, to prevent abuse, APIs enforce rate limits on requests. Let’s look at how OpenAI’s rate limits work.
The OpenAI API enforces limits at the organization level based on the endpoint and account type. There are three key metrics:
- RPM (requests per minute) – The maximum requests allowed per minute
- RPD (requests per day) – The maximum requests allowed per day
- TPM (tokens per minute) – The maximum tokens allowed to be sent per minute
Table Of Contents
What Is The Rate Limit For GPT-4
Note that your rate limit and spending limit (quota) are automatically adjusted based on a number of factors. As your usage of the OpenAI API goes up and you successfully pay the bill, OpenAI automatically increases your usage tier:
Quick Overview
TIER | QUALIFICATION | MAX CREDITS | REQUEST LIMITS | TOKEN LIMITS |
---|---|---|---|---|
Free | User must be in an allowed geography | $100 | 3 RPM 200 RPD | 10K TPM |
Tier 1 | $5 paid | $100 | 500 RPM 10K RPD | 20K TPM |
Tier 2 | $50 paid and 7+ days since first successful payment | $250 | 5000 RPM | 40K TPM |
Tier 3 | $100 paid and 7+ days since first successful payment | $500 | 5000 RPM | 80K TPM |
Tier 4 | $250 paid and 14+ days since first successful payment | $1000 | 10K RPM | 300K TPM |
Tier 5 | $1000 paid and 30+ days since first successful payment | $1000 | 10K RPM | 300K TPM |
Rate Limits For Free Trial Users
Model | TPM | RPM | RPD |
---|---|---|---|
Chat | |||
gpt-3.5-turbo | 40,000 | 3 | 200 |
gpt-3.5-turbo-0301 | 40,000 | 3 | 200 |
gpt-3.5-turbo-0613 | 40,000 | 3 | 200 |
gpt-3.5-turbo-1106 | 40,000 | 3 | 200 |
gpt-3.5-turbo-16k | 40,000 | 3 | 200 |
gpt-3.5-turbo-16k-0613 | 40,000 | 3 | 200 |
gpt-3.5-turbo-instruct | 150,000 | 3 | 200 |
gpt-3.5-turbo-instruct-0914 | 150,000 | 3 | 200 |
Text | |||
ada | 150,000 | 3 | 200 |
ada-code-search-code | 150,000 | 3 | 200 |
ada-code-search-text | 150,000 | 3 | 200 |
ada-search-document | 150,000 | 3 | 200 |
ada-search-query | 150,000 | 3 | 200 |
ada-similarity | 150,000 | 3 | 200 |
babbage | 150,000 | 3 | 200 |
babbage-002 | 150,000 | 3 | 200 |
babbage-code-search-code | 150,000 | 3 | 200 |
babbage-code-search-text | 150,000 | 3 | 200 |
babbage-search-document | 150,000 | 3 | 200 |
babbage-search-query | 150,000 | 3 | 200 |
babbage-similarity | 150,000 | 3 | 200 |
code-davinci-edit-001 | 150,000 | 3 | 200 |
code-search-ada-code-001 | 150,000 | 3 | 200 |
code-search-ada-text-001 | 150,000 | 3 | 200 |
code-search-babbage-code-001 | 150,000 | 3 | 200 |
code-search-babbage-text-001 | 150,000 | 3 | 200 |
curie | 150,000 | 3 | 200 |
curie-instruct-beta | 150,000 | 3 | 200 |
curie-search-document | 150,000 | 3 | 200 |
curie-search-query | 150,000 | 3 | 200 |
curie-similarity | 150,000 | 3 | 200 |
davinci | 150,000 | 3 | 200 |
davinci-instruct-beta | 150,000 | 3 | 200 |
davinci-search-document | 150,000 | 3 | 200 |
davinci-search-query | 150,000 | 3 | 200 |
davinci-similarity | 150,000 | 3 | 200 |
text-ada-001 | 150,000 | 3 | 200 |
text-babbage-001 | 150,000 | 3 | 200 |
text-curie-001 | 150,000 | 3 | 200 |
text-davinci-001 | 150,000 | 3 | 200 |
text-davinci-002 | 150,000 | 3 | 200 |
text-davinci-003 | 150,000 | 3 | 200 |
text-davinci-edit-001 | 150,000 | 3 | 200 |
text-embedding-ada-002 | 150,000 | 3 | 200 |
text-search-ada-doc-001 | 150,000 | 3 | 200 |
text-search-ada-query-001 | 150,000 | 3 | 200 |
text-search-babbage-doc-001 | 150,000 | 3 | 200 |
text-search-babbage-query-001 | 150,000 | 3 | 200 |
text-search-curie-doc-001 | 150,000 | 3 | 200 |
text-search-curie-query-001 | 150,000 | 3 | 200 |
text-search-davinci-doc-001 | 150,000 | 3 | 200 |
text-search-davinci-query-001 | 150,000 | 3 | 200 |
text-similarity-ada-001 | 150,000 | 3 | 200 |
text-similarity-babbage-001 | 150,000 | 3 | 200 |
text-similarity-curie-001 | 150,000 | 3 | 200 |
text-similarity-davinci-001 | 150,000 | 3 | 200 |
tts-1 | 150,000 | 3 | 200 |
tts-1-1106 | 150,000 | 3 | 200 |
tts-1-hd | 150,000 | 3 | 200 |
tts-1-hd-1106 | 150,000 | 3 | 200 |
Moderation | |||
text-moderation-latest | 150,000 | 3 | – |
text-moderation-stable | 150,000 | 3 | – |
Fine-tuning Inference | |||
babbage-002 | 150,000 | 3 | |
davinci-002 | 150,000 | 3 | |
gpt-3.5-turbo-0613 | 40,000 | 3 | |
Fine-tuning Training | ACTIVE / QUEUED JOBS | JOBS PER DAY | |
babbage-002 | 3 | 48 | |
davinci-002 | 3 | 48 | |
gpt-3.5-turbo-0613 | 3 | 48 | |
Image | |||
DALL·E 2 | 3 RPM, 200 RPD, 5 images per minute | ||
DALL·E 3 | 3 RPM, 200 RPD, 1 images per minute | ||
Audio | |||
whisper-1 | 3 | 200 | |
Other | |||
Default limits for all other models | 150,000 | 3 | 200 |
Rate Limits For Pay-as-you-go Users (Tier 1)
Model | TPM | RPM |
---|---|---|
Chat | ||
gpt-3.5-turbo | 90,000 | 3,500 |
gpt-3.5-turbo-0301 | 90,000 | 3,500 |
gpt-3.5-turbo-0613 | 90,000 | 3,500 |
gpt-3.5-turbo-1106 | 180,000 | 3,500 |
gpt-3.5-turbo-16k | 180,000 | 3,500 |
gpt-3.5-turbo-16k-0613 | 180,000 | 3,500 |
gpt-3.5-turbo-instruct | 250,000 | 3,000 |
gpt-3.5-turbo-instruct-0914 | 250,000 | 3,000 |
gpt-4 | 10,000 | 500 |
gpt-4-0314 | 10,000 | 500 |
gpt-4-0613 | 10,000 | 500 |
gpt-4-1106-preview | 10,000 | 20 RPM, 100RPD |
gpt-4-vision-preview | 10,000 | 20 RPM, 100RPD |
Text | ||
ada | 250,000 | 3,000 |
ada-code-search-code | 250,000 | 3,000 |
ada-code-search-text | 250,000 | 3,000 |
ada-search-document | 250,000 | 3,000 |
ada-search-query | 250,000 | 3,000 |
ada-similarity | 250,000 | 3,000 |
babbage | 250,000 | 3,000 |
babbage-002 | 250,000 | 3,000 |
babbage-code-search-code | 250,000 | 3,000 |
babbage-code-search-text | 250,000 | 3,000 |
babbage-search-document | 250,000 | 3,000 |
babbage-search-query | 250,000 | 3,000 |
babbage-similarity | 250,000 | 3,000 |
code-davinci-edit-001 | 150,000 | 20 |
code-search-ada-code-001 | 250,000 | 3,000 |
code-search-ada-text-001 | 250,000 | 3,000 |
code-search-babbage-code-001 | 250,000 | 3,000 |
code-search-babbage-text-001 | 250,000 | 3,000 |
curie | 250,000 | 3,000 |
curie-instruct-beta | 250,000 | 3,000 |
curie-search-document | 250,000 | 3,000 |
curie-search-query | 250,000 | 3,000 |
curie-similarity | 250,000 | 3,000 |
davinci | 250,000 | 3,000 |
davinci-002 | 250,000 | 3,000 |
davinci-instruct-beta | 250,000 | 3,000 |
davinci-search-document | 250,000 | 3,000 |
davinci-search-query | 250,000 | 3,000 |
davinci-similarity | 250,000 | 3,000 |
text-ada-001 | 250,000 | 3,000 |
text-babbage-001 | 250,000 | 3,000 |
text-curie-001 | 250,000 | 3,000 |
text-davinci-001 | 250,000 | 3,000 |
text-davinci-002 | 250,000 | 3,000 |
text-davinci-003 | 250,000 | 3,000 |
text-davinci-edit-001 | 150,000 | 20 |
text-embedding-ada-002 | 1,000,000 | 3,000 |
text-search-ada-doc-001 | 250,000 | 3,000 |
text-search-ada-query-001 | 250,000 | 3,000 |
text-search-babbage-doc-001 | 250,000 | 3,000 |
text-search-babbage-query-001 | 250,000 | 3,000 |
text-search-curie-doc-001 | 250,000 | 3,000 |
text-search-curie-query-001 | 250,000 | 3,000 |
text-search-davinci-doc-001 | 250,000 | 3,000 |
text-search-davinci-query-001 | 250,000 | 3,000 |
text-similarity-ada-001 | 250,000 | 3,000 |
text-similarity-babbage-001 | 250,000 | 3,000 |
text-similarity-curie-001 | 250,000 | 3,000 |
text-similarity-davinci-001 | 250,000 | 3,000 |
tts-1 | – | 50 |
tts-1-1106 | 250,000 | 3,000 |
tts-1-hd | – | 50 |
tts-1-hd-1106 | 250,000 | 3,000 |
Moderation | ||
text-moderation-latest | 150,000 | 1,000 |
text-moderation-stable | 150,000 | 1,000 |
Fine-tuning Inference | ||
babbage-002 | 250,000 | 3,000 |
davinci-002 | 250,000 | 3,000 |
gpt-3.5-turbo-0613 | 90,000 | 3,500 |
Fine-tuning Training | ACTIVE / QUEUED JOBS | JOBS PER DAY |
babbage-002 | 3 | 48 |
davinci-002 | 3 | 48 |
gpt-3.5-turbo-0613 | 3 | 48 |
Image | img / min | |
DALL·E 2 | 5 | – |
DALL·E 3 | 5 | – |
Audio | ||
whisper-1 | – | 50 |
Other | ||
Default limits for all other models | 250,000 | 3,000 |
What The Differences Between Rate Limits And Token Limits
Rate limits restrict the number of API requests. Token limits restrict the number of tokens (usually words) sent to a model per request. For example, gpt-4-32k-0613 has a max of 32,768 tokens per request. You can’t increase the token limit, only reduce the number of tokens per request.
What Is TPM
TPM, or Tokens Per Minute, refers to the number of tokens your organization can send to the OpenAI API within a minute. Tokens are chunks of data, such as words or characters, that the model processes. The TPM limit ensures that the server can handle the volume of data being processed without being overwhelmed.
What Is RPM
RPM, or Requests Per Minute, measures how many requests your organization can make to the OpenAI API within a minute. This limit is set to prevent overloading the server and to ensure fair usage among all users. The exact number varies depending on the endpoint used and the type of account you have.
What Is RPD
RPD, or Requests Per Day, is another rate limit set by OpenAI. It determines the total number of requests your organization can make to the API within a 24-hour period. It is worth noting that RPD has no limit for Pay-as-you-go Users.
What Happens If The Rate Limit Is Reached
If your organization reaches its rate limit, the OpenAI API will stop fulfilling further requests until enough time has passed. This is to prevent server overload and maintain service quality. If you encounter a rate limit error, it means you’ve exceeded your limit and need to wait before making more requests:
Rate limit reached for default-text-davinci-002 in organization org-{id} on requests per min. Limit: 20.000000 / min. Current: 24.000000 / min.