What Are The Rate Limits For OpenAI API?

The Rate Limits For GPT-4 models: 500 - 10K RPM (requests per minute), 10K - 1.5M TPM (tokens per minute) based on usage tiers.

APIs like OpenAI’s allow developers to integrate powerful AI into their applications easily. However, to prevent abuse, APIs enforce rate limits on requests. Let’s look at how OpenAI’s rate limits work.

The OpenAI API enforces limits at the organization level based on the endpoint and account type. There are three key metrics:

  • RPM (requests per minute) – The maximum requests allowed per minute
  • RPD (requests per day) – The maximum requests allowed per day
  • TPM (tokens per minute) – The maximum tokens allowed to be sent per minute

What Is The Rate Limit For GPT-4

Note that your rate limit and spending limit (quota) are automatically adjusted based on a number of factors. As your usage of the OpenAI API goes up and you successfully pay the bill, OpenAI automatically increases your usage tier:

Quick Overview

TIERQUALIFICATIONMAX CREDITSREQUEST LIMITSTOKEN LIMITS
FreeUser must be in an allowed geography$100
Tier 1$5 paid$100500 RPM
10K RPD (gpt-4)
300K (gpt-4-turbo)
10K (gpt-4)
Tier 2$50 paid and 7+ days since first successful payment$5005000 RPM450K (gpt-4-turbo)
40K (gpt-4)
Tier 3$100 paid and 7+ days since first successful payment$1,0005000 RPM600K (gpt-4-turbo)
80K (gpt-4)
Tier 4$250 paid and 14+ days since first successful payment$5,00010K RPM800K (gpt-4-turbo)
300K (gpt-4)
Tier 5$1000 paid and 30+ days since first successful payment$10,00010K RPM1.5M (gpt-4-turbo)
300K (gpt-4)

Rate Limits For Free Trial Users

ModelTPMRPMRPD
Chat
gpt-3.5-turbo40,0003200
gpt-3.5-turbo-012540,0003200
gpt-3.5-turbo-030140,0003200
gpt-3.5-turbo-061340,0003200
gpt-3.5-turbo-110640,0003200
gpt-3.5-turbo-16k40,0003200
gpt-3.5-turbo-16k-061340,0003200
gpt-3.5-turbo-instruct150,0003200
gpt-3.5-turbo-instruct-0914150,0003200
Text
babbage-002150,0003200
davinci-002150,0003200
text-embedding-3-large150,0003200
text-embedding-3-small150,0003200
text-embedding-ada-002150,0003200
tts-1150,0003200
tts-1-1106150,0003200
tts-1-hd150,0003200
tts-1-hd-1106150,0003200
Moderation
text-moderation-latest150,0003
text-moderation-stable150,0003
Fine-tuning Inference
babbage-002150,0003
davinci-002150,0003
gpt-3.5-turbo-012540,0003
gpt-3.5-turbo-061340,0003
gpt-3.5-turbo-110640,0003
gpt-4-061340,0003
Fine-tuning TrainingACTIVE / QUEUED JOBSJOBS PER DAY
babbage-002348
davinci-002348
gpt-3.5-turbo-0613348
Image
DALL·E 23 RPM, 200 RPD, 5 images per minute
DALL·E 33 RPM, 200 RPD
Audio
whisper-13200
Other
Default limits for all other models150,0003200

Rate Limits For Pay-as-you-go Users (Tier 1 – Tier 5)

ModelTPMRPM
Chat
gpt-3.5-turbo60,000 (T1)
80,000 (T2)
160,000 (T3)
1,000,000 (T4)
2,000,000 (T5)
3,500 RPM, 10K RPD (T1)
3,500 RPM (T2, T3)
10,000 RPM (T4, T5)
gpt-3.5-turbo-012560,000 (T1)500 RPM, 10K RPD (T1)
gpt-3.5-turbo-030160,000 (T1)500 RPM, 10K RPD (T1)
gpt-3.5-turbo-061360,000 (T1)500 RPM, 10K RPD (T1)
gpt-3.5-turbo-110660,000 (T1)500 RPM, 10K RPD (T1)
gpt-3.5-turbo-16k60,000 (T1)500 RPM, 10K RPD (T1)
gpt-3.5-turbo-16k-061360,000 (T1)500 RPM, 10K RPD (T1)
gpt-3.5-turbo-instruct250,000 (T1)3,000 (T1)
gpt-3.5-turbo-instruct-0914250,000 (T1)3,000 (T1)
gpt-410,000 (T1)
30,000 (T2)
80,000 (T3)
300,000 (T4, T5)
500 RPM, 10K RPD (T1)
5,000 RPM (T2, T3)
10,000 RPM (T4, T5)
gpt-4-061310,000 (T1)500 RPM, 10K RPD (T1)
gpt-4-turbo-preview300,000 (T1)
450,000 (T2)
600,000 (T3)
800,000 (T4)
1,500,000 (T5)
500 RPM (T1)
5,000 RPM (T2, T3)
10,000 RPM (T4, T5)
gpt-4-vision-preview10,000 (T1)
20,000 (T2)
40,000 (T3)
150,000 (T4)
300,000 (T5)
80 RPM, 500 RPD (T1)
100 RPM, 1K RPD (T2)
120 RPM, 1.5K RPD (T3)
300 RPM, 2K RPD (T4)
3000 RPM (T5)
Text
babbage-002250,000 (T1)3,000 (T1)
davinci-002250,000 (T1)3,000 (T1)
text-embedding-3-large1,000,000 (T1, T2)
5,000,000 (T3, T4)
10,000,000 (T5)
500 RPM, 10k RPD (T1)
500 RPM (T2)
5,000 RPM (T3)
10,000 RPM (T4, T5)
text-embedding-3-small1,000,000 (T1)3,000 (T1)
text-embedding-ada-0021,000,000 (T1)3,000 (T1)
tts-150 (T1, T2)
100 (T3, T4)
500 (T5)
tts-1-110650 (T1)
tts-1-hd3 (T1)
5 (T2)
7 (T3)
10 (T4)
20 (T5)
tts-1-hd-11063 (T1)
Moderation
text-moderation-latest150,000 (T1)1,000 (T1)
text-moderation-stable150,000 (T1)1,000 (T1)
Fine-tuning Inference
babbage-002250,000 (T1)3,000 (T1)
davinci-002250,000 (T1)3,000 (T1)
gpt-3.5-turbo-061360,000 (T1)500 (T1)
gpt-3.5-turbo-110660,000 (T1)500 (T1)
gpt-4-061310,000 (T1)500 (T1)
Fine-tuning TrainingACTIVE / QUEUED JOBSJOBS PER DAY
babbage-0023 (T1)48 (T1)
davinci-0023 (T1)48 (T1)
gpt-3.5-turbo-06133 (T1)48 (T1)
Imageimg / min
DALL·E 25 (T1)
50 (T2)
100 (T3, T4)
500 (T5)
DALL·E 35 (T1)
7 (T2, T3)
15 (T3, T4)
50 (T5)
Audio
whisper-150 (T1, T2)
100 (T3, T4)
500 (T5)
Other
Default limits for all other models250,000 (T1)3,000 (T1)

What The Differences Between Rate Limits And Token Limits

Rate limits restrict the number of API requests. Token limits restrict the number of tokens (usually words) sent to a model per request. For example, gpt-4-32k-0613 has a max of 32,768 tokens per request. You can’t increase the token limit, only reduce the number of tokens per request.

See Also: What Is The Max Token Limit In OpenAI ChatGPT

What Is TPM

TPM, or Tokens Per Minute, refers to the number of tokens your organization can send to the OpenAI API within a minute. Tokens are chunks of data, such as words or characters, that the model processes. The TPM limit ensures that the server can handle the volume of data being processed without being overwhelmed.

What Is RPM

RPM, or Requests Per Minute, measures how many requests your organization can make to the OpenAI API within a minute. This limit is set to prevent overloading the server and to ensure fair usage among all users. The exact number varies depending on the endpoint used and the type of account you have.

What Is RPD

RPD, or Requests Per Day, is another rate limit set by OpenAI. It determines the total number of requests your organization can make to the API within a 24-hour period. It is worth noting that RPD has no limit for Pay-as-you-go Users.

What Happens If The Rate Limit Is Reached

If your organization reaches its rate limit, the OpenAI API will stop fulfilling further requests until enough time has passed. This is to prevent server overload and maintain service quality. If you encounter a rate limit error, it means you’ve exceeded your limit and need to wait before making more requests:

Rate limit reached for default-text-davinci-002 in organization org-{id} on requests per min. Limit: 20.000000 / min. Current: 24.000000 / min.

References

Changelog:

02/21/2024

  • Updated rate limits

One comment

Leave a Reply

Your email address will not be published. Required fields are marked *