GPT Base, GPT-3.5 Turbo & GPT-4: What's the difference?
A breakdown of OpenAI models, including their strengths, weaknesses, and cost. We also cover lesser-known AI models like Whisper and Embeddings.
Aug 31, 2023 • 10 Minute Read
Are you confused by the differences between all of OpenAI’s models? Completely understandable! There’s a lot of them on offer, and the distinctions are murky unless you’re knee-deep in working with AI. But learning to tell them apart can save you money and help you use the right AI model for the job at hand.
In this article, we’ll break down the differences between OpenAI’s large language models, including the cost of using each one, the amount of content you can get out of it, and what they excel at.
Before we start: ChatGPT is not the same as a GPT model
A common misconception is that ChatGPT is synonymous with GPT. It’s an easy mistake to make: they share a similar name, they’re made by the same company, and they both deal with AI. The difference is ChatGPT is an application powered by GPT AI models, but is not an AI model itself.
Think of it this way: ChatGPT is the car, and the GPT model is the engine. You can take out an engine and replace it with another, more powerful one, while leaving the rest of the car intact. You can also take that engine and use it in a completely different car (E.g. API for your product).
And now that’s out of the way, let’s get into the differences between the models!
An overview of OpenAI’s GPT models
All of these models understand and generate code and text, but the accuracy, speed, and cost at which they do it are different.
GPT-4 | The most capable GPT model series to date. Able to do complex tasks, but slower at giving answers. Currently used by ChatGPT Plus. |
GPT-3.5 | Faster than GPT-4 and more flexible than GPT Base. The “good enough” model series for most tasks, whether chat or general. |
GPT-3.5 Turbo | The best model in the GPT-3.5 series. Currently used by the free version of ChatGPT. Cost effective and flexible. |
GPT Base | Not trained with instruction following. Best used when fine tuned for specific tasks, otherwise use GPT-3.5 or GPT-4. Used for legacy cases as a replacement for the original GPT-3. |
GPT-3 | The predecessor to GPT-3.5, currently depreciated. |
GPT-4
This is currently the most advanced GPT model series OpenAI has on offer (and that’s why it’s currently powering their paid product, ChatGPT Plus). It can handle significantly more tokens than GPT-3.5, which means it’s able to solve more difficult problems with greater accuracy.
GPT-4 can analyze and comment on images and graphics, unlike GPT-3.5 which can only analyze text. Also, you can get it to specify its tone of voice and task (E.g. “Always speak like Yoda”). One example of this is ChatGPT Plus’s Custom Instructions feature, but there are other applications in terms of controlling your output, particularly for chatbot applications.
That said, GPT-4 does have some downsides. Because it’s the latest model, it’s more costly than GPT-3.5. Also, as a result of being more powerful, it’s also slower in giving responses. GPT-4 is best when you’re more concerned with accuracy than speed.
Model | Input | Output | Max tokens | Training data |
gpt-4 | $0.03 / 1K tokens | $0.06 / 1K tokens | 8,192 tokens | Up to Sep 2021 |
gpt-4-32k | $0.06 / 1K tokens | $0.12 / 1K tokens | 32,768 tokens | Up to Sep 2021 |
To learn more about GPT-4, read our article: “GPT-4: All about the latest update, and how it changes ChatGPT.”
GPT-3.5 and GPT-3.5 Turbo
GPT-3.5 models understand and generate natural language or code. If you’ve ever used the free version of ChatGPT, it is currently powered by one of these models. It’s both good at completing both general tasks and chat-specific ones, and is considered the “good enough” model for most needs.
Even though GPT-4 has been out for some time, GPT-3.5 is still very popular because of its lower price point and faster speeds. The current model, GPT-3.5 Turbo, is considered the most capable model of the GPT-3.5 family.
Model | Input | Output | Max tokens | Training data |
gpt-3.5-turbo | $0.0015 / 1K tokens | $0.002 / 1K tokens | 4,096 tokens | Up to Sep 2021 |
gpt-3.5-turbo-16k | $0.003 / 1K tokens | $0.004 / 1K tokens | 16,384 tokens | Up to Sep 2021 |
GPT Base
These are the budget GPT models, and as you’d expect are the least useful out of the box. GPT base models can understand and generate text and code, but they’re not great at following instructions, so you’ll often get more generalized or random responses instead.
For example if you asked it to “Write a Python function that calculates the square root of a number”, it might provide you an explanation of what a square root is without generating the code snippet you were expecting. This can be mitigated somewhat by fine-tuning the model to perform a narrow task (but fine tuning that model costs money).
The biggest advantage of GPT Base is that it’s cheap as dirt, assuming you don’t spend more on fine-tuning it. It is also a replacement model for the original GPT-3 base models and uses the legacy Completions API. Babbage-002 is a replacement for the GPT-3 ada and babbage models, while Davinci-002 is a replacement for the GPT-3 curie and davinci models.
Model | Usage | Max tokens | Training data |
babbage-002 | $0.0004 / 1K tokens | 16,384 tokens | Up to Sep 2021 |
davinci-002 | $0.0020 / 1K tokens | 16,384 tokens | Up to Sep 2021 |
Making a model better: Refining it to be better than the base
Just because a model isn’t fit for purpose out of the box, it doesn’t mean you can’t make it better by training it. You can create your own custom models by fine-tuning a base OpenAI model with your own training data. Once you’ve fine-tuned it, this changes the billing structure when you make requests to that model, listed below.
Fine tuning is not currently available for GPT-4. According to OpenAI, it recommends you use GPT-3.5 Turbo for fine tuning as a base.
Model | Training | Input usage | Output usage |
babbage-002 | $0.0004 / 1K tokens | $0.0016 / 1K tokens | $0.0016 / 1K tokens |
davinci-002 | $0.0060 / 1K tokens | $0.0120 / 1K tokens | $0.0120 / 1K tokens |
gpt-3.5-turbo | $0.0080 / 1K tokens | $0.0120 / 1K tokens | $0.0160 / 1K tokens |
Other OpenAI models
We’ve talked a lot about the GPT models, but there are actually other OpenAI models that are worth learning about that may be more of a fit for what you’re trying to do. All of these are available through the OpenAI API.
DALL-E 2 | Creates images and art from a text description. |
Whisper | Translates speech into text and many languages into English. |
Embeddings | A numerical representation of text that can be used to measure the relatedness between two bits of text. Good for search, clustering, recommendations, anomaly detection, and classification tasks. |
Moderation | A tool you can use to check to see if content complies with OpenAI’s usage policies and take action, such as by filtering it. |
DALL-E 2
You’ve probably heard of DALL-E before now. You can build it into your apps to create and edit images and art from a text description. You can also currently test it out via OpenAI’s Labs interface without building it into your app. Pricing is scaled by the resolution of the images you’re working with.
Resolution | Price |
1024×1024 | $0.020 / image |
512×512 | $0.018 / image |
256×256 | $0.016 / image |
Whisper
If you’re trying to turn speech into text, or translate something into English, Whisper is your model of choice. It can also be used for language identification. There’s an open source version of Whisper and one you can access through OpenAI.
Since one costs you money and the other doesn’t, it seems like a no brainer to pick the latter, but the API version runs faster. There’s a paper on the technical details you can read if you’re feeling studious. The pricing is incredibly straightforward, with the model 'whisper-1' charged at $0.006 / minute (rounded to the nearest second).
Embeddings
Embeddings is an interesting model offering that checks the relatedness of text strings, and turns them into a representative number. For example, the word “Taco” and “Food” would be strongly related, whereas the words “Food” and “Computer” would not be. This allows machines to understand the relationship between words.
They help computers do things like figure out if a sentence is positive or negative, translate languages, and even write like a human. It's a bit like teaching computers to speak our language using a special code.
Common applications include:
Search (where results are ranked by relevance to a query string)
Clustering (where text strings are grouped by similarity)
Recommendations (where items with related text strings are recommended)
Anomaly detection (where outliers with little relatedness are identified)
Diversity measurement (where similarity distributions are analyzed)
Classification (where text strings are classified by their most similar label)
In pretty much every case, OpenAI recommends you use Ada v2 for Embeddings, since it is cheaper, better, and faster. However, you can pay more for less (if that floats your boat), but we won’t go into those options here.
Model | Rough pages per dollar | Usage |
text-embedding-ada-002 | 3000 | $0.0001 / 1K tokens |
Moderation
People can’t be trusted with anything, and if you plug an AI model into your app, there’s the chance they could use it to generate hate speech, sexual content, or other inappropriate prose. That’s why OpenAI provides moderation models that can be used to check content complies with their usage policies
The good news is, this endpoint is completely free. You can only use it to monitor the inputs and outputs of OpenAPIs, though.
Conclusion: Choose the right model for the task
Hopefully you’ve now got a better understanding of the difference between OpenAI’s different AI models, and the differences between them. Being informed means you can make better choices, like not just using GPT-4 because it’s the latest offering, or choosing GPT Base because it’s the cheapest.
Want to learn more about Generative AI?
If you’re looking to be more informed about AI and brush up on your skills, Pluralsight offers a range of beginner, intermediate, and expert AI and ML courses can help you learn the ins and outs. You can sign up for a 10-day free trial with no commitments, so why not check our courses out (and get over a week of professional upskilling at the very cheap cost of zero dollars?).
Here are some of the courses in our library you might want to check out:
If you like reading articles like this one (and I’m guessing you do, or you wouldn’t have gotten this far), here are some more articles that will help keep you informed about AI:
Organizations, don’t ban generative AI: Write a usage policy
Security reviews and AI models: How to decide what to greenlight
Critical thinking: A must-have soft skill in the age of GenAI
If you’re using ChatGPT a lot, here are some additional reads: