The GPT models can be quite expensive once they start getting used a lot in your organization. Especially GPT-4, which, at the time I write this article, cost 10 for 1 million output tokens.
If you are using these models in batch jobs, where you don’t require the response right away, there is a way to make you save 50% of your bill.
Here’s how:
Introduction to OpenAI Batch Python API
The Batch API is designed for bulk processing of tasks. It’s particularly useful when you have a large dataset that you need to process in a cost-effective way, without having the need for the response at query time. By using the Batch API, you send a batch of prompts in one request, and come back later to fetch your results.
Setting Up
To start using the OpenAI Batch API, you need to set up your Python environment:
-
Install the OpenAI Python package: If not already installed, you can install it using pip:
-
Ensure you have your API key from OpenAI. This key is required to authenticate your requests.
-
You need a list of tasks that you want to process. Each task follows this template:
Customize this template following the Completion API
-
Put all of these tasks into a json-line file.
-
Upload the file to OpenAI
-
Create the batch job
-
Wait 24h. Well actually, a bit less, as batch are usually processed faster than that. So you can poll from time to time the API.
-
Finally, retrieve the results:
Final note: results are unordered and will not necessarily match your input batch order.
Tips for Further Cost Savings
- Efficient prompt design can reduce the number of tokens processed, thereby lowering costs.
- Keep an eye on your API usage to understand your usage patterns, as it can help you optimize further.
- Use models that are best suited for your task but also cost-effective. Sometimes, a simpler model may suffice. As a comparison, GPT-4 is 20x expensive than GPT-3.5.
For more details, you can always refer to the official OpenAI API documentation: OpenAI Batch API Reference