The Best of Both Worlds: Saving Costs and Time When Using OpenAI’s API

Header image from Roundicons Premium on FlatIcons

When working with OpenAI’s API for large language models (LLMs) in research projects, there are two techniques that can help save both time and costs.

The first is the Batch API, which allows you to submit multiple prompt requests in batches rather than one request at a time. This means you can process all (or a subset of) your requests asynchronously, rather than waiting for each request to complete sequentially. Once processing is complete, the API returns a file containing all the responses. The Batch API offers great advantages, including cost efficiency (a 50% discount compared to synchronous API calls), higher rate limits, and faster processing times, particularly for large-scale requests (each batch typically completes within 24 hours, often sooner).

The second technique is using structured outputs, which ensures that the model generates responses in a predefined schema (JSON format). This eliminates concerns about invalid or inconsistent outputs, such as incorrect formatting and missing information. Structured outputs are particularly useful for researchers using LLMs for dataset annotation. By enforcing a structured format, the model can return annotations and explanations as separate fields, reducing the need for manual extraction of annotations and post-processing. This not only improves efficiency but also minimises the costs associated with manual effort.

You can find tutorials for each of these two techniques via their respective links above. However, a key challenge remains: how to effectively combine these two techniques. OpenAI’s website does not provide a tutorial for this, and online discussions on the topic tend to be technical and/or overwhelming, making them difficult for applied researchers to follow.

At the SoDa team, we encountered this problem firsthand while working with Gabrielle Martins van Jaarsveld, a participant in our SoDa Fellowship Programme. In her project, she used OpenAI’s API to annotate educational chatbot conversations for markers of self-regulated learning. Given the complexity of her prompt requests — including long and detailed instructions, thousands of prompt calls, and a large annotation space — we opted to leverage both the Batch API and structured outputs to optimize cost and efficiency.

After multiple iterations, we developed a solution that, to the best of our knowledge, is not yet readily available online. To help others facing similar challenges, we are using this blog post to share our approach.

Step 1: Prepare Your Data

The first step is to prepare your input data in the required format. The OpenAI Batch API requires that your input is formatted as a very specifically structured JSON lines file. In the first step, we will create a csv file containing all the requests we want to make in the correct structure. Then, we will convert it into a .jsonl file in the next step.

The csv file needs to include (at least) the following columns, which specify the details of a prompt request:

  • custom_id: An identifier unique to each request, defined by user.
  • method: The HTTP method to be used for the request. Currently only POST is supported.
  • url: The OpenAI API relative URL to be used for the request. Currently /v1/chat/completions, /v1/embeddings, and /v1/completions are supported.
  • model: The name of the model requested. See available models here.
  • messages: The system prompt and the user prompt.
  • max_tokens: The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API.

Here’s a csv file with two examples:

custom_id method url model messages max_tokens
request_1 POST /v1/chat/completions gpt-4o [{“role”: “system”, “content”: “You are a researcher who is examining the sentiment of reviews left by students after completing a university course. Score reviews on a scale of 1-10, with 1 being highly negative, and 10 being highly positive.”}, {“role”: “user”, “content”: “This course was really interesting, and I enjoyed it a lot. The exam was a little difficult, but the lectures prepared us really well for it.”}] 1000
request_2 POST /v1/chat/completions gpt-4o [{“role”: “system”, “content”: “You are a researcher who is examining the sentiment of reviews left by students after completing a university course. Score reviews on a scale of 1-10, with 1 being highly negative, and 10 being highly positive.”}, {“role”: “user”, “content”: “This course was a nightmare. The professor was late, the course was boring, and the exam was so difficult I barely got through half of it in the time.”}] 1000

In this csv file, each row is a new API request, containing a unique ID, the location and method of the request, the model which will be used, the actual content of the request, and the maximum number of tokens of the response. The messages column contains both a system prompt, which provides instructions to the model as to how to respond to the input, and the user input itself.

TIP:

  1. Make sure the column names are exactly the same as shown above.
  2. All the calls in a single input file must use the same model.
  3. Make sure your text data is cleaned before inputting it into the csv file. Characters such as " ; \ or / can cause errors when converting the file to jsonl or when being processed by OpenAI’s API.

Step 2: Create Your Input File

Once you have a csv file with each line specifying a new request, you can generate the properly formatted JSON lines input file. First, make sure you have the python installed on your machine, and then install the following packages:

pip install pandas jsonschema pydantic openai

The packages pandas, jsonschema, and pydantic will all be used to prepare and general the input file. The openai package will be used in the following steps to create the batch request. Once all the relevant packages are installed, the following pieces of code will implement the desired format of your structured input, and then generate a .jsonl file which can be uploaded to OpenAI.

First, import all the relevant packages, and set your OpenAI API key. If you don’t already have one, you can generate an API key from your OpenAI account.

import pandas as pd
import json
from pydantic import BaseModel
from openai.lib._pydantic import to_strict_json_schema
from openai import OpenAI
from pprint import pprint

api_key = "YOUR-API-KEY-HERE" #CHANGE ME

Then, decide on the desired structure of your output, and use the code below to set this. The structure of this formatting will be Name: Data Type. Make sure the names are intuitive, and describe what data needs to be outputted as part of this variable. If a name consists of more than one word, you can concatenate them with the underscore sign _. A variety of data formats are supported including: String, Number, Boolean, Integer, Object, Array, Enum, anyOf. Make sure the structure is logical following from the instructions given to the LLM in the system prompt section of your input file.

class Structured_Response(BaseModel): 
    Score: int
    Reason: str

Structured_Response = to_strict_json_schema(Structured_Response)
pprint(Structured_Response, sort_dicts=False)

The to_strict_json_schema function is necessary (credits to karthik.shivaram under this post). Otherwise OpenAI’s API will have trouble processing your structured output schema.

Once your structured output format is set, the following code combines this information with your csv input file, to create a single .jsonl file which is ready for batch processing. This is the stage of the process where you’re most likely to run into errors, often due to unexpected characters in your csv file. If you are getting errors at this phase, double check your text data for any characters which may be causing issues during the conversion process (i.e., ; " \ / ).


df = pd.read_csv("batchinput.csv", encoding="ISO-8859-1") 

jsonl_lines = []
for _, row in df.iterrows():
    json_obj = {
        "custom_id": row["custom_id"],
        "method": row["method"],
        "url": row["url"],
        "body": {
            "model": row["model"],
            "messages": json.loads(row["messages"], strict=False), 
            "max_tokens": row["max_tokens"],
            "response_format": {
                "type": "json_schema",
                "json_schema": {
                  "name": "structured_response",
                  "schema": Structured_Response,
                  "strict": True
                }
            }
        }
    }
    jsonl_lines.append(json.dumps(json_obj))

with open("batchinput.jsonl", "w") as f: 
    f.write("\n".join(jsonl_lines))

print("JSONL file created: batchinput.jsonl") 

You should now have a formatted file called batchinput.jsonl, and are ready to create a batch using the OpenAI API.

Step 3: Creating and Submitting a Batch Job

Now that you have a properly formatted .jsonl file, you can upload this file via the OpenAI API using the code below.

client = OpenAI(api_key = api_key)

batch_input_file = client.files.create(
    file=open("batchinput.jsonl", "rb"), 
    purpose="batch"
)

print(batch_input_file)

Then, use the following code to create and submit a batch job. It also prints the submitted batch ID, which you should save, as you will need it to check the status of this job and download the output file later.

submitted_job = client.batches.create(
    input_file_id=batch_input_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={
        "description": "BATCH JOB DISCRIPTION HERE" 
    }
)
print(submitted_job.id)

NOTE: Currently the only time window available with the Batch API is 24 hours. However, depending on the size of your job it will likely be completed in significantly less time.

Step 4: Checking and Downloading the Output

The time needed to complete all requests in your batch will differ, but you can check the status at any time. This can be done from the OpenAI web portal, where you can also directly download the output file, or via the API. The code below can be used to check the completion status of your batch job using the batch job ID you previously saved. If the job is completed, this will return the output file ID, which can be used to download the output file.

from openai import OpenAI
client = OpenAI(api_key = "API-KEY-HERE") # CHANGE ME

batch = client.batches.retrieve("OUTPUT-BATCH-ID-HERE") # CHANGE ME
if batch.status == 'completed' : 
    print("Batch completed, ouput file ID:", batch.output_file_id)
else:
    print("Batch in progress", batch)

Once the job is completed you can download the output, which will be in the form of another jsonl file. Each line in this file will be for a separate request, marked by the unique ID you used in your original data file.

from openai import OpenAI
client = OpenAI(api_key = "API-KEY-HERE") # CHANGE ME

file_response = client.files.content("FILE-ID-HERE") # CHANGE ME

file_path = "batchoutput.jsonl"

with open(file_path, "wb") as f:
    f.write(file_response.read())  

print(f"File saved as: {file_path}")

Bonus Step: Converting Back to .csv

This final step is for those who are unsure of how to work with the .jsonl file which is output by the API, and would like to convert it back into a csv file so they can work with the results. The following code can be used to convert your file back to a readable .csv:

import json
import csv
import sys

def jsonl_to_csv(input_jsonl_path, output_csv_path):
    records = []
    all_keys = set()
    
    with open(input_jsonl_path, mode="r", encoding="utf-8") as fin: 
        for line in fin:
            line = line.strip()
            if not line:
                continue  
            obj = json.loads(line)
            records.append(obj)
            all_keys.update(obj.keys())  

    fieldnames = list(all_keys)

    with open(output_csv_path, mode="w", encoding="utf-8", newline="") as fout: 
        writer = csv.DictWriter(fout, fieldnames=fieldnames)
        writer.writeheader()
        for record in records:
            writer.writerow(record)