Using ChatGPT API in Your Projects

# Using ChatGPT API in Your Projects

The ChatGPT web interface is fine for experimenting, but when you need AI embedded in your actual application—automating responses, processing data, or building features that depend on LLM output—you’re using the API. This article covers the practical parts: making requests, handling responses, understanding costs, and the gotchas that will bite you in production.

By the end, you’ll have a working client you can drop into a real project.

## Setting Up Your API Key

Before anything else, you need an API key. Go to platform.openai.com, sign up, and navigate to the API keys section. Create a new secret key and copy it immediately—you won’t see it again.

Store this key in your environment, never in your code:

“`bash
export OPENAI_API_KEY=”sk-proj-…”
“`

You can also use a `.env` file with `python-dotenv` or your framework’s preferred secrets management. If you commit this key to a repository, you’ll have a bad time—revoke it immediately and create a new one.

The API key is tied to your organization and billing. Speaking of which: the API is not free. You’ll need to add a payment method, though you get some free credits on signup. Check your usage dashboard regularly.

## Making Your First Request

The ChatGPT API uses a simple HTTP structure. You’re sending a POST request to `https://api.openai.com/v1/chat/completions` with a JSON body. Here’s what a request looks like:

“`python
import requests
import os

API_URL = “https://api.openai.com/v1/chat/completions”
API_KEY = os.environ.get(“OPENAI_API_KEY”)

def chat(messages):
headers = {
“Authorization”: f”Bearer {API_KEY}”,
“Content-Type”: “application/json”
}

payload = {
“model”: “gpt-4o”,
“messages”: messages,
“temperature”: 0.7
}

response = requests.post(API_URL, headers=headers, json=payload)
response.raise_for_status()

return response.json()[“choices”][0][“message”][“content”]

# Usage
result = chat([
{“role”: “system”, “content”: “You are a helpful assistant.”},
{“role”: “user”, “content”: “Explain async/await in Python in one sentence.”}
])
print(result)
“`

A few things to note:

– **`model`**: `gpt-4o` is the current flagship, but `gpt-4o-mini` is cheaper and faster for simple tasks. There’s also `o1` and `o1-mini` for reasoning-heavy tasks—these use a different API structure and don’t support `temperature` or most streaming options.
– **`messages`**: Array of message objects with `role` (system, user, assistant) and `content`. The API has no memory between requests—you must pass the full conversation history every time.
– **`temperature`**: Controls randomness. `0` is deterministic, `0.7` is balanced, `1.5` is chaotic. For most production tasks, stick between `0` and `0.3`.

## Building a Production-Ready Client

The basic request above works for testing, but production code needs more structure. Here’s a client with error handling, retries, and streaming support:

“`python
import requests
import os
import time
from typing import Generator

class ChatGPTClient:
def __init__(self, api_key: str = None, model: str = “gpt-4o”):
self.api_key = api_key or os.environ.get(“OPENAI_API_KEY”)
self.model = model
self.api_url = “https://api.openai.com/v1/chat/completions”

def _headers(self) -> dict:
return {
“Authorization”: f”Bearer {self.api_key}”,
“Content-Type”: “application/json”
}

def chat(self, messages: list, temperature: float = 0.7) -> str:
“””Send a non-streaming request with retry logic.”””
payload = {
“model”: self.model,
“messages”: messages,
“temperature”: temperature
}

for attempt in range(3):
try:
response = requests.post(
self.api_url,
headers=self._headers(),
json=payload,
timeout=30
)
response.raise_for_status()
return response.json()[“choices”][0][“message”][“content”]

except requests.exceptions.Timeout:
if attempt == 2:
raise Exception(“Request timed out after 3 attempts”)
time.sleep(2 ** attempt)

except requests.exceptions.HTTPError as e:
# Handle rate limits specifically
if e.response.status_code == 429:
retry_after = int(e.response.headers.get(“Retry-After”, 5))
time.sleep(retry_after)
else:
raise

raise Exception(“Max retries exceeded”)

def stream(self, messages: list, temperature: float = 0.7) -> Generator[str, None, None]:
“””Stream responses token-by-token.”””
payload = {
“model”: self.model,
“messages”: messages,
“temperature”: temperature,
“stream”: True
}

response = requests.post(
self.api_url,
headers=self._headers(),
json=payload,
stream=True,
timeout=60
)
response.raise_for_status()

for line in response.iter_lines():
if not line:
continue

# OpenAI streams Server-Sent Events
decoded = line.decode(“utf-8”)
if decoded.startswith(“data: “):
data = decoded[6:]
if data == “[DONE]”:
break

chunk = json.loads(data)
delta = chunk.get(“choices”, [{}])[0].get(“delta”, {})
if “content” in delta:
yield delta[“content”]
“`

This client handles the basics: timeouts, HTTP errors, rate limit responses (429 errors tell you when to retry), and streaming for responses where you want to show output as it’s generated.

## Understanding Pricing and Rate Limits

The API charges by token—input tokens and output tokens are priced differently. As of 2026, `gpt-4o` runs about $2.50/1M input tokens and $10.00/1M output tokens. `gpt-4o-mini` is roughly 10x cheaper. `o1` is more expensive and metered differently.

This sounds small until you run your code in a loop. A single conversation with a few messages might use 1,000-2,000 tokens. Run that 1,000 times a day and you’re looking at real money. Track your usage through the API:

“`python
import requests

def get_usage():
response = requests.get(
“https://api.openai.com/v1/usage”,
headers={“Authorization”: f”Bearer {os.environ.get(‘OPENAI_API_KEY’)}”}
)
data = response.json()
print(f”Today’s usage: ${data[‘daily_costs’][0][‘cost’]:.4f}”)
“`

Rate limits depend on your tier. Free tier is 3 RPM (requests per minute) and 200 TPM (tokens per minute). Paid tiers go higher. If you hit a limit, the API returns 429 and tells you when to retry via the `Retry-After` header. Build your client to respect this—don’t just hammer the API.

For high-volume work, consider batching or using a different model for simpler tasks.

## Handling Responses and Errors

The response structure is straightforward:

“`json
{
“id”: “chatcmpl-xxx”,
“object”: “chat.completion”,
“created”: 1700000000,
“model”: “gpt-4o”,
“choices”: [
{
“index”: 0,
“message”: {
“role”: “assistant”,
“content”: “Your response text here”
},
“finish_reason”: “stop”
}
],
“usage”: {
“prompt_tokens”: 50,
“completion_tokens”: 20,
“total_tokens”: 70
}
}
“`

`finish_reason` tells you why the response ended: `stop` means normal completion, `length` means hit token limit, `content_filter` means something was flagged.

Errors you’ll encounter:

– **401**: Invalid or missing API key
– **429**: Rate limit hit—back off
– **500**: OpenAI server error—retry with exponential backoff
– **400**: Bad request—usually malformed messages or invalid parameters

Always log the full error response in development so you can see what the API is actually complaining about.

## Common Pitfalls

A few things that will ruin your day:

**No conversation history**: The API is stateless. If you send `{“role”: “user”, “content”: “Continue”}`, it has no idea what you’re continuing. You must include the full message history in every request.

**Token limits**: Each model has a maximum context window—roughly 128K tokens for gpt-4o. If your conversation exceeds this, you get an error. For long conversations, you need to truncate or summarize earlier messages.

**Prompt injection**: If you pass user input directly into the messages array without sanitization, users can manipulate your prompts. Validate and sanitize any user input before adding it to messages