AI Email Summarization: Build It Yourself

header 6

# AI Email Summarization: Build It Yourself

Most “AI email” tools are just wrappers around OpenAI’s API with a shiny UI. They’re expensive, slow, and you have no control over the output. I’ll show you how to build your own email summarization system that runs locally, costs almost nothing, and gives you exactly the output format you need.

This isn’t a tutorial for a product. It’s a practical guide for developers who want to understand the mechanics and implement this themselves.

## The Problem With Off-the-Shelf Solutions

Every email summarization tool I’ve tested has the same issues:

– **Latency**: Round-trip to their servers adds 2-5 seconds
– **Cost**: Most charge $10-30/month for something you can do yourself
– **No customization**: You can’t tweak the prompt, output format, or processing logic
– **Privacy**: Your emails go through someone else’s infrastructure

If you’re handling hundreds of emails daily, these problems compound. A local solution solves all of them.

## Architecture Overview

Here’s what we’re building:

“`
Email Source → Fetch → Preprocess → LLM → Post-process → Output
“`

The “LLM” step is where the magic happens, but everything around it matters. Let me break down each component.

## Step 1: Fetching Emails

I’ll use Gmail’s API for this example, but the pattern works with any email provider. You need to enable the Gmail API and get credentials from Google Cloud Console.

“`python
import base64
from google.oauth2.credentials import Credentials
from googleapiclient.discovery import build

def fetch_recent_emails(service, max_results=10):
“””Fetch the most recent emails from inbox.”””
results = service.users().messages().list(
userId=’me’,
maxResults=max_results,
q=’is:unread’
).execute()

messages = results.get(‘messages’, [])
emails = []

for msg in messages:
msg_data = service.users().messages().get(
userId=’me’,
id=msg[‘id’],
format=’full’
).execute()

payload = msg_data[‘payload’]
headers = {h[‘name’]: h[‘value’] for h in payload[‘headers’]}

# Decode body if present
body = “”
if ‘parts’ in payload:
for part in payload[‘parts’]:
if part.get(‘mimeType’) == ‘text/plain’:
if ‘data’ in part:
body = base64.urlsafe_b64decode(
part[‘data’]).decode(‘utf-8’)
break

emails.append({
‘id’: msg[‘id’],
‘subject’: headers.get(‘Subject’, ”),
‘from’: headers.get(‘From’, ”),
‘date’: headers.get(‘Date’, ”),
‘body’: body[:5000] # Truncate for LLM context limits
})

return emails
“`

Key detail: I’m truncating the body to 5000 characters. Most LLMs have context limits, and emails can get massive. Truncating at a reasonable length (while keeping the beginning where the main content usually is) is a practical tradeoff.

## Step 2: Building the Summarization Prompt

The prompt is where most people fail. A generic “summarize this email” produces generic results. You need to be specific about what information matters to you.

“`python
def build_summarization_prompt(email_data):
“””Build a prompt that extracts exactly what you need.”””

prompt = f”””You are an email analysis assistant. Analyze the following email
and provide a structured summary.

EMAIL DETAILS:
– From: {email_data[‘from’]}
– Subject: {email_data[‘subject’]}
– Date: {email_data[‘date’]}

BODY:
{email_data[‘body’]}

Provide your summary in this exact JSON format:
{{
“action_required”: “yes|no|maybe”,
“priority”: “high|medium|low”,
“category”: “one word category”,
“summary”: “2-3 sentence summary of the key points”,
“entities”: [“list of important names, dates, or numbers”]
}}

Only output valid JSON. No additional text.”””

return prompt
“`

This prompt forces the LLM to output structured data you can parse programmatically. Instead of a human-readable summary you have to parse manually, you get JSON you can feed directly into other systems.

## Step 3: Running the Inference

I’ll show two options here: local with Ollama, and API-based with OpenAI. Use local if you have GPU hardware or want zero ongoing costs. Use the API if speed matters more.

### Option A: Local with Ollama

“`python
import ollama

def summarize_local(prompt):
“””Use Ollama for local inference.”””
response = ollama.chat(
model=’llama3.2′, # or mistral, phi3
messages=[{
‘role’: ‘user’,
‘content’: prompt
}],
options={
‘temperature’: 0.3, # Lower = more deterministic
‘num_ctx’: 4096
}
)
return response[‘message’][‘content’]
“`

Ollama runs entirely on your machine. With an M-series Mac or a decent GPU, you can process emails in under 2 seconds. The model stays loaded in memory, so subsequent calls are even faster.

### Option B: API with OpenAI

“`python
from openai import OpenAI

client = OpenAI(api_key=”your-key-here”)

def summarize_api(prompt):
“””Use OpenAI API for faster inference.”””
response = client.chat.completions.create(
model=”gpt-4o-mini”, # Cheap and fast
messages=[{
‘role’: ‘user’,
‘content’: prompt
}],
temperature=0.3,
response_format={“type”: “json_object”}
)
return response.choices[0].message.content
“`

The API approach costs roughly $0.001-0.005 per email depending on model choice. For personal use, this is essentially free. GPT-4o-mini is surprisingly good at following the JSON format指令.

## Step 4: Processing and Output

Now we tie it together:

“`python
import json
import time

def process_inbox():
“””Main processing loop.”””
# Initialize services (setup code omitted)
service = build(‘gmail’, ‘v1’, credentials=credentials)

emails = fetch_recent_emails(service, max_results=20)
results = []

for email in emails:
prompt = build_summarization_prompt(email)

start = time.time()
# Use either local or API version
summary = summarize_local(prompt)
# summary = summarize_api(prompt)
elapsed = time.time() – start

try:
parsed = json.loads(summary)
parsed[‘processing_time’] = round(elapsed, 2)
parsed[’email_id’] = email[‘id’]
results.append(parsed)
except json.JSONDecodeError:
# Handle malformed output
results.append({
‘action_required’: ‘unknown’,
‘priority’: ‘low’,
‘category’: ‘parse_error’,
‘summary’: ‘Failed to parse LLM output’,
‘entities’: []
})

# Rate limiting for API calls
if elapsed < 0.5: time.sleep(0.5) return results # Example output # [ # { # "action_required": "yes", # "priority": "high", # "category": "meeting", # "summary": "Team meeting rescheduled to Thursday 2pm. Action: confirm attendance.", # "entities": ["Thursday 2pm", "team meeting"], # "processing_time": 1.84 # }, # ... # ] ``` This gives you a structured list of analyzed emails with processing times. You can now filter, sort, or display these results however you want. ## Limitations I've Encountered Be honest about what doesn't work: - **Context truncation loses information**: If the important details are at the end of a long email, you'll miss them. This is a hard limit of the approach. - **JSON output isn't perfect**: LLMs sometimes add trailing commas or slightly wrong keys. The error handling in the code above catches this, but you'll lose some data. - **Local models are slower**: On CPU-only machines, expect 10-30 seconds per email with Ollama. Not usable for real-time workflows. - **Prompt drift**: As emails change format or you get new types of messages, the prompt may need tweaking. This is maintenance overhead. ## Key Takeaways - Building your own email summarization gives you full control over output format, latency, and cost - Use structured prompts that output JSON for machine-readable results - Ollama works locally with decent speed on Apple Silicon or GPU hardware - API-based solutions (GPT-4o-mini) cost under a cent per email and are faster - The real value isn't the summarization—it's the structured metadata (priority, action required, category) you can use for automation ## Next Steps 1. **Get the Gmail API credentials**: Set up a project in Google Cloud Console and enable the Gmail API 2. **Try Ollama first**: If you have an M-series Mac or GPU, install it and test with `ollama run llama3.2` 3. **Run the code above**: Replace the placeholder setup code with your credentials, process 10 emails, and see the output 4. **Customize the prompt**: Change the JSON structure to match what you actually need—add fields, remove ones you don't use 5. **Build the UI**: Pipe these results to a simple dashboard, keyboard shortcuts, or email filters that sort by priority The infrastructure is the easy part. Figuring out what metadata matters for your workflow is where the real optimization happens.