Karl — PDF OCR Automation

What It Does

This is an automated document processing system that works like a smart assistant for PDFs. You drop a PDF into a Google Drive folder, and the system takes it from there — it reads every page, pulls out all the text, saves any images it finds, writes a short summary of what each page is about, and neatly organizes everything into a spreadsheet. No manual copying and pasting.

It's designed for anyone dealing with large PDF documents — whether it's reports, scanned contracts, manuals, or research papers. The workflow handles up to 200 pages per document and processes everything in the background.

How It Works

1. PDF Lands in Drive

The system watches a designated Google Drive folder. As soon as a PDF is uploaded, the workflow starts automatically.

2. Mistral OCR Reads Every Page

The PDF is sent to Mistral's OCR API, which scans each page and extracts both text and embedded images with high accuracy.

3. Images Are Saved Separately

Every image found in the document is extracted, given a clear filename (page number + image number), and uploaded to a separate Google Drive folder. The image links are tracked in the spreadsheet.

4. AI Writes a Summary

Each page's text is passed to GPT-4.1-mini, which generates a short, one-line summary. This makes it easy to understand what a document contains at a glance.

5. Everything Goes to a Spreadsheet

All extracted data lands in a Google Sheet — page number, full extracted text, AI summary, image filenames, and image links. It's a clean, searchable record of your entire document.

Key Features

Hands-Free Workflow

There's nothing to click once the PDF is uploaded. The system watches, processes, and organizes everything from start to finish automatically.

Smart Image Handling

Images aren't just dumped into a folder. Each one is named based on which document and page it came from, so you can always trace an image back to its source.

Page-Level Summaries

AI writes a short summary for every page. This is especially useful for long documents — you can browse the summaries to find the page you actually need without scrolling through hundreds of pages.

Structured Output

Everything lives in a Google Sheet with separate sections for text content and image data. It's easy to search, filter, export, or share with others.

Built With

n8n Mistral OCR API OpenAI GPT-4.1-mini Google Drive Google Sheets

Karl Miler — PDF OCR Automation