Indexatron: Teaching Local LLMs to See Family Photos

ai python ollama llm privacy experiment

Status: ✅ SUCCESS Hypothesis: Local LLMs can analyse family photos with useful metadata extraction

I’ve been building the-mcculloughs.org - a family photo sharing app. The Rails side handles uploads, galleries, and all the usual stuff. But I wanted semantic search - not just “photos from 2015” but “photos at the beach” or “pictures with grandma.”

The cloud APIs exist. But uploading decades of family photos to someone else’s servers? Hard pass.

Time for a science experiment - two apps working together.

The Experiment

I called it Indexatron 🤖

The goal: prove that Ollama running locally with LLaVA:7b and nomic-embed-text can:

  1. Analyse photos - Extract descriptions, detect people/objects, estimate era
  2. Generate embeddings - Create 768-dimensional vectors for similarity search
  3. Process batches - Handle multiple images with progress tracking

Test Results

MetricValue
Images Processed3/3
Failed0
Total Time40.82s
Avg Time/Image~13.6s

Sample Outputs

🐕 family_photo_03.jpg

🍺 family_photo_02.jpg

👔 family_photo_01.jpg

What Worked Well

LLaVA Vision Analysis

Embedding Generation

Batch Processing

Quirks & Learnings

JSON Parsing Required Repair

LLaVA doesn’t always output clean JSON. The analyser needed:

Model Hallucinations

Some amusing observations:

These quirks don’t break the system - robust parsing handles them.

Processing Time

~13.6 seconds per image is acceptable for batch processing. Real-time analysis would need:

Development Approach

This was parallel development across two codebases - with very different approaches for each.

The Boring Bits: AI Agents for CRUD

The Rails API work? It’s not exciting. Setting up API endpoints, adding pgvector, writing migrations, CRUD operations - I’ve done this hundreds of times. It’s necessary scaffolding, but it’s not where I want to spend my brain cycles.

So I let AI agents handle it. Claude Code with custom agents for code review, test writing, and documentation. The agents handled the boilerplate while I reviewed and approved. This is exactly what AI assistance is good for - augmenting the repetitive work so you can focus on what matters.

The Interesting Bits: README-Driven Development

Indexatron was different. This was an experiment - I needed to understand every piece, make deliberate choices, and document as I went. For this, I used README-driven development:

  1. Write the README first - Document what the code should do before writing it
  2. One branch per milestone - Each branch proves one thing works
  3. Merge only when it works - No moving on until the milestone is complete
  4. AI for documentation - Let agents help write up the results

README-driven development forces you to think through the design before coding. It’s slower, but you end up with working code and documentation. Perfect for experiments where you need to prove something works.

Development Progress

Indexatron (Python) - The Experiment

README-driven development with one branch per milestone:

PRMilestoneWhat It Proved
#5Project SetupFoundation ready
#1Ollama ConnectionLocal LLM runtime accessible
#2Image AnalysisLLaVA extracts useful metadata
#3Embeddings768-dim vectors for similarity
#4Batch ProcessingScalable to many images

Each branch had to work before moving on. Prove it, merge it, move on.

Rails App - The Integration (Agent-Assisted)

While I focused on Indexatron, AI agents handled the Rails infrastructure:

PRFeature
#60AI Photo Analysis API with pgvector

Standard API endpoint, database migration, pgvector setup - all the CRUD that’s been done a thousand times before. The agents wrote the code, I reviewed it, tests passed, merged. That’s the right division of labour: agents handle the predictable, humans handle the novel.

Technical Stack

Ollama (local runtime)
├── llava:7b (~4.7GB) - Vision analysis
└── nomic-embed-text (~274MB) - Embeddings

Python 3.11+
├── ollama - API client
├── pydantic - Data validation
├── pillow - Image handling
└── rich - Console output

Next Steps

This proves the concept works. Future integration:

  1. Rails API - Add endpoint for on-demand analysis
  2. Database Storage - Save embeddings in PostgreSQL (pgvector)
  3. Similarity Search - Find “photos like this one”
  4. Face Recognition - Cluster photos by person (future model)

Conclusion

🤖 The robots can see our photos.

Local LLMs provide a privacy-preserving alternative to cloud APIs for photo analysis. The quality is good enough for family photo organisation, and the 768-dimensional embeddings enable future similarity search features.

Code

RepositoryDescription
swmcc/indexatronPython service for local LLM photo analysis
swmcc/the-mcculloughs.orgRails family photo sharing app

Full experiment results: RESULTS.md


Built with Ollama, LLaVA, and a healthy scepticism of cloud APIs.

← Back to writing