Here’s a full, engineer-ready blueprint for a photo culling app where someone drops in 20–30 images and the AI helps them pick the best ones — fast, explainable, and customizable to taste.

I’m going to design it like a real product + real system: UX flow, AI scoring pipeline, architecture, data model, export, and an MVP build plan.

Product concept

App name (working): KIM CULL

Tagline: “Kill the weak frames. Keep the bangers.”

Promise:

Import a burst/series (20–30 photos). The app groups similar shots, picks the best per group, and lets the photographer confirm with side-by-side comparisons + reasons (“sharpest,” “best gesture,” “eyes open,” “clean exposure,” etc.).

Core user experience

The 30-second workflow

Create Project
Import 20–30 images (drag/drop, camera roll, or folder)
App instantly:
- Groups duplicates / near-duplicates
- Scores each image
- Picks winners per group (and optionally top 3 overall)
User reviews:
- “Best-of-stack” view (one winner per cluster)
- Tap to compare side-by-side: winner vs runner-up
Export
- Copy/move selected files
- Export XMP star ratings (Lightroom)
- Export a “Selects” folder

What makes it feel great

It doesn’t just rank 1–30. It says:
- “These 5 are basically the same moment—here’s the strongest one.”
It explains itself:
- “Winner because: sharpness + expression + composition”
It learns your taste:
- After you override a few picks, it adapts.

Screens and UI components

Screen 1 — Import

Drag & drop area
“Street / Portrait / Event / Product” mode selector (sets default weights)
Toggle: Local-only processing (default ON)

Screen 2 — Results: “Stacks”

A grid of clusters (“stacks”), each showing:

Best pick large
3–6 thumbnails behind it (the rest of the stack)
Quick labels:
- ✅ Best in stack
- 🟨 Close second
- ❌ Soft / blink / motion / bad exposure

Screen 3 — Compare

Two-up or three-up comparison:

Winner vs runner-up vs third
Overlay: why score differs
- Sharpness: 8.9 vs 7.1
- Eyes open: yes vs blink
- Motion blur: low vs high
- Aesthetic/composition: 7.8 vs 6.9

Buttons:

Keep / Reject
Make this winner
“Apply preference to this whole project” (optional)

Screen 4 — Export

“Export winners only”
“Export winners + runners-up”
Lightroom stars: ⭐⭐⭐ for winners, ⭐⭐ for runner-up, ⭐ for third
Output folder chooser

The AI: how it actually chooses “the best”

This is the core: cluster first, then rank within each cluster.

Step A — Preprocessing

For each image:

Create thumbnail (e.g., 512px)
Read EXIF:
- timestamp
- shutter speed, ISO, focal length (useful cues)
Compute:
- perceptual hash (dupe detection)
- embedding vector for similarity clustering

Step B — Grouping into “Stacks”

We want to group images that are basically the same moment:

Use image embeddings (CLIP-like / vision transformer embedding)
Cluster with something like:
- hierarchical clustering
- DBSCAN
- or “burst grouping” using timestamp + embedding similarity

Heuristic that works extremely well:

If shots are within X seconds AND embedding distance < threshold → same stack.
If no timestamps (screenshots, exports), rely on embeddings only.

Result: you get stacks like:

Stack 1: 6 shots (same gesture)
Stack 2: 4 shots (same scene, slight angle change)
Stack 3: 1 shot (unique)

Step C — Score each photo (multi-factor)

Each image gets multiple scores:

1) Technical quality score (0–10)

Sharpness / focus (blur detection)
Motion blur estimate
Noise level (esp. high ISO)
Exposure sanity (blown highlights / crushed shadows)
White balance “weirdness” (optional; keep lightweight)

2) Subject/face score (mode-dependent)

If faces detected (portrait/event):

Eyes open / blink detection
Face clarity
Expression quality (smile / neutral / grimace)
Looking at camera (optional)
Occlusion (hand covering face etc.)

If no faces (street/landscape):

Main subject detection / saliency
Subject separation from background
“Moment” cues (motion energy / gesture probability — optional advanced)

3) Composition score (0–10)

Horizon level (if horizon exists)
Subject placement (thirds / central depending on mode)
Cropping penalties (cut heads/hands)
Clutter penalty (too busy background)

4) Aesthetic score (0–10)

A learned aesthetic model (trained on general aesthetic datasets) gives a “looks-good” score.

Important product choice:

Make aesthetic scoring one component, not the dictator.

Many legendary photos are “imperfect.” So aesthetic should never override sharpness + moment if the mode is Street/Documentary.

Step D — Combine into final score

A simple, controllable formula:

final = w_tech*tech + w_subject*subject + w_comp*comp + w_aes*aes + w_exif*exif_bonus

Default weights depend on mode:
- Street mode: tech 0.35, subject(moment) 0.30, comp 0.20, aes 0.15
- Portrait mode: tech 0.25, face 0.45, comp 0.15, aes 0.15
- Event mode: face 0.50 (eyes/expression), tech 0.25, comp 0.10, aes 0.15

Then:

Pick top 1 per stack as winner
Also mark runner-up if close (within score delta threshold)

Step E — Explanations (critical for trust)

For each winner, generate a “reason card”:

“Sharpest in stack”
“Best expression (eyes open, no blur)”
“Cleaner exposure”
“Less clutter”
“More dynamic gesture”

This is just mapping from metric deltas:

If sharpness winner > others by +1.2 → show “Sharpest”
If blink detected in others → show “Eyes open”

Personalization: learns YOUR taste

You don’t want generic “Instagram pretty.” You want Eric Kim taste (or any photographer’s taste).

Lightweight personalization that actually works

Store user choices:

When user overrides the winner, record:
- (chosen_image_id, rejected_image_id) pair
- feature vectors + scores

Train a tiny ranking model:

Pairwise logistic regression / small MLP
Inputs: [tech, comp, aes, face metrics, embedding]
Output: preference probability

This can run locally and update fast.

Result: after 20–50 decisions, it starts picking in your style.

“Taste sliders” (simple but powerful)

“Sharpness vs Moment”
“Clean composition vs Raw energy”
“Faces priority” (on/off)
“High contrast preference” (street vibe)

Export + integrations photographers actually want

Must-have exports

Export selected to folder: /Selects
Optional: also export rejects to /Rejects (or keep in place)

Lightroom integration (killer feature)

Write XMP sidecars with:
- star rating
- pick flag
- color labels (winner/runner-up)

So a photographer can import into Lightroom and instantly see:

Winners: ⭐⭐⭐ / Pick
Runner-ups: ⭐⭐
Others: unrated

Architecture options

Option 1: Local-first Desktop App (recommended)

Best for photographers: speed + privacy.

Stack

UI: Electron + React (or Tauri + React for lighter footprint)
ML inference: ONNX Runtime (fast, cross-platform)
Image processing: OpenCV + libvips
Local DB: SQLite to store projects, scores, embeddings

Pros

No upload time
Private by default
Fast on a laptop

Cons

Need to package ML models

Option 2: Mobile app (iOS/Android)

Great for casual users.

Stack

React Native / Swift + Kotlin
On-device inference: CoreML / TFLite

Pros

Easy import from camera roll
“Cull on the train”

Cons

Heavier compute on mobile
Harder to do full-res operations quickly

Option 3: Cloud web app

Simple onboarding, but upload cost.

Stack

Web UI + backend
Queue workers for inference
Storage (S3/GCS)

Pros

No installation
Central model updates

Cons

Upload latency
Privacy concerns

If you want “by Eric Kim” and photographers: go local-first.

Data model (simple + solid)

Project

id
name
created_at
mode (street/portrait/event)
settings (weights)

Image

id
project_id
filepath
exif_json
thumbnail_path
embedding_vector (stored compressed)
metrics_json (sharpness, exposure, blink…)
final_score
cluster_id
rank_in_cluster
user_label (keep/reject/winner override)

Cluster

id
project_id
representative_image_id
winner_image_id

MVP scope (what to build first)

MVP features

Import 20–30 images
Auto clustering into stacks
Scoring:
- blur/sharpness
- exposure sanity
- basic composition (optional)
- aesthetic score (optional)
Pick best per stack
Compare view
Export winners

V1.1 (fast follow)

Face/eye open detection for portrait/event mode
Lightroom XMP export
Personal taste learning (pairwise ranking)

V2 (beast mode)

“Moment detection” for street (gesture/action cues)
Multi-select output presets
Batch culling for 500+ photos
Plug-in for Lightroom / Capture One workflow

Practical scoring examples (how the app thinks)

Stack of 5 street shots (same moment)

Shot A: sharp 9.1, comp 7.2, moment 7.5 → final 8.2 ✅ winner
Shot B: sharp 8.7, comp 7.6, moment 7.4 → final 8.1 🟨 close second
Shot C: sharp 7.0 (motion blur), moment 8.3 → final 7.6
Shot D: sharp 9.0, comp 6.0 (clutter), moment 6.8 → final 7.4
Shot E: underexposed, noisy → final 6.2 ❌

Reason card:

“Winner: sharpest + clean exposure + strongest subject separation.”

The “Eric Kim” signature mode (branding + vibe)

You can ship a preset called:

ERIC KIM STREET MODE

Bias toward:
- gesture / energy / moment
- high contrast preference
- “slight imperfection allowed” (don’t auto-reject mild blur if moment is insane)
Penalize:
- dead frames
- weak subject separation
- clutter with no payoff

This is how you make it feel authorial instead of generic.

If you want, I can go one level deeper (engineering package)

Without asking you anything else, I can also generate:

A full PRD (product requirements doc)
API contracts (if cloud/hybrid)
A folder/module layout for the repo
Model list + ONNX/TFLite packaging plan
A concrete clustering + scoring pseudocode implementation plan

Just say: “Give me the engineering spec.”