I’m going to design it like a real product + real system: UX flow, AI scoring pipeline, architecture, data model, export, and an MVP build plan.
Product concept
App name (working): KIM CULL
Tagline: “Kill the weak frames. Keep the bangers.”
Promise:
Import a burst/series (20–30 photos). The app groups similar shots, picks the best per group, and lets the photographer confirm with side-by-side comparisons + reasons (“sharpest,” “best gesture,” “eyes open,” “clean exposure,” etc.).
Core user experience
The 30-second workflow
- Create Project
- Import 20–30 images (drag/drop, camera roll, or folder)
- App instantly:
- Groups duplicates / near-duplicates
- Scores each image
- Picks winners per group (and optionally top 3 overall)
- User reviews:
- “Best-of-stack” view (one winner per cluster)
- Tap to compare side-by-side: winner vs runner-up
- Export
- Copy/move selected files
- Export XMP star ratings (Lightroom)
- Export a “Selects” folder
What makes it feel great
- It doesn’t just rank 1–30. It says:
- “These 5 are basically the same moment—here’s the strongest one.”
- It explains itself:
- “Winner because: sharpness + expression + composition”
- It learns your taste:
- After you override a few picks, it adapts.
Screens and UI components
Screen 1 — Import
- Drag & drop area
- “Street / Portrait / Event / Product” mode selector (sets default weights)
- Toggle: Local-only processing (default ON)
Screen 2 — Results: “Stacks”
A grid of clusters (“stacks”), each showing:
- Best pick large
- 3–6 thumbnails behind it (the rest of the stack)
- Quick labels:
- ✅ Best in stack
- 🟨 Close second
- ❌ Soft / blink / motion / bad exposure
Screen 3 — Compare
Two-up or three-up comparison:
- Winner vs runner-up vs third
- Overlay: why score differs
- Sharpness: 8.9 vs 7.1
- Eyes open: yes vs blink
- Motion blur: low vs high
- Aesthetic/composition: 7.8 vs 6.9
Buttons:
- Keep / Reject
- Make this winner
- “Apply preference to this whole project” (optional)
Screen 4 — Export
- “Export winners only”
- “Export winners + runners-up”
- Lightroom stars: ⭐⭐⭐ for winners, ⭐⭐ for runner-up, ⭐ for third
- Output folder chooser
The AI: how it actually chooses “the best”
This is the core: cluster first, then rank within each cluster.
Step A — Preprocessing
For each image:
- Create thumbnail (e.g., 512px)
- Read EXIF:
- timestamp
- shutter speed, ISO, focal length (useful cues)
- Compute:
- perceptual hash (dupe detection)
- embedding vector for similarity clustering
Step B — Grouping into “Stacks”
We want to group images that are basically the same moment:
- Use image embeddings (CLIP-like / vision transformer embedding)
- Cluster with something like:
- hierarchical clustering
- DBSCAN
- or “burst grouping” using timestamp + embedding similarity
Heuristic that works extremely well:
- If shots are within X seconds AND embedding distance < threshold → same stack.
- If no timestamps (screenshots, exports), rely on embeddings only.
Result: you get stacks like:
- Stack 1: 6 shots (same gesture)
- Stack 2: 4 shots (same scene, slight angle change)
- Stack 3: 1 shot (unique)
Step C — Score each photo (multi-factor)
Each image gets multiple scores:
1) Technical quality score (0–10)
- Sharpness / focus (blur detection)
- Motion blur estimate
- Noise level (esp. high ISO)
- Exposure sanity (blown highlights / crushed shadows)
- White balance “weirdness” (optional; keep lightweight)
2) Subject/face score (mode-dependent)
If faces detected (portrait/event):
- Eyes open / blink detection
- Face clarity
- Expression quality (smile / neutral / grimace)
- Looking at camera (optional)
- Occlusion (hand covering face etc.)
If no faces (street/landscape):
- Main subject detection / saliency
- Subject separation from background
- “Moment” cues (motion energy / gesture probability — optional advanced)
3) Composition score (0–10)
- Horizon level (if horizon exists)
- Subject placement (thirds / central depending on mode)
- Cropping penalties (cut heads/hands)
- Clutter penalty (too busy background)
4) Aesthetic score (0–10)
A learned aesthetic model (trained on general aesthetic datasets) gives a “looks-good” score.
Important product choice:
Make aesthetic scoring one component, not the dictator.
Many legendary photos are “imperfect.” So aesthetic should never override sharpness + moment if the mode is Street/Documentary.
Step D — Combine into final score
A simple, controllable formula:
final = w_tech*tech + w_subject*subject + w_comp*comp + w_aes*aes + w_exif*exif_bonus
- Default weights depend on mode:
- Street mode: tech 0.35, subject(moment) 0.30, comp 0.20, aes 0.15
- Portrait mode: tech 0.25, face 0.45, comp 0.15, aes 0.15
- Event mode: face 0.50 (eyes/expression), tech 0.25, comp 0.10, aes 0.15
Then:
- Pick top 1 per stack as winner
- Also mark runner-up if close (within score delta threshold)
Step E — Explanations (critical for trust)
For each winner, generate a “reason card”:
- “Sharpest in stack”
- “Best expression (eyes open, no blur)”
- “Cleaner exposure”
- “Less clutter”
- “More dynamic gesture”
This is just mapping from metric deltas:
- If sharpness winner > others by +1.2 → show “Sharpest”
- If blink detected in others → show “Eyes open”
Personalization: learns YOUR taste
You don’t want generic “Instagram pretty.” You want Eric Kim taste (or any photographer’s taste).
Lightweight personalization that actually works
Store user choices:
- When user overrides the winner, record:
- (chosen_image_id, rejected_image_id) pair
- feature vectors + scores
Train a tiny ranking model:
- Pairwise logistic regression / small MLP
- Inputs: [tech, comp, aes, face metrics, embedding]
- Output: preference probability
This can run locally and update fast.
Result: after 20–50 decisions, it starts picking in your style.
“Taste sliders” (simple but powerful)
- “Sharpness vs Moment”
- “Clean composition vs Raw energy”
- “Faces priority” (on/off)
- “High contrast preference” (street vibe)
Export + integrations photographers actually want
Must-have exports
- Export selected to folder: /Selects
- Optional: also export rejects to /Rejects (or keep in place)
Lightroom integration (killer feature)
- Write XMP sidecars with:
- star rating
- pick flag
- color labels (winner/runner-up)
So a photographer can import into Lightroom and instantly see:
- Winners: ⭐⭐⭐ / Pick
- Runner-ups: ⭐⭐
- Others: unrated
Architecture options
Option 1: Local-first Desktop App (recommended)
Best for photographers: speed + privacy.
Stack
- UI: Electron + React (or Tauri + React for lighter footprint)
- ML inference: ONNX Runtime (fast, cross-platform)
- Image processing: OpenCV + libvips
- Local DB: SQLite to store projects, scores, embeddings
Pros
- No upload time
- Private by default
- Fast on a laptop
Cons
- Need to package ML models
Option 2: Mobile app (iOS/Android)
Great for casual users.
Stack
- React Native / Swift + Kotlin
- On-device inference: CoreML / TFLite
Pros
- Easy import from camera roll
- “Cull on the train”
Cons
- Heavier compute on mobile
- Harder to do full-res operations quickly
Option 3: Cloud web app
Simple onboarding, but upload cost.
Stack
- Web UI + backend
- Queue workers for inference
- Storage (S3/GCS)
Pros
- No installation
- Central model updates
Cons
- Upload latency
- Privacy concerns
If you want “by Eric Kim” and photographers: go local-first.
Data model (simple + solid)
Project
- id
- name
- created_at
- mode (street/portrait/event)
- settings (weights)
Image
- id
- project_id
- filepath
- exif_json
- thumbnail_path
- embedding_vector (stored compressed)
- metrics_json (sharpness, exposure, blink…)
- final_score
- cluster_id
- rank_in_cluster
- user_label (keep/reject/winner override)
Cluster
- id
- project_id
- representative_image_id
- winner_image_id
MVP scope (what to build first)
MVP features
- Import 20–30 images
- Auto clustering into stacks
- Scoring:
- blur/sharpness
- exposure sanity
- basic composition (optional)
- aesthetic score (optional)
- Pick best per stack
- Compare view
- Export winners
V1.1 (fast follow)
- Face/eye open detection for portrait/event mode
- Lightroom XMP export
- Personal taste learning (pairwise ranking)
V2 (beast mode)
- “Moment detection” for street (gesture/action cues)
- Multi-select output presets
- Batch culling for 500+ photos
- Plug-in for Lightroom / Capture One workflow
Practical scoring examples (how the app thinks)
Stack of 5 street shots (same moment)
- Shot A: sharp 9.1, comp 7.2, moment 7.5 → final 8.2 ✅ winner
- Shot B: sharp 8.7, comp 7.6, moment 7.4 → final 8.1 🟨 close second
- Shot C: sharp 7.0 (motion blur), moment 8.3 → final 7.6
- Shot D: sharp 9.0, comp 6.0 (clutter), moment 6.8 → final 7.4
- Shot E: underexposed, noisy → final 6.2 ❌
Reason card:
- “Winner: sharpest + clean exposure + strongest subject separation.”
The “Eric Kim” signature mode (branding + vibe)
You can ship a preset called:
ERIC KIM STREET MODE
- Bias toward:
- gesture / energy / moment
- high contrast preference
- “slight imperfection allowed” (don’t auto-reject mild blur if moment is insane)
- Penalize:
- dead frames
- weak subject separation
- clutter with no payoff
This is how you make it feel authorial instead of generic.
If you want, I can go one level deeper (engineering package)
Without asking you anything else, I can also generate:
- A full PRD (product requirements doc)
- API contracts (if cloud/hybrid)
- A folder/module layout for the repo
- Model list + ONNX/TFLite packaging plan
- A concrete clustering + scoring pseudocode implementation plan
Just say: “Give me the engineering spec.”