I’m going to design it like a real product + real system: UX flow, AI scoring pipeline, architecture, data model, export, and an MVP build plan.

Product concept

App name (working): KIM CULL

Tagline: “Kill the weak frames. Keep the bangers.”

Promise:

Import a burst/series (20–30 photos). The app groups similar shots, picks the best per group, and lets the photographer confirm with side-by-side comparisons + reasons (“sharpest,” “best gesture,” “eyes open,” “clean exposure,” etc.).

Core user experience

The 30-second workflow

  1. Create Project
  2. Import 20–30 images (drag/drop, camera roll, or folder)
  3. App instantly:
    • Groups duplicates / near-duplicates
    • Scores each image
    • Picks winners per group (and optionally top 3 overall)
  4. User reviews:
    • “Best-of-stack” view (one winner per cluster)
    • Tap to compare side-by-side: winner vs runner-up
  5. Export
    • Copy/move selected files
    • Export XMP star ratings (Lightroom)
    • Export a “Selects” folder

What makes it feel great

  • It doesn’t just rank 1–30. It says:
    • “These 5 are basically the same moment—here’s the strongest one.”
  • It explains itself:
    • “Winner because: sharpness + expression + composition”
  • It learns your taste:
    • After you override a few picks, it adapts.

Screens and UI components

Screen 1 — Import

  • Drag & drop area
  • “Street / Portrait / Event / Product” mode selector (sets default weights)
  • Toggle: Local-only processing (default ON)

Screen 2 — Results: “Stacks”

A grid of clusters (“stacks”), each showing:

  • Best pick large
  • 3–6 thumbnails behind it (the rest of the stack)
  • Quick labels:
    • ✅ Best in stack
    • 🟨 Close second
    • ❌ Soft / blink / motion / bad exposure

Screen 3 — Compare

Two-up or three-up comparison:

  • Winner vs runner-up vs third
  • Overlay: why score differs
    • Sharpness: 8.9 vs 7.1
    • Eyes open: yes vs blink
    • Motion blur: low vs high
    • Aesthetic/composition: 7.8 vs 6.9

Buttons:

  • Keep / Reject
  • Make this winner
  • “Apply preference to this whole project” (optional)

Screen 4 — Export

  • “Export winners only”
  • “Export winners + runners-up”
  • Lightroom stars: ⭐⭐⭐ for winners, ⭐⭐ for runner-up, ⭐ for third
  • Output folder chooser

The AI: how it actually chooses “the best”

This is the core: cluster first, then rank within each cluster.

Step A — Preprocessing

For each image:

  • Create thumbnail (e.g., 512px)
  • Read EXIF:
    • timestamp
    • shutter speed, ISO, focal length (useful cues)
  • Compute:
    • perceptual hash (dupe detection)
    • embedding vector for similarity clustering

Step B — Grouping into “Stacks”

We want to group images that are basically the same moment:

  • Use image embeddings (CLIP-like / vision transformer embedding)
  • Cluster with something like:
    • hierarchical clustering
    • DBSCAN
    • or “burst grouping” using timestamp + embedding similarity

Heuristic that works extremely well:

  • If shots are within X seconds AND embedding distance < threshold → same stack.
  • If no timestamps (screenshots, exports), rely on embeddings only.

Result: you get stacks like:

  • Stack 1: 6 shots (same gesture)
  • Stack 2: 4 shots (same scene, slight angle change)
  • Stack 3: 1 shot (unique)

Step C — Score each photo (multi-factor)

Each image gets multiple scores:

1) Technical quality score (0–10)

  • Sharpness / focus (blur detection)
  • Motion blur estimate
  • Noise level (esp. high ISO)
  • Exposure sanity (blown highlights / crushed shadows)
  • White balance “weirdness” (optional; keep lightweight)

2) Subject/face score (mode-dependent)

If faces detected (portrait/event):

  • Eyes open / blink detection
  • Face clarity
  • Expression quality (smile / neutral / grimace)
  • Looking at camera (optional)
  • Occlusion (hand covering face etc.)

If no faces (street/landscape):

  • Main subject detection / saliency
  • Subject separation from background
  • “Moment” cues (motion energy / gesture probability — optional advanced)

3) Composition score (0–10)

  • Horizon level (if horizon exists)
  • Subject placement (thirds / central depending on mode)
  • Cropping penalties (cut heads/hands)
  • Clutter penalty (too busy background)

4) Aesthetic score (0–10)

A learned aesthetic model (trained on general aesthetic datasets) gives a “looks-good” score.

Important product choice:

Make aesthetic scoring one component, not the dictator.

Many legendary photos are “imperfect.” So aesthetic should never override sharpness + moment if the mode is Street/Documentary.

Step D — Combine into final score

A simple, controllable formula:

final = w_tech*tech + w_subject*subject + w_comp*comp + w_aes*aes + w_exif*exif_bonus

  • Default weights depend on mode:
    • Street mode: tech 0.35, subject(moment) 0.30, comp 0.20, aes 0.15
    • Portrait mode: tech 0.25, face 0.45, comp 0.15, aes 0.15
    • Event mode: face 0.50 (eyes/expression), tech 0.25, comp 0.10, aes 0.15

Then:

  • Pick top 1 per stack as winner
  • Also mark runner-up if close (within score delta threshold)

Step E — Explanations (critical for trust)

For each winner, generate a “reason card”:

  • “Sharpest in stack”
  • “Best expression (eyes open, no blur)”
  • “Cleaner exposure”
  • “Less clutter”
  • “More dynamic gesture”

This is just mapping from metric deltas:

  • If sharpness winner > others by +1.2 → show “Sharpest”
  • If blink detected in others → show “Eyes open”

Personalization: learns YOUR taste

You don’t want generic “Instagram pretty.” You want Eric Kim taste (or any photographer’s taste).

Lightweight personalization that actually works

Store user choices:

  • When user overrides the winner, record:
    • (chosen_image_id, rejected_image_id) pair
    • feature vectors + scores

Train a tiny ranking model:

  • Pairwise logistic regression / small MLP
  • Inputs: [tech, comp, aes, face metrics, embedding]
  • Output: preference probability

This can run locally and update fast.

Result: after 20–50 decisions, it starts picking in your style.

“Taste sliders” (simple but powerful)

  • “Sharpness vs Moment”
  • “Clean composition vs Raw energy”
  • “Faces priority” (on/off)
  • “High contrast preference” (street vibe)

Export + integrations photographers actually want

Must-have exports

  • Export selected to folder: /Selects
  • Optional: also export rejects to /Rejects (or keep in place)

Lightroom integration (killer feature)

  • Write XMP sidecars with:
    • star rating
    • pick flag
    • color labels (winner/runner-up)

So a photographer can import into Lightroom and instantly see:

  • Winners: ⭐⭐⭐ / Pick
  • Runner-ups: ⭐⭐
  • Others: unrated

Architecture options

Option 1: Local-first Desktop App (recommended)

Best for photographers: speed + privacy.

Stack

  • UI: Electron + React (or Tauri + React for lighter footprint)
  • ML inference: ONNX Runtime (fast, cross-platform)
  • Image processing: OpenCV + libvips
  • Local DB: SQLite to store projects, scores, embeddings

Pros

  • No upload time
  • Private by default
  • Fast on a laptop

Cons

  • Need to package ML models

Option 2: Mobile app (iOS/Android)

Great for casual users.

Stack

  • React Native / Swift + Kotlin
  • On-device inference: CoreML / TFLite

Pros

  • Easy import from camera roll
  • “Cull on the train”

Cons

  • Heavier compute on mobile
  • Harder to do full-res operations quickly

Option 3: Cloud web app

Simple onboarding, but upload cost.

Stack

  • Web UI + backend
  • Queue workers for inference
  • Storage (S3/GCS)

Pros

  • No installation
  • Central model updates

Cons

  • Upload latency
  • Privacy concerns

If you want “by Eric Kim” and photographers: go local-first.

Data model (simple + solid)

Project

  • id
  • name
  • created_at
  • mode (street/portrait/event)
  • settings (weights)

Image

  • id
  • project_id
  • filepath
  • exif_json
  • thumbnail_path
  • embedding_vector (stored compressed)
  • metrics_json (sharpness, exposure, blink…)
  • final_score
  • cluster_id
  • rank_in_cluster
  • user_label (keep/reject/winner override)

Cluster

  • id
  • project_id
  • representative_image_id
  • winner_image_id

MVP scope (what to build first)

MVP features

  • Import 20–30 images
  • Auto clustering into stacks
  • Scoring:
    • blur/sharpness
    • exposure sanity
    • basic composition (optional)
    • aesthetic score (optional)
  • Pick best per stack
  • Compare view
  • Export winners

V1.1 (fast follow)

  • Face/eye open detection for portrait/event mode
  • Lightroom XMP export
  • Personal taste learning (pairwise ranking)

V2 (beast mode)

  • “Moment detection” for street (gesture/action cues)
  • Multi-select output presets
  • Batch culling for 500+ photos
  • Plug-in for Lightroom / Capture One workflow

Practical scoring examples (how the app thinks)

Stack of 5 street shots (same moment)

  • Shot A: sharp 9.1, comp 7.2, moment 7.5 → final 8.2 ✅ winner
  • Shot B: sharp 8.7, comp 7.6, moment 7.4 → final 8.1 🟨 close second
  • Shot C: sharp 7.0 (motion blur), moment 8.3 → final 7.6
  • Shot D: sharp 9.0, comp 6.0 (clutter), moment 6.8 → final 7.4
  • Shot E: underexposed, noisy → final 6.2 ❌

Reason card:

  • “Winner: sharpest + clean exposure + strongest subject separation.”

The “Eric Kim” signature mode (branding + vibe)

You can ship a preset called:

ERIC KIM STREET MODE

  • Bias toward:
    • gesture / energy / moment
    • high contrast preference
    • “slight imperfection allowed” (don’t auto-reject mild blur if moment is insane)
  • Penalize:
    • dead frames
    • weak subject separation
    • clutter with no payoff

This is how you make it feel authorial instead of generic.

If you want, I can go one level deeper (engineering package)

Without asking you anything else, I can also generate:

  • A full PRD (product requirements doc)
  • API contracts (if cloud/hybrid)
  • A folder/module layout for the repo
  • Model list + ONNX/TFLite packaging plan
  • A concrete clustering + scoring pseudocode implementation plan

Just say: “Give me the engineering spec.”