RememberMe / architecture.md
Evan Li
revert
fbb6c1a

HCP Face Analysis β€” Architecture

Pipeline

A single photo runs through ten analyzers. Their outputs are merged into one dictionary; later analyzers can overwrite keys from earlier ones (only intentional in a couple of places β€” _run_pipeline in app.py is the single source of truth).

Photo (RGB ndarray)
  β”‚
  β”œβ”€β–Ί [1] InsightFaceAnalyzer  (insightface buffalo_l, ONNX)
  β”‚       β†’ face_bbox, face_confidence, face_embedding (512-d ArcFace),
  β”‚         age_estimate (piecewise-calibrated). Gender comes from
  β”‚         FairFace (step 3a) for a real softmax confidence.
  β”‚
  β”œβ”€β–Ί Build face crop from face_bbox + padding. Downstream analyzers
  β”‚   that benefit from a tighter input read the crop; MediaPipe gets
  β”‚   the full image because it has its own detector.
  β”‚
  β”œβ”€β–Ί [2] LandmarkAnalyzer  (MediaPipe Face Landmarker)
  β”‚       478 landmarks + 52 blendshapes β†’ all geometric features,
  β”‚       smiling, mouth_open (via blendshapes.jawOpen), eyes_open,
  β”‚       facial_asymmetry_score, smile_asymmetry, possible_dimples,
  β”‚       possible_unibrow.
  β”‚
  β”œβ”€β–Ί [3a] GenderAnalyzer  (dima806/fairface_gender ViT)
  β”‚       β†’ gender, gender_confidence, gender_distribution
  β”‚       (cropped input). Replaces the InsightFace gender head so
  β”‚       we get a real softmax confidence.
  β”‚
  β”œβ”€β–Ί [3b] EthnicityAnalyzer  (cledoux42/Ethnicity_Test_v003 ViT)
  β”‚       β†’ ethnicity, ethnicity_confidence, ethnicity_distribution
  β”‚       (cropped input).
  β”‚
  β”œβ”€β–Ί [4] ParsingAnalyzer  (SegFormer-B5 human parsing)
  β”‚       β†’ _skin_mask, _hair_mask, hat_detected, hair_length,
  β”‚         hair_present, wrinkle_level, skin_texture_score,
  β”‚         skin_uniformity, freckles_or_moles
  β”‚       (cropped input β€” cleaner masks).
  β”‚
  β”œβ”€β–Ί [5] EmotionAnalyzer  (HSEmotion EfficientNet-B0)
  β”‚       β†’ primary/secondary emotion, emotion_scores, valence,
  β”‚         arousal, mood (cropped input).
  β”‚
  β”œβ”€β–Ί [6] ColorAnalyzer  (no ML β€” OpenCV LAB/HSV)
  β”‚       Reads SegFormer masks + MediaPipe lip/iris landmarks.
  β”‚       β†’ skin_tone (Fitzpatrick + L*/a*/b* + hex), skin_undertone,
  β”‚         eye_color, hair_color (name + hex), hair_texture
  β”‚         (coarse, fallback), lip_color (shade + hex)
  β”‚
  β”œβ”€β–Ί [7] ObstructionAnalyzer  (dima806/face_obstruction ViT-B/16)
  β”‚       β†’ wearing_glasses, wearing_sunglasses, wearing_mask,
  β”‚         obstruction_scores (cropped input).
  β”‚
  β”œβ”€β–Ί [8] HairTypeAnalyzer  (dima806/hair_type ViT-B/16)
  β”‚       β†’ hair_type (curly/dreadlocks/kinky/straight/wavy),
  β”‚         hair_type_confidence (cropped input).
  β”‚
  β”œβ”€β–Ί [9] BeautyAnalyzer  (ResNet-50 trained on SCUT-FBP5500)
  β”‚       Optional. Loads local weights or HF Hub; if absent, output
  β”‚       is None and AestheticAnalyzer falls back to rules.
  β”‚       β†’ beauty_score (1.0–5.0), beauty_score_norm (0–100),
  β”‚         beauty_model_source.
  β”‚
  └─► [10] AestheticAnalyzer  (no model)
          Reads the merged dict from steps 1–9 and produces the
          final chopped_score (0–100) plus chopped_breakdown
          showing each factor's signed contribution.

Internal/scratch keys use a leading underscore (_skin_mask, _hair_mask, _raw_landmarks, _insight_landmarks_2d). app.py strips them before returning JSON.

Attribute β†’ source map

Section Field(s) Source
Demographics face_bbox, face_confidence, face_embedding (512-d), age_estimate (piecewise-calibrated), age_range InsightFace buffalo_l
Demographics gender, gender_confidence, gender_distribution FairFace ViT
Demographics ethnicity, ethnicity_confidence, ethnicity_distribution cledoux42 ViT
Emotion primary/secondary emotion, emotion_scores, valence, arousal, mood HSEmotion EffNet-B0
Face Structure face_shape (+ 4 ratios), jawline_type/angle, chin_type, cheekbone_prominence, cheek_fullness, forehead_width, facial_asymmetry_score MediaPipe Face Landmarker
Hair hair_length, hair_present SegFormer-B5
Hair hair_type (+ confidence) HairTypeViT (dima806)
Hair hair_color, hair hex ColorAnalyzer
Eyes eye_shape, eye_depth, eye_spacing, eye_size, eyes_open MediaPipe
Eyes eye_color ColorAnalyzer
Eyebrows eyebrow_shape, eyebrow_arch_height, eyebrow_thickness, possible_unibrow MediaPipe
Nose nose_shape, nose_bridge, nose_tip_shape, nostril_width MediaPipe
Lips & Mouth lip_fullness, lip_balance, mouth_width, cupids_bow, smile_asymmetry, possible_dimples, smiling, mouth_open MediaPipe (last two via blendshapes)
Lips & Mouth lip_color (shade + hex) ColorAnalyzer (mask from MediaPipe)
Skin skin_tone (Fitzpatrick, L*/a*/b*, hex), skin_undertone ColorAnalyzer
Skin wrinkle_level, skin_texture_score, skin_uniformity SegFormer mask + OpenCV stats (freckles_or_moles still computed server-side but no longer displayed β€” detector was too noisy)
Accessories wearing_glasses, wearing_sunglasses, wearing_mask ObstructionViT (dima806)
Accessories wearing_hat SegFormer (hat class coverage)
Aesthetics beauty_score (1–5), beauty_score_norm (0–100) BeautyAnalyzer (SCUT-FBP5500 ResNet-50)
Aesthetics chopped_score (0–100), chopped_breakdown AestheticAnalyzer (rule + learned blend)

Face matching

InsightFace's ArcFace head emits a 512-d L2-normalised recognition embedding. We store it alongside each contact in people.face_embedding (pgvector). On a new photo save, the client queries Supabase for any contact with cosine similarity β‰₯ 0.55 to the new embedding and prompts the user "this looks like {name}, add to that profile?" before creating a new contact.

LFW accuracy is 99.83%; IJB-B at FAR=1e-4 is 96.21%. For grouping photos in a personal collection (similar lighting, same camera) this is excellent. Identical twins and close family members can match β€” the 0.55 threshold makes the prompt opt-in rather than auto-merge.

Training the beauty regressor

Live source in training/beauty/. The script fine-tunes a timm ResNet-50 on SCUT-FBP5500. After training, drop the resulting beauty_regressor.pt into face-service/models/ (or push to HF Hub and set BEAUTY_HF_REPO_ID). BeautyAnalyzer picks it up automatically on the next process boot.

Until weights exist, beauty_score returns None and the AestheticAnalyzer gracefully falls back to a pure rule-based chopped score.

Deployment

The service builds as a Docker image targeting Hugging Face Spaces free tier (2 GB RAM, shared CPU). MediaPipe .task and the InsightFace buffalo_l bundle are pulled at build time; all other Hugging Face models lazy-download on first inference and cache under /root/.cache/huggingface.

The Node/Express server forwards /analyze-face requests to FACE_SERVICE_URL/analyze-base64. The React Native client never talks to this service directly.

Adding a new analyzer

  1. Drop a new module under analyzers/ with a class exposing __init__() and analyze(...) -> dict.
  2. Import + add a lazy-load block in app.py's get_analyzers().
  3. Add a results.update(...) call inside _run_pipeline at the right pipeline position.
  4. Surface the new keys in client/src/screens/EditProfileScreen.js and add a legend row.

Order matters: later analyzers overwrite earlier keys on collision. The aesthetic aggregator runs last so it can see everything.