Spaces:

evanlyhf
/

RememberMe

Sleeping

App Files Files Community

RememberMe / architecture.md

Evan Li

revert

fbb6c1a 14 days ago

preview code

raw

history blame contribute delete

7.65 kB

HCP Face Analysis — Architecture

Pipeline

A single photo runs through ten analyzers. Their outputs are merged into one dictionary; later analyzers can overwrite keys from earlier ones (only intentional in a couple of places — _run_pipeline in app.py is the single source of truth).

Photo (RGB ndarray)
  │
  ├─► [1] InsightFaceAnalyzer  (insightface buffalo_l, ONNX)
  │       → face_bbox, face_confidence, face_embedding (512-d ArcFace),
  │         age_estimate (piecewise-calibrated). Gender comes from
  │         FairFace (step 3a) for a real softmax confidence.
  │
  ├─► Build face crop from face_bbox + padding. Downstream analyzers
  │   that benefit from a tighter input read the crop; MediaPipe gets
  │   the full image because it has its own detector.
  │
  ├─► [2] LandmarkAnalyzer  (MediaPipe Face Landmarker)
  │       478 landmarks + 52 blendshapes → all geometric features,
  │       smiling, mouth_open (via blendshapes.jawOpen), eyes_open,
  │       facial_asymmetry_score, smile_asymmetry, possible_dimples,
  │       possible_unibrow.
  │
  ├─► [3a] GenderAnalyzer  (dima806/fairface_gender ViT)
  │       → gender, gender_confidence, gender_distribution
  │       (cropped input). Replaces the InsightFace gender head so
  │       we get a real softmax confidence.
  │
  ├─► [3b] EthnicityAnalyzer  (cledoux42/Ethnicity_Test_v003 ViT)
  │       → ethnicity, ethnicity_confidence, ethnicity_distribution
  │       (cropped input).
  │
  ├─► [4] ParsingAnalyzer  (SegFormer-B5 human parsing)
  │       → _skin_mask, _hair_mask, hat_detected, hair_length,
  │         hair_present, wrinkle_level, skin_texture_score,
  │         skin_uniformity, freckles_or_moles
  │       (cropped input — cleaner masks).
  │
  ├─► [5] EmotionAnalyzer  (HSEmotion EfficientNet-B0)
  │       → primary/secondary emotion, emotion_scores, valence,
  │         arousal, mood (cropped input).
  │
  ├─► [6] ColorAnalyzer  (no ML — OpenCV LAB/HSV)
  │       Reads SegFormer masks + MediaPipe lip/iris landmarks.
  │       → skin_tone (Fitzpatrick + L*/a*/b* + hex), skin_undertone,
  │         eye_color, hair_color (name + hex), hair_texture
  │         (coarse, fallback), lip_color (shade + hex)
  │
  ├─► [7] ObstructionAnalyzer  (dima806/face_obstruction ViT-B/16)
  │       → wearing_glasses, wearing_sunglasses, wearing_mask,
  │         obstruction_scores (cropped input).
  │
  ├─► [8] HairTypeAnalyzer  (dima806/hair_type ViT-B/16)
  │       → hair_type (curly/dreadlocks/kinky/straight/wavy),
  │         hair_type_confidence (cropped input).
  │
  ├─► [9] BeautyAnalyzer  (ResNet-50 trained on SCUT-FBP5500)
  │       Optional. Loads local weights or HF Hub; if absent, output
  │       is None and AestheticAnalyzer falls back to rules.
  │       → beauty_score (1.0–5.0), beauty_score_norm (0–100),
  │         beauty_model_source.
  │
  └─► [10] AestheticAnalyzer  (no model)
          Reads the merged dict from steps 1–9 and produces the
          final chopped_score (0–100) plus chopped_breakdown
          showing each factor's signed contribution.

Internal/scratch keys use a leading underscore (_skin_mask, _hair_mask, _raw_landmarks, _insight_landmarks_2d). app.py strips them before returning JSON.

Attribute → source map

Section	Field(s)	Source
Demographics	face_bbox, face_confidence, face_embedding (512-d), age_estimate (piecewise-calibrated), age_range	InsightFace buffalo_l
Demographics	gender, gender_confidence, gender_distribution	FairFace ViT
Demographics	ethnicity, ethnicity_confidence, ethnicity_distribution	cledoux42 ViT
Emotion	primary/secondary emotion, emotion_scores, valence, arousal, mood	HSEmotion EffNet-B0
Face Structure	face_shape (+ 4 ratios), jawline_type/angle, chin_type, cheekbone_prominence, cheek_fullness, forehead_width, facial_asymmetry_score	MediaPipe Face Landmarker
Hair	hair_length, hair_present	SegFormer-B5
Hair	hair_type (+ confidence)	HairTypeViT (dima806)
Hair	hair_color, hair hex	ColorAnalyzer
Eyes	eye_shape, eye_depth, eye_spacing, eye_size, eyes_open	MediaPipe
Eyes	eye_color	ColorAnalyzer
Eyebrows	eyebrow_shape, eyebrow_arch_height, eyebrow_thickness, possible_unibrow	MediaPipe
Nose	nose_shape, nose_bridge, nose_tip_shape, nostril_width	MediaPipe
Lips & Mouth	lip_fullness, lip_balance, mouth_width, cupids_bow, smile_asymmetry, possible_dimples, smiling, mouth_open	MediaPipe (last two via blendshapes)
Lips & Mouth	lip_color (shade + hex)	ColorAnalyzer (mask from MediaPipe)
Skin	skin_tone (Fitzpatrick, L/a/b*, hex), skin_undertone	ColorAnalyzer
Skin	wrinkle_level, skin_texture_score, skin_uniformity	SegFormer mask + OpenCV stats (`freckles_or_moles` still computed server-side but no longer displayed — detector was too noisy)
Accessories	wearing_glasses, wearing_sunglasses, wearing_mask	ObstructionViT (dima806)
Accessories	wearing_hat	SegFormer (hat class coverage)
Aesthetics	beauty_score (1–5), beauty_score_norm (0–100)	BeautyAnalyzer (SCUT-FBP5500 ResNet-50)
Aesthetics	chopped_score (0–100), chopped_breakdown	AestheticAnalyzer (rule + learned blend)

Face matching

InsightFace's ArcFace head emits a 512-d L2-normalised recognition embedding. We store it alongside each contact in people.face_embedding (pgvector). On a new photo save, the client queries Supabase for any contact with cosine similarity ≥ 0.55 to the new embedding and prompts the user "this looks like {name}, add to that profile?" before creating a new contact.

LFW accuracy is 99.83%; IJB-B at FAR=1e-4 is 96.21%. For grouping photos in a personal collection (similar lighting, same camera) this is excellent. Identical twins and close family members can match — the 0.55 threshold makes the prompt opt-in rather than auto-merge.

Training the beauty regressor

Live source in training/beauty/. The script fine-tunes a timm ResNet-50 on SCUT-FBP5500. After training, drop the resulting beauty_regressor.pt into face-service/models/ (or push to HF Hub and set BEAUTY_HF_REPO_ID). BeautyAnalyzer picks it up automatically on the next process boot.

Until weights exist, beauty_score returns None and the AestheticAnalyzer gracefully falls back to a pure rule-based chopped score.

Deployment

The service builds as a Docker image targeting Hugging Face Spaces free tier (2 GB RAM, shared CPU). MediaPipe .task and the InsightFace buffalo_l bundle are pulled at build time; all other Hugging Face models lazy-download on first inference and cache under /root/.cache/huggingface.

The Node/Express server forwards /analyze-face requests to FACE_SERVICE_URL/analyze-base64. The React Native client never talks to this service directly.

Adding a new analyzer

Drop a new module under analyzers/ with a class exposing __init__() and analyze(...) -> dict.
Import + add a lazy-load block in app.py's get_analyzers().
Add a results.update(...) call inside _run_pipeline at the right pipeline position.
Surface the new keys in client/src/screens/EditProfileScreen.js and add a legend row.

Order matters: later analyzers overwrite earlier keys on collision. The aesthetic aggregator runs last so it can see everything.