Skip to content

Tracking pipeline

Every frame in PoseFlow goes through the same five-stage path. This page tracks one frame from camera bytes to rep tick.

Stage 1. Camera

Native (iOS / Android / macOS): PoseCameraView wraps package:camera and emits CameraImage frames. Default resolution preset is medium (~720 × 480); the pose engine downsamples to ≤ 448 px internally, so higher capture resolutions only pay plane-copy + downscale cost. package:camera caps capture at ~30 fps on Android, a Camera2 native config is required to unlock 60 fps for fast-motion movements.

Web: a hidden <video> element + a Web Worker. The worker runs the pose pipeline (WASM); the main thread keeps the UI responsive.

Both surfaces emit Pose at ~30 fps on a modern phone, ~25 fps on mid-tier laptops in the browser. Apple Silicon iPad runs at 60 fps.

Stage 2. Pose detection

NativePoseSource runs the pure-C pose engine via the blaze_flow package. Output is a Pose containing:

  • 2D landmarks: 33 × (x, y, visibility) in display-space [0, 1] coordinates (orientation-rotated, selfie-mirrored on the front camera). One canonical coord space across every surface, there is no separate “raw” pose.
  • 3D world landmarks (when hasWorldLandmarks), 33 × (x, y, z) in metric body-frame coordinates with origin at the hip midpoint. Web pipeline currently doesn’t populate these; native does.
  • indexByName: landmark name → index map (pose landmark layout: nose=0, left_shoulder=11, …).

See Pose for the full accessor surface.

Stage 3. Tracking

MovementTracker.processFrame(pose) runs six sub-systems per frame. Branching is gated on the loaded Movement shape; not every sub- system runs for every movement.

3a. Tracking-point evaluation

Every TrackingPoint defined on the loaded movement computes a scalar value for this frame:

TypeCompute
angle3-landmark joint angle via AngleExtractor
distanceEuclidean distance between two landmarks (display-space [0, 1])
ratiodistance₁ / distance₂ (proportion of two distances)
proximitysmoothed inverse distance (used for wrist_to_shoulder etc.)
positionraw x or y of one landmark in display-space [0, 1]
velocityrolling slope of a base channel over a 200 ms window
stabilityrolling max-min of a base channel over a 1 s window

Per-frame results land in TrackingResult.trackingValues , Map<String, double> keyed by tracking-point id.

3b. Phase state machine

The PhaseStateMachine advances the loaded movement’s phase graph every frame. It runs in one of two modes depending on the loaded Movement:

  • Rule-based mode (phaseConfigs populated, positions empty) , the default and only mode emitted by PoseFlow Studio. Each phase has authored condition gates (membership-style: “current tracking-point values must fall inside these bands”); the machine transitions to the next phase when the gate is satisfied. Reps complete when the machine cycles through the authored phase sequence.

  • Vector-matching mode (Movement.positions.isNotEmpty), legacy movements built from a recorded marker pass. A small PCA index classifies the current pose against named reference positions; the state machine ticks when the user traverses the authored sequence. Result lands in result.phaseId + a confidence score.

The mode is picked at load() time and held for the session, see Rep counting for the full decision tree.

3c. Form service

FormServiceV2 evaluates every FormRule against the current phase’s tracking values. Triggered rules emit FormFeedbackEvents (cue + severity + trigger mode) on the tracker’s onFeedback stream.

The current form score is the rolling weighted average of per-rule compliance.

3d. Camera-angle bucket detection

CameraAngleDetector.detectBucket(pose) runs every frame, voting across 11 paired landmarks to classify the camera bucket (front_hip, 45left_hip, …, front_overhead, etc.). When the movement has authored angleBands for view-dependent measurements, the detected bucket gates which range applies. See camera buckets.

The StableCameraAngleDetector (wraps the stateless detector with a 10-frame rolling window + hysteresis) is what the runtime tracker actually uses, so the chosen bucket is stable across frame-to-frame jitter.

3e. Frame validator

FrameValidator runs visibility + distance gates (“is the user actually in the frame?”). When the gates fail, the rep counter ignores the frame entirely, anti-cheat for partial-view reps.

3f. Rep-scoring gate

Once a rep completes, the optional RepScoringConfig evaluates the rep against a quality threshold. Reps that fail the gate fire RepCompletedEvent but are flagged so the consumer can visibly show “doesn’t count.” Trainers can configure the gate live via the ShowcaseTracker without restarting the session.

Stage 4. Result + events

Every processFrame returns a TrackingResult with:

  • repCount, current count.
  • formScore, 0–100, live across the recent window.
  • phaseId, current phase from the state machine.
  • trackingValues, per-frame channel values.
  • pipeline, RepPipelineSnapshot exposing scoring-gate state for trainer-facing UIs.
  • pose, the original Pose (so consumers can read landmarks without re-running detection).
  • feedback, list of active feedback messages.
  • lastRepQuality, populated only on the frame where a rep completes, with the per-dimension breakdown (form, ROM, tempo, stability).
  • repJustCompleted, true on the single completion frame.

On a rep boundary, onRepCompleted fires once with a RepCompletedEvent (rep number, quality, duration). Form feedback events stream continuously via onFeedback.

Stage 5. Consumer state

Your app subscribes to those streams + reads per-frame result, fans out to UI state (BLoC, ChangeNotifier, etc.), renders.

The shared widget TrackedMovementView bundles stages 1–3 and surfaces stages 4–5 as callbacks; you can also wire PoseCameraView + MovementTracker manually for more control.

Frame timing budget

On a 2023-class mobile (iPhone 14, Pixel 7):

StageCost
Pose detection (the pose engine, C-side)12–20 ms
Tracking-point eval (per frame)< 1 ms
Phase machine + form service< 1 ms
Bucket detector< 0.5 ms
Total per-frame Dart-side work~3 ms

The camera + pose engine dominate; everything downstream is amortised. On web, the worker adds an extra 5–10 ms for the bitmap transfer.

PoseCameraView.onPipelineTiming exposes per-stage PipelineTimingReport data so trainer-facing diagnostics can render the full waterfall.