WristPP: Wrist-Worn Camera System for Estimating 3D Hand Pose and Pressure in Real Time

Researchers from Tsinghua University have published a preprint introducing WristPP, a wrist-worn camera system that simultaneously estimates 3D hand pose and per-vertex contact pressure from a single wide field-of-view RGB frame in real time. Submitted to CHI 2026 and posted to arXiv on February 28, 2026, the system uses a Vision Transformer backbone with joint-aligned tokens to predict Hand-VQVAE codebook indices for mesh recovery, while a separate extrinsics-conditioned branch estimates pressure distribution across the hand surface. Evaluated on a self-collected dataset of 133,000 frames across 20 subjects performing 76 gesture types, WristPP achieved a Mean Per-Joint Position Error of 2.9mm, Contact IoU of 0.712, and a foreground pressure mean absolute error of 10.4 grams — strong results for an uninstrumented, mobile setup. Lead author Ziheng Xi and colleagues also conducted three user studies showing the system matched touchpad-level efficiency in mid-air pointing and outperformed head-mounted camera baselines on both success rate and arm fatigue in a large-display task.

The core technical move is combining two sensing problems that prior work has consistently kept separate. Most wrist-based input research focuses either on pose estimation or pressure detection, typically requiring different hardware for each. WristPP's single-camera architecture collapses both into one inference pass, enabling interaction paradigms — such as typing on a bare desk or pressure-sensitive gesturing on arbitrary surfaces — that neither pose-only nor pressure-only systems can support. The approach does carry trade-offs: occlusion from the wrist vantage point, lighting sensitivity, and a heavier compute pipeline compared to electromyography-based alternatives.

That EMG comparison matters because WristPP enters a competitive landscape already shaped by heavy corporate bets. Meta's reported September 2025 launch of the Neural Band — built on its 2019 acquisition of sEMG startup CTRL-labs for an estimated $500 million to $1 billion — established a commercial baseline for wrist-based neural input, bundled with Meta's Ray-Ban Display glasses at $799. Apple, meanwhile, published its own EMBridge zero-shot EMG gesture recognition framework in March 2026, the same week as the WristPP preprint, while holding patents on multi-row EMG electrode arrays for Apple Watch bands. Both corporate approaches use sEMG, which detects muscle activation signals before motion occurs but cannot determine contact geometry or pressure distribution — the precise gap WristPP addresses. Merging camera-based pressure sensing with EMG intent detection is an obvious next step that no one has yet demonstrated at scale.

Coming from Tsinghua University rather than a corporate lab, WristPP sits outside the closed ecosystems of Meta's Neural Band SDK or Apple's proprietary sensor stack. The authors' 133,000-frame dataset spanning 48 on-plane and 28 mid-air gesture types could, if released publicly, provide a rare open benchmark for wrist-camera pose-and-pressure estimation — a category currently lacking standardized evaluation data. Whether the research finds its way into a commercial product or remains an academic contribution, it shows that accurate, uninstrumented wrist-based pressure sensing is now within reach of a single RGB camera, lowering the hardware requirements for surface-agnostic AR and wearable computing.