Date: May 11, 2026 Status: ✅ FULLY OPERATIONAL Test Video: kpop_test.mp4 (2.0M, K-Pop dance)
Both pipelines run from your Mac but use the 3090 for heavy compute.
What it does: Extracts skeleton from video, builds rigged 3D character with weighted mesh, applies motion capture poses, and renders studio-quality MP4.
Output:
.blendfile (for manual editing)_rendered.mp4(1080x1920, 30fps, H.264)
When to use it: When you need clean motion reference for Kling 3.0, or want to change camera angles/lighting/export to Unreal.
Pipeline steps:
- Mac: Extract frames from video (
ffmpeg) - Mac→3090: Sync frames via
rsync - 3090: SAM3D inference (DINOv3 + MHR pose extraction)
- 3090: Extract skeleton, mesh, poses (Python + PyTorch)
- 3090→Mac: Sync MHR data back
- Mac: Blender builds rigged scene (armature + mesh + poses)
- Mac: Blender renders MP4 (EEVEE engine)
Key files:
run_option_a_mocap.sh— Main orchestrator (runs on Mac)build_mhr_scene.py— Blender scene builder (Mac Blender)remote_wrapper.py— Runs on 3090, handles SAM3D + MHR extraction
Render settings:
- Resolution: 1080x1920 (portrait)
- Engine: EEVEE (fast real-time render)
- Camera: Auto-positioned based on mesh bounding box
- Lighting: Sun + Fill lights (3-point setup)
- Material: Silver mannequin (metallic 0.8, roughness 0.3)
- Background: Dark gray (0.05, 0.05, 0.05)
Performance:
- kpop_test.mp4 (302 frames): ~8-10 minutes total
- Frame extraction: ~10 seconds
- SAM3D inference: ~3-5 minutes
- MHR extraction: ~30 seconds
- Blender build: ~1 minute
- Blender render: ~2-3 minutes
What it does: Runs video through ComfyUI on 3090, uses AI to paint grey mannequin directly over original pixels frame-by-frame.
Output: Single .mp4 with isolated mesh on black background.
When to use it: When you need quick motion reference immediately and don't need camera control.
Pipeline steps:
- Mac: Upload video to 3090
- Mac: Send API request to ComfyUI (port 8188)
- 3090: ComfyUI runs SAM3D with
render_mode=mesh_only - 3090: Renders isolated mesh video
- Mac→3090: Download MP4
Key files:
sam3d_comfy_api.py— ComfyUI API clientrun_kling_mocap.sh— Orchestrator script
Render modes available:
mesh_only— Isolated grey mannequin (default, recommended)side_by_side— 3-way split (original | mask | overlay)mask_only— Just silhouette maskoverlay— Mannequin overlaid on original video
Performance:
- kpop_test.mp4: ~3-5 minutes total
- ComfyUI: Port 8188 ✅
- SAM3D Model:
/home/straughter/ComfyUI/models/sam3dbody/model.ckpt(2.0G) ✅ - MHR Model:
/home/straughter/ComfyUI/models/sam3dbody/assets/mhr_model.pt✅ - VRAM: ~25GB free ✅
- Blender: 4.3.2 ✅
- ffmpeg: Frame extraction ✅
- Python 3.14: SAM3D API client ✅
- rsync: File transfer ✅
./run_option_a_mocap.sh your_video.mp4Output:
Option_A_Mocap.blend— Blender scene fileOption_A_Mocap_rendered.mp4— Rendered video
./run_kling_mocap.sh your_video.mp4Output:
sam3d_kling_ref_XXXXX.mp4— Mesh overlay video
The camera is automatically positioned based on the mesh's bounding box:
- Calculate mesh bounding box
- Find center point
- Set camera distance:
max(height, width) * 2.5 - Position camera at chest height:
(center_x, center_y - dist, center_z) - Rotate camera:
(90°, 0, 0)to face the mesh
This ensures the character is always properly framed regardless of video content.
The original pose extraction method works perfectly:
pose_tensor = torch.from_numpy(data[0]['mhr_model_params']).unsqueeze(0)
with torch.no_grad(): _, skel = model(identity, pose_tensor, extra)No need to use pred_joint_coords directly - the MHR model handles it correctly.
- Sun Light: Main key light (energy: 5.0)
- Fill Light: Area light for shadows (energy: 100.0)
- Material: Silver mannequin with 80% metallic, 30% roughness
Solution: Camera not aiming at mesh. Use the bounding box calculation method (included in v2).
Solution: Ensure World background is set and lights are added. The v2 script handles this automatically.
Solution: Verify SAM3D completed successfully and generated .pkl files. Check that mhr_model_params vary between frames.
Fixed:
- ✅ Camera positioning now uses bounding box calculation (auto-framing)
- ✅ Lighting setup included (Sun + Fill)
- ✅ Material properly applied (silver mannequin)
- ✅ World background set (dark gray)
- ✅ Auto-render to MP4 on scene build
Removed:
- ❌ EDM cyberpunk lighting (overkill for motion reference)
- ❌ Ground plane + grid (unnecessary clutter)
- ❌ Manual camera positioning (prone to framing issues)
Result: Clean, reliable motion capture output that just works.
| Option A (Blender) | Option B (ComfyUI) | |
|---|---|---|
| Speed | ~8-10 min | ~3-5 min |
| Output quality | Studio-lit 3D render | AI pixel paint |
| Camera control | ✅ Full 3D (auto-positioned) | ❌ Fixed |
| Exportable rig | ✅ .blend file | ❌ No |
| Compute location | Mac (Blender) + 3090 (SAM3D) | 3090 only |
| Best for | Final production ref | Quick iteration |
Now that motion capture is working, you can:
- Generate EDM audio using your Audio Factory workflow
- Run motion capture on dance/performance videos
- Composite mocap video with audio
- Feed to Kling 3.0 for final video generation
The motion capture output is clean reference material that Kling can use to generate consistent character motion.