Wormius Realtimeus

Doable in real time with a compact pipeline and a small nonlinear solver per frame. I'll give a complete, practical approach you can implement right away:

assumptions I’ll make (don’t worry if some don’t hold; I include fallbacks):
1. camera intrinsics (K) are known (focal length (f), principal point). If not, I show a fallback.
2. each bead has known physical radius (R) and the known center-to-center spacing sequence (s_i).
3. each bead has a unique color and the colors appear in the same sequence left→right in the string (you said sequence is known).
4. you have a measured ground plane (plane equation in camera coords) ( \mathbf{n}^\top \mathbf{X} + h = 0) or at least the height (z_{plane}) in camera coordinates.
5. beads are nearly spherical (we’ll model them as spheroids/ellipsoids when needed).

Outline: detect bead 2D centers → assign by unique color/sequence → build 3D position solver that fits depths along camera rays subject to spacing constraints and plane-contact constraints (≥30% touching) → fit oriented ellipsoids for visualization using local tangent + radius → per-frame warm-start + few LM iterations for real-time.

1) Image preprocessing & 2D bead detection

Convert image to an appropriate color space (HSV or Lab*) and threshold per known bead color to get masks.
For each color mask:
- morphological opening/closing to clean.
- find contours; compute sub-pixel centroid (moment or elliptical fit) and image ellipse (cv2.fitEllipse) for ellipse axes & orientation.
You now have image points (u_i = (x_i,y_i)) and a 2D ellipse per bead (if you want shape cues).
Order beads according to the known color sequence (so you have correspondences between image points and physical bead indices).

Implementation tip: use a tiny per-color lookup table (HSV ranges), and filter by expected area to reject false detections. Warm start color thresholds adaptively if lighting shifts.

2) Ray back-projection

For a bead image point (u_i = (x_i,y_i)) and intrinsics (K), compute normalized camera ray [ \mathbf{r}_i = K^{-1} [x_i, y_i, 1]^\top ] and unit ray direction (\hat{\mathbf r}_i = \mathbf{r}_i/|\mathbf{r}_i|).

Unknown 3D bead center (P_i) lies on the ray: (P_i = d_i , \hat{\mathbf r}_i) where (d_i>0) is the depth along ray.

If intrinsics are unknown:

approximate focal length from known physical bead radius using apparent radius in pixels (r^{px}): (d \approx f \cdot (R / r^{px})). If (f) unknown, you can bootstrap by assuming a reasonable (f) (e.g. sensor width / FoV) or run a short calibration with a calibration pattern. This will be less accurate but usable.

3) Constrained depth estimation (core solver)

We want 3D positions (P_i(d_i)) that

project near the observed (u_i) (data term),
satisfy center-to-center spacing (||P_{i+1}-P_i|| = s_i) (geometric constraint),
enforce that at least (m = \lceil 0.3 N\rceil) beads are contacting the ground plane: for those beads, enforce the plane-contact constraint ( \mathbf{n}^\top P_j + h = -R) (touching plane at distance R along normal). (Note: plane equation sign depends on convention; adapt to your measured plane.)

Because selecting which beads touch the plane is combinatorial, use this practical approach:

heuristic selection: pick the (m) beads with largest image y (lowest on image) OR smallest depth from an initial solve; these are likely to be on the plane. Force those to satisfy the contact constraint (hard constraint).
optionally run two attempts: pick the top-m by image y and try solver; if residuals high, try top-m by depth; pick the lower residual result.

Mathematical optimization (unknown vector (d = [d_1..d_N])):

Minimize [ \mathcal{E}(d)= \sum_i w_i | \pi(d_i \hat r_i) - u_i |^2

\lambda_s \sum_{i} \big(|d_{i+1}\hat r_{i+1} - d_i \hat r_i| - s_i\big)^2
\lambda_p \sum_{j\in\mathcal{P}} \big(\mathbf{n}^\top (d_j \hat r_j) + h + R \big)^2 ] where (\pi(\cdot)) is projection with intrinsics (K), (\mathcal{P}) is the set of indices forced to touch plane, (w_i) weights from ellipse confidence, and (\lambda_s,\lambda_p) are penalties.

Use scipy.optimize.least_squares (Levenberg–Marquardt or trust-region-reflective) to solve for (d). Per-frame: warm-start with previous (d).

This optimization is small (N variables where N is number of beads — usually tens), so it runs quickly in C++/ceres or Python+NumPy with a few LM iterations. In optimized C++ with ceres you'll get >30Hz easily for N~20; in Python expect 10–30Hz depending on machine if you limit iterations.

4) Estimating ellipsoid orientation & radii

Once bead centers (P_i) are known, align an ellipsoid (center (P_i), rotation (R_i), axes (a,b,c)):

Heuristic practical model:

principal axis of ellipsoid aligned with local tangent of the string:
- compute tangent (T_i) as normalized (P_{i+1}-P_{i-1}) (endpoints use nearest neighbor).
choose one principal axis along (T_i), the other two perpendicular; set the two transverse axes equal to physical bead radius (R) (so (a=R, b=R)).
the depth-axis (c) (axis along tangent) can be set to (R) for almost-spherical beads, or adjusted to better match the observed 2D ellipse: use the projected ellipse from the 3D ellipsoid and solve for (c) that best matches the image ellipse major/minor ratio. There's a closed form mapping from a 3D ellipsoid (symmetric positive definite matrix Q) projected into image to a 2D conic. Solve for (c) by least squares (low-dim scalar solve).

If beads are nearly spherical, using sphere (all axes = R) is simpler and robust.

To render an ellipsoid so its projection matches the observed fitted 2D ellipse more closely, use the method in "Image of a quadric is a conic" (Torr & Zisserman style). Practically, for each bead:

compute image ellipse matrix (C_i) from cv2.fitEllipse
set up the relation between 3D quadric (Q_i) and image conic (C_i \propto K^{-T} [R_i ,, t_i] Q_i^{-1} [R_i ,, t_i]^T K^{-1}) and solve for unknown scale in ellipsoid axes. That is heavier algebraically but yields better visual alignment.

5) Per-frame real-time considerations & acceleration

Warm start with previous frame's (d).
Solve only when beads moved or every k-th frame; use optical flow for tiny motions and skip heavy optimization.
Use analytic Jacobians for the LM solver for speed (I can provide if you want).
C++ (ceres-solver) implementation recommended for production; Python scipy.optimize.least_squares is fine for prototyping.
If you have GPU & CUDA, move image processing to GPU (OpenCV CUDA) and run solver in CPU.

6) Robustness tips

If occlusion or missed beads: solve with only visible beads; use spacing penalty to keep chain consistent and infer occluded bead positions.
If beads move relative to string (rotation), rely on color sequence — that gives strong correspondence.
If lighting changes, adapt HSV thresholds with small per-frame feedback (tune via running mean of bead colors).

7) Concrete Python sketch (core solver)

This is a compact, runnable sketch for the depth solver. You’ll need OpenCV and scipy.

import numpy as np
import cv2
from scipy.optimize import least_squares

# inputs (per frame)
K = ...                 # 3x3 intrinsics
rays = ...              # Nx3 unit rays in camera coords (from backproject(image points))
u = ...                 # Nx2 image points (x,y)
s = ...                 # (N-1) spacing array in meters
R_bead = 0.005          # bead radius in meters (example)
plane_n = np.array([0,0,1.0])  # plane normal in camera coords (example)
plane_h = -0.0          # plane constant: n^T X + h = 0 (adapt to your plane)
N = len(rays)
m = int(np.ceil(0.3 * N))  # how many must touch

# helper: project point to image
def project_point(X, K):
    x = K @ X
    return (x[:2] / x[2])

# residual function
def residuals(d):
    # d: N depths
    res = []
    # reprojection residuals
    for i in range(N):
        P = d[i] * rays[i]
        p = project_point(P, K)
        res.append(p[0] - u[i,0])
        res.append(p[1] - u[i,1])
    # spacing residuals
    lambda_s = 50.0
    for i in range(N-1):
        Pi = d[i]*rays[i]
        Pj = d[i+1]*rays[i+1]
        dist = np.linalg.norm(Pj - Pi)
        res.append(lambda_s * (dist - s[i]))
    # plane-contact residuals: pick indices with largest image y (lowest in image)
    # (assumes image origin at top-left)
    idx_sorted = np.argsort(-u[:,1])
    for j in idx_sorted[:m]:
        Pj = d[j]*rays[j]
        res.append(100.0 * (plane_n.dot(Pj) + plane_h + R_bead))
    return np.array(res)

# initial guess: simple proportional depth from pixel size or constant
d0 = np.ones(N) * 1.0  # 1m initial
res = least_squares(residuals, d0, method='lm', max_nfev=60, ftol=1e-6)
d_est = res.x
# compute 3D centers
P_est = (d_est[:,None] * rays)

This is the minimal core; expand with covariance, weights, bounds (d>0), and good initial guesses.

8) Putting it all together (pipeline summary)

per-frame: color-segment → centroid + ellipse fit → rays r_i.
order by known color sequence → produce correspondences (u_i \leftrightarrow) bead index.
build rays, spacing vector (s_i), plane parameters.
choose (m=\lceil0.3N\rceil). Select candidate contact beads (lowest image y or smallest predicted depth).
run least_squares solve for depths (d) with spacing + plane penalties.
compute 3D centers (P_i).
compute local tangents and build ellipsoid orientation; render/supply ellipsoid geometry to your renderer.
warm-start next frame from current solution.

9) Edge cases & recommended improvements

If the chain has large curvature and the spacing constraints are tight, the solver can become nonlinear — use stronger (\lambda_s) or parameterize positions by a spline with along-string arc-length variables to reduce DOF.
If beads sit on plane in multiple poses, try combinatorial selection of which beads touch plane for a handful of top candidates (cheap if N small).
For best speed & robustness: implement solver in C++ + ceres and use analytic Jacobians. Offload color segmentation to GPU.

Next steps

provide a fuller Python reference implementation (color segmentation → centroid → solver → ellipsoid fit) that you can run on example frames,
give the analytic Jacobian for the residuals (for speed),
or port the solver to a ceres-based C++ snippet for real-time 60Hz+.

meshula/wormius_realtimeus.md

Select an option

No results found