Doable in real time with a compact pipeline and a small nonlinear solver per frame. I'll give a complete, practical approach you can implement right away:
-
assumptions I’ll make (don’t worry if some don’t hold; I include fallbacks):
- camera intrinsics (K) are known (focal length (f), principal point). If not, I show a fallback.
- each bead has known physical radius (R) and the known center-to-center spacing sequence (s_i).
- each bead has a unique color and the colors appear in the same sequence left→right in the string (you said sequence is known).
- you have a measured ground plane (plane equation in camera coords) ( \mathbf{n}^\top \mathbf{X} + h = 0) or at least the height (z_{plane}) in camera coordinates.
- beads are nearly spherical (we’ll model them as spheroids/ellipsoids when needed).
Outline: detect bead 2D centers → assign by unique color/sequence → build 3D position solver that fits depths along camera rays subject to spacing constraints and plane-contact constraints (≥30% touching) → fit oriented ellipsoids for visualization using local tangent + radius → per-frame warm-start + few LM iterations for real-time.
-
Convert image to an appropriate color space (HSV or Lab*) and threshold per known bead color to get masks.
-
For each color mask:
- morphological opening/closing to clean.
- find contours; compute sub-pixel centroid (moment or elliptical fit) and image ellipse (cv2.fitEllipse) for ellipse axes & orientation.
-
You now have image points (u_i = (x_i,y_i)) and a 2D ellipse per bead (if you want shape cues).
-
Order beads according to the known color sequence (so you have correspondences between image points and physical bead indices).
Implementation tip: use a tiny per-color lookup table (HSV ranges), and filter by expected area to reject false detections. Warm start color thresholds adaptively if lighting shifts.
For a bead image point (u_i = (x_i,y_i)) and intrinsics (K), compute normalized camera ray [ \mathbf{r}_i = K^{-1} [x_i, y_i, 1]^\top ] and unit ray direction (\hat{\mathbf r}_i = \mathbf{r}_i/|\mathbf{r}_i|).
Unknown 3D bead center (P_i) lies on the ray: (P_i = d_i , \hat{\mathbf r}_i) where (d_i>0) is the depth along ray.
If intrinsics are unknown:
- approximate focal length from known physical bead radius using apparent radius in pixels (r^{px}): (d \approx f \cdot (R / r^{px})). If (f) unknown, you can bootstrap by assuming a reasonable (f) (e.g. sensor width / FoV) or run a short calibration with a calibration pattern. This will be less accurate but usable.
We want 3D positions (P_i(d_i)) that
- project near the observed (u_i) (data term),
- satisfy center-to-center spacing (||P_{i+1}-P_i|| = s_i) (geometric constraint),
- enforce that at least (m = \lceil 0.3 N\rceil) beads are contacting the ground plane: for those beads, enforce the plane-contact constraint ( \mathbf{n}^\top P_j + h = -R) (touching plane at distance R along normal). (Note: plane equation sign depends on convention; adapt to your measured plane.)
Because selecting which beads touch the plane is combinatorial, use this practical approach:
- heuristic selection: pick the (m) beads with largest image y (lowest on image) OR smallest depth from an initial solve; these are likely to be on the plane. Force those to satisfy the contact constraint (hard constraint).
- optionally run two attempts: pick the top-m by image y and try solver; if residuals high, try top-m by depth; pick the lower residual result.
Mathematical optimization (unknown vector (d = [d_1..d_N])):
Minimize [ \mathcal{E}(d)= \sum_i w_i | \pi(d_i \hat r_i) - u_i |^2
- \lambda_s \sum_{i} \big(|d_{i+1}\hat r_{i+1} - d_i \hat r_i| - s_i\big)^2
- \lambda_p \sum_{j\in\mathcal{P}} \big(\mathbf{n}^\top (d_j \hat r_j) + h + R \big)^2 ] where (\pi(\cdot)) is projection with intrinsics (K), (\mathcal{P}) is the set of indices forced to touch plane, (w_i) weights from ellipse confidence, and (\lambda_s,\lambda_p) are penalties.
Use scipy.optimize.least_squares (Levenberg–Marquardt or trust-region-reflective) to solve for (d). Per-frame: warm-start with previous (d).
This optimization is small (N variables where N is number of beads — usually tens), so it runs quickly in C++/ceres or Python+NumPy with a few LM iterations. In optimized C++ with ceres you'll get >30Hz easily for N~20; in Python expect 10–30Hz depending on machine if you limit iterations.
Once bead centers (P_i) are known, align an ellipsoid (center (P_i), rotation (R_i), axes (a,b,c)):
Heuristic practical model:
-
principal axis of ellipsoid aligned with local tangent of the string:
- compute tangent (T_i) as normalized (P_{i+1}-P_{i-1}) (endpoints use nearest neighbor).
-
choose one principal axis along (T_i), the other two perpendicular; set the two transverse axes equal to physical bead radius (R) (so (a=R, b=R)).
-
the depth-axis (c) (axis along tangent) can be set to (R) for almost-spherical beads, or adjusted to better match the observed 2D ellipse: use the projected ellipse from the 3D ellipsoid and solve for (c) that best matches the image ellipse major/minor ratio. There's a closed form mapping from a 3D ellipsoid (symmetric positive definite matrix Q) projected into image to a 2D conic. Solve for (c) by least squares (low-dim scalar solve).
If beads are nearly spherical, using sphere (all axes = R) is simpler and robust.
To render an ellipsoid so its projection matches the observed fitted 2D ellipse more closely, use the method in "Image of a quadric is a conic" (Torr & Zisserman style). Practically, for each bead:
- compute image ellipse matrix (C_i) from cv2.fitEllipse
- set up the relation between 3D quadric (Q_i) and image conic (C_i \propto K^{-T} [R_i ,, t_i] Q_i^{-1} [R_i ,, t_i]^T K^{-1}) and solve for unknown scale in ellipsoid axes. That is heavier algebraically but yields better visual alignment.
- Warm start with previous frame's (d).
- Solve only when beads moved or every k-th frame; use optical flow for tiny motions and skip heavy optimization.
- Use analytic Jacobians for the LM solver for speed (I can provide if you want).
- C++ (ceres-solver) implementation recommended for production; Python
scipy.optimize.least_squaresis fine for prototyping. - If you have GPU & CUDA, move image processing to GPU (OpenCV CUDA) and run solver in CPU.
- If occlusion or missed beads: solve with only visible beads; use spacing penalty to keep chain consistent and infer occluded bead positions.
- If beads move relative to string (rotation), rely on color sequence — that gives strong correspondence.
- If lighting changes, adapt HSV thresholds with small per-frame feedback (tune via running mean of bead colors).
This is a compact, runnable sketch for the depth solver. You’ll need OpenCV and scipy.
import numpy as np
import cv2
from scipy.optimize import least_squares
# inputs (per frame)
K = ... # 3x3 intrinsics
rays = ... # Nx3 unit rays in camera coords (from backproject(image points))
u = ... # Nx2 image points (x,y)
s = ... # (N-1) spacing array in meters
R_bead = 0.005 # bead radius in meters (example)
plane_n = np.array([0,0,1.0]) # plane normal in camera coords (example)
plane_h = -0.0 # plane constant: n^T X + h = 0 (adapt to your plane)
N = len(rays)
m = int(np.ceil(0.3 * N)) # how many must touch
# helper: project point to image
def project_point(X, K):
x = K @ X
return (x[:2] / x[2])
# residual function
def residuals(d):
# d: N depths
res = []
# reprojection residuals
for i in range(N):
P = d[i] * rays[i]
p = project_point(P, K)
res.append(p[0] - u[i,0])
res.append(p[1] - u[i,1])
# spacing residuals
lambda_s = 50.0
for i in range(N-1):
Pi = d[i]*rays[i]
Pj = d[i+1]*rays[i+1]
dist = np.linalg.norm(Pj - Pi)
res.append(lambda_s * (dist - s[i]))
# plane-contact residuals: pick indices with largest image y (lowest in image)
# (assumes image origin at top-left)
idx_sorted = np.argsort(-u[:,1])
for j in idx_sorted[:m]:
Pj = d[j]*rays[j]
res.append(100.0 * (plane_n.dot(Pj) + plane_h + R_bead))
return np.array(res)
# initial guess: simple proportional depth from pixel size or constant
d0 = np.ones(N) * 1.0 # 1m initial
res = least_squares(residuals, d0, method='lm', max_nfev=60, ftol=1e-6)
d_est = res.x
# compute 3D centers
P_est = (d_est[:,None] * rays)This is the minimal core; expand with covariance, weights, bounds (d>0), and good initial guesses.
- per-frame: color-segment → centroid + ellipse fit → rays
r_i. - order by known color sequence → produce correspondences (u_i \leftrightarrow) bead index.
- build rays, spacing vector (s_i), plane parameters.
- choose (m=\lceil0.3N\rceil). Select candidate contact beads (lowest image y or smallest predicted depth).
- run least_squares solve for depths (d) with spacing + plane penalties.
- compute 3D centers (P_i).
- compute local tangents and build ellipsoid orientation; render/supply ellipsoid geometry to your renderer.
- warm-start next frame from current solution.
- If the chain has large curvature and the spacing constraints are tight, the solver can become nonlinear — use stronger (\lambda_s) or parameterize positions by a spline with along-string arc-length variables to reduce DOF.
- If beads sit on plane in multiple poses, try combinatorial selection of which beads touch plane for a handful of top candidates (cheap if N small).
- For best speed & robustness: implement solver in C++ + ceres and use analytic Jacobians. Offload color segmentation to GPU.
- provide a fuller Python reference implementation (color segmentation → centroid → solver → ellipsoid fit) that you can run on example frames,
- give the analytic Jacobian for the residuals (for speed),
- or port the solver to a ceres-based C++ snippet for real-time 60Hz+.