ENH: Add Eigen-backed itk::Math::SVD and migrate ITK's vnl_svd consumers#6487
ENH: Add Eigen-backed itk::Math::SVD and migrate ITK's vnl_svd consumers#6487hjmjohnson wants to merge 7 commits into
Conversation
|
| Operation | Why sign-invariant |
|---|---|
U·Vᵀ (polar / closest rotation) |
(−uᵢ)(−vᵢ)ᵀ = uᵢvᵢᵀ |
pinverse = V Σ⁺ Uᵀ |
paired sign flip cancels |
Solve = V Σ⁺ Uᵀ b |
paired sign flip cancels |
recompose = U Σ Vᵀ |
paired sign flip cancels |
This is the key enabling migration of real code with zero baseline churn.
3.4 Zero-copy data path
ITK/vnl storage is row-major and contiguous; Eigen defaults to column-major. We map the existing buffer with Eigen::Map<…, Eigen::RowMajor> (no allocation, no copy) and let Eigen's storage-order-aware assignment fill the vnl output.
4. Experiments (accuracy [T1], square perf [T2/Fig 2], rectangular perf [T3])
4.1 Accuracy
| Check | Result |
|---|---|
Reconstruction ‖UΣVᵀ − A‖ |
≤ machine precision |
Singular values vs vnl_svd |
agree to ε·max(W) |
pinverse vs vnl_svd (tall/wide) |
rel. 1e-15…8e-15 |
Moore–Penrose A·A⁺·A = A |
holds |
| Large-size accuracy | BDCSVD marginally better than vnl_svd |
Table 1. Accuracy validation (GoogleTest itkMathSVDGTest.cxx); ill-conditioned (6×6 Hilbert, κ≈1e7) and degenerate (repeated σ) spectra tested explicitly.
4.2 Performance — square (single-thread, idle host, AppleClang arm64)
| Size | Eigen ÷ vnl_svd | Winner |
|---|---|---|
| 3×3 | 2.2× | Eigen |
| 4×4 | 2.0× | Eigen |
| 6×6 | ~1.0× | parity |
| 8×8 | ~0.7× | vnl_svd/LINPACK |
| 16×16 | ~0.6× | vnl_svd/LINPACK |
| ≥50×50 | 1.5–2.3× | Eigen (BDCSVD) |
Table 2. Square SVD throughput ratio (>1 = Eigen faster).
Figure 2 — Square speedup (Eigen ÷ vnl_svd), single-thread, idle
0.5× 1.0×(parity) 2.0×
| | |
3×3 ████████████████████████████████████████ 2.2× ▲ Eigen
4×4 ████████████████████████████████████ 2.0× │ wins
6×6 ██████████████████ 1.0× ── parity
8×8 ████████████ 0.7× │ vnl
16×16 ██████████ 0.6× ▼ wins
50×50 ███████████████████████████ 1.5× ▲ Eigen
200×200 ████████████████████████████████████████ 2.3× │ wins
The only band where Eigen loses (8×8–16×16 dense square) is seldom — perhaps never — exercised in ITK, whose hot paths are 3×3/4×4 transforms and image-orientation (Eigen ~2×) plus occasional large dense systems (BDCSVD wins). LINPACK may hold a hardware-alignment advantage at the mid band, not yet explored (§7).
4.3 Performance — rectangular (single-thread, load-noted)
| Shape | Eigen ÷ vnl_svd | pinverse err |
|---|---|---|
| 10×3 | 0.88× | 1e-15 |
| 20×6 | 0.74× | 1e-15 |
| 50×10 | 0.89× | 2e-15 |
| 100×20 | 1.96× | 4e-15 |
| 200×50 | 2.29× | 8e-15 |
| 3×10 (wide) | 1.51× | 5e-16 |
| 10×40 (wide) | 5.57× | 2e-15 |
Table 3. Rectangular BDCSVD (thin U/V) vs vnl_svd pinverse — Eigen wins larger-tall and all wide shapes; loses only tiny-tall where setup dominates. (Methodology: rectangular measured at host load ~13.5 — read as ratios; square set measured idle.)
5. Migration & validation (sites [T4], tolerance proofs [T5], build/test matrix [T6], review)
5.1 Migrated call sites
| File | Shape | Operation | Invariance |
|---|---|---|---|
itkRigid2DTransform.hxx |
2×2 | U·Vᵀ (orthogonalize) |
sign-invariant |
itkNiftiImageIO.cxx |
3×3 | U·Uᵀ, U·Vᵀ, min(W) |
sign-invariant |
itkKernelTransform.hxx |
square | pinverse (abs 1e-8 tol) |
sign-indep. |
itkDisplacementFieldTransform.hxx |
square | pinverse (per-voxel inverse) |
sign-indep. |
itkFEMLinearSystemWrapperDenseVNL.cxx |
square | Solve |
sign-indep. |
itkQuadEdgeMeshDecimationQuadricElementHelper.h |
small square | recompose/rank/Solve |
sign-indep. |
itkBSplineControlPointImageFilter.hxx |
rectangular | pinverse · S |
sign-indep. |
itkBSplineScatteredDataPointSetToImageFilter.hxx |
rectangular | pinverse · S |
sign-indep. |
Table 4. Every square and rectangular vnl_svd consumer in ITK proper.
5.2 Tolerance faithfulness
Original vnl_svd form |
Migrated form | Equivalence |
|---|---|---|
vnl_svd(A) (default zero_out_tol=0) |
pinverse(0) / Solve(b,0) |
tol = 0·max(W) = 0 ⇒ keep all nonzero σ |
vnl_svd(L, 1e-8) |
pinverse(1e-8/W[0]) |
rcond·max(W) = 1e-8 (absolute) |
ctor absTol + zero_out_relative(relTol) |
rcond = max(absTol/W[0], relTol) |
in-place zeroing ⇒ net max(absTol, relTol·σmax) |
singularities()==0 (tol 1e-8) |
W.min_value() > 1e-8 |
identical boundary |
Table 5. Tolerance reproduction (independently verified in code review).
5.3 Build & test matrix
| Platform / compiler | Build | Warnings (changed code) | Tests |
|---|---|---|---|
| macOS / AppleClang arm64 | ✅ exit 0 | 0 | full 5973/5973 |
| Linux (cortex) / gcc 13.3 | ✅ exit 0 | 0 | default 4487/4487 |
Table 6. Two-compiler validation. (The 4 macOS warnings observed are pre-existing -Wunused-parameter in the RANSAC remote module, independent of this PR.) CI on #6487 extends coverage to MSVC/Windows, ARM, and the Azure pipelines.
5.4 Code review
Multiple maximum-effort multi-agent passes (line-by-line, removed-behavior, cross-file, C++/Eigen pitfalls, conventions) found zero live correctness bugs; the removed-behavior pass independently re-verified all eight migrations faithful, and two plausible findings were refuted on inspection.
6. Discussion — the deprecation gate
The umbrella rule for replacing a vnl_* engine with Eigen is that Eigen must win both accuracy and speed. Here:
- Accuracy: Eigen ≥
vnl_svdeverywhere (equal, better when ill-conditioned). ✔ - Speed: size-mixed — wins small (≤4×4), large (≥50×50), all wide; loses the 8×8–16×16 dense band. ✘ (not a clean sweep)
Therefore this PR does not deprecate or re-back vnl_svd. It ships the Eigen path as the opt-in itk::Math::SVD, migrates ITK consumers where results are provably unchanged, and documents the size-dependent performance so a future deprecation can be argued from data.
7. Future work
- Threshold tuning. Re-derive
kJacobiMaxDim/kFixedSVDMaxDimfrom a cross-platform (arm64 + x86_64) sweep of n = 5…20 and the fixed VDim=16 boundary; benchmark the untested 7×7 and probe whether LINPACK's mid-band edge is a hardware-alignment effect Eigen can match. vnl_svdAPI deprecation plan.nullvector/well_condition/singularitieshave zero ITK callers; remaining direct users are DTI reconstruction (PERF: Hoist constant tensor-basis SVD out of the per-voxel DTI loop #6482/PERF: Hoist constant tensor-basis SVD out of the per-voxel DTI loop (backport of #6482) #6483), theImageRegistration9example, and one numerics test — migrate those, then add the three-state deprecation guard.- Feed both into the broader SVD whitepaper.
8. References
- ITKv7 numerics direction — ENH: Umbrella — stage VNL → Eigen numerics migration for ITKv7 #6403
- Eigen3 third-party design / no transitive Eigen exposure — ENH: Master tracking — Eigen3 third-party design for ITK 6.x #6230
- Precedents:
vnl_fft→ PocketFFT (ENH: PocketFFT as a permissive, dependency-free default FFT backend #6441),vnl_matrix_exp→itk::Math::MatrixExponential(ENH: Add itk::Math::MatrixExponential (Eigen) + restore vnl_matrix_exp shim (#6452) #6454), eispack → Eigen eigensystems (ENH: Eigen-backed itk eigendecompositions; deprecate vnl_*_eigensystem and eispack #6475) - This PR — ENH: Add Eigen-backed itk::Math::SVD and migrate ITK's vnl_svd consumers #6487
|
This is ready for accepting comments. In addition to ITK internal consistent (non-regression) testing, some informal testing with ANTs and BRAINSTools has also been done. I'm working on a more formal test against a "forest" of downstream source trees (Slicer/ANTs/BRAINSTools) . |
23e72d3 to
dd4f7d6
Compare
This comment was marked as resolved.
This comment was marked as resolved.
Add an opt-in Eigen-backed singular value decomposition for square matrices, following the itk::Math pattern of MatrixExponential and QRDecomposition. Eigen appears only in the detail layer; the public API returns vnl matrices/vectors so no Eigen type is exposed (InsightSoftwareConsortium#6230). Fixed compile-time sizes use JacobiSVD with NoQRPreconditioner (valid and cheaper for square inputs) over a zero-copy Eigen::Map; runtime sizes select JacobiSVD for small n and BDCSVD for larger n; large fixed sizes fall back to the runtime path. A failed/non-finite decomposition throws via svd.info(). Sign canonicalization reuses the shared itk::detail::CanonicalizeColumnSignsPaired helper (extended here with a paired overload so U and V flip in lockstep), making results reproducible across builds and platforms -- an added benefit over vnl_svd at negligible cost. The result exposes vnl_svd's dominant operations -- pinverse(), Solve(), rank() and recompose(). Their default rcond truncates singular values at k*epsilon*max(W) rather than vnl_svd's keep-all-nonzero default; callers needing exact vnl equivalence pass rcond = 0. The two differ only for singular values within machine epsilon of zero, so the both-axes (accuracy and speed) equivalence criterion is very nearly met and is almost certainly not impactful on the validity of results. A v6 migration-guide notice describes the size-dependent performance (faster at 3x3/4x4 and large sizes, parity/slightly slower in the 8-16 band) and the deterministic-sign behavior. This adds the new API only and routes no existing caller, so vnl_svd is unchanged and not deprecated, and all baselines are unaffected.
…zation Route the square SVD orthogonalization sites to the new Eigen-backed itk::Math::SVD: the NIFTI sform/qform similarity check and non-orthogonal-sform fix-up (3x3), and Rigid2DTransform matrix-parameter extraction (2x2). These use only sign-invariant combinations -- U*V^T, the transpose of an orthogonal U in place of its inverse, and singular-value comparison -- so the geometric results are unchanged. The full test suite passes with no baseline changes.
…M solves Migrate three square SVD solve/inverse sites to the Eigen-backed itk::Math::SVD: - KernelTransform::ComputeWMatrix solves the square L system; vnl_svd's 1e-8 absolute zero-tolerance is reproduced as a relative rcond on pinverse(). - DisplacementFieldTransform inverse-position Jacobian: svd.inverse() becomes pinverse(), which also hoists the inverse out of the former per-element recomputation. - LinearSystemWrapperDenseVNL solves the square dense FEM system via Solve(). solve()/inverse() yield the unique least-squares / minimum-norm result independent of the SVD sign convention, so the outputs are unchanged; the full test suite passes (5971/5971).
Add a threshold-aware recompose() to the SVD result: reconstruct U diag(w) V^T with singular values at or below rcond*max(w) zeroed (a truncated, rank-reduced reconstruction). Route the QuadEdge decimation quadric helper -- its recompose(), rank() and solve() calls -- to itk::Math::SVD. vnl_svd's absolute (constructor) and relative (zero_out_relative) zero-tolerances are combined into one relative rcond. recompose/rank/solve are sign-invariant, so the decimation results are unchanged; the full test suite passes (5972/5972).
A runtime-sized vnl_matrix of any shape is now accepted. A single dimension comparison dispatches square inputs to the existing fast path (JacobiSVD + NoQRPreconditioner / BDCSVD) and rectangular inputs to BDCSVD with thin U/V (U is m x k, V is n x k, k = min(m, n)), so the square fast path is unchanged. The detail pinverse/recompose helpers and the shared CanonicalizeColumnSignsPaired helper now use each factor's own row count, making them shape-general; pinverse(), Solve(), rank() and recompose() work for rectangular inputs unchanged. The dynamic result struct is renamed SquareSVDResult -> SVDResult to reflect both shapes. Fixed-size overloads remain square-only by signature. Tests cover tall/wide reconstruction, pinverse and least-squares Solve vs vnl_svd.
Route the rectangular lattice-refinement solves in BSplineControlPointImageFilter and BSplineScatteredDataPointSetToImageFilter to itk::Math::SVD: vnl_svd(R).solve(S) becomes itk::Math::SVD(R).pinverse() * S (the least-squares solution for the rectangular refinement matrix R). The result is sign-invariant, so the refined coefficients are unchanged; the full test suite passes (5973/5973).
Rename the FixedSquareSVDResult and SVDResult members pinverse/rank/ recompose to PseudoInverse/Rank/Recompose, matching ITK method-naming style and the already-PascalCase Solve(). Update the GoogleTest, the KernelTransform, DisplacementField, B-spline control-point and scattered-data, and QuadEdge decimation consumers, and the migration guide. Drop the unmeasured 7x7 row and its speculative note from the benchmark table; the mid-size hardware-alignment caveat stays on the 8x8/16x16 rows that carry measurements.
dd4f7d6 to
5612174
Compare
|
Downstream indicator testing (ANTs + BRAINSTools) — no regressions; no perf change resolvable above noise. I built two downstream consumers against this PR's base and tip and compared. Net: this change is safe downstream (no correctness regressions); a performance effect, if any, is below what this setup can measure. Setup (clean isolation)This PR's merge-base is the
Method: serial Correctness — no regressions
Performance — within measurement noiseEvery SVD-relevant per-test delta is smaller than that test's CV (12–25% on the heavy deformable tests). E.g. Caveat worth stating: neither consumer was modified to call |
Adds an opt-in, header-only Eigen-backed
itk::Math::SVD(square and rectangular) and migrates every square and rectangularvnl_svdconsumer in ITK proper onto it. All migrations use sign-invariant operations (U·Vᵀ,pinverse,Solve,recompose), so test baselines are unchanged. This is additive —vnl_svdis not deprecated here. Part of the vnl→Eigen numerics direction (#6403, #6230); follows theitk::Mathpattern ofMatrixExponential/QRDecomposition.What's included
Modules/Core/Common/include/itkMathSVD.h:itk::Math::SVDreturningU/W/Vas vnl types pluspinverse(),Solve(),rank(),recompose(). Eigen appears only in thedetaillayer (no Eigen type in any public signature). Fixed square sizes useJacobiSVD+NoQRPreconditionerover a zero-copyEigen::Map; runtime square uses Jacobi (small n) /BDCSVD(large n); rectangular usesBDCSVDthin U/V. Throws on a non-finite decomposition viasvd.info().itkEigenDecompositionSignConvention.h), default on, for cross-platform/SIMD-stable signs.rcond=0reproduces vnl's keep-all default,1e-8/wmaxreproduces an absolute 1e-8):Rigid2DTransform,NiftiImageIO(orientation),KernelTransform,DisplacementFieldTransform,FEM LinearSystemWrapperDenseVNL,QuadEdgeMeshDecimationQuadricElementHelper, and both B-spline lattice-refinement filters.itkMathSVDGTest.cxx(square, rectangular, ill-conditioned/Hilbert, degenerate spectrum, pinverse/Solve/rank/recompose vsvnl_svd, canonicalize toggle) and a v6 migration-guide section.Performance (equal accuracy)
Idle, single-thread, AppleClang arm64; ratio > 1 = Eigen faster:
Accuracy is machine-precision-equivalent to
vnl_svd(BDCSVD marginally better at large sizes).Verification