Skip to content

ENH: Add Eigen-backed itk::Math::SVD and migrate ITK's vnl_svd consumers#6487

Open
hjmjohnson wants to merge 7 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:add-itk-math-svd
Open

ENH: Add Eigen-backed itk::Math::SVD and migrate ITK's vnl_svd consumers#6487
hjmjohnson wants to merge 7 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:add-itk-math-svd

Conversation

@hjmjohnson

Copy link
Copy Markdown
Member

Adds an opt-in, header-only Eigen-backed itk::Math::SVD (square and rectangular) and migrates every square and rectangular vnl_svd consumer in ITK proper onto it. All migrations use sign-invariant operations (U·Vᵀ, pinverse, Solve, recompose), so test baselines are unchanged. This is additivevnl_svd is not deprecated here. Part of the vnl→Eigen numerics direction (#6403, #6230); follows the itk::Math pattern of MatrixExponential/QRDecomposition.

What's included
  • New API Modules/Core/Common/include/itkMathSVD.h: itk::Math::SVD returning U/W/V as vnl types plus pinverse(), Solve(), rank(), recompose(). Eigen appears only in the detail layer (no Eigen type in any public signature). Fixed square sizes use JacobiSVD+NoQRPreconditioner over a zero-copy Eigen::Map; runtime square uses Jacobi (small n) / BDCSVD (large n); rectangular uses BDCSVD thin U/V. Throws on a non-finite decomposition via svd.info().
  • Sign canonicalization shared with the eigendecomposition convention (itkEigenDecompositionSignConvention.h), default on, for cross-platform/SIMD-stable signs.
  • Migrated consumers (all sign-invariant; bit-faithful tolerance reproduction — rcond=0 reproduces vnl's keep-all default, 1e-8/wmax reproduces an absolute 1e-8): Rigid2DTransform, NiftiImageIO (orientation), KernelTransform, DisplacementFieldTransform, FEM LinearSystemWrapperDenseVNL, QuadEdgeMeshDecimationQuadricElementHelper, and both B-spline lattice-refinement filters.
  • GoogleTest itkMathSVDGTest.cxx (square, rectangular, ill-conditioned/Hilbert, degenerate spectrum, pinverse/Solve/rank/recompose vs vnl_svd, canonicalize toggle) and a v6 migration-guide section.
Performance (equal accuracy)

Idle, single-thread, AppleClang arm64; ratio > 1 = Eigen faster:

size Eigen vs vnl_svd
3×3 / 4×4 ~2.2× / ~2.0× faster
6×6 ~parity
8×8, 16×16 ~0.5–0.8× (vnl_svd/LINPACK wins; seldom used in ITK)
≥50×50 ~1.5–2.3× faster (BDCSVD)
rectangular (wide / large-tall) ~1.5–5.6× / ~2× faster

Accuracy is machine-precision-equivalent to vnl_svd (BDCSVD marginally better at large sizes).

Verification
  • Local (macOS / AppleClang) and cortex (Linux / gcc 13.3): both build warning-clean for the changed code.
  • Linux full default-module suite: 4487/4487 passed, 0 failed. Local full suite (incl. remote modules + Python): 5973/5973.
  • Multiple max-effort code-review passes (correctness, removed-behavior, cross-file, C++/Eigen pitfalls); migrations independently verified faithful.

@github-actions github-actions Bot added type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Enhancement Improvement of existing methods or implementation type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct area:Core Issues affecting the Core module area:Filtering Issues affecting the Filtering module area:IO Issues affecting the IO module area:Numerics Issues affecting the Numerics module area:Documentation Issues affecting the Documentation module labels Jun 22, 2026
@hjmjohnson

hjmjohnson commented Jun 22, 2026

Copy link
Copy Markdown
Member Author

itk::Math::SVD — an Eigen-backed SVD for ITK · PR #6487 whitepaper

PR-scoped whitepaper; same global goals as the umbrella VNL/netlib → Eigen effort (#6403, #6230).

Headline. Adds an opt-in, header-only Eigen-backed SVD (square and rectangular) and migrates all 8 ITK vnl_svd consumers onto it via sign-invariant operations, so no test baseline changes. Accuracy is machine-precision-equivalent to vnl_svd; performance favours ITK's common paths (3×3/4×4 ≈ 2×, ≥50×50 and wide shapes 1.5–5.6×) and loses only the rarely-used 8×8–16×16 dense band. Validated 0 warnings / 0 regressions on AppleClang (5973/5973) and gcc 13.3 (4487/4487). Additive — vnl_svd is unchanged.

Details below are collapsed to keep the PR page light — expand any section.

1. Goals

Global goal (shared with the umbrella effort). Move ITK's numeric base layer from vendored VNL/VXL + netlib onto Eigen, exposed through an itk:: API that never leaks Eigen types into public signatures (#6230), for a faster, actively-maintained backend.

This PR's scope.

  1. Provide an Eigen-backed SVD with the dominant vnl_svd operations (pinverse, Solve, rank, recompose) behind a vnl-typed interface.
  2. Migrate ITK's own vnl_svd consumers where it can be done without changing numerical results (§3).
  3. Quantify accuracy and performance to inform the eventual vnl_svd deprecation decision (§6), without deprecating anything yet.
2. Background & motivation

vnl_svd wraps a vendored netlib/LINPACK SVD. It is correct but: (a) its singular-vector signs depend on solver internals, so results can differ across platforms/SIMD width; (b) the netlib implementation is unmaintained; (c) it carries a large transitive VXL surface. Eigen offers two modern engines — JacobiSVD (accurate for small matrices) and BDCSVD (divide-and-conquer, faster/more accurate at scale, handles rectangular) — already vendored in ITK.

The central difficulty is the sign ambiguity: for A = U Σ Vᵀ, the pair (uᵢ, vᵢ) may be replaced by (−uᵢ, −vᵢ) without changing A. A naïve engine swap perturbs every consumer that reads U/V columns, churning baselines. §3.3 avoids that entirely.

3. Design (API, engine dispatch [Figure 1], sign-invariance, zero-copy)

3.1 API surface

auto svd = itk::Math::SVD(A);          // A: itk::Matrix, vnl_matrix_fixed, or vnl_matrix
svd.U;  svd.W;  svd.V;                  // vnl types — no Eigen in the signature
svd.pinverse(rcond);                    // Moore–Penrose A⁺
svd.Solve(b, rcond);                    // least-squares / minimum-norm x
svd.rank(rcond);                        // numerical rank
svd.recompose(rcond);                   // U·diag(W)·Vᵀ with small σ truncated

rcond < 0 ⇒ auto threshold k·ε·max(W); rcond = 0 ⇒ keep every nonzero σ (reproduces vnl_svd's default, §5.2).

3.2 Engine selection (Figure 1)

flowchart TD
  A["itk::Math::SVD(A)"] --> B{compile-time fixed?}
  B -->|"fixed, VDim ≤ 16"| C["JacobiSVD + NoQRPreconditioner<br/>stack temporaries, zero-copy Map"]
  B -->|"fixed, VDim &gt; 16"| D[runtime path]
  B -->|runtime vnl_matrix| D
  D --> E{shape}
  E -->|"square, n ≤ 6"| F["JacobiSVD + NoQRPreconditioner (full U/V)"]
  E -->|"square, n &gt; 6"| G["BDCSVD (full U/V)"]
  E -->|rectangular| H["BDCSVD (thin U/V): U is m×k, V is n×k, k=min(m,n)"]
  C --> Z["canonicalize signs (default) → vnl U/W/V"]
  F --> Z
  G --> Z
  H --> Z
Loading

Figure 1. A single integer comparison routes each input to the cheapest valid engine; the square fast path is unperturbed by rectangular support. Thresholds kJacobiMaxDim = 6 and kFixedSVDMaxDim = 16 are flagged for further study (§7).

3.3 Sign canonicalization and the four invariant operations

Signs are canonicalized by default (largest-magnitude entry of each U column made positive, flip mirrored onto V) using the shared eigendecomposition convention → build/SIMD-stable output. Every operation used by the migrated consumers is sign-invariant:

Operation Why sign-invariant
U·Vᵀ (polar / closest rotation) (−uᵢ)(−vᵢ)ᵀ = uᵢvᵢᵀ
pinverse = V Σ⁺ Uᵀ paired sign flip cancels
Solve = V Σ⁺ Uᵀ b paired sign flip cancels
recompose = U Σ Vᵀ paired sign flip cancels

This is the key enabling migration of real code with zero baseline churn.

3.4 Zero-copy data path

ITK/vnl storage is row-major and contiguous; Eigen defaults to column-major. We map the existing buffer with Eigen::Map<…, Eigen::RowMajor> (no allocation, no copy) and let Eigen's storage-order-aware assignment fill the vnl output.

4. Experiments (accuracy [T1], square perf [T2/Fig 2], rectangular perf [T3])

4.1 Accuracy

Check Result
Reconstruction ‖UΣVᵀ − A‖ ≤ machine precision
Singular values vs vnl_svd agree to ε·max(W)
pinverse vs vnl_svd (tall/wide) rel. 1e-15…8e-15
Moore–Penrose A·A⁺·A = A holds
Large-size accuracy BDCSVD marginally better than vnl_svd

Table 1. Accuracy validation (GoogleTest itkMathSVDGTest.cxx); ill-conditioned (6×6 Hilbert, κ≈1e7) and degenerate (repeated σ) spectra tested explicitly.

4.2 Performance — square (single-thread, idle host, AppleClang arm64)

Size Eigen ÷ vnl_svd Winner
3×3 2.2× Eigen
4×4 2.0× Eigen
6×6 ~1.0× parity
8×8 ~0.7× vnl_svd/LINPACK
16×16 ~0.6× vnl_svd/LINPACK
≥50×50 1.5–2.3× Eigen (BDCSVD)

Table 2. Square SVD throughput ratio (>1 = Eigen faster).

Figure 2 — Square speedup (Eigen ÷ vnl_svd), single-thread, idle
            0.5×        1.0×(parity)        2.0×
             |             |                 |
   3×3   ████████████████████████████████████████  2.2×   ▲ Eigen
   4×4   ████████████████████████████████████       2.0×   │ wins
   6×6   ██████████████████                          1.0×  ── parity
   8×8   ████████████                                0.7×   │ vnl
  16×16  ██████████                                  0.6×   ▼ wins
  50×50  ███████████████████████████                 1.5×   ▲ Eigen
 200×200 ████████████████████████████████████████    2.3×   │ wins

The only band where Eigen loses (8×8–16×16 dense square) is seldom — perhaps never — exercised in ITK, whose hot paths are 3×3/4×4 transforms and image-orientation (Eigen ~2×) plus occasional large dense systems (BDCSVD wins). LINPACK may hold a hardware-alignment advantage at the mid band, not yet explored (§7).

4.3 Performance — rectangular (single-thread, load-noted)

Shape Eigen ÷ vnl_svd pinverse err
10×3 0.88× 1e-15
20×6 0.74× 1e-15
50×10 0.89× 2e-15
100×20 1.96× 4e-15
200×50 2.29× 8e-15
3×10 (wide) 1.51× 5e-16
10×40 (wide) 5.57× 2e-15

Table 3. Rectangular BDCSVD (thin U/V) vs vnl_svd pinverse — Eigen wins larger-tall and all wide shapes; loses only tiny-tall where setup dominates. (Methodology: rectangular measured at host load ~13.5 — read as ratios; square set measured idle.)

5. Migration & validation (sites [T4], tolerance proofs [T5], build/test matrix [T6], review)

5.1 Migrated call sites

File Shape Operation Invariance
itkRigid2DTransform.hxx 2×2 U·Vᵀ (orthogonalize) sign-invariant
itkNiftiImageIO.cxx 3×3 U·Uᵀ, U·Vᵀ, min(W) sign-invariant
itkKernelTransform.hxx square pinverse (abs 1e-8 tol) sign-indep.
itkDisplacementFieldTransform.hxx square pinverse (per-voxel inverse) sign-indep.
itkFEMLinearSystemWrapperDenseVNL.cxx square Solve sign-indep.
itkQuadEdgeMeshDecimationQuadricElementHelper.h small square recompose/rank/Solve sign-indep.
itkBSplineControlPointImageFilter.hxx rectangular pinverse · S sign-indep.
itkBSplineScatteredDataPointSetToImageFilter.hxx rectangular pinverse · S sign-indep.

Table 4. Every square and rectangular vnl_svd consumer in ITK proper.

5.2 Tolerance faithfulness

Original vnl_svd form Migrated form Equivalence
vnl_svd(A) (default zero_out_tol=0) pinverse(0) / Solve(b,0) tol = 0·max(W) = 0 ⇒ keep all nonzero σ
vnl_svd(L, 1e-8) pinverse(1e-8/W[0]) rcond·max(W) = 1e-8 (absolute)
ctor absTol + zero_out_relative(relTol) rcond = max(absTol/W[0], relTol) in-place zeroing ⇒ net max(absTol, relTol·σmax)
singularities()==0 (tol 1e-8) W.min_value() > 1e-8 identical boundary

Table 5. Tolerance reproduction (independently verified in code review).

5.3 Build & test matrix

Platform / compiler Build Warnings (changed code) Tests
macOS / AppleClang arm64 ✅ exit 0 0 full 5973/5973
Linux (cortex) / gcc 13.3 ✅ exit 0 0 default 4487/4487

Table 6. Two-compiler validation. (The 4 macOS warnings observed are pre-existing -Wunused-parameter in the RANSAC remote module, independent of this PR.) CI on #6487 extends coverage to MSVC/Windows, ARM, and the Azure pipelines.

5.4 Code review

Multiple maximum-effort multi-agent passes (line-by-line, removed-behavior, cross-file, C++/Eigen pitfalls, conventions) found zero live correctness bugs; the removed-behavior pass independently re-verified all eight migrations faithful, and two plausible findings were refuted on inspection.

6. Discussion — the deprecation gate

The umbrella rule for replacing a vnl_* engine with Eigen is that Eigen must win both accuracy and speed. Here:

  • Accuracy: Eigen ≥ vnl_svd everywhere (equal, better when ill-conditioned). ✔
  • Speed: size-mixed — wins small (≤4×4), large (≥50×50), all wide; loses the 8×8–16×16 dense band. ✘ (not a clean sweep)

Therefore this PR does not deprecate or re-back vnl_svd. It ships the Eigen path as the opt-in itk::Math::SVD, migrates ITK consumers where results are provably unchanged, and documents the size-dependent performance so a future deprecation can be argued from data.

7. Future work
8. References

@hjmjohnson

Copy link
Copy Markdown
Member Author

This is ready for accepting comments. In addition to ITK internal consistent (non-regression) testing, some informal testing with ANTs and BRAINSTools has also been done. I'm working on a more formal test against a "forest" of downstream source trees (Slicer/ANTs/BRAINSTools) .

@dzenanz dzenanz left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks OK.

Comment thread Modules/Core/Common/include/itkMathSVD.h Outdated
Comment thread Documentation/docs/migration_guides/itk_6_migration_guide.md Outdated
@hjmjohnson hjmjohnson marked this pull request as ready for review June 23, 2026 01:54
@greptile-apps

This comment was marked as resolved.

Comment thread Modules/Core/Common/include/itkEigenDecompositionSignConvention.h Outdated
Comment thread Modules/Core/Common/include/itkMathSVD.h
Comment thread Modules/Core/Common/include/itkMathSVD.h Outdated
Add an opt-in Eigen-backed singular value decomposition for square matrices,
following the itk::Math pattern of MatrixExponential and QRDecomposition. Eigen
appears only in the detail layer; the public API returns vnl matrices/vectors so
no Eigen type is exposed (InsightSoftwareConsortium#6230).

Fixed compile-time sizes use JacobiSVD with NoQRPreconditioner (valid and cheaper
for square inputs) over a zero-copy Eigen::Map; runtime sizes select JacobiSVD for
small n and BDCSVD for larger n; large fixed sizes fall back to the runtime path.
A failed/non-finite decomposition throws via svd.info(). Sign canonicalization
reuses the shared itk::detail::CanonicalizeColumnSignsPaired helper (extended here
with a paired overload so U and V flip in lockstep), making results reproducible
across builds and platforms -- an added benefit over vnl_svd at negligible cost.

The result exposes vnl_svd's dominant operations -- pinverse(), Solve(), rank()
and recompose(). Their default rcond truncates singular values at k*epsilon*max(W)
rather than vnl_svd's keep-all-nonzero default; callers needing exact vnl
equivalence pass rcond = 0. The two differ only for singular values within machine
epsilon of zero, so the both-axes (accuracy and speed) equivalence criterion is
very nearly met and is almost certainly not impactful on the validity of results.
A v6 migration-guide notice describes the size-dependent performance (faster at
3x3/4x4 and large sizes, parity/slightly slower in the 8-16 band) and the
deterministic-sign behavior.

This adds the new API only and routes no existing caller, so vnl_svd is unchanged
and not deprecated, and all baselines are unaffected.
…zation

Route the square SVD orthogonalization sites to the new Eigen-backed
itk::Math::SVD: the NIFTI sform/qform similarity check and non-orthogonal-sform
fix-up (3x3), and Rigid2DTransform matrix-parameter extraction (2x2). These use
only sign-invariant combinations -- U*V^T, the transpose of an orthogonal U in
place of its inverse, and singular-value comparison -- so the geometric results
are unchanged. The full test suite passes with no baseline changes.
…M solves

Migrate three square SVD solve/inverse sites to the Eigen-backed itk::Math::SVD:

- KernelTransform::ComputeWMatrix solves the square L system; vnl_svd's 1e-8
  absolute zero-tolerance is reproduced as a relative rcond on pinverse().
- DisplacementFieldTransform inverse-position Jacobian: svd.inverse() becomes
  pinverse(), which also hoists the inverse out of the former per-element
  recomputation.
- LinearSystemWrapperDenseVNL solves the square dense FEM system via Solve().

solve()/inverse() yield the unique least-squares / minimum-norm result
independent of the SVD sign convention, so the outputs are unchanged; the full
test suite passes (5971/5971).
Add a threshold-aware recompose() to the SVD result: reconstruct U diag(w) V^T
with singular values at or below rcond*max(w) zeroed (a truncated, rank-reduced
reconstruction). Route the QuadEdge decimation quadric helper -- its recompose(),
rank() and solve() calls -- to itk::Math::SVD. vnl_svd's absolute (constructor)
and relative (zero_out_relative) zero-tolerances are combined into one relative
rcond. recompose/rank/solve are sign-invariant, so the decimation results are
unchanged; the full test suite passes (5972/5972).
A runtime-sized vnl_matrix of any shape is now accepted. A single dimension
comparison dispatches square inputs to the existing fast path (JacobiSVD +
NoQRPreconditioner / BDCSVD) and rectangular inputs to BDCSVD with thin U/V (U is
m x k, V is n x k, k = min(m, n)), so the square fast path is unchanged. The
detail pinverse/recompose helpers and the shared CanonicalizeColumnSignsPaired
helper now use each factor's own row count, making them shape-general; pinverse(),
Solve(), rank() and recompose() work for rectangular inputs unchanged. The dynamic
result struct is renamed SquareSVDResult -> SVDResult to reflect both shapes.
Fixed-size overloads remain square-only by signature. Tests cover tall/wide
reconstruction, pinverse and least-squares Solve vs vnl_svd.
Route the rectangular lattice-refinement solves in BSplineControlPointImageFilter
and BSplineScatteredDataPointSetToImageFilter to itk::Math::SVD: vnl_svd(R).solve(S)
becomes itk::Math::SVD(R).pinverse() * S (the least-squares solution for the
rectangular refinement matrix R). The result is sign-invariant, so the refined
coefficients are unchanged; the full test suite passes (5973/5973).
Rename the FixedSquareSVDResult and SVDResult members pinverse/rank/
recompose to PseudoInverse/Rank/Recompose, matching ITK method-naming
style and the already-PascalCase Solve(). Update the GoogleTest, the
KernelTransform, DisplacementField, B-spline control-point and
scattered-data, and QuadEdge decimation consumers, and the migration
guide.

Drop the unmeasured 7x7 row and its speculative note from the benchmark
table; the mid-size hardware-alignment caveat stays on the 8x8/16x16
rows that carry measurements.
@hjmjohnson

Copy link
Copy Markdown
Member Author

Downstream indicator testing (ANTs + BRAINSTools) — no regressions; no perf change resolvable above noise.

I built two downstream consumers against this PR's base and tip and compared. Net: this change is safe downstream (no correctness regressions); a performance effect, if any, is below what this setup can measure. ⚠️ These are indicator results, not a rigorous benchmark — shared host, test processes at nice=5, small N — so treat the timings as directional only.

Setup (clean isolation)

This PR's merge-base is the main tip used as baseline (fb2e7fc907), so the only delta is this PR's commits. Consumer source byte-identical on both sides; ITK configured identically (incl. ITKVtkGlue, ITKIODCMTK).

consumer version SVD-relevant path exercised
ANTs d2fbf8bd antsRegistration SyN/elastic → DisplacementFieldTransform
BRAINSTools ff4c121a BRAINSFit BSpline/SyN → DisplacementFieldTransform; BCD/GTRACT

Method: serial ctest -j1, 3–8 repeats, median + run-to-run CV; a delta counts only if it exceeds that test's CV.

Correctness — no regressions
  • BRAINSFit: 37/37 pass on both base and PR.
  • BRAINSTools BCD+GTRACT: identical pass/fail (same 2 pre-existing DICOM-fixture failures).
  • ANTs registration: only pre-existing/flaky failures, present on stock main too (ANTS_SYN_WITH_TIME, near-threshold *RotationMasks); no new failure attributable to this PR.
Performance — within measurement noise

Every SVD-relevant per-test delta is smaller than that test's CV (12–25% on the heavy deformable tests). E.g. ANTS_ROT_GSYN pooled samples are near-identical (both means ≈13.7 s); BRAINSFitTest_BSpline* trend faster (median −6% to −14%) but do not clear the noise floor. Cool/linear paths are flat (≤2%).

Caveat worth stating: neither consumer was modified to call itk::Math::SVD directly — they benefit only transitively through ITK-internal call sites. Any larger downstream win would require porting their hot loops to the new interface (not done here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Core Issues affecting the Core module area:Documentation Issues affecting the Documentation module area:Filtering Issues affecting the Filtering module area:IO Issues affecting the IO module area:Numerics Issues affecting the Numerics module type:Enhancement Improvement of existing methods or implementation type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants