PERF: Hoist constant tensor-basis SVD out of the per-voxel DTI loop#6482
Conversation
93f7ddb to
a502cac
Compare
a502cac to
36b2284
Compare
|
| Filename | Overview |
|---|---|
| Modules/Filtering/DiffusionTensorImage/include/itkDiffusionTensor3DReconstructionImageFilter.h | Adds m_TensorBasisInverse member with a 2-line comment that exceeds the prose budget; the Doxygen \warning about single-threading was not removed, leaving user-visible documentation contradicting the fix. |
| Modules/Filtering/DiffusionTensorImage/include/itkDiffusionTensor3DReconstructionImageFilter.hxx | Hoists vnl_svd::pinverse() from the per-voxel loop into ComputeTensorBasis(); both code paths updated symmetrically and math is correct. New 3-line comment is prose-budget-over-limit but otherwise the logic is sound. |
| Modules/Filtering/DiffusionTensorImage/test/itkDiffusionTensor3DReconstructionSingleImageTest.cxx | New test exercising the SetGradientImage (single-VectorImage) path and cross-checking it against AddGradientImage output; sanity checks are appropriate and the 1e-8 tolerance is correct for double precision. |
| Modules/Filtering/DiffusionTensorImage/test/CMakeLists.txt | Correctly registers the new test with itk_add_test; no issues. |
Sequence Diagram
%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant U as User code
participant F as Filter::Update()
participant B as BeforeThreadedGenerateData
participant C as ComputeTensorBasis
participant T as DynamicThreadedGenerateData (N threads)
U->>F: Update()
F->>B: BeforeThreadedGenerateData()
B->>C: ComputeTensorBasis()
Note over C: Builds m_BMatrix, m_TensorBasis
C->>C: "m_TensorBasisInverse = vnl_svd(m_TensorBasis).pinverse()"
Note over C: Single SVD computed ONCE (was per-voxel)
C-->>B: return
B-->>F: return
F->>T: spawn N threads
loop per voxel
T->>T: "D = m_TensorBasisInverse * B (matvec only, thread-safe)"
end
T-->>F: results merged
F-->>U: output tensor image
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant U as User code
participant F as Filter::Update()
participant B as BeforeThreadedGenerateData
participant C as ComputeTensorBasis
participant T as DynamicThreadedGenerateData (N threads)
U->>F: Update()
F->>B: BeforeThreadedGenerateData()
B->>C: ComputeTensorBasis()
Note over C: Builds m_BMatrix, m_TensorBasis
C->>C: "m_TensorBasisInverse = vnl_svd(m_TensorBasis).pinverse()"
Note over C: Single SVD computed ONCE (was per-voxel)
C-->>B: return
B-->>F: return
F->>T: spawn N threads
loop per voxel
T->>T: "D = m_TensorBasisInverse * B (matvec only, thread-safe)"
end
T-->>F: results merged
F-->>U: output tensor image
Comments Outside Diff (1)
-
Modules/Filtering/DiffusionTensorImage/include/itkDiffusionTensor3DReconstructionImageFilter.h, line 109-116 (link)Stale
\warningblock contradicts the PR's goalThe
\warningtelling users to callSetNumberOfWorkUnits(1)because of thenetlib/dsvdcthread-safety issue is now wrong. This PR fixes the root cause by moving the SVD intoBeforeThreadedGenerateData(), so any user who reads this warning and acts on it will run single-threaded and lose the entire performance benefit. The.hxxinline comment was removed, but the user-visible Doxygen block (the one that renders in IDE tooltips and generated docs) was not. It should be removed.
Reviews (1): Last reviewed commit: "ENH: Add single-image-path test for DTI ..." | Re-trigger Greptile
thewtex
left a comment
There was a problem hiding this comment.
greptile has style suggestions
…t-svd-r54 PERF: Hoist constant tensor-basis SVD out of the per-voxel DTI loop (backport of #6482)
DiffusionTensor3DReconstructionImageFilter constructed a vnl_svd of the constant m_TensorBasis once per voxel (~16.7M times for a 256^3 DWI) to solve the Stejskal-Tanner equations. The SVD is the same for every voxel, so compute it once in BeforeThreadedGenerateData and reuse it via solve() in the per-voxel loop. solve() is const, so concurrent per-voxel calls are safe. The reconstructed tensors are bit-identical to the previous per-voxel construction. Removing the per-voxel SVD also frees the loop of the thread-unsafe netlib dsvdc call, so the stale single-thread warning is dropped.
Replace the per-voxel vnl_svd::solve() with a single matrix-vector product against the precomputed pseudo-inverse (the dual tensor basis). This is faster per voxel: one matrix-vector product instead of solve()'s three. The result is no longer bit-identical to the previous engine: pinverse()*b forms the inverse once, whereas solve() applies V*W^-1*U^T*b sequentially, so the floating-point operation order differs. The difference is ~1e-15 relative, far below any meaningful threshold, and all reconstruction tests pass unchanged.
The existing DiffusionTensor3DReconstructionImageFilter test only exercises the multi-image gradient path (AddGradientImage). Add a test that reconstructs the same data through the single-VectorImage path (SetGradientImage) and verifies it produces identical tensors, covering the previously-untested code path.
afa06ad to
c68809c
Compare
7b023f3
into
InsightSoftwareConsortium:main
DiffusionTensor3DReconstructionImageFilterconstructed avnl_svdof the constantm_TensorBasisonce per voxel (~16.7M times for a 256³ DWI) to solve the Stejskal–Tanner equations. The pseudo-inverse (the dual tensor basis) is identical for every voxel, so this precomputes it once inBeforeThreadedGenerateDataand applies it per voxel as a matrix-vector product. Sincevnl_svd::solve(X)ispinverse()·X, the reconstructed tensors are unchanged. Removing the per-voxel SVD also frees the loop of the thread-unsafe netlibdsvdccall, so the stale single-thread warning is dropped.Verification
itkDiffusionTensor3DReconstructionImageFilterTestpasses (it reconstructs and checks the output tensor at voxel (3,3,3) against expected values at 1e-4). A mutation check (scaling the precomputed pseudo-inverse by 1.5) makes that test fail, confirming it exercises the changed path.Note: the unit test exercises the
AddGradientImage(many-images) path and the shared precompute; theSetGradientImage(single-image) path uses the byte-identical loop edit but has no direct unit-test coverage today.