Skip to content

WIP: CI TESTING ENH: Two-level CastXML/igenerator build cache + Python CI workflow#6486

Open
hjmjohnson wants to merge 9 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:ci/linux-azure-disk-management
Open

WIP: CI TESTING ENH: Two-level CastXML/igenerator build cache + Python CI workflow#6486
hjmjohnson wants to merge 9 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:ci/linux-azure-disk-management

Conversation

@hjmjohnson

Copy link
Copy Markdown
Member

WIP CI testing for two-level CastXML/igenerator build cache and new Python CI workflow.

What this branch contains

Six commits:

  1. COMP: Free preinstalled software in Linux Azure CI — pre-existing disk management.
  2. ENH: Two-level content-addressed CastXML/igenerator build cache — L1 (per-build, no subprocess) → L2 (content-addressed, cross-directory sharing). Key algorithm v3, gzip storage (~253 MB for 807-module build).
  3. ENH: Multi-path cascade + gzip L2 storeITK_WRAP_CACHE accepts colon-separated list; gzip default; LRU eviction via background fork.
  4. STYLE: Rename _CACHE_FMT_KEY_VERSION — clarifies that the constant is the key-algorithm version salt, not a storage-format flag.
  5. ENH: Simplify CastXML cache restore; default ITK_WRAP_CASTXML_CACHE to ON — remove hardlink path (copy-only); default the CMake option to ON so new build directories benefit automatically.
  6. ENH: Add Python CI workflow with persistent CastXML and ccache stores — new .github/workflows/python.yml (ITK.Pixi.Python on ubuntu-24.04 / windows-2022 / macos-15) with actions/cache restore/save for both ccache (ccache-v4) and castxml (castxml-v1). New pixi tasks: configure-python-ci, build-python-ci, test-python-ci.
Benchmark results (gyrus, 72-core, SSD only)
Build Description Time
B1 no-cache baseline 7m19s
B2 cold — seeds gzip cache 9m30s (+29%)
B3 warm (L1→L2→decompress) 7m25s (~0%)
Cache 806 entries 253M gzip

On a 72-core machine castxml is not on the critical path (SWIG/linking dominate), so warm speedup is near zero locally. On 4-core CI runners castxml is ~6.7 min wall vs ~22 s on 72 cores, so the cache is expected to provide substantial speedup there.

Ubuntu-22.04 and ubuntu-24.04 hosted agents ship Android SDK (~9 GB),
Haskell/GHCup (~5 GB), .NET (~2-3 GB), Swift (~1.5 GB), CodeQL (~2 GB),
and Boost headers (~1.2 GB). ITK's Linux builds use none of these;
removing them at job start recovers ~20 GB before checkout, ccache
restore, and the build itself consume disk.
@hjmjohnson hjmjohnson marked this pull request as ready for review June 21, 2026 23:49
@github-actions github-actions Bot added type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots area:Python wrapping Python bindings for a class type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct labels Jun 21, 2026
@greptile-apps

This comment was marked as resolved.

Comment thread Wrapping/Generators/CastXML/itk-castxml-cache.py
Comment thread .github/workflows/python.yml
Comment thread .github/workflows/python.yml
Comment thread pyproject.toml
@github-actions github-actions Bot added the area:IO Issues affecting the IO module label Jun 22, 2026
Add ITK_WRAP_CASTXML_CACHE option (default OFF).  Wraps castxml
with a two-level cache:

  L1 (no subprocess): sha256 of binary content-hash + inc + cxx
  L2 (content-only):  sha256 of castxml -E output, markers stripped

L1 hit restores gzip-compressed XML with no castxml process.
L2 keys are path-independent; worktrees share the same store.
Binary fingerprinted by content hash so ninja -t clean reuses L1.
LRU eviction via background fork; 2 GiB cap (ITK_WRAP_CACHE_MAX_SIZE).

igenerator.py gains matching LRU eviction and bypass flag.
…cache

Extend ITK_WRAP_CACHE to a colon-separated list of roots (like PATH).
Reads search each root in order; writes go to the first that accepts an
atomic rename.  A read-only shared NFS cache can follow a writable SSD:
  export ITK_WRAP_CACHE=/local/ssd/cache:/nfs/lab/shared-cache
Students get L2 hits from the shared cache while storing L1 maps locally.

Add ITK_WRAP_CACHE_FORMAT=uncompressed: stores plain XML and restores
via os.link() when cache and build share a filesystem, so A/B/C/D test
builds each cost one L2 inode rather than N copies.  Falls back to
shutil.copy2() on cross-device links.  gzip remains the default.

Unlink output_xml before a full castxml run to sever any prior hardlink
to the L2 store so castxml cannot corrupt a shared inode.
The constant is a key-algorithm version salt, not a storage format
descriptor.  Renaming clarifies that it belongs to the hash key
computation and should not change when the storage format changes.
…E to ON

Remove the hardlink restore path from _restore_xml() — shutil.copy2()
is sufficient; disk space is not constrained enough to justify the
POSIX-only os.link() complexity and cross-device fallback.  gzip
remains the default storage format (~253 MB for a full 807-module
build vs 2.2 G uncompressed).

Default ITK_WRAP_CASTXML_CACHE to ON so new build directories benefit
from cross-dir L2 sharing without manual configuration.  The cache
location defaults to ~/.cache/itk-wrap; CI overrides via ITK_WRAP_CACHE.
Add .github/workflows/python.yml (ITK.Pixi.Python) to run the Python
wrapping build on ubuntu-24.04, windows-2022, and macos-15.  Mirror
the ccache persistence pattern from Pixi-Cxx: restore before configure,
save (if !cancelled) after build.

Add a second castxml-v1 cache restore/save pair pointing at
${{ runner.temp }}/itk-castxml-cache, passed to the build via
ITK_WRAP_CACHE.  On a cold run the cache is seeded; on a warm run
castxml is skipped for all 807 wrapped types — measured 6m37s vs 9m30s
on a 72-core machine, larger speedup expected on 4-core CI runners
where castxml is on the critical path.

Add configure-python-ci, build-python-ci, and test-python-ci pixi
tasks that mirror their non-CI counterparts but pass
-DITK_WRAP_CASTXML_CACHE:BOOL=ON explicitly.
Add ITK_WRAP_CACHE pipeline variable and a Cache@2 restore task
(castxml-v1 key) to ITK.Linux.Python, ITK.macOS.Python, and
ITK.Windows.Python.  The Cache@2 task mirrors the existing ccache
pattern: restore before the build step, Azure DevOps automatically
saves on post-job when the path is non-empty.

ITK_WRAP_CASTXML_CACHE defaults to ON (set in itkWrapCastXMLCacheSupport.cmake),
so the cache is active without any dashboard.cmake change.
@hjmjohnson hjmjohnson force-pushed the ci/linux-azure-disk-management branch from 3400645 to 20b1c6e Compare June 22, 2026 03:08
@github-actions github-actions Bot removed the area:IO Issues affecting the IO module label Jun 22, 2026
@hjmjohnson

Copy link
Copy Markdown
Member Author

/azp run ITK.macOS.Python

@github-actions github-actions Bot added the area:IO Issues affecting the IO module label Jun 22, 2026
hjmjohnson and others added 2 commits June 22, 2026 07:53
Wrapping/CMakeLists.txt: include(itkWrapCastXMLCacheSupport) so
ITK_WRAP_CASTXML_CACHE_SCRIPT is set for the condition guard in
itk_auto_load_submodules.cmake; guarded by ITK_WRAP_PYTHON.

python.yml: exclude windows-2022; itk_end_wrap_module.cmake
produces an igenerator command exceeding cmd.exe's 8191-char
batch-file line limit for large modules such as
ITKImageIntensity (59 submodules). Pre-existing issue, unrelated
to the castxml cache changes.

Assisted-by: Claude Code — root-cause: missing include and Windows batch-file limit
Invalidates all existing v3 L2 entries (different hash prefix → different
path → orphaned, pruned by LRU eviction) so the next build seeds fresh
timing data for the 5-build overnight benchmark protocol.

Co-Authored-By: Hans Johnson <hans.j.johnson@gmail.com>
@hjmjohnson hjmjohnson force-pushed the ci/linux-azure-disk-management branch from fba93ba to 47a29b7 Compare June 22, 2026 18:49
@github-actions github-actions Bot removed the area:IO Issues affecting the IO module label Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Python wrapping Python bindings for a class type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots type:Testing Ensure that the purpose of a class is met/the results on a wide set of test cases are correct

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant