[SPARK-57142][INFRA] Share SBT precompile artifact with tpcds-1g CI job#56200
[SPARK-57142][INFRA] Share SBT precompile artifact with tpcds-1g CI job#56200zhengruifeng wants to merge 1 commit into
Conversation
Wire the tpcds-1g job to consume the shared precompile artifact, extending the pattern already used by docker-integration-tests and k8s-integration-tests (SPARK-57069). The tpcds-1g job drives SBT directly via 'build/sbt "sql/testOnly ..."', so the first SBT invocation otherwise compiles sql/core (main + test) from scratch. The precompile job already runs 'Test/package', which compiles the sql/core test classes (TPCDSQueryTestSuite, TPCDSCollationQueryTestSuite, GenTPCDSData, TPCDSSchema). Extracting the precompiled target/ lets SBT skip that compile and run the test phase directly, the same way the k8s job reuses the artifact (no SKIP_SCALA_BUILD needed since the job does not go through dev/run-tests.py). - precompile 'if:' gate fires on tpcds-1g == 'true'. - tpcds-1g: 'needs: precondition' -> 'needs: [precondition, precompile]', plus '(!cancelled()) &&' so it still runs if precompile is cancelled. - Download/Extract steps after Java install, with graceful fallback (continue-on-error). If the artifact is missing, SBT compiles from scratch as before. Generated-by: Claude Code (Opus 4.7)
CI performance: before vs afterSamples: BEFORE = 3
Samples:
Where the savings come fromBefore this PR, the Test phase -- unaffected (as expected)
All within CI noise -- the precompile artifact has no effect on SQL query execution speed. Bottom line~7% wall-clock reduction per |
What changes were proposed in this pull request?
This PR wires the
tpcds-1gjob in.github/workflows/build_and_test.ymlto consume the sharedprecompileartifact, extending the pattern already applied todocker-integration-testsandk8s-integration-tests(SPARK-57069; parent SPARK-56830).Concretely:
precompilejob'sif:gate is extended to also fire whentpcds-1g == 'true'in the precondition output, so the artifact is available whenever the job runs.tpcds-1g:needs: precondition->needs: [precondition, precompile]if:extended with(!cancelled()) &&so the job still runs if precompile is cancelled.continue-on-error: true).The
tpcds-1gjob drives SBT directly viabuild/sbt "sql/testOnly ..."(andbuild/sbt "sql/Test/runMain org.apache.spark.sql.GenTPCDSData ..."on a TPC-DS data cache miss), so it does not go throughdev/run-tests.pyand needs noSKIP_SCALA_BUILDflag -- the same situation ask8s-integration-tests. The first SBT invocation otherwise compilessql/core(main + test) from scratch. Theprecompilejob already runsTest/package, which compiles thesql/coretest classes this job depends on (TPCDSQueryTestSuite,TPCDSCollationQueryTestSuite,GenTPCDSData,TPCDSSchema). Extracting the precompiledtarget/lets SBT skip that compile and run the test phase directly.Optional: graceful fallback if precompile fails
Same pattern as the prior consumers:
precompilekeepscontinue-on-error: true.needs.precompile.result == 'success'and hascontinue-on-error: true.continue-on-error: true.Worst case is degraded to the pre-PR behavior, not a workflow failure.
Note: the existing
# Any TPC-DS related updates on this job need to be applied to tpcds-1g-gen job of benchmark.yml as wellcomment refers to TPC-DS data-generation parameters (scale factor,tpcds-kitref,GenTPCDSDataargs). This PR changes none of those -- it only adds build-artifact reuse, andbenchmark.ymlis a standalone workflow with no sharedprecompilejob -- so no corresponding change is needed there.Why are the changes needed?
Today every run of
build_and_test.ymlthat requirestpcds-1gre-runs the samesql/coreSBT compile that theprecompilejob already produced forpyspark/sparkr/build/ docker / k8s. Wiringtpcds-1gto the existing artifact removes that duplicate compile for free (precompile is already running).Does this PR introduce any user-facing change?
No. CI infrastructure change only.
How was this patch tested?
The change is exercised by the CI run of this PR itself. The Download/Extract steps log the artifact size; if the precompile job is forced to fail (or its artifact is missing), the job falls back to the original local SBT build.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)