chore(ci): replace maximize-build-space with free-disk-space in e2e#4336
chore(ci): replace maximize-build-space with free-disk-space in e2e#4336wolfboys merged 3 commits intoapache:devfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the E2E GitHub Actions workflow to address consistent “No space left on device” failures by switching from an LVM-based disk maximization approach to a package-removal based disk cleanup action.
Changes:
- Replaced
easimon/maximize-build-spacewithjlumbroso/free-disk-space@v1.3.1in the E2E workflow. - Configured the new action to remove several large preinstalled components (dotnet/android/haskell/codeql/docker images/large packages) to free runner disk space.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
This workflow run failed: https://github.com/apache/streampark/actions/runs/23380797925 With the code in this PR, we won't see the "no space left on device" errors anymore. If the re-run is lucky enough to pass Also, here is a screenshot of a fully successful E2E run I had a few days ago: |
.github/workflows/e2e.yml
Outdated
| - name: Maximize runner space | ||
| uses: easimon/maximize-build-space@fc881a613ad2a34aca9c9624518214ebc21dfc0c | ||
| - name: Free Disk Space | ||
| uses: jlumbroso/free-disk-space@21bdaa2c9e347d9d7fdb1bd6124d61c0f335a419 # v1.3.1 |
There was a problem hiding this comment.
Can you add comment here?
refer: https://github.com/jlumbroso/free-disk-space
There was a problem hiding this comment.
Done, and used the correct SHA.
|




What changes were proposed in this pull request
This is a new issue that surfaced following my previous fix for another E2E failure.
Currently, the E2E tests are consistently failing due to "No space left on device" errors. This issue is primarily caused by the
easimon/maximize-build-spaceaction currently used in the workflow. To improve disk space utilization, this action consolidates multiple disk spaces into a single LVM mount, but leaves only about 10GB of available space on the root directory after mounting. Ideally, all operations should be executed within the LVM volume. However, there are some hard-to-track, unexpected disk write behaviors in our workflow that do not use the LVM. As a result, the 10GB root directory space is quickly exhausted, leading to E2E failures.I tried directly adjusting the reservation parameters of
easimon/maximize-build-space(for example, settingroot-reserve-mb: 30720in hopes of preserving 30GB for the root directory), but no matter how it was adjusted, the root directory still only had 10GB left after the LVM mount was completed. This might be a bug in the action itself, or perhaps the parameters failed to take effect as expected in this specific environment.Looking at the overall environment, the total physical disk space of the Runner is actually sufficient, and we don't need to introduce a complex LVM mounting mechanism to consolidate space. Therefore, this PR replaces the space-clearing solution with
jlumbroso/free-disk-space. By directly removing large, pre-installed packages that are not actually needed in the environment (such asdotnet,android,haskell, etc.), we can free up ample disk space in a much simpler and more direct way, thereby completely resolving the disk space shortage issue in the E2E tests.Brief change log
easimon/maximize-build-spacewithjlumbroso/free-disk-space@v1.3.1in the GitHub Actions workflow.
dotnet,android,haskell,codeql,docker-images,large-packages) to directly free up Runner disk space instead of using LVM.
Verifying this change
This change is primarily verified by the E2E tests in the CI workflow.
Please note: The previous "No space left on device" error was a deterministic issue that happened every time, and this fix completely resolves it. However, during verification, I found that the E2E tests might occasionally fail due to other unrelated, flaky issues. Therefore, when verifying this PR, if you encounter a failure that is not related to disk space, simply re-running the jobs will highly likely result in a successful pass.
Does this pull request potentially affect one of the following parts
-Dependencies (does it add or upgrade a dependency): no