Skip to content

HDDS-15404. Enable short-circuit unit test in CI#10377

Draft
ChenSammi wants to merge 5 commits into
apache:HDDS-10685from
ChenSammi:HDDS-15404
Draft

HDDS-15404. Enable short-circuit unit test in CI#10377
ChenSammi wants to merge 5 commits into
apache:HDDS-10685from
ChenSammi:HDDS-15404

Conversation

@ChenSammi
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

  1. enable unit tests
  2. refactor native library download
  3. fix one TODO

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15404

How was this patch tested?

Test locally with commands and passed

mvn clean package -DskipTests -Dmaven.javadoc.skip=true -Pdist

mvn test -Dtest=TestDomainSocketFactory,TestXceiverServerDomainSocket,TestXceiverClientManagerSC,TestShortCircuitChunkInputStream

@adoroszlai
Copy link
Copy Markdown
Contributor

Please wait for clean CI run in fork before opening PR.

@ChenSammi ChenSammi marked this pull request as draft May 28, 2026 11:04
@ChenSammi
Copy link
Copy Markdown
Contributor Author

ChenSammi commented May 29, 2026

Please wait for clean CI run in fork before opening PR.

@adoroszlai, just a discussion, it looks like personal github account has very limited resource than apache account, an example is this patch, from yesterday to today, many actions of one CI run was queued for a considered time(more than 30 mins) to get executed, which is inefficient for debug issues existed only with CI flow, as I had to wait for the CI to get executed. In the meanwhile, it's not recommended to use apache CI resource, do you mind explaining a bit more about the reason?

@adoroszlai
Copy link
Copy Markdown
Contributor

it looks like personal github account has very limited resource than apache account

Personal account is limited to 20 concurrent jobs. For full Ozone CI, that's about 1 commit per hour. Apache organization may have higher limits, but it is shared between many repositories and even more developers. ASF policy for GitHub Actions imposes limits on resource use. Also, each Apache repo still has limited concurrency, which can be observed when CI for several PRs are in flight or multiple PRs are merged around the same time.

Note that both forks and shared repos may be affected by occasional GitHub Actions outages, which was the case this week.

To iterate faster in your fork when debugging, temporarily disable unrelated jobs and splits on your dev branch (example: adoroszlai@9a51d89). You may also run integration tests targetedly via flaky-test-check.

Also, I think you can disable close-stale-prs and scheduled-label-pull-requests in your fork.

@ChenSammi
Copy link
Copy Markdown
Contributor Author

ChenSammi commented May 29, 2026

There are 4 integration tests, all belong to "ozone-integration-test" module, but two integration categories. The tests in client category passed, and those in hdds failed due to libhadoop not found .

TestDomainSocketFactory            passed  integration(client)
TestLocalChunkInputStream      passed  integration(client)
TestXceiverServerDomainSocket   failed  integration(hdds)
TestXceiverClientManagerSC     failed    integration(hdds)

I add the profile "test-short-circuit" in last commit, and "-Ptest-short-circuit" when integration.sh is called.
integration(client) passed while integration(hdds) failed. Does each integration category run share the same ozone home directory, and files, or each integration category has its own fresh run environment?
@adoroszlai , do you have any suggestion that how I can make the integration(hdds) have the libhadoop native library ready too?

Latest CI: https://github.com/ChenSammi/ozone/actions/runs/26620467488/job/78446176255

One workaround is move all these test classes into one package, so they will be included in only one integration run category. But I'd like to know why integration(hdds) doesn't have the libhadoop.

@adoroszlai
Copy link
Copy Markdown
Contributor

Does each integration category run share the same ozone home directory, and files, or each integration category has its own fresh run environment?

They use fresh environment, since they are executed on separate runner instances.

how I can make the integration(hdds) have the libhadoop native library ready too?

The failed split (hdds) also had Hadoop native libraries:

Found Hadoop native libraries. Copying to distribution...
$ mkdir -p ./lib/native
$ cp -rP /home/runner/work/ozone/ozone/target/native-lib/libhadoop.dylib /home/runner/work/ozone/ozone/target/native-lib/libhadoop.so /home/runner/work/ozone/ozone/target/native-lib/libhadoop_linux_x86_64.so /home/runner/work/ozone/ozone/target/native-lib/libhadoop_osx_aarch_64.dylib ./lib/native

I add the profile "test-short-circuit" in last commit, and "-Ptest-short-circuit" when integration.sh is called

This results in a new split, as they are derived from profiles named test-.... It tries to run all integration tests (and times out after 1.5 hours), because there are no include/exclude rules.

One workaround is move all these test classes into one package, so they will be included in only one integration run category.

You can keep them in their current package by adding <exclude> rules for Surefire in the existing test profiles and <include> rule for the new, accidental profile.

ozone/pom.xml

Lines 2783 to 2804 in bf68f20

<profile>
<id>test-hdds</id>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<configuration>
<includes>
<include>org.apache.hadoop.hdds.**</include>
</includes>
<excludes>
<exclude>org.apache.hadoop.hdds.scm.container.**</exclude>
<!-- default excludes -->
<exclude>**/*$*</exclude>
</excludes>
<excludedGroups>${unstable-test-groups}</excludedGroups>
</configuration>
</plugin>
</plugins>
</build>
</profile>

Otherwise please rename the new profile to something like with-hadoop-native-lib.

@ChenSammi
Copy link
Copy Markdown
Contributor Author

ChenSammi commented May 29, 2026

Found Hadoop native libraries. Copying to distribution...
$ mkdir -p ./lib/native
$ cp -rP /home/runner/work/ozone/ozone/target/native-lib/libhadoop.dylib /home/runner/work/ozone/ozone/target/native-lib/libhadoop.so /home/runner/work/ozone/ozone/target/native-lib/libhadoop_linux_x86_64.so /home/runner/work/ozone/ozone/target/native-lib/libhadoop_osx_aarch_64.dylib ./lib/native

is output by dist-layout-stitching. I assume dist-layout-stitching is only called when -Pdist is enabled, does each integration run command "COMMAND: hadoop-ozone/dev-support/checks/integration.sh -Ptest-ozone -Drocks_tools_native" will also call dist-layout-stitching?

@adoroszlai
Copy link
Copy Markdown
Contributor

I assume dist-layout-stitching is only called when -Pdist is enabled,

No, -Pdist enables build of the binary tarball.

does each integration run command ... will also call dist-layout-stitching?

Yes, otherwise it would not appear in the integration check's log.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants