DAOS-18487 object: control EC rebuild resource consumption by gnailzenh · Pull Request #17439 · daos-stack/daos

gnailzenh · 2026-01-24T01:33:09Z

A degraded EC read will allocate and register an extra buffer to recover data, which may cause ENOMEM in some cases.

this workaround does not prevent dynamic buffer allocation and registration, it does provide relatively precise control over the resources consumed by degraded EC reads.

Steps for the author:

Commit message follows the guidelines.
Appropriate Features or Test-tag pragmas were used.
Appropriate Functional Test Stages were run.
At least two positive code reviews including at least one code owner from each category referenced in the PR.
Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

Gatekeeper requested (daos-gatekeeper added as a reviewer).

A degraded EC read will allocate and register an extra buffer to recover data, which may cause ENOMEM in some cases. this workaround does not prevent dynamic buffer allocation and registration, it does provide relatively precise control over the resources consumed by degraded EC reads. Signed-off-by: Liang Zhen <gnailzenh@gmail.com>

github-actions · 2026-01-24T01:33:25Z

Errors are Unable to load ticket data
https://daosio.atlassian.net/browse/DAOS-18487

liuxuezhao · 2026-01-24T02:40:13Z

src/object/srv_obj_migrate.c

+		 * registration, it does provide relatively precise control over the
+		 * resources consumed by degraded EC reads.
+		 */
+		data_size *= MIN(8, obj_ec_data_tgt_nr(&mrone->mo_oca));


See below L2052, the data_size pass to migrate_dkey(tls, mrone, data_size);
So the added size can define a new variable only pass to migrate_res_hold()/release(), to avoid affect migrate_dkey()?

And some fetch cases need not the data recovery process so will not allocate extra buffers, so maybe need not add so much size? as this may affect RB performance

daosbuild3 · 2026-01-24T03:54:08Z

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17439/1/testReport/

daosbuild3 · 2026-01-25T01:30:02Z

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17439/1/execution/node/1282/log

daosbuild3 · 2026-01-25T01:50:05Z

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17439/1/execution/node/1323/log

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>

daosbuild3 · 2026-01-27T01:59:53Z

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17439/2/execution/node/1352/log

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>

daosbuild3 · 2026-01-28T03:34:03Z

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17439/3/testReport/

For data migration, after being waken up, the ULT should try to wake up another ULT if there is still available resource. Signed-off-by: Liang Zhen <gnailzenh@gmail.com>

daosbuild3 · 2026-01-30T02:49:14Z

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17439/4/execution/node/1392/log

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>

daosbuild3 · 2026-02-03T16:01:16Z

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17439/5/testReport/

daosbuild3 · 2026-02-03T19:03:49Z

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17439/5/execution/node/1306/log

daosbuild3 · 2026-02-03T19:44:10Z

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17439/5/execution/node/1365/log

daosbuild3 · 2026-02-03T20:45:15Z

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17439/5/execution/node/1445/log

gnailzenh requested review from a team as code owners January 24, 2026 01:33

gnailzenh requested review from liuxuezhao and wangshilong January 24, 2026 01:33

liuxuezhao reviewed Jan 24, 2026

View reviewed changes

gnailzenh added 2 commits January 26, 2026 20:52

DAOS-18487 object: degraded buffer size only impact resource control

fc7efdc

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>

Merge branch 'master' into b_ec_res

eecf6d3

wangshilong previously approved these changes Jan 26, 2026

View reviewed changes

liuxuezhao previously approved these changes Jan 27, 2026

View reviewed changes

DAOS-18487 object: amplify credits also for data from parity shard

cf0d064

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>

gnailzenh dismissed stale reviews from liuxuezhao and wangshilong via cf0d064 January 28, 2026 02:41

DAOS-18487 object: try to wake up more ULTs

b000ff0

For data migration, after being waken up, the ULT should try to wake up another ULT if there is still available resource. Signed-off-by: Liang Zhen <gnailzenh@gmail.com>

gnailzenh requested a review from NiuYawei January 29, 2026 08:26

NiuYawei previously approved these changes Jan 29, 2026

View reviewed changes

liuxuezhao previously approved these changes Jan 29, 2026

View reviewed changes

DAOS-18487 object: decrease upper limit of rebuild resource

9086fc3

Signed-off-by: Liang Zhen <gnailzenh@gmail.com>

gnailzenh dismissed stale reviews from liuxuezhao and NiuYawei via 9086fc3 February 3, 2026 13:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAOS-18487 object: control EC rebuild resource consumption#17439

DAOS-18487 object: control EC rebuild resource consumption#17439
gnailzenh wants to merge 6 commits intomasterfrom
liang/b_ec_res

gnailzenh commented Jan 24, 2026

Uh oh!

github-actions bot commented Jan 24, 2026

Uh oh!

liuxuezhao Jan 24, 2026 •

edited

Loading

Uh oh!

daosbuild3 commented Jan 24, 2026

Uh oh!

daosbuild3 commented Jan 25, 2026

Uh oh!

daosbuild3 commented Jan 25, 2026

Uh oh!

daosbuild3 commented Jan 27, 2026

Uh oh!

daosbuild3 commented Jan 28, 2026

Uh oh!

daosbuild3 commented Jan 30, 2026

Uh oh!

daosbuild3 commented Feb 3, 2026

Uh oh!

daosbuild3 commented Feb 3, 2026

Uh oh!

daosbuild3 commented Feb 3, 2026

Uh oh!

daosbuild3 commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

Conversation

gnailzenh commented Jan 24, 2026

Steps for the author:

After all prior steps are complete:

Uh oh!

github-actions bot commented Jan 24, 2026

Uh oh!

liuxuezhao Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

daosbuild3 commented Jan 24, 2026

Uh oh!

daosbuild3 commented Jan 25, 2026

Uh oh!

daosbuild3 commented Jan 25, 2026

Uh oh!

daosbuild3 commented Jan 27, 2026

Uh oh!

daosbuild3 commented Jan 28, 2026

Uh oh!

daosbuild3 commented Jan 30, 2026

Uh oh!

daosbuild3 commented Feb 3, 2026

Uh oh!

daosbuild3 commented Feb 3, 2026

Uh oh!

daosbuild3 commented Feb 3, 2026

Uh oh!

daosbuild3 commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

liuxuezhao Jan 24, 2026 •

edited

Loading