DAOS-18487 object: control EC rebuild resource consumption#17439
DAOS-18487 object: control EC rebuild resource consumption#17439
Conversation
A degraded EC read will allocate and register an extra buffer to recover data, which may cause ENOMEM in some cases. this workaround does not prevent dynamic buffer allocation and registration, it does provide relatively precise control over the resources consumed by degraded EC reads. Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
|
Errors are Unable to load ticket data |
src/object/srv_obj_migrate.c
Outdated
| * registration, it does provide relatively precise control over the | ||
| * resources consumed by degraded EC reads. | ||
| */ | ||
| data_size *= MIN(8, obj_ec_data_tgt_nr(&mrone->mo_oca)); |
There was a problem hiding this comment.
See below L2052, the data_size pass to migrate_dkey(tls, mrone, data_size);
So the added size can define a new variable only pass to migrate_res_hold()/release(), to avoid affect migrate_dkey()?
And some fetch cases need not the data recovery process so will not allocate extra buffers, so maybe need not add so much size? as this may affect RB performance
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17439/1/testReport/ |
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17439/1/execution/node/1282/log |
|
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17439/1/execution/node/1323/log |
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
|
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17439/2/execution/node/1352/log |
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
cf0d064
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17439/3/testReport/ |
For data migration, after being waken up, the ULT should try to wake up another ULT if there is still available resource. Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
|
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17439/4/execution/node/1392/log |
Signed-off-by: Liang Zhen <gnailzenh@gmail.com>
|
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17439/5/testReport/ |
|
Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17439/5/execution/node/1306/log |
|
Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17439/5/execution/node/1365/log |
|
Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17439/5/execution/node/1445/log |
A degraded EC read will allocate and register an extra buffer to recover data, which may cause ENOMEM in some cases.
this workaround does not prevent dynamic buffer allocation and registration, it does provide relatively precise control over the resources consumed by degraded EC reads.
Steps for the author:
After all prior steps are complete: