Skip to content

set cpu pinning for envs#1289

Merged
dnsi0 merged 1 commit intomainfrom
cpu-affinity
Mar 26, 2026
Merged

set cpu pinning for envs#1289
dnsi0 merged 1 commit intomainfrom
cpu-affinity

Conversation

@dnsi0
Copy link
Contributor

@dnsi0 dnsi0 commented Mar 24, 2026

Fixes #1279 .

Changes proposed in this PR:

  • Pin each Docker compute container to dedicated physical CPU cores using CpusetCpus, preventing jobs from competing for the same cores
  • Each compute environment gets its own range of cores via a cpuOffset, so multiple environments don't overlap
  • On node restart, rebuild the CPU allocation map by inspecting running containers (rebuildCpuAllocations)
  • Release pinned cores back to the pool when a job finishes (cleanupJob → releaseCpus)
  • When not enough free cores are available, the job proceeds without pinning

@dnsi0 dnsi0 changed the title set cpu pinning for envs, release cpu once the job is done, handle th… set cpu pinning for envs Mar 25, 2026
@dnsi0 dnsi0 marked this pull request as ready for review March 25, 2026 08:47
@alexcos20
Copy link
Member

/run-security-scan

Copy link
Member

@alexcos20 alexcos20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI automated code review (Gemini 3).

Overall risk: medium

Summary:
This pull request introduces CPU affinity capabilities to the Docker-based C2D compute engine. It allows pinning compute jobs to specific physical CPU cores, manages CPU core allocation and deallocation, and supports multi-cluster configurations by offsetting CPU core indices. The changes include adding cpuOffset to the engine constructor, calculating available environment CPU cores, integrating CpusetCpus into Docker container creation, and implementing mechanisms to rebuild CPU allocations on node restart and release cores upon job cleanup. This is a significant enhancement for resource management and potentially performance for CPU-bound tasks.

Comments:
• [WARNING][style] The // eslint-disable-next-line require-await comment on this line seems misplaced, as parseCpusetString is a synchronous function and does not need this directive. Please remove it.
• [INFO][other] The TODO comment /* TODO - get namedresources & discreete one is still present. Consider addressing it in this PR or creating a follow-up task if it's not critical for immediate functionality.
• [INFO][other] The addition of await this.rebuildCpuAllocations() during initialization is a crucial improvement for robustness. It correctly handles node restarts by restoring CPU allocations for already running containers, which is excellent for maintaining consistent resource management.
• [WARNING][performance] In allocateCpus, if freeCores.length < count, the method logs a warning and returns null, meaning the container will proceed without CPU affinity (i.e., Docker will schedule it on any available CPU cycles). Is this the desired fallback behavior? For applications where strict CPU isolation is critical or resources are highly contended, it might be preferable to throw an error, fail the job, or implement a queuing/retry mechanism. Please confirm if this graceful degradation is acceptable or if a stricter approach is needed based on business requirements.
• [WARNING][other] The cpuOffset logic in C2DEngines (specifically cpuOffset += cpuRes.total) relies heavily on cluster.connection.resources.find((r: any) => r.id === 'cpu') and cpuRes?.total to accurately sum up the total CPUs for each cluster. This implies that these configurations must be perfectly accurate and that physical CPU indices can be assumed to be contiguous across multiple Docker engines/clusters when cpuOffset is applied. Please ensure documentation or configuration guidelines clearly explain this dependency for multi-cluster setups to prevent misconfigurations leading to CPU allocation overlaps or inefficiencies.

@alexcos20 alexcos20 self-requested a review March 26, 2026 05:58
Copy link
Member

@alexcos20 alexcos20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dnsi0 dnsi0 merged commit 7cedcef into main Mar 26, 2026
18 of 19 checks passed
@dnsi0 dnsi0 deleted the cpu-affinity branch March 26, 2026 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cpu afinty on docker containers

2 participants