-
Notifications
You must be signed in to change notification settings - Fork 7
Add minimum changes to support containers #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hi, Just my observation and few comments. For container runtime, the current JobSpec is minimal for an image and volume mounts, and there might be a need for more options to express for container runtime, like Kind of, rather than pushing these flags directly in JobSpec, the API could introduce a dedicated It also raises additional questions for facilities and IRI Interface implementation (how this would work in practise between all facilities), as each facility might use a different container runtime (Docker, Apptainer, Podman…), and not everyone allows fully privileged containers (just my guess). How are these capabilities exposed (container runtime, flags supported) and who does the "heavy lifting" to translate container parameters to facility container runtime. Is it IRI Interface or is it left for the end-user to identify each facilities capabilities and make changes as required to run jobs. |
Separating the configuration into into a separate container specific object is a good idea, however, I think we need to be careful to avoid exposing too much as we need to allow for sites to implement the interface, so it really needs to be the lowest common denominator that the container runtimes used across the different sites can support. For example I didn't expose the network configuration as I was thinking that we should just default to the host. For MPI and GPU configuration, I would say that these options could be enabled if the job spec dictated that they where necessary, to avoid duplicating configuration.
Yes, as I said above, we need to expose a very minimal subset of container functionality, so it can be implemented successfully across sites. I see this interface as a subset of container functionality rather than as superset of all container runtime options. We could also provide a site specific "extra container options" property, as an escape hatch that would allow sites to support more advanced options, but these would not necessarily be supported across all sites. |
205b127 to
b5bdf4a
Compare
|
It would be great to be able to get my job executable or script to run inside a specified container image. That would remove my need to login to the system manually and compile my code before submitting a job. However, HPC systems need special mounts/definitions in order to use GPUs / accelerators and the system MPI libraries, etc. Some references to compare when doing this are:
I think the common subset is the image name and mount list. However, we should also try and add something like a |
|
@frobnitzem I agree that we need to be able to support GPUs and MPI libraries. I was proposing that given this configuration is already covered in the job spec each implementation would use that to add appropriate |
This PR add some properties to the
JobSpecto allow containerize jobs to be run.