Autodiscover paths in the no-engine case#3139
Autodiscover paths in the no-engine case#3139Andrew-S-Rosen merged 22 commits intoQuantum-Accelerators:mainfrom
Conversation
|
Thanks, @vineetbansal! One other ToDo here: can you also update the docs with info about your new feature? https://quantum-accelerators.github.io/quacc/user/settings/file_management.html |
Corrected punctuation and formatting for clarity.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3139 +/- ##
==========================================
- Coverage 97.68% 97.59% -0.10%
==========================================
Files 97 98 +1
Lines 4190 4282 +92
==========================================
+ Hits 4093 4179 +86
- Misses 97 103 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@Andrew-S-Rosen - I think that should do it to fix the tests. I'm not sure why Jenkins is stuck - perhaps a restart of server infrastructure that happened last Tuesday? |
…b results, if AUTODISCOVER_DIR is true; docs updated
|
@vineetbansal thanks!! Is this ready for review? After review, I will probably ask someone else in the group to try it out and make sure the hierarchy makes sense to them. |
|
@Andrew-S-Rosen - yes this is ready to be reviewed now. Two things that we can either do as part of this PR or later:
|
|
Thanks! I will review it this weekend. 👍
|
There was a problem hiding this comment.
Hi @vineetbansal! I have some comments worth addressing below. As for your "Two things that we can either do as part of this PR or later", please see my comment above.
Directory in Output
When I run the following example, the output schema (JSON) has a dir_name field that includes the directory. When I run with AUTODISCOVER_DIR set to False, the full path is shown (this is the desired behavior). When I run with AUTODISCOVER_DIR set to True, the relative path is shown. Can you ensure that when it gets stored in the output, it is resolved so that the user can find the files afterwards?
Note that this is also a relevant concern with the logging, where it does things like
INFO:quacc.runners.prep:Moving C:\Users\asros\Desktop\tmp-bulk_to_slabs_flow-2026-02-22-01-32-35-651667-90541\bulk_to_slabs_subflow-2026-02-22-01-32-35-652280-77098\relax_job-2026-02-22-01-32-35-825258-48557 contents to bulk_to_slabs_flow-2026-02-22-01-32-35-651667-90541\bulk_to_slabs_subflow-2026-02-22-01-32-35-652280-77098\relax_job-2026-02-22-01-32-35-825258-48557
from quacc.recipes.emt.core import relax_job
from ase.build import bulk
atoms = bulk("Cu")
output = relax_job(atoms)
print(output["dir_name"])Tmp Path Naming on Failed Jobs
There is now a consideration that we did not need to think about before. Namely, if someone runs a flow that launches 10 independent and concurrent jobs, they might (or might not) be fine with 9/10 finishing depending on whether the data for the 9/10 might still be useful.
As it stands right now, the parent flow will stay in the scratch directory with the `tmp- name if 1/10 jobs fails. Admittedly, I am not sure really what to do about this and am open to suggestions. I will need to think on this some more too.
from quacc.recipes.emt.slabs import bulk_to_slabs_flow
from ase.build import bulk
atoms = bulk("Xe")
bulk_to_slabs_flow(atoms)|
|
||
| At job runtime, the file structure looks like: | ||
|
|
||
| If `AUTODISCOVER_DIR` is `false`: |
There was a problem hiding this comment.
Can we make this False instead?
|
|
||
| Once the job successfully completes, the file structure looks like: | ||
|
|
||
| If `AUTODISCOVER_DIR` is `false`: |
| # Move files from tmpdir to job_results_dir. | ||
| LOGGER.info(f"Moving {tmpdir} contents to {job_results_dir}") | ||
| job_results_dir.mkdir(parents=True, exist_ok=True) | ||
| for file_name in os.listdir(tmpdir): | ||
| move(tmpdir / file_name, job_results_dir / file_name) | ||
| rmtree(tmpdir) |
There was a problem hiding this comment.
Looking at the diff, I'm a bit confused where the if settings.CREATE_UNIQUE_DIR business went. Can you explain?
There was a problem hiding this comment.
On main, CREATE_UNIQUE_DIR is checked in both functions:
def calc_setup:
job_results_dir = settings.RESULTS_DIR.resolve()
if settings.CREATE_UNIQUE_DIR:
job_results_dir /= f"{tmpdir.name.split('tmp-')[-1]}"
def calc_cleanup (main):
if settings.CREATE_UNIQUE_DIR:
move(tmpdir, job_results_dir)
else:
for file_name in os.listdir(tmpdir):
move(tmpdir / file_name, job_results_dir / file_name)
rmtree(tmpdir)
When CREATE_UNIQUE_DIR=True, job_results_dir doesn't exist yet, so move(tmpdir, job_results_dir) works as a rename.
On this branch, CREATE_UNIQUE_DIR is only checked in calc_setup. In calc_cleanup, we do:
def calc_cleanup:
job_results_dir.mkdir(parents=True, exist_ok=True)
for file_name in os.listdir(tmpdir):
move(tmpdir / file_name, job_results_dir / file_name)
Once mkdir pre-creates job_results_dir (the mkdir call is needed for the AUTODISCOVER_DIR case where job_results_dir is a deep nested path), move(tmpdir, job_results_dir) would move tmpdir inside job_results_dir rather than as it. So we don't do that, and move its contents to job_results_dir instead.
So this branch collapses both cases into a single strategy as far as calc_cleanup is concerned, and CREATE_UNIQUE_DIR only needs to live in calc_setup. Effectively we're still moving everything inside tmpdir to job_results_dir.
Some newly introduced tests in tests/core/wflow/test_autodiscover_paths.py::test_results_dir_safe illustrate this.
|
|
||
| if is_top_level(): | ||
| # This is the outermost tracked call: create a unique root | ||
| # directory (e.g. ``quacc-abc123/``) and initialize both the |
There was a problem hiding this comment.
I think this comment might need updating.
|
@Andrew-S-Rosen. For the two comments you made:
Fair point - this is fixed by returned the
With With In either case, stuff stays in scratch when a job fails. The two minor differences are:
|
… vb/path_noengine


Summary of Changes
Adds support for inspecting the current running context using:
get_directory_context()<- The top-level directory specified by the@flow.get_context_path()<- The position of the running code relative to the top-level@flowas/-separated string elements.Together, these allow the code to create a folder using -
settings.RESULTS_DIR / Path(get_directory_context()) / get_context_path().calc_setupuses this to create timestamped folders for output.Still a WIP. I need to add support for a few things we discussed, and add tests for these:
@jobs might be running without an encompassing@flowand need to be able to set their own top-level directory.@flowmight have several@subflows with the same name. Thus the subflow level folders need to have a timestamp appended to them, just like the folders at the@joblevel.test_autodiscover_pathsto cover the case ofQUACC_AUTODISCOVER_DIRtoTrue/Falseto demonstrate the directory structure in both modes. Add tree to their docstrings so it's clear what to expect in each case.