Skip to content

Conversation

@jeremyestein
Copy link
Collaborator

@jeremyestein jeremyestein commented Jan 8, 2026

Implements #29 and fixes #26.

Snakemake is run according to a cron specification (default is once a day in the early hours of the morning).

It causes the CSVs output by the waveform-controller to be converted to both kinds of parquet, and then the de-id parquet to be uploaded to the DSH, leaving behind marker files with upload stats so that snakemake knows they are done, and so do the humans!

Toy hasher needed a fix because Python's hash method is not stable from run to run. (Switch to using real hasher is in #35)

Also add a pipeline debugging guide.

@jeremyestein jeremyestein linked an issue Jan 8, 2026 that may be closed by this pull request
1 task
@jeremyestein jeremyestein mentioned this pull request Jan 8, 2026
6 tasks
@jeremyestein jeremyestein linked an issue Jan 8, 2026 that may be closed by this pull request
6 tasks
@jeremyestein jeremyestein marked this pull request as ready for review January 23, 2026 12:49
Fix docformatter errors in place, and stop ruff and docformatter from fighting with each other.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pipeline logic CSN is still present in de-id file names

2 participants