Skip to content

Reduce size of data files in Endom#23

Closed
fingolfin wants to merge 1 commit into
masterfrom
mh/compress
Closed

Reduce size of data files in Endom#23
fingolfin wants to merge 1 commit into
masterfrom
mh/compress

Conversation

@fingolfin
Copy link
Copy Markdown
Member

  • use gzip -9 to get maximal compression
  • remove superfluous whitespace inside

This reduces the size of the directory from 50M to 26M for me.

- use `gzip -9` to get maximal compression
- remove superfluous whitespace inside
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.85%. Comparing base (58a9b9f) to head (b740da0).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master      #23   +/-   ##
=======================================
  Coverage   99.85%   99.85%           
=======================================
  Files           5        5           
  Lines         709      709           
=======================================
  Hits          708      708           
  Misses          1        1           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@olexandr-konovalov olexandr-konovalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I didn't inspect the changes, but perhaps @raemarina and @IrynaRaievska can pick up a random file to compare? I presume you have used some shell script to process all data?

@ChrisJefferson
Copy link
Copy Markdown
Member

I was looking at possible another way to squish this, which particularly helps for the larger files, while still keeping them fairly easy to understand, will report soon.

@fingolfin
Copy link
Copy Markdown
Member Author

@olexandr-konovalov yes, a very simply script, basically gunzip Endom/*/*.gz, followed by invoking perl/sed/whatever to remove all spaces (except spaces where to the left and right there are letter, to avoid mucking up the return statement at the end of each file), and then finally gzip -9 -n Endom/*/*.txt

But of course it seems @ChrisJefferson has done something even better, nice. (Makes me wonder if the same technique maybe also would apply to the additional data files on Zenodo)

@fingolfin
Copy link
Copy Markdown
Member Author

Closing in favor of PR #38

@fingolfin fingolfin closed this May 1, 2026
@fingolfin fingolfin deleted the mh/compress branch May 1, 2026 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants