[QDP] Numpy Input Speed & Memory Improvements#877
[QDP] Numpy Input Speed & Memory Improvements#877400Ping wants to merge 2 commits intoapache:mainfrom
Conversation
Signed-off-by: 400Ping <fourhundredping@gmail.com>
44bc250 to
549e254
Compare
Signed-off-by: 400Ping <fourhundredping@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR implements streaming and memory-mapped alternatives for NumPy .npy file ingestion to avoid memory spikes from the original approach that loaded the entire file through Array2 before flattening. The changes add direct header parsing, chunked readers with support for both row-major (C) and Fortran-order arrays, new IO helper functions, the memmap2 dependency, and comprehensive unit tests.
Changes:
- Added
NumpyStreamingReaderfor chunk-by-chunk reading without loading entire files into memory - Added
NumpyMmapReaderfor memory-mapped IO with OS-managed paging - Implemented custom header parsing to avoid intermediate Array2 allocations
- Added helper functions for safe f64 reading from bytes and files
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 20 comments.
Show a summary per file
| File | Description |
|---|---|
| qdp/qdp-core/src/readers/numpy.rs | Core implementation of streaming and mmap readers with custom header parsing, data reading utilities, and unit tests |
| qdp/qdp-core/src/readers/mod.rs | Exported new reader types for public API |
| qdp/qdp-core/src/io.rs | Added convenience functions for streaming and mmap batch reading |
| qdp/qdp-core/Cargo.toml | Added memmap2 dependency for memory-mapped IO |
| qdp/Cargo.toml | Specified memmap2 version in workspace |
| qdp/Cargo.lock | Locked memmap2 dependency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let start = data_base + self.row_cursor * sample_size * std::mem::size_of::<f64>(); | ||
| let end = start + elem_count * std::mem::size_of::<f64>(); | ||
| let bytes = &self.mmap[start..end]; | ||
| copy_f64s_from_bytes(bytes, &mut buffer[..elem_count])?; |
There was a problem hiding this comment.
Since we use copy_f64s_from_bytes in MmapReader that requires high-throughput, we implement with bytes -> f64 transformation + copy now, maybe we could take a follow-up to improve it into a pure memcpy method (zero-copy). WDYT?
There was a problem hiding this comment.
I think it is actually an opened issue, I will do this probably later tonight
There was a problem hiding this comment.
Nice try diddy. thx for contribution!
There was a problem hiding this comment.
wait I double check again and found out that rich didn't put zero copy into the issue list #718 weird
There was a problem hiding this comment.
if you want you can open a follow up for this?
|
Could you provide a benchmark before and after result if any? So reviewers don’t have to run it themselves! Thanks! |
|
Before: After: |
|
The improvements isn't a lot better, should I close this? |
|
You might want to also monitor mem spike as it your primary goal and if it really matters because it's on cpu and ram can be offload to ssd. (I'm not sure if it's possible tho. I'm not familiar with offloading strategy. offloading will decrease the speed because another io bound) |
Just checked the mem spike, it isn't better Closing this PR |
Purpose of PR
Implemented streaming and mmap alternatives for .npy ingestion to avoid the read_npy -> Array2 -> flatten Vec memory spike. Added direct header parsing, chunked readers (row-major + Fortran-order handling), and new IO helpers, plus memmap dependency and tests for streaming/mmap paths.
Related Issues or PRs
Closes #789
Changes Made
Breaking Changes
Checklist