refactor parquet -> icebug-disk#476
Conversation
|
dataset PR: LadybugDB/dataset#1 |
5a77447 to
e63c9b7
Compare
|
Ran the tests added in the PR locally in CLI too. Also, |
Could you add a commit to reset the submodule pointer? Tests should pass then. Also see my comment about making things work when schema.cypher is not in the same dir. |
This doesn't leave room for saying How about: The format could be guessed/defaulted based on the file extension in the URL. This is consistent with how DuckDB does I haven't reviewed the rest of the PR. But think it's directionally correct. |
The
all in for separate dir path and format options
storage is a |
|
LGTM except for this item:
Looks like you haven't implemented it yet? A test case for: and we should be good to go. Also |
I was waiting for your signal 🙃 |
this should fail right? |
|
It's the main use case for icebug disk. I expect the |
Missed this comment. I like the separate |
separate format option |
|
new dataset PR: LadybugDB/dataset#2 |
|
@adsharma could you PTAL? |
There was a problem hiding this comment.
Two minor issues - suggest making a small PR post merge.
Two major pre-existing issues (out of scope for this PR), but noting here for future reference:
- High: src/storage/table/ice_disk_node_table.cpp:306 assigns nodeID.offset = rowIndex, where rowIndex is only the row’s position within the currently assigned Parquet row group. With multiple row groups, every row group starts offsets at 0, producing duplicate node IDs and breaking joins/relationship traversals.
This needs a row-group base offset from Parquet metadata added to the local row index. - High: src/storage/table/ice_disk_rel_table.cpp:311 drops relationships for later bound nodes in the same row group. When the scan sees a second boundOffset, it breaks, then scanInternalByRowGroups increments currentRowGroup, so the rest of that row group is never revisited. A query with a batch containing multiple source nodes can return only the first node’s relationships.
Guess I have to resurrect #429 😓 |
Apart from renames
storage =formatCloses #469