Skip to content

fix: support total_size=0 in byte_range_to_row_range#7115

Merged
AdamGS merged 1 commit intovortex-data:developfrom
aalexandrov:fix_division_by_zero_error
Mar 26, 2026
Merged

fix: support total_size=0 in byte_range_to_row_range#7115
AdamGS merged 1 commit intovortex-data:developfrom
aalexandrov:fix_division_by_zero_error

Conversation

@aalexandrov
Copy link
Contributor

@aalexandrov aalexandrov commented Mar 22, 2026

Summary

In certain cases DataFusion might decide to create a ranges for empty files that have zero rows. In order to avoid hitting a division by zero error in the

let average_row = total_size / row_count;

line, add an early exit check to VortexOpener::open to immediately return an empty stream if the file contains zero rows:

if vxf.row_count() == 0 {
    let empty_stream = stream::iter(vec![]).boxed();
    return Ok(empty_stream);
}

The enforced row_count > 0 invariant for the byte_range_to_row_range call is now also called out with a debug_assert! macro.

Testing

Add a new test_open_empty_file test case.

@AdamGS
Copy link
Contributor

AdamGS commented Mar 23, 2026

If the file has 0 rows, shouldn't VortexOpener::open just return an empty stream?

@aalexandrov aalexandrov force-pushed the fix_division_by_zero_error branch from 22bd055 to 8f0f23a Compare March 25, 2026 14:10
@aalexandrov
Copy link
Contributor Author

If the file has 0 rows, shouldn't VortexOpener::open just return an empty stream?

Fair point. I updated the PR and added a test_open_empty_file to cover the refined VortexOpener::open behavior.

Nit: The changes are in a fixup commit, so they need to be squashed before the PR is merged.

Comment on lines +204 to +205
let empty_stream = stream::iter(vec![]).boxed();
return Ok(empty_stream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nit - I think using the buildin utility is nicer

Suggested change
let empty_stream = stream::iter(vec![]).boxed();
return Ok(empty_stream);
return Ok(futures::stream::empty().boxed());

@AdamGS
Copy link
Contributor

AdamGS commented Mar 25, 2026

overall LGTM, just running CI to see everything is good

@AdamGS AdamGS added the changelog/fix A bug fix label Mar 25, 2026
@AdamGS
Copy link
Contributor

AdamGS commented Mar 25, 2026

btw - you need to sign-off on your commits for us to merge it, we'll also squash it before merging.

@aalexandrov aalexandrov force-pushed the fix_division_by_zero_error branch from 8f0f23a to 4ce2594 Compare March 26, 2026 12:10
@aalexandrov
Copy link
Contributor Author

btw - you need to sign-off on your commits for us to merge it, we'll also squash it before merging.

Done. I used the occasion to also squashed the commits and update the commit message to align with the PR description.

In certain cases DataFusion might decide to create a ranges for empty
files that have zero rows. In order to avoid hitting a division by zero
error in the

```rust
let average_row = total_size / row_count;
```

line, add an early exit check to `VortexOpener::open` to immediately
return an empty stream if the file contains  zero rows:

```rust
if vxf.row_count() == 0 {
    let empty_stream = stream::iter(vec![]).boxed();
    return Ok(empty_stream);
}
```

The enforced `row_count > 0` invariant for the `byte_range_to_row_range`
call is now also called out with a `debug_assert!` macro.

Add a new `test_open_empty_file` test case.

Signed-off-by: Alexander Alexandrov <alexander.s.alexandrov@gmail.com>
@aalexandrov aalexandrov force-pushed the fix_division_by_zero_error branch from 4ce2594 to 18d8814 Compare March 26, 2026 12:14
@AdamGS AdamGS added the ext/datafusion Relates to the DataFusion integration label Mar 26, 2026
@AdamGS AdamGS self-assigned this Mar 26, 2026
@AdamGS
Copy link
Contributor

AdamGS commented Mar 26, 2026

thank you for contributing this fix!

@AdamGS AdamGS merged commit c256ccd into vortex-data:develop Mar 26, 2026
67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/fix A bug fix ext/datafusion Relates to the DataFusion integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants