Notes on optimizing memory usage when downloading data from databases to arrow.

A technique to consider to reduce the memory usage when loading data from databases to arrow is memory **pre-allocation**.

**Memory pre-allocation** in this context is the technique of pre-allocating the memory that will be used by the entire dataset before downloading any data. This has the advantage that it minimizes the amount of volatile memory that will be used, to accomplish this some metadata is needed like total count of rows and the data types of each column of the dataset, obtaining this metadata adds overhead that can often be mitigated if the used queries are optimized, for example by constructing appropriate indexes.
 
Still we only **minimize** memory consumption, getting close to a theoretical minimum but almost never achieving it, this is due to the dynamic nature of some column types, like strings or dynamic arrays (often called lists), where we don't know the real length of every row.

## Why pre-allocating saves memory?
In the rust implementation, ArrowBuilders are dynamic, you can keep adding values. Every time a new value is added, `buffer.reserve` is called:

[src](https://github.com/apache/arrow-rs/blob/5dd34630c742f3cf78f539245a6fbfdd92dde891/arrow-buffer/src/buffer/mutable.rs#L216)

```rust
    #[inline(always)]
    pub fn reserve(&mut self, additional: usize) {
        let required_cap = self.len + additional;
        if required_cap > self.layout.size() {
            let new_capacity = bit_util::round_upto_multiple_of_64(required_cap);
            let new_capacity = std::cmp::max(new_capacity, self.layout.size() * 2);
            self.reallocate(new_capacity)
        }
    }
```

`additional`  is the number of bytes that will be used by the new value(s), calculated as `elements * size_of_type_bytes`

As values are added, `required_cap` will no longer fit the current allocated size and will resize it to whatever is bigger: the next valid 64 multiple or the current size * 2. Doubling the allocated memory is a common technique to avoid multiple smaller allocations, which are more expensive as the overhead adds up. The memory will always be a multiple of 64 for better cache and SIMD performance.

Pre-allocating memory avoids the exponential allocation growth that would otherwise happen as values are appended one by one.

Let's have a look at one example:

Imagine there is an `u32` builder with a buffer currently taking 100MB; 1e8 bytes. If we allocate a new item, reserve will be called with `reserve(1 * 4)`,  since an `u32` takes 4 bytes. The new allocation will be `max(1_000_064, 2_000_000) = 2e8` bytes, doubling the current allocated memory.

## Should we pre-allocate?

Memory pre-allocation only makes sense if the saved memory amortizes the time spent fetching the metadata to do it.

To put things into perspective:

`tphc lineitem 10x` (60M rows)

| library    | Time   | Memory  | Has Index | Pre-Allocated |
|------------|--------|---------|-----------|---------------|
| conecta    | 89.80  | 8320.34 | True      | False         |
| conecta    | 90.8s  | 7804.08 | True      | True          |
| conecta    | 105.35 | 8320.34 | False     | False         |
| conecta    | 170.43 | 7804.08 | False     | True          |
| connectorx | 156.31 | 7695.11 | False     | False         |
| connectorx | 103.02 | 7695.11 | True     | False         |


`tphc lineitem 1x` (6M rows)

| library    | Time   | Memory  | Has Index | Pre-Allocated |
|------------|--------|---------|-----------|---------------|
| conecta | 1.88 | 147.35 | False | True |
| conecta | 1.83 | 212.40 | False | False |
| conecta | 1.82 | 214.44 | True | False |
| conecta | 1.87 | 147.65 | True | True |
| connectorx | 1.95 | 161.47 | False | False |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes on optimizing memory usage when downloading data from databases to arrow. #21

Why pre-allocating saves memory?

Should we pre-allocate?

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

library	Time	Memory	Has Index	Pre-Allocated
conecta	89.80	8320.34	True	False
conecta	90.8s	7804.08	True	True
conecta	105.35	8320.34	False	False
conecta	170.43	7804.08	False	True
connectorx	156.31	7695.11	False	False
connectorx	103.02	7695.11	True	False

library	Time	Memory	Has Index	Pre-Allocated
conecta	1.88	147.35	False	True
conecta	1.83	212.40	False	False
conecta	1.82	214.44	True	False
conecta	1.87	147.65	True	True
connectorx	1.95	161.47	False	False

Notes on optimizing memory usage when downloading data from databases to arrow. #21

Description

Why pre-allocating saves memory?

Should we pre-allocate?

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions