FFCV doesn't work for large dataset

I am trying to load a 600GB dataset.

It froze for one hour on np.from_file in ffcv -> ffcv -> reader.py line 70 before I gave up and cancelled it.

I tried to fix this by using np.memmap.

```
        alloc_table = np.memmap(self._fname, dtype=ALLOC_TABLE_TYPE,
                                  offset=offset, shape=file_size, mode="r+")
        # alloc_table = np.fromfile(self._fname, dtype=ALLOC_TABLE_TYPE,
        #                           offset=offset)
```

The first time I did this, for some reason the subsequent code changed my 262GB Beton file to 6.2TB.

I need to recreate the beton now to try with just the read flag for memmap to see if I can get this working. Otherwise any tips?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FFCV doesn't work for large dataset #389

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

FFCV doesn't work for large dataset #389

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions