Skip to content

FFCV doesn't work for large dataset #389

@richardrl

Description

@richardrl

I am trying to load a 600GB dataset.

It froze for one hour on np.from_file in ffcv -> ffcv -> reader.py line 70 before I gave up and cancelled it.

I tried to fix this by using np.memmap.

        alloc_table = np.memmap(self._fname, dtype=ALLOC_TABLE_TYPE,
                                  offset=offset, shape=file_size, mode="r+")
        # alloc_table = np.fromfile(self._fname, dtype=ALLOC_TABLE_TYPE,
        #                           offset=offset)

The first time I did this, for some reason the subsequent code changed my 262GB Beton file to 6.2TB.

I need to recreate the beton now to try with just the read flag for memmap to see if I can get this working. Otherwise any tips?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions