PQ-FastScan is a Rust implementation for fast vector quantization and search using Product Quantization (PQ). It supports 8-bit PQ using AVX512 SIMD instructions, which significantly accelerates the computation by leveraging advanced vector extensions.
This repository is still under development and is not yet ready for benchmarking.
Ensure you have the following installed on your system:
- Rust (latest stable version)
- Cargo (comes with Rust)
- HDF5 library
-
Install Rust and Cargo:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source $HOME/.cargo/env
-
Install HDF5 library: (On Linux)
sudo apt-get update sudo apt-get install -y build-essential libhdf5-dev openssl pkg-config libssl-dev
-
Install HDF5 library (On MacOS):
brew install hdf5@1.10 openssl pkg-config
Note: Make sure to follow the instructions to link the HDF5 library correctly after installation.
-
Clone the repository:
git clone https://github.com/arkrishn94/pq-fastscan.git cd pq-fastscan -
Build the project:
cargo build --release
-
Ensure you have the test data in
.hdf5format with the following keys:train: Data to be indexed and used for trainingtest: Queries to be searched. Make sure the data is in float32 format.
-
Run the code:
cargo run --release -- <file_name> <num_PQ_sections> <top_k>
Replace <file_name> with the path to your data file, <num_PQ_sections> with the number of PQ sections, and <top_k> with the number of top results to retrieve.
This will execute the code in [main.rs] and perform the vector quantization and search operations.
To run PQ-FastScan, you can use an example HDF5 file from the ann-benchmarks repository. Here is an example command:
-
Download an example HDF5 file:
wget https://github.com/erikbern/ann-benchmarks/raw/master/data/glove-100-angular.hdf5 -P data/
-
Run PQ-FastScan with the downloaded file:
cargo run --release -- data/glove-100-angular.hdf5 25 10
In this example:
data/glove-100-angular.hdf5is the path to the HDF5 file.25is the number of PQ sections.10is the number of top results to retrieve.
For more example datasets, you can visit the ann-benchmarks repository.