Skip to content

feat: add hnsw-rabitq support#69

Open
egolearner wants to merge 58 commits intomainfrom
feat/rabitq
Open

feat: add hnsw-rabitq support#69
egolearner wants to merge 58 commits intomainfrom
feat/rabitq

Conversation

@egolearner
Copy link
Collaborator

@egolearner egolearner commented Feb 4, 2026

resolve #42

Greptile Summary

This PR adds full HNSW-RaBitQ support — an approximate nearest-neighbor algorithm that combines HNSW graph traversal with RaBitQ binary quantization — across the core algorithm layer, streaming/searching/building interfaces, Python bindings, DB integration, and test coverage. It is a large feature addition (~14,500 lines) that mirrors the structure of the existing HNSW implementation.

Previous review rounds have addressed the most critical issues (VLA portability, null-pointer dereferences in cleanup, silent search_bf_by_p_keys_impl no-op, cluster->mount() return value ignored, and memory-leak TODOs). The remaining concerns are:

  • POSIX-only <strings.h> in rabitq_converter.ccstrncasecmp is not available on MSVC, the same portability class as the VLA issues that were already fixed.
  • Missing topk guard for the ex_bits_ == 0 path in HnswRabitqQueryAlgorithm::search_neighbors — when total_bits=1 is configured every non-visited neighbor is unconditionally pushed into the candidates heap without the topk-worst pruning that guards the ex_bits_ > 0 path, causing unnecessary work and degraded latency for this quantization setting.

Confidence Score: 3/5

  • Safe to merge after addressing the portability issue and the ex_bits_==0 search performance regression.
  • Most of the critical bugs from earlier rounds have been fixed. Two remaining issues — a POSIX header portability problem and a missing candidate-pruning guard in the 1-bit quantization search path — are not runtime-crash-level bugs but should be resolved before merge given the stated goal of cross-platform support and correct performance characteristics for all quantization configurations.
  • src/core/algorithm/hnsw_rabitq/rabitq_converter.cc (POSIX header) and src/core/algorithm/hnsw_rabitq/hnsw_rabitq_query_algorithm.cc (missing candidate guard for ex_bits_==0)

Important Files Changed

Filename Overview
src/core/algorithm/hnsw_rabitq/hnsw_rabitq_query_algorithm.cc Core HNSW-RaBitQ query traversal; when ex_bits_==0 all neighbor nodes are unconditionally added to the candidates heap without topk-based pruning, causing performance degradation in the 1-bit quantization case.
src/core/algorithm/hnsw_rabitq/rabitq_converter.cc POSIX-specific <strings.h> header used for strncasecmp is not available on MSVC; same portability class as the VLA issues already fixed in this PR.
src/core/algorithm/hnsw_rabitq/hnsw_rabitq_builder.cc Implements the offline batch builder for HNSW-RaBitQ; previous issues (null pointer deref in cleanup, dead null-check, rand() thread-safety) have been addressed; logic looks correct.
src/core/algorithm/hnsw_rabitq/hnsw_rabitq_searcher.cc search_bf_by_p_keys_impl has been properly implemented (was previously fully commented out returning 0); all three search paths (ANN, brute-force, p-keys brute-force) look correct.
src/core/algorithm/hnsw_rabitq/rabitq_reformer.cc Pimpl-based wrapper that loads RaBitQ centroids & rotator from storage, handles both L2 and IP metrics correctly; transform_to_entity properly sizes q_to_centroids for both metric types.
src/core/algorithm/hnsw_rabitq/hnsw_rabitq_streamer.cc Online streaming index with proper shared-mutex for concurrent read/write; reformer load/dump logic on open() correctly handles both fresh and existing storage; add paths convert vectors via reformer before insertion.
src/core/algorithm/hnsw_rabitq/hnsw_rabitq_algorithm.cc Graph construction (add_node, select_entry_point, search_neighbors, prune/reverse-update) for the builder context; VLA issues previously flagged have been fixed and code now uses std::vector.
src/core/algorithm/hnsw_rabitq/rabitq_utils.cc Utility function dump_rabitq_centroids writes header, rotated centroids, original centroids, rotator data, and padding to a dumper; CRC computed and appended correctly.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant HnswRabitqBuilder
    participant RabitqConverter
    participant RabitqReformer
    participant HnswRabitqEntity
    participant HnswRabitqAlgorithm

    Caller->>HnswRabitqBuilder: init(meta, params)
    HnswRabitqBuilder->>RabitqConverter: init(meta, params)
    HnswRabitqBuilder->>HnswRabitqEntity: init()
    HnswRabitqBuilder->>HnswRabitqAlgorithm: init()

    Caller->>HnswRabitqBuilder: train(holder)
    HnswRabitqBuilder->>RabitqConverter: train(holder)
    Note over RabitqConverter: KMeans clustering → centroids
    HnswRabitqBuilder->>RabitqConverter: dump(MemoryDumper)
    HnswRabitqBuilder->>RabitqReformer: init() + load(MemoryReadStorage)
    Note over RabitqReformer: Loads centroids + rotator

    Caller->>HnswRabitqBuilder: build(threads, holder)
    loop For each vector
        HnswRabitqBuilder->>RabitqReformer: convert(vec) → quantized_vec
        HnswRabitqBuilder->>HnswRabitqEntity: add_vector(level, key, quantized_vec)
    end
    par Parallel build threads
        HnswRabitqBuilder->>HnswRabitqAlgorithm: add_node(id, level, ctx)
        Note over HnswRabitqAlgorithm: HNSW graph construction
    end

    Caller->>HnswRabitqBuilder: dump(dumper)
    HnswRabitqBuilder->>RabitqConverter: dump(dumper)
    HnswRabitqBuilder->>HnswRabitqEntity: dump(dumper)

    Caller->>HnswRabitqSearcher: load(storage, metric)
    HnswRabitqSearcher->>RabitqReformer: load(storage)
    HnswRabitqSearcher->>HnswRabitqEntity: load(storage)
    HnswRabitqSearcher->>HnswRabitqQueryAlgorithm: create(entity, num_clusters, metric_type)

    Caller->>HnswRabitqSearcher: search_impl(query, qmeta, count, ctx)
    HnswRabitqSearcher->>RabitqReformer: transform_to_entity(query) → QueryEntity
    Note over RabitqReformer: rotate query, quantize, compute q_to_centroids
    HnswRabitqSearcher->>HnswRabitqQueryAlgorithm: search(entity, ctx)
    Note over HnswRabitqQueryAlgorithm: HNSW beam search with RaBitQ distance estimates
Loading

Last reviewed commit: 3fb3261

Greptile also left 2 inline comments on this PR.

@egolearner
Copy link
Collaborator Author

Depends on VectorDB-NTU/RaBitQ-Library#36

@egolearner egolearner marked this pull request as ready for review February 10, 2026 13:31

void HnswRabitqQueryAlgorithm::expand_neighbors_by_group(
TopkHeap &topk, HnswRabitqContext *ctx) const {
// if (!ctx->group_by().is_valid()) {
Copy link
Collaborator

@richyreachy richyreachy Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to return with some message when group's setup?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented group by search, will test it after framework support group by .

continue;
}
} else {
// Candidate cand{ResultRecord(candest.est_dist, candest.low_dist),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to check ex bits here? can it be checked earlier?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's necessary to check ex_bits here.

  1. first get_bin_est use single bit to estimate score
  2. ex_bits > 0
    1. yes. check if need to do full estimate with extra bits
    2. no. Nothing to be done since we already used all the information.


// TODO: check loop type

// if ((!topk.full()) || cur_dist < topk[0].second) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lagacy codes here and elsewhere can be cleaned up.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

query_wrapper.set_g_add(q_to_centroids[cluster_id]);
float est_dist = rabitqlib::split_distance_boosting(
ex_data, ip_func_, query_wrapper, padded_dim_, ex_bits_, res.ip_x0_qr);
float low_dist = est_dist - (res.est_dist - res.low_dist) / (1 << ex_bits_);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can predefine param to convert to multiply manipulation:
double inv_divisor = 1.0 / (1 << ex_bits_);
auto result = (res.est_dist - res.low_dist) * inv_divisor;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_ex_est is not used, deleted.

@egolearner
Copy link
Collaborator Author

@greptile

@egolearner
Copy link
Collaborator Author

@Greptile

@egolearner
Copy link
Collaborator Author

@greptile

@egolearner
Copy link
Collaborator Author

@greptile

// limitations under the License.

#include "rabitq_converter.h"
#include <strings.h>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 POSIX-only <strings.h> header

<strings.h> is a POSIX-specific header (providing strncasecmp) that is not available on MSVC. The VLA issues flagged in earlier review rounds were fixed to ensure MSVC compatibility — this is the same class of problem. On Windows you would need _strnicmp instead.

Consider using a portable helper or a conditional compilation guard:

Suggested change
#include <strings.h>
#include <cstring>

Then replace the strncasecmp calls (lines 86, 88) with a portable wrapper, e.g.:

#ifdef _MSC_VER
#  define strncasecmp _strnicmp
#endif

or simply convert to lowercase first with std::transform.

Comment on lines +163 to +192

for (uint32_t i = 0; i < size; ++i) {
node_id_t node = neighbor_ids[i];
EstimateRecord candest;
auto *cand_vector = entity_.get_vector(node);
ailego_prefetch(cand_vector);
get_bin_est(cand_vector, candest, *query_entity);

if (ex_bits_ > 0) {
// Check preliminary score against current worst full estimate.
bool flag_update_KNNs =
(!topk.full()) || candest.low_dist < topk[0].second.est_dist;

if (flag_update_KNNs) {
// Compute the full estimate if promising.
get_full_est(cand_vector, candest, *query_entity);
} else {
continue;
}
}
candidates.emplace(node, ResultRecord(candest));
// update entry_point for next level scan
if (candest < *dist) {
*entry_point = node;
*dist = candest;
}
if (!filter(node)) {
topk.emplace(node, ResultRecord(candest));
}
} // end for
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing topk guard for ex_bits_ == 0 case

When ex_bits_ > 0 the code correctly gates adding a node to candidates behind a preliminary lower-bound check:

bool flag_update_KNNs =
    (!topk.full()) || candest.low_dist < topk[0].second.est_dist;
if (!flag_update_KNNs) continue;

However, when ex_bits_ == 0 (i.e. total_bits = 1 quantization) that entire block is skipped and every non-visited neighbor is unconditionally inserted into candidates. With ex_bits_ == 0, the binary estimate from get_bin_est is already the best available estimate (low_dist ≈ est_dist), so the same guard can be applied without any change in correctness. Without it, the search degrades toward exhaustive BFS for the 1-bit case, defeating the beam-search pruning of HNSW.

Suggested fix:

for (uint32_t i = 0; i < size; ++i) {
  node_id_t node = neighbor_ids[i];
  EstimateRecord candest;
  auto *cand_vector = entity_.get_vector(node);
  ailego_prefetch(cand_vector);
  get_bin_est(cand_vector, candest, *query_entity);

  if (ex_bits_ > 0) {
    bool flag_update_KNNs =
        (!topk.full()) || candest.low_dist < topk[0].second.est_dist;
    if (flag_update_KNNs) {
      get_full_est(cand_vector, candest, *query_entity);
    } else {
      continue;
    }
  } else {
    // ex_bits_ == 0: est_dist is already the best estimate
    if (topk.full() && candest.est_dist >= topk[0].second.est_dist) {
      continue;
    }
  }

  candidates.emplace(node, ResultRecord(candest));
  ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Integrate RaBitQ Quantization into HNSW Index

6 participants