-
Notifications
You must be signed in to change notification settings - Fork 541
fix(segment): initialize memory components when loading collection from disk #232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -4224,6 +4224,64 @@ TEST_F(CollectionTest, CornerCase_CreateAndOpen) { | |||||||||||
| } | ||||||||||||
| } | ||||||||||||
|
|
||||||||||||
| TEST_F(CollectionTest, Bug227_VectorSearchAfterReopen) { | ||||||||||||
| // Regression test for GitHub issue #227: | ||||||||||||
| // "load from file, using vector to search failed" | ||||||||||||
| // | ||||||||||||
| // This test verifies that vector search works correctly after closing | ||||||||||||
| // and reopening a collection. The bug was that memory vector indexers | ||||||||||||
| // were not initialized when loading from disk, causing search to return | ||||||||||||
| // empty results. | ||||||||||||
|
|
||||||||||||
| FileHelper::RemoveDirectory(col_path); | ||||||||||||
|
|
||||||||||||
| // Create schema with vector fields | ||||||||||||
| auto schema = TestHelper::CreateNormalSchema(); | ||||||||||||
| auto options = CollectionOptions{false, true, 64 * 1024 * 1024}; | ||||||||||||
|
|
||||||||||||
| // Create collection and insert documents | ||||||||||||
| auto collection = TestHelper::CreateCollectionWithDoc( | ||||||||||||
| col_path, *schema, options, 0, 10, false); | ||||||||||||
| ASSERT_NE(collection, nullptr); | ||||||||||||
|
|
||||||||||||
| // Verify documents are searchable before closing | ||||||||||||
| VectorQuery query; | ||||||||||||
| query.field_name_ = "dense_fp32"; | ||||||||||||
| query.query_vector_ = std::vector<float>{0.1f, 0.2f, 0.3f, 0.4f, 0.5f, | ||||||||||||
| 0.6f, 0.7f, 0.8f, 0.9f, 1.0f, | ||||||||||||
| 1.1f, 1.2f, 1.3f, 1.4f, 1.5f, | ||||||||||||
| 1.6f, 1.7f, 1.8f, 1.9f, 2.0f}; | ||||||||||||
|
Comment on lines
+4250
to
+4253
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The test constructs a query vector with 20 elements, but the Depending on whether dimension validation is enforced in the FLAT indexer's search path:
The query vector should be 128 floats:
Suggested change
(or any representative 128-element pattern — the exact values are not important for this regression test). |
||||||||||||
| query.topk_ = 5; | ||||||||||||
|
|
||||||||||||
| auto result_before = collection->Query(query); | ||||||||||||
| ASSERT_TRUE(result_before.has_value()) | ||||||||||||
| << "Query failed before close: " << result_before.error().message(); | ||||||||||||
| ASSERT_EQ(result_before.value().size(), 5) | ||||||||||||
| << "Expected 5 results before close"; | ||||||||||||
|
|
||||||||||||
| // Close collection | ||||||||||||
| collection.reset(); | ||||||||||||
|
|
||||||||||||
| // Reopen collection from disk | ||||||||||||
| auto reopen_result = Collection::Open(col_path, options); | ||||||||||||
| ASSERT_TRUE(reopen_result.has_value()) | ||||||||||||
| << "Failed to reopen collection: " << reopen_result.error().message(); | ||||||||||||
| collection = std::move(reopen_result.value()); | ||||||||||||
|
|
||||||||||||
| // Verify documents are still searchable after reopening | ||||||||||||
| // This was failing before the fix - it would return 0 results | ||||||||||||
| auto result_after = collection->Query(query); | ||||||||||||
| ASSERT_TRUE(result_after.has_value()) | ||||||||||||
| << "Query failed after reopen: " << result_after.error().message(); | ||||||||||||
| ASSERT_EQ(result_after.value().size(), 5) | ||||||||||||
| << "Expected 5 results after reopen, got " << result_after.value().size() | ||||||||||||
| << ". This indicates the vector index was not properly loaded."; | ||||||||||||
|
|
||||||||||||
| // Verify stats are consistent | ||||||||||||
| auto stats = collection->Stats().value(); | ||||||||||||
| ASSERT_EQ(stats.doc_count, 10) << "Document count mismatch after reopen"; | ||||||||||||
| } | ||||||||||||
|
|
||||||||||||
| TEST_F(CollectionTest, CornerCase_CreateIndex) { | ||||||||||||
| auto schema = TestHelper::CreateNormalSchema(); | ||||||||||||
| auto options = CollectionOptions{false, true, 64 * 1024 * 1024}; | ||||||||||||
|
|
||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
init_memory_components()deletes an existing forward-block file duringOpen()When called from
Open()(the new code path),init_memory_components()unconditionally removes any pre-existing file atmem_path(lines 3977–3982), logging it as a "possible crash residue". This is safe for the post-flush normal reopen scenario (the new writing block ID has never been written to disk), but could silently destroy data if:set_writing_forward_blockbut beforememory_store_->Open()returns, and the partially-created file already exists.Open()completes.A comment here clarifying that this deletion is intentional and that the file, if it exists at all, can only be incomplete crash residue (because the block ID is freshly allocated) would make the intent explicit and help future readers avoid introducing a bug by calling
init_memory_components()in a context where the file might legitimately exist.