Adding GraphParams to be able to save graph parameters of index to SavedParams#786
Adding GraphParams to be able to save graph parameters of index to SavedParams#786
Conversation
… vector type to SavedParams
There was a problem hiding this comment.
Pull request overview
This PR adds GraphParams struct to persist graph configuration parameters (l_build, alpha, backedge_ratio, vector_dtype) alongside the BfTreeProvider index. The changes enable saving and loading these parameters as part of the index's SavedParams, allowing the DiskANNIndex configuration to be reconstructed on load.
Changes:
- Introduced
GraphParamsstruct with fields for l_build, alpha, backedge_ratio, and vector_dtype - Added optional
graph_paramsfield toBfTreeProvider,BfTreeProviderParameters, andSavedParams - Updated save/load implementations to persist and restore
graph_params - Updated all tests and documentation examples to set
graph_params: None
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| diskann-providers/src/model/graph/provider/async_/bf_tree/provider.rs | Added GraphParams struct, integrated graph_params field into BfTreeProvider/BfTreeProviderParameters/SavedParams, updated save/load logic, updated all doc examples and tests |
| diskann-providers/src/model/graph/provider/async_/bf_tree/mod.rs | Exported GraphParams in public API |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #786 +/- ##
==========================================
- Coverage 89.00% 89.00% -0.01%
==========================================
Files 428 431 +3
Lines 78417 78455 +38
==========================================
+ Hits 69795 69828 +33
- Misses 8622 8627 +5
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
| pub l_build: usize, | ||
| pub alpha: f32, | ||
| pub backedge_ratio: f32, | ||
| pub vector_dtype: String, |
There was a problem hiding this comment.
Is this the data type as in f32/f16 etc.? It's probably better for it to be an enum instead of a raw string.
There was a problem hiding this comment.
Good point. I changed the line to pub vector_dtype: VectorDtype, and added code
/// The element type of the full-precision vectors stored in the index.
#[derive(Serialize, Deserialize, Clone, Debug, PartialEq, Eq)]
#[serde(rename_all = "lowercase")]
pub enum VectorDtype {
F32,
F16,
U8,
I8,
}
impl VectorDtype {
/// Derive the `VectorDtype` from a concrete [`VectorRepr`] type parameter.
#[allow(clippy::panic)]
pub fn from_type<T: VectorRepr>() -> Self {
let name = std::any::type_name::<T>();
match name {
"f32" => Self::F32,
"half::f16" | "f16" => Self::F16,
"i8" => Self::I8,
"u8" => Self::U8,
_ => panic!("unsupported VectorRepr type: {}", name),
}
}
}
There was a problem hiding this comment.
mmmm, this isn't the best approach. First std::any::type_name is not a stable representation and can change at any time. But more-so, almost any use of std::any::type_name or std::any::TypeId in patterns like this is almost always a bad idea. Nothing is preventing passing something like MinMax to this type and you only discover that it doesn't work at runtime.
Instead, I'd suggest something like
trait AsVectorDtype {
const DATA_TYPE: VectorDtype;
}and use that trait. Misuses will be caught at compile time, it's guaranteed not to break, and is much friendlier on the compiler to optimize.
There was a problem hiding this comment.
Thank you for the suggestion and for describing the alternative! I added the following code:
pub trait AsVectorDtype {
const DATA_TYPE: VectorDtype;
}
impl AsVectorDtype for f32 {
const DATA_TYPE: VectorDtype = VectorDtype::F32;
}
impl AsVectorDtype for half::f16 {
const DATA_TYPE: VectorDtype = VectorDtype::F16;
}
impl AsVectorDtype for i8 {
const DATA_TYPE: VectorDtype = VectorDtype::I8;
}
impl AsVectorDtype for u8 {
const DATA_TYPE: VectorDtype = VectorDtype::U8;
}
Is this what you meant by your comment?
This PR addresses the following issue:
We want to save alpha, l_build, backedge_ratio and vector_dtype somewhere and the best place to do it (in my opinion) is SavedParams.
For that we need to save GraphParams in BfTreeProviderParameters and in BfTreeProvider. This is what this PR does.