docs: Add MongoDB data-source and offline-store reference documentation#6351
Conversation
Signed-off-by: jvincent-mongodb <jeffrey.vincent@mongodb.com>
Signed-off-by: jvincent-mongodb <jeffrey.vincent@mongodb.com>
…tion Signed-off-by: jvincent-mongodb <jeffrey.vincent@mongodb.com>
Signed-off-by: jvincent-mongodb <jeffrey.vincent@mongodb.com>
Signed-off-by: jvincent-mongodb <jeffrey.vincent@mongodb.com>
Signed-off-by: jvincent-mongodb <jeffrey.vincent@mongodb.com>
a2d2ca9 to
3eab42a
Compare
Signed-off-by: jvincent-mongodb <jeffrey.vincent@mongodb.com>
|
@ntkathole - This is ready for your review/merge. TIA! |
| Use `retrieve_online_documents_v2()` to perform similarity search: | ||
|
|
||
| ```python | ||
| results = FeatureStore.store.retrieve_online_documents_v2( |
There was a problem hiding this comment.
Fine for example, but user copying same block might face issue.
store = FeatureStore(repo_path=".")
results = store.retrieve_online_documents_v2(...)
|
|
||
| ## Key Optimizations | ||
|
|
||
| * **K-collapse**: Multiple FeatureViews that share the same join keys are queried in a single aggregation using `feature_view: {$in: [...]}`, reducing round trips. |
There was a problem hiding this comment.
The actual get_historical_features implementation loops over each proj_name in fv_to_features and issues separate MongoDB aggregation pipelines per feature view.
K-collapse can be enhancements?
There was a problem hiding this comment.
You're right. K-collapse was removed after the complication around projection keys was discovered. Two projections, same feature view. Thank you for pointing this out. I may have to change docstrings too.
|
@jvincent-mongodb Also want to include Data Source and offline store in |
Signed-off-by: jvincent-mongodb <jeffrey.vincent@mongodb.com>
|
Hi @ntkathole! Could you please take another look at this? |
|
@jvincent-mongodb Let's update |
Signed-off-by: jvincent-mongodb <jeffrey.vincent@mongodb.com>
|
|
||
| ## 📦 Functionality and Roadmap | ||
|
|
||
| The list below contains the functionality that contributors are planning to develop for Feast. |
There was a problem hiding this comment.
@ntkathole This is a strange line. Isn't the list below functionality that contributors HAVE developed for Feast?
|
|
||
| The MongoDB online store supports [Atlas Vector Search](https://www.mongodb.com/docs/atlas/atlas-vector-search/), enabling similarity search over feature embeddings stored in MongoDB Atlas. This is powered by the `$vectorSearch` aggregation stage and requires MongoDB Atlas (or the `mongodb/mongodb-atlas-local` Docker image for local development). | ||
|
|
||
| See [PR #6344](https://github.com/feast-dev/feast/pull/6344) for full implementation details. |
There was a problem hiding this comment.
Should this line be included? It links back to code..
| provider: local | ||
| online_store: | ||
| type: mongodb | ||
| connection_string: mongodb+srv://<user>:<pass>@cluster.mongodb.net |
There was a problem hiding this comment.
You'll see in CI, but this may need # pragma: allowlist secret too.
| top_k=5, | ||
| ) | ||
|
|
||
| # Each result is a (event_timestamp, entity_key_proto, feature_dict) tuple. |
There was a problem hiding this comment.
The result of FeatureStore's version is OnlineResponse. I'd just remove these lines.
Given the change, I suggest reviewing the comments below. They are about the implementation in MongoDB, which IS what they want to hear about (how it works), just not specific to the function above.
| The MongoDB offline store provides support for reading [MongoDBSource](../data-sources/mongodb.md). | ||
| * Uses a single shared collection with a compound index for all FeatureViews, distinguished by a `feature_view` discriminator field. | ||
| * Entity dataframes can be provided as a Pandas dataframe. The offline store converts entity identifiers into serialized entity keys for efficient lookup against the collection. | ||
|
|
There was a problem hiding this comment.
Skip the description.
What this PR does / why we need it:
Adds reference documentation for the MongoDB contrib data source and offline store integration. This fills the documentation gap for the MongoDB offline store that was added in #6138.
Changes:
docs/reference/data-sources/mongodb.md— documentsMongoDBSource, configuration, supported typesdocs/reference/offline-stores/mongodb.md— documentsMongoDBOfflineStore, data model, retrieval semantics, materialization, and known limitationsdocs/SUMMARY.mdanddocs/reference/data-sources/README.mdto include MongoDB in the navigationWhich issue(s) this PR fixes:
Follows up on #6138 (feat: MongoDB offline store)
Checks
git commit -s)Testing Strategy
This is a docs-only change (no code changes).
Misc
Documentation covers:
feature_historycollection design)