[FLINK-AGENTS-524] Add Amazon OpenSearch and S3 Vectors vector store integrations#533
[FLINK-AGENTS-524] Add Amazon OpenSearch and S3 Vectors vector store integrations#533avichaym wants to merge 2 commits intoapache:mainfrom
Conversation
56b9836 to
9f4f768
Compare
|
@avichaym Please add the following content to your PR description and select a checkbox: |
c492b13 to
dd77d50
Compare
| * Batch-embeds all documents in a single call, then delegates to addEmbedding. | ||
| * | ||
| * <p>TODO: This batch embedding logic is duplicated in S3VectorsVectorStore. Consider | ||
| * extracting to BaseVectorStore in a follow-up (would also benefit ElasticsearchVectorStore). |
There was a problem hiding this comment.
+1 for implementing this batch embedding logic in BaseVectorStore directly.
There was a problem hiding this comment.
Agreed. Will submit as a follow-up PR.
|
|
||
| this.index = descriptor.getArgument("index"); | ||
| if (this.index == null || this.index.isBlank()) { | ||
| throw new IllegalArgumentException("index is required for OpenSearchVectorStore"); |
There was a problem hiding this comment.
Could index be null but indicate index in each operation?
There was a problem hiding this comment.
Good point. Made index optional — when null, operations use the collection parameter passed at call time. This supports use cases where the index is determined per-operation (e.g., Long-Term Memory
collection management)
| @Nullable List<String> ids, @Nullable String collection, Map<String, Object> extraArgs) | ||
| throws IOException { | ||
| if (ids == null || ids.isEmpty()) { | ||
| return; |
There was a problem hiding this comment.
In current design, if ids is not provided, vector store should get/delete all the documents in the collection. The behavior of s3vectors is inconsistent, we need throw exception when ids is not provided for s3vectors or emphasize this point in the documentation.
There was a problem hiding this comment.
Fixed. Changed get() and delete() to throw UnsupportedOperationException when ids is null. S3 Vectors does have a ListVectors API that could support this, but it's paginated (max 1000/page) and a full list+delete-all could be expensive on large indexes.
I think throwing is the safer default, but can also implement in this or a follow-up .
| body.put("size", ids.size()); | ||
| return parseHits(executeRequest("POST", "/" + idx + "/_search", body.toString())); | ||
| } | ||
| int limit = 10000; |
There was a problem hiding this comment.
Maybe a static variable is better?
There was a problem hiding this comment.
Fixed. Extracted to private static final int DEFAULT_GET_LIMIT = 10000
| this.client = | ||
| S3VectorsClient.builder() | ||
| .region(Region.of(regionStr != null ? regionStr : "us-east-1")) | ||
| .credentialsProvider(DefaultCredentialsProvider.create()) |
There was a problem hiding this comment.
According to https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/auth/credentials/DefaultCredentialsProvider.html#create(), create is deprecated, builder().build() is better.
There was a problem hiding this comment.
Fixed in both OpenSearchVectorStore and S3VectorsVectorStore.
9c6d61e to
900d687
Compare
wenjin272
left a comment
There was a problem hiding this comment.
Thanks for addressing my comments. Overall looks good to me, just some comments about the retry parameters. Besides, there are some code-style questions.
|
|
||
| this.retryExecutor = | ||
| RetryExecutor.builder() | ||
| .maxRetries(5) |
There was a problem hiding this comment.
Should this be configuable?
|
|
||
| this.retryExecutor = | ||
| RetryExecutor.builder() | ||
| .maxRetries(5) |
900d687 to
8242655
Compare
…integrations - BedrockChatModelConnection: Converse API with native tool calling, SigV4 auth - BedrockChatModelSetup: model, temperature, max_tokens configuration - BedrockEmbeddingModelConnection: Titan Text Embeddings V2 with parallel batch embedding - BedrockEmbeddingModelSetup: model, dimensions configuration - Retry via unified RetryExecutor with Bedrock-specific retryable predicate - stripMarkdownFences for non-tool-call responses (lossless fence stripping only)
…integrations - OpenSearchVectorStore: supports Serverless (AOSS) and Service domains, IAM and basic auth, SigV4 signing, bulk indexing, collection management for long-term memory - S3VectorsVectorStore: PutVectors/QueryVectors/GetVectors/DeleteVectors, batched puts (500/request limit) - Retry via unified RetryExecutor with service-specific retryable predicates
8242655 to
6e10664
Compare
Linked issue: #524
Depends on #534 — please merge that first.
Purpose of change
Add Amazon OpenSearch and S3 Vectors as vector store providers.
OpenSearchVectorStore— Supports Serverless (AOSS) and Service domains, IAM/basic auth, implementsCollectionManageableVectorStorefor Long-Term Memory, KNN search with filter support, chunked bulk writesS3VectorsVectorStore— S3 Vectors SDK, PutVectors chunked at 500 (API limit)Both override
add()for batch embedding optimization.New modules:
integrations/vector-stores/opensearch/,integrations/vector-stores/s3vectors/Tests
OPENSEARCH_ENDPOINT,S3V_BUCKET): collection CRUD, document CRUD, filtered queryAPI
No public API changes. New integration modules only.
Documentation
doc-neededdoc-not-neededdoc-included