Skip to content

[FLINK-AGENTS-524] Add Amazon OpenSearch and S3 Vectors vector store integrations#533

Open
avichaym wants to merge 2 commits intoapache:mainfrom
avichaym:feature/aws-vector-stores
Open

[FLINK-AGENTS-524] Add Amazon OpenSearch and S3 Vectors vector store integrations#533
avichaym wants to merge 2 commits intoapache:mainfrom
avichaym:feature/aws-vector-stores

Conversation

@avichaym
Copy link

@avichaym avichaym commented Feb 10, 2026

Linked issue: #524

Depends on #534 — please merge that first.

Purpose of change

Add Amazon OpenSearch and S3 Vectors as vector store providers.

  • OpenSearchVectorStore — Supports Serverless (AOSS) and Service domains, IAM/basic auth, implements CollectionManageableVectorStore for Long-Term Memory, KNN search with filter support, chunked bulk writes
  • S3VectorsVectorStore — S3 Vectors SDK, PutVectors chunked at 500 (API limit)

Both override add() for batch embedding optimization.

New modules: integrations/vector-stores/opensearch/, integrations/vector-stores/s3vectors/

Tests

  • Unit tests: OpenSearch (4), S3 Vectors (2)
  • Integration tests gated by env vars (OPENSEARCH_ENDPOINT, S3V_BUCKET): collection CRUD, document CRUD, filtered query
  • End-to-end validated with RAG and Long-Term Memory demos against real OpenSearch domain and S3 Vectors bucket

API

No public API changes. New integration modules only.

Documentation

  • doc-needed
  • doc-not-needed
  • doc-included

@github-actions github-actions bot added priority/major Default priority of the PR or issue. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. labels Feb 10, 2026
@avichaym avichaym changed the title [FLINK-AGENTS-523] Add Amazon Bedrock chat model and embedding model integrations [FLINK-AGENTS-524] Add Amazon OpenSearch and S3 Vectors vector store integrations Feb 10, 2026
@avichaym avichaym force-pushed the feature/aws-vector-stores branch from 56b9836 to 9f4f768 Compare February 10, 2026 16:37
@github-actions github-actions bot added the doc-label-missing The Bot applies this label either because none or multiple labels were provided. label Feb 10, 2026
@github-actions
Copy link

@avichaym Please add the following content to your PR description and select a checkbox:

- [ ] `doc-needed` 
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-included` 

@github-actions github-actions bot added doc-not-needed Your PR changes do not impact docs and removed doc-label-missing The Bot applies this label either because none or multiple labels were provided. labels Feb 10, 2026
@wenjin272 wenjin272 added doc-needed Your PR changes impact docs. and removed doc-not-needed Your PR changes do not impact docs labels Feb 12, 2026
@avichaym avichaym force-pushed the feature/aws-vector-stores branch 2 times, most recently from c492b13 to dd77d50 Compare February 19, 2026 09:08
Copy link
Collaborator

@wenjin272 wenjin272 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @avichaym, thanks for your work. LGTM, just a few minor comments.

* Batch-embeds all documents in a single call, then delegates to addEmbedding.
*
* <p>TODO: This batch embedding logic is duplicated in S3VectorsVectorStore. Consider
* extracting to BaseVectorStore in a follow-up (would also benefit ElasticsearchVectorStore).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for implementing this batch embedding logic in BaseVectorStore directly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Will submit as a follow-up PR.


this.index = descriptor.getArgument("index");
if (this.index == null || this.index.isBlank()) {
throw new IllegalArgumentException("index is required for OpenSearchVectorStore");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could index be null but indicate index in each operation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Made index optional — when null, operations use the collection parameter passed at call time. This supports use cases where the index is determined per-operation (e.g., Long-Term Memory
collection management)

@Nullable List<String> ids, @Nullable String collection, Map<String, Object> extraArgs)
throws IOException {
if (ids == null || ids.isEmpty()) {
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current design, if ids is not provided, vector store should get/delete all the documents in the collection. The behavior of s3vectors is inconsistent, we need throw exception when ids is not provided for s3vectors or emphasize this point in the documentation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed get() and delete() to throw UnsupportedOperationException when ids is null. S3 Vectors does have a ListVectors API that could support this, but it's paginated (max 1000/page) and a full list+delete-all could be expensive on large indexes.
I think throwing is the safer default, but can also implement in this or a follow-up .

body.put("size", ids.size());
return parseHits(executeRequest("POST", "/" + idx + "/_search", body.toString()));
}
int limit = 10000;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a static variable is better?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Extracted to private static final int DEFAULT_GET_LIMIT = 10000

this.client =
S3VectorsClient.builder()
.region(Region.of(regionStr != null ? regionStr : "us-east-1"))
.credentialsProvider(DefaultCredentialsProvider.create())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in both OpenSearchVectorStore and S3VectorsVectorStore.

@avichaym avichaym force-pushed the feature/aws-vector-stores branch from 9c6d61e to 900d687 Compare March 2, 2026 12:30
Copy link
Collaborator

@wenjin272 wenjin272 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments. Overall looks good to me, just some comments about the retry parameters. Besides, there are some code-style questions.


this.retryExecutor =
RetryExecutor.builder()
.maxRetries(5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be configuable?


this.retryExecutor =
RetryExecutor.builder()
.maxRetries(5)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@avichaym avichaym force-pushed the feature/aws-vector-stores branch from 900d687 to 8242655 Compare March 3, 2026 07:37
Avichay Marciano added 2 commits March 3, 2026 09:54
…integrations

- BedrockChatModelConnection: Converse API with native tool calling, SigV4 auth
- BedrockChatModelSetup: model, temperature, max_tokens configuration
- BedrockEmbeddingModelConnection: Titan Text Embeddings V2 with parallel batch embedding
- BedrockEmbeddingModelSetup: model, dimensions configuration
- Retry via unified RetryExecutor with Bedrock-specific retryable predicate
- stripMarkdownFences for non-tool-call responses (lossless fence stripping only)
…integrations

- OpenSearchVectorStore: supports Serverless (AOSS) and Service domains, IAM and basic auth, SigV4 signing, bulk indexing, collection management for long-term memory
- S3VectorsVectorStore: PutVectors/QueryVectors/GetVectors/DeleteVectors, batched puts (500/request limit)
- Retry via unified RetryExecutor with service-specific retryable predicates
@avichaym avichaym force-pushed the feature/aws-vector-stores branch from 8242655 to 6e10664 Compare March 3, 2026 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-needed Your PR changes impact docs. fixVersion/0.3.0 The feature or bug should be implemented/fixed in the 0.3.0 version. priority/major Default priority of the PR or issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants