feat: Added a batch completed callback to the data client mutations batcher#1308
feat: Added a batch completed callback to the data client mutations batcher#1308gkevinzheng wants to merge 1 commit intov3_stagingfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the Bigtable data client's mutations batcher by providing a mechanism for users to register a callback function. This function will be executed upon the completion of a mutation batch, receiving a detailed list of Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature by adding a batch completion callback to the MutationsBatcher. While the implementation is generally clean and the refactoring of exception-to-status conversion logic into _populate_statuses_from_mutations_exception_group improves modularity, a critical security vulnerability has been identified. The _populate_statuses_from_mutations_exception_group function in google/cloud/bigtable/data/_helpers.py performs unvalidated list index access using error.index, which can lead to application crashes (Denial of Service) if the index is out of bounds or None. It is recommended to add bounds and type checking for error.index before using it to access the statuses list. Additionally, a minor issue was noted in the new unit tests where assertions are written as ineffective statements.
| for error in exc_group.exceptions: | ||
| cause = error.__cause__ | ||
| if isinstance(cause, RetryExceptionGroup): | ||
| statuses[error.index] = _get_status(cause.exceptions[-1]) | ||
| else: | ||
| statuses[error.index] = _get_status(cause) |
There was a problem hiding this comment.
The helper function _populate_statuses_from_mutations_exception_group uses error.index to access the statuses list without validating that the index is an integer or within the bounds of the list. Since error.index is derived from server responses (via FailedMutationEntryError), a malicious or malfunctioning server could provide an out-of-bounds index, causing an IndexError. Additionally, some parts of the library (e.g., in MutationsBatcher) explicitly set error.index to None, which would cause a TypeError if the helper is called on such an exception. This can lead to a Denial of Service (crash) of the client application.
| for error in exc_group.exceptions: | |
| cause = error.__cause__ | |
| if isinstance(cause, RetryExceptionGroup): | |
| statuses[error.index] = _get_status(cause.exceptions[-1]) | |
| else: | |
| statuses[error.index] = _get_status(cause) | |
| for error in exc_group.exceptions: | |
| if isinstance(error.index, int) and 0 <= error.index < len(statuses): | |
| cause = error.__cause__ | |
| if isinstance(cause, RetryExceptionGroup): | |
| statuses[error.index] = _get_status(cause.exceptions[-1]) | |
| else: | |
| statuses[error.index] = _get_status(cause) |
| kwargs["operation_timeout"] == 17 | ||
| kwargs["attempt_timeout"] == 13 |
There was a problem hiding this comment.
These lines are intended to be assertions, but they are currently ineffective statements. They should be updated to use assert to correctly validate the timeout values.
| kwargs["operation_timeout"] == 17 | |
| kwargs["attempt_timeout"] == 13 | |
| assert kwargs["operation_timeout"] == 17 | |
| assert kwargs["attempt_timeout"] == 13 | |
| kwargs["operation_timeout"] == 17 | ||
| kwargs["attempt_timeout"] == 13 |
There was a problem hiding this comment.
These lines appear to be intended as assertions but are currently written as ineffective statements. To ensure the timeout values are correctly validated in the test, they should be converted to assert statements.
| kwargs["operation_timeout"] == 17 | |
| kwargs["attempt_timeout"] == 13 | |
| assert kwargs["operation_timeout"] == 17 | |
| assert kwargs["attempt_timeout"] == 13 | |
Changes made:
Table.mutate_rowsfrom producing a list ofStatusprotos from aMutationsExceptionGroup