Skip to content

Handle failed identity resolution gracefully in PolarisEventMetadataFactory#4459

Open
HonahX wants to merge 2 commits into
apache:mainfrom
HonahX:honahx-fix-rate-limiter-error
Open

Handle failed identity resolution gracefully in PolarisEventMetadataFactory#4459
HonahX wants to merge 2 commits into
apache:mainfrom
HonahX:honahx-fix-rate-limiter-error

Conversation

@HonahX
Copy link
Copy Markdown
Contributor

@HonahX HonahX commented May 15, 2026

PolarisEventMetadataFactory.create() calls getDeferredIdentity().subscribeAsCompletionStage().getNow(null) to resolve the current user for event metadata. When called before authentication has completed
(e.g., in pre-authentication filters), this can trigger the auth pipeline, which may fail with CompletionException wrapping an AuthenticationFailedException.

Previously, this exception would propagate to the caller. For example, when the rate limiter filter emits an event before rejecting a request, the leaked
Completion exception gets mapped to 500 instead of the intended 429.

In general, since user itself is an optional field, proceed the event dispatching with a null user instead of failing directly could increase the stability and while we can still see/notice the actual error through debug log.

                                                                                                                                                                                                              ## Changes                                                      
  • PolarisEventMetadataFactory.java: Refactor getUser() to use new getSecurityIdentity() helper with CompletionException handling and debug logging
  • PolarisEventMetadataFactoryTest.java: New unit test covering resolved identity, anonymous identity, not-yet-resolved identity, and failed deferred resolution

Test plan

  • New unit test PolarisEventMetadataFactoryTest passes (4 cases)

Checklist

  • 🛡️ Don't disclose security issues! (contact security@apache.org)
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

…actory

When PolarisEventMetadataFactory.create() is called before authentication
has completed (e.g., in pre-authentication filters), triggering deferred
identity resolution can cause the auth pipeline to fail with a
CompletionException. Previously this exception would propagate to the caller.

For example, when the rate limiter filter emits an event before rejecting a
request to an unauthenticated endpoint like /oauth/tokens, the leaked auth
exception gets mapped to 401 instead of the intended 429.

Fix: extract getSecurityIdentity() helper that catches CompletionException
from deferred identity resolution, returning Optional.empty() instead of
propagating the auth failure. A debug log is emitted for observability.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-project-automation github-project-automation Bot moved this to PRs In Progress in Basic Kanban Board May 15, 2026
@HonahX HonahX marked this pull request as ready for review May 15, 2026 22:39
@HonahX HonahX requested review from adnanhemani and adutra May 15, 2026 22:39
adnanhemani
adnanhemani previously approved these changes May 18, 2026
Copy link
Copy Markdown
Contributor

@adnanhemani adnanhemani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM - agreed with us needing this change. Thanks @HonahX!


@Test
void createReturnsEmptyUserWhenIdentityNotYetResolved() {
// Simulates getNow(null) returning null because the future hasn't completed
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think this is true, but would be nice if someone can confirm that this behavior hasn't changed as part of this code change.

@github-project-automation github-project-automation Bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board May 18, 2026
snazy
snazy previously approved these changes May 18, 2026
Copy link
Copy Markdown
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-read the updated factory and the rate-limiter call path. Catching CompletionException looks a bit odd at first, but given this code intentionally peeks via CompletionStage#getNow(null), that is the wrapper we get when deferred identity resolution has already completed exceptionally. Treating that as an unavailable optional user is reasonable for event metadata, especially from pre-auth filters where metadata creation must not change the response status.

The focused test passes locally, and the PR CI is green.

Copy link
Copy Markdown
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One follow-up after thinking more about the CompletionException boundary: I think this catch should probably unwrap and only suppress the expected auth-unavailable/auth-failed case, not every CompletionException.

CompletionException is just the CompletionStage wrapper. It could also wrap a technical failure from identity resolution. For example, AuthenticatingAugmentor deliberately lets ServiceFailureException bubble so it is not converted into a 401. With the current code, that kind of failure would be logged at debug and treated as an absent optional user.

Could we narrow this to something like AuthenticationFailedException (and any other explicitly expected unauthenticated case), and rethrow the CompletionException otherwise? A test that a non-auth RuntimeException still propagates would make the boundary clearer.

@snazy snazy dismissed their stale review May 18, 2026 09:21

mistake

void createReturnsEmptyUserWhenIdentityNotYetResolved() {
// Simulates getNow(null) returning null because the future hasn't completed
when(currentIdentityAssociation.getDeferredIdentity())
.thenReturn(Uni.createFrom().item(() -> null));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not accurately testing the scenario of a future that isn't completed. You should return instead a Uni that never completes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Uni.createFrom().nothing()

should do the trick. Updated the test to use that!

- Only suppress AuthenticationFailedException in getSecurityIdentity(),
  rethrow CompletionException for non-auth failures (e.g. ServiceFailureException)
- Fix "not-yet-resolved" test to use Uni.createFrom().nothing() (truly incomplete)
- Add realmId/timestamp assertions to anonymous and not-yet-resolved tests
- Add test verifying non-auth exceptions propagate

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@HonahX
Copy link
Copy Markdown
Contributor Author

HonahX commented May 20, 2026

@snazy Thanks for the review! I've updated to suppress AuthenticationFailedException only and re-throw CompletionException otherwise, and also added a test. Please take a look : )

@HonahX HonahX requested review from adutra and snazy May 20, 2026 20:55
.subscribeAsCompletionStage()
.getNow(null));
} catch (CompletionException e) {
if (e.getCause() instanceof AuthenticationFailedException) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a holder approach?

We could start with PolarisPrincipalHolder but alter it to always receive the PolarisPrincipal after authentication completes... that is in AuthenticatingAugmentor.

The normal producer method would assert that the principal is set.

The events factory could deal with PolarisPrincipalHolder and get an optional PolarisPrincipal value via another getter.

This way most of the processing logic does not have to deal with async aspects.

WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants