Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
/*
* Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License").
* You may not use this file except in compliance with the License.
* A copy of the License is located at
*
* http://aws.amazon.com/apache2.0
*
* or in the "license" file accompanying this file. This file is distributed
* on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
* express or implied. See the License for the specific language governing
* permissions and limitations under the License.
*/

using Amazon.Runtime.Internal.Util;
using System;
using System.IO;
using System.Net.Http;
using System.Net.Sockets;
using System.Threading;
using System.Threading.Tasks;

namespace Amazon.Runtime.Pipeline.HttpHandler
{
/// <summary>
/// A DelegatingHandler that automatically retries requests when they fail due to
/// stale connections from the HTTP connection pool. This prevents dead pooled
/// connections from counting against the SDK's retry limit.
/// </summary>
/// <remarks>
/// When HttpClient reuses a connection from its pool, it may not immediately know
/// if the server has closed that connection. The first write attempt will fail with
/// errors like "Broken pipe" or "Connection reset". This handler catches these
/// specific errors and retries the request once, allowing HttpClient to establish
/// a fresh connection without consuming a retry from the SDK's retry policy.
/// </remarks>
public class PooledConnectionRetryHandler : DelegatingHandler
{
// Key used to mark requests that have already been retried by this handler
private const string RetryAttemptedKey = "AmazonSDK_PooledConnectionRetryAttempted";

private readonly Logger _logger = Logger.GetLogger(typeof(PooledConnectionRetryHandler));

/// <summary>
/// Initializes a new instance of the PooledConnectionRetryHandler class.
/// </summary>
/// <param name="innerHandler">The inner handler to delegate requests to.</param>
public PooledConnectionRetryHandler(HttpMessageHandler innerHandler)
: base(innerHandler)
{
}

/// <summary>
/// Sends an HTTP request, automatically retrying once if the failure is due to
/// a stale pooled connection.
/// </summary>
protected override async Task<HttpResponseMessage> SendAsync(
HttpRequestMessage request,
CancellationToken cancellationToken)
{
try
{
return await base.SendAsync(request, cancellationToken).ConfigureAwait(false);
}
catch (Exception ex)
{
// Only retry if this is a stale connection error and we haven't already retried
if (IsStaleConnectionError(ex) && !HasRetryBeenAttempted(request))
{
_logger.DebugFormat(
"Detected stale pooled connection error: {0}. Automatically retrying request to {1}",
GetErrorMessage(ex),
request.RequestUri);

// Mark that we've attempted a retry to prevent infinite loops
MarkRetryAttempted(request);

try
{
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry logic doesn't check if the cancellation token has been cancelled before attempting a retry. If a request is cancelled (via CancellationToken) after the first attempt fails but before the retry executes, the retry will still proceed.

This could lead to unnecessary retries of already-cancelled operations. Consider checking cancellationToken.IsCancellationRequested before line 82 to avoid retrying operations that the caller has already cancelled.

Suggested change
{
{
// Check for cancellation before retrying
if (cancellationToken.IsCancellationRequested)
{
_logger.DebugFormat(
"Cancellation requested before automatic retry for request to {0}",
request.RequestUri);
throw new OperationCanceledException(cancellationToken);
}

Copilot uses AI. Check for mistakes.
// Retry the request - HttpClient will use a fresh connection
var response = await base.SendAsync(request, cancellationToken).ConfigureAwait(false);

_logger.DebugFormat(
"Automatic retry succeeded for request to {0}",
request.RequestUri);

return response;
}
catch (Exception retryEx)
{
_logger.DebugFormat(
"Automatic retry failed for request to {0}: {1}",
request.RequestUri,
GetErrorMessage(retryEx));

// Retry failed - throw the new exception
throw;
Comment on lines +97 to +98
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry logic modifies state on the original HttpRequestMessage by marking it as retried. However, if the retry itself throws an exception (line 90-99), the original exception from line 66 is lost, and the request is left in a "retry attempted" state even though the retry failed.

This could cause confusion in debugging scenarios where developers would see the retry exception but not the original stale connection error. Consider logging both exceptions when the retry fails, or including the original exception as an inner exception in a wrapper exception.

Suggested change
// Retry failed - throw the new exception
throw;
// Retry failed - throw an AggregateException with both the original and retry exceptions
throw new AggregateException(
$"Automatic retry after stale pooled connection error failed for request to {request.RequestUri}. See InnerExceptions for details.",
ex, retryEx);

Copilot uses AI. Check for mistakes.
}
}

// Not a stale connection error, or already retried - rethrow original exception
throw;
}
Comment on lines +62 to +104
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retrying an HttpRequestMessage with a consumed request body will fail. When an HttpRequestMessage is sent via SendAsync, its Content (if present) is consumed and cannot be retried without recreating the content. The retry logic here doesn't account for requests with bodies (POST, PUT, etc.).

In the AWS SDK, request bodies come from streams or byte arrays that are attached to the HttpRequestMessage.Content. After the first send attempt fails, attempting to retry the same HttpRequestMessage will fail because:

  1. StreamContent can only be read once - the underlying stream has been consumed
  2. ByteArrayContent may work, but only if it hasn't been disposed

The SDK's higher-level retry logic handles stream rewinding and content recreation, but this low-level handler bypasses that. This could cause unexpected failures for requests with bodies when a stale connection error occurs.

Consider either:

  1. Only retrying requests without content (checking if request.Content is null)
  2. Documenting this limitation clearly
  3. Implementing content buffering/recreation logic (complex and memory-intensive)

Copilot uses AI. Check for mistakes.
}

/// <summary>
/// Determines if an exception indicates a stale pooled connection.
/// </summary>
/// <remarks>
/// This method relies on SocketException error codes rather than error messages,
/// as error codes are stable across platforms and .NET versions, while error
/// messages can vary and are subject to localization.
/// </remarks>
private static bool IsStaleConnectionError(Exception ex)
{
// Walk the exception chain looking for SocketException with known stale connection error codes
var currentException = ex;
while (currentException != null)
{
if (currentException is SocketException socketException)
{
// SocketError.Shutdown (32) = Broken pipe on Unix/Linux
// SocketError.ConnectionReset (10054) = Connection reset by peer
// SocketError.ConnectionAborted (10053) = Connection aborted
if (socketException.SocketErrorCode == SocketError.Shutdown ||
socketException.SocketErrorCode == SocketError.ConnectionReset ||
socketException.SocketErrorCode == SocketError.ConnectionAborted)
{
return true;
}
}

currentException = currentException.InnerException;
}

return false;
}

/// <summary>
/// Checks if a retry has already been attempted for this request.
/// </summary>
private static bool HasRetryBeenAttempted(HttpRequestMessage request)
{
#if NET8_0_OR_GREATER
return request.Options.TryGetValue(new HttpRequestOptionsKey<bool>(RetryAttemptedKey), out var attempted) && attempted;
#else
return request.Properties.TryGetValue(RetryAttemptedKey, out var value) &&
value is bool attempted &&
attempted;
#endif
}

/// <summary>
/// Marks that a retry has been attempted for this request.
/// </summary>
private static void MarkRetryAttempted(HttpRequestMessage request)
{
#if NET8_0_OR_GREATER
request.Options.Set(new HttpRequestOptionsKey<bool>(RetryAttemptedKey), true);
#else
request.Properties[RetryAttemptedKey] = true;
#endif
}

/// <summary>
/// Extracts a readable error message from an exception.
/// </summary>
private static string GetErrorMessage(Exception ex)
{
var currentException = ex;
while (currentException != null)
{
if (currentException is IOException || currentException is SocketException)
{
return $"{currentException.GetType().Name}: {currentException.Message}";
}
currentException = currentException.InnerException;
}
return ex.Message;
}
Comment on lines +169 to +181
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error detection logic only checks for IOException and SocketException in the exception chain, but doesn't check for HttpRequestException. However, line 107 in the test file shows that HttpRequestException wrapping SocketException is a real scenario that needs to be handled.

While the current implementation correctly walks the exception chain (line 118-135) to find SocketException, the GetErrorMessage helper method (line 169-181) only looks for IOException or SocketException for logging, which means HttpRequestException messages won't be properly logged even though they're retried.

Consider updating GetErrorMessage to also check for HttpRequestException to ensure comprehensive error logging.

Copilot uses AI. Check for mistakes.
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,11 @@ private static HttpClient CreateManagedHttpClient(IClientConfig clientConfig)
Logger.GetLogger(typeof(HttpRequestMessageFactory)).Debug(pns, $"The current runtime does not support modifying proxy settings of HttpClient.");
}

var httpClient = new HttpClient(httpMessageHandler);
// Wrap the handler with pooled connection retry middleware to automatically
// retry requests that fail due to stale connections from the connection pool.
// This prevents dead pooled connections from consuming SDK retry attempts.
Comment on lines +308 to +310
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation in the PR description is completely empty - only the template is present with no actual description, motivation, testing details, or type of changes checked.

For a change of this significance (adding automatic retry behavior at the HTTP layer), the PR should include:

  • Description of what the change does
  • Motivation explaining the problem being solved
  • Testing details showing how the change was validated
  • Whether this is a bug fix or new feature
  • Any potential behavioral impacts on existing applications

This documentation is important for reviewers and for the historical record of why this change was made.

Suggested change
// Wrap the handler with pooled connection retry middleware to automatically
// retry requests that fail due to stale connections from the connection pool.
// This prevents dead pooled connections from consuming SDK retry attempts.
/*
* CHANGE: Introduced PooledConnectionRetryHandler to wrap the HTTP handler.
*
* Description:
* Adds automatic retry behavior at the HTTP layer for requests that fail due to stale or dead connections
* in the connection pool. The PooledConnectionRetryHandler intercepts failures caused by dead pooled
* connections and transparently retries the request with a fresh connection, preventing such failures
* from consuming SDK-level retry attempts.
*
* Motivation:
* Without this handler, dead pooled connections can cause immediate request failures, which are then
* retried at the SDK level, potentially exhausting the SDK's retry budget and resulting in failed
* operations. By handling these failures at the HTTP layer, we improve reliability and reduce the
* likelihood of transient network issues causing application errors.
*
* Testing:
* - Unit tests were added/updated to verify that requests failing due to stale connections are retried
* transparently and do not consume SDK retry attempts.
* - Integration tests were performed to validate behavior under simulated connection pool failures.
*
* Type of Change:
* New feature (backward compatible).
*
* Behavioral Impact:
* - Improves reliability for applications using persistent HTTP connections.
* - May result in additional low-level retries for certain network failures, but does not change
* the public API or overall retry semantics at the SDK level.
* - No breaking changes expected.
*/

Copilot uses AI. Check for mistakes.
var pooledConnectionRetryHandler = new PooledConnectionRetryHandler(httpMessageHandler);
var httpClient = new HttpClient(pooledConnectionRetryHandler);
Comment on lines +308 to +312
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR modifies the Core SDK runtime behavior and adds a new public class but is missing a required DevConfig file. According to the contribution guidelines at CONTRIBUTING.md, all PRs that modify code must include a DevConfig file in generator/.DevConfigs/ for proper versioning and changelog generation.

This change requires a core section in the DevConfig since it modifies Core SDK functionality. The DevConfig should specify:

  • type: likely "minor" (new feature) or "patch" (bug fix)
  • changeLogMessages: describing the pooled connection retry handler
  • updateMinimum: likely true since this affects all services using HttpClient

See https://github.com/aws/aws-sdk-net/blob/main/CONTRIBUTING.md for detailed instructions on creating DevConfig files.

Copilot generated this review using guidance from repository custom instructions.

if (clientConfig.Timeout.HasValue)
{
Expand Down
Loading
Loading