NE-2418: Add haproxy_max_connections metric by alebedev87 · Pull Request #728 · openshift/router

alebedev87 · 2026-02-06T18:13:12Z

Add a new haproxy_max_connections gauge metric that exposes the process-wide maximum connections configured for HAProxy.

The metric is extracted from the public frontend's "slim" field (field 6) in HAProxy's "show stat" CSV output. Since the router configures both global and defaults sections with the same ROUTER_MAX_CONNECTIONS value, the public frontend's session limit reflects the process-wide maxconn setting.

E2E test: openshift/cluster-ingress-operator#1361.

openshift-ci-robot · 2026-02-06T18:13:16Z

@alebedev87: This pull request references NE-2418 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Add a new haproxy_max_connections gauge metric that exposes the process-wide maximum connections configured for HAProxy.

The metric is extracted from the public frontend's "slim" field (field 6) in HAProxy's "show stat" CSV output. Since the router configures both global and defaults sections with the same ROUTER_MAX_CONNECTIONS value, the public frontend's session limit reflects the process-wide maxconn setting.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Add a new haproxy_max_connections gauge metric that exposes the process-wide maximum connections configured for HAProxy. The metric is extracted from the public frontend's "slim" field (field 6) in HAProxy's "show stat" CSV output. Since the router configures both global and defaults sections with the same ROUTER_MAX_CONNECTIONS value, the public frontend's session limit reflects the process-wide maxconn setting. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

alebedev87 · 2026-02-25T12:30:40Z

The new e2e test from CIO is passing.

/retitle NE-2418: Add haproxy_max_connections metric

openshift-ci-robot · 2026-02-25T12:33:38Z

@alebedev87: This pull request references NE-2418 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Add a new haproxy_max_connections gauge metric that exposes the process-wide maximum connections configured for HAProxy.

The metric is extracted from the public frontend's "slim" field (field 6) in HAProxy's "show stat" CSV output. Since the router configures both global and defaults sections with the same ROUTER_MAX_CONNECTIONS value, the public frontend's session limit reflects the process-wide maxconn setting.

E2E test: openshift/cluster-ingress-operator#1361.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2026-02-25T13:11:27Z

@alebedev87: This pull request references NE-2418 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Add a new haproxy_max_connections gauge metric that exposes the process-wide maximum connections configured for HAProxy.

The metric is extracted from the public frontend's "slim" field (field 6) in HAProxy's "show stat" CSV output. Since the router configures both global and defaults sections with the same ROUTER_MAX_CONNECTIONS value, the public frontend's session limit reflects the process-wide maxconn setting.

E2E test: openshift/cluster-ingress-operator#1361.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jcmoraisjr · 2026-02-25T16:05:52Z

/assign

jcmoraisjr · 2026-02-27T15:11:41Z

+		// The router configures both global and defaults sections with the same ROUTER_MAX_CONNECTIONS value,
+		// so the public frontend's limit (field 6/slim) reflects the process-wide maxconn setting.
+		// NOTE: If the defaults maxconn is ever configured differently from global maxconn,
+		// this approach will no longer accurately represent the process-wide limit.


This metric is already available via haproxy_frontend_current_sessions, its just hidden behind this configuration, its missing 6:

router/pkg/router/metrics/haproxy/haproxy.go

Lines 109 to 111 in d8ed355

// defaultSelectedMetrics is the list of metrics included by default. These metrics are a subset

// of the metrics exposed by haproxy_exporter by default for performance reasons.

var defaultSelectedMetrics = []int{2, 4, 5, 7, 8, 9, 13, 14, 17, 21, 24, 33, 35, 39, 40, 41, 42, 43, 44, 58, 59, 60, 79, 85}

Also, as you pointed this is a frontend metric. The global one is available via show info, and reading global metrics from there is preferable because it not only provides the correct one, but also provides the current global connections, which is the metric to be tracked along with maxconn to alert users about the availability of their connection limits.

This metric is already available via haproxy_frontend_current_sessions, its just hidden behind this configuration, its missing 6:

Yes, we skiped it because it's fairly static. Maybe it was completely static at the time the decision was made, IngressController's max connection tuning option might have been added later.

Also, as you pointed this is a frontend metric. The global one is available via show info, and reading global metrics from there is preferable because it not only provides the correct one

Right. I was thinking of this approach too. What was puzzling me is the scraping behavior which can go one of 2 ways: http endpoint of admin socket. Since show info would always use the unix socket it would depart from this behavior by hard-wiring the max_connection metric to the "unix socket scraping". That's why I decided to go the easiest path which is getting the maxconn from the scraped data we already have (any frontend's maxconn would do the thing since we don't have any configuration knob to set maxconn on frontends). I didn't look up how to get the global maxconn from the stats webpage, maybe it's possible too. However, your remark made me think whether we are obliged to keep this as a requirement (being able to scrape from the http endpoint). CIO hardcodes the metrics type to haproxy which disables the stats webpage. I think I need to get more history data on this, may be @Miciah has a stronger opinion about whether we can scrape the max connections from the admin socket without implementing it from the http stats.

also provides the current global connections, which is the metric to be tracked along with maxconn to alert users about the availability of their connection limits.

Yes. That's another point I was thinking of. A haproxy_current_connections metric can be convenient, I agree. However the same data can be retrieved from haproxy_frontend_current_sessions metric. All frontends have to be used though.

Yes, we skiped it because it's fairly static.

Indeed. It reports the current configuration only. This is the same data that this PR is providing if I'm not mistaken.

may be @Miciah has a stronger opinion about we can scrape the max connections from the admin socket without implementing it from the http stats.

... and

However the same data can be retrieved from haproxy_frontend_current_sessions metric. All frontends have to be used though.

Just my 2c on it.

If I understood it correctly this effort comes from an issue in the client, due to a non monitored maxconn reached and causing an outage. My proposal is to expose the real data from the best source, and not only the max but current as well. Anything we calculate or infer ourselves might be wrong, maybe today, maybe in the future when we change some approach and start to expose non accurate data.

After a discussion with @Miciah over Slack, he expressed his preference of using CSV data (what's returned by show stat) whenever the needed value is present there. Since with the current architecture frontend maxconn == global maxconn, CSV data can be used for the global maxconn metric.

I think I stick to this implementation then as it's the simplest. The point about using the best source (show info) is fair and we may need to come back to it in the future. Also, I think that this will have to paired with the decommission (or redesign) of the http endpoint.

jcmoraisjr · 2026-03-12T12:37:16Z

/lgtm
/approve

#728 (comment) is an unaddressed point though, we're choosing the simplest approach for now.

openshift-ci · 2026-03-12T12:38:52Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcmoraisjr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [jcmoraisjr]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alebedev87 · 2026-03-12T13:39:27Z

/assign @ShudiLi

For verification.

ShudiLi · 2026-03-13T02:58:07Z

ShudiLi · 2026-03-13T03:03:49Z

@alebedev87 LGTM overall, but after I updated the tuningOptions/maxConnections with 2000, the attached metrics picture showed the stats of the deleted router pods, maybe we could remove those lines of the deleted router pods.

alebedev87 · 2026-03-13T10:19:12Z

but after I updated the tuningOptions/maxConnections with 2000, the attached metrics picture showed the stats of the deleted router pods, maybe we could remove those lines of the deleted router pods.

@ShudiLi: yes, metrics with None value are old time series. Prometheus has a retention period which doesn't allow older metrics to disappear from the displayed immediately. They should not be a problem for alerting rules though because the rules will be using aggregate functions like sum() which don't take slate metrics into account.

Example (contrived one) of an alerting rule which fires when the cluster's ingress (all routers) is approaching the limit of connections:

Btw we can see some of the existing metrics giving None timeseries too:

ShudiLi · 2026-03-13T12:46:07Z

@alebedev87 Thanks for the explanation. As we talked in slack, similar to haproxy_up, the None value is a standard Prometheus retention, and it won't be taken into account in the alert rule.

ShudiLi · 2026-03-13T13:51:50Z

/verified by @ShudiLi

openshift-ci-robot · 2026-03-13T13:52:04Z

@ShudiLi: This PR has been marked as verified by @ShudiLi.

Details

In response to this:

/verified by @ShudiLi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-03-13T18:16:35Z

@alebedev87: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Feb 6, 2026

alebedev87 changed the title ~~NE-2418: Add haproxy_max_connections metric~~ [WIP] NE-2418: Add haproxy_max_connections metric Feb 6, 2026

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2026

openshift-ci Bot requested review from Thealisyed and davidesalerno February 6, 2026 18:14

alebedev87 force-pushed the maxconn_metric branch from 176bedf to 8693a02 Compare February 6, 2026 18:15

alebedev87 force-pushed the maxconn_metric branch from 8693a02 to a0ed627 Compare February 6, 2026 18:16

openshift-ci Bot changed the title ~~[WIP] NE-2418: Add haproxy_max_connections metric~~ NE-2418: Add haproxy_max_connections metric Feb 25, 2026

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 25, 2026

openshift-ci Bot assigned jcmoraisjr Feb 25, 2026

jcmoraisjr reviewed Feb 27, 2026

View reviewed changes

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 12, 2026

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 12, 2026

openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 13, 2026

openshift-merge-bot Bot merged commit a7c0b4c into openshift:master Mar 13, 2026
11 checks passed

alebedev87 mentioned this pull request Mar 13, 2026

NE-2418: Add e2e test for haproxy_max_connections metric openshift/cluster-ingress-operator#1361

Merged

alebedev87 mentioned this pull request Mar 31, 2026

OSDOCS-18141 [NE-2354] Support Prometheus monitoring metric for route… openshift/openshift-docs#109366

Open

1 task

	// defaultSelectedMetrics is the list of metrics included by default. These metrics are a subset
	// of the metrics exposed by haproxy_exporter by default for performance reasons.
	var defaultSelectedMetrics = []int{2, 4, 5, 7, 8, 9, 13, 14, 17, 21, 24, 33, 35, 39, 40, 41, 42, 43, 44, 58, 59, 60, 79, 85}

Conversation

alebedev87 commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Feb 6, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alebedev87 commented Feb 25, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Feb 25, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Feb 25, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcmoraisjr commented Feb 25, 2026

Uh oh!

jcmoraisjr Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

alebedev87 Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

jcmoraisjr Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

alebedev87 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

jcmoraisjr commented Mar 12, 2026

Uh oh!

openshift-ci Bot commented Mar 12, 2026

Uh oh!

alebedev87 commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShudiLi commented Mar 13, 2026

Uh oh!

ShudiLi commented Mar 13, 2026

Uh oh!

alebedev87 commented Mar 13, 2026

Uh oh!

ShudiLi commented Mar 13, 2026

Uh oh!

ShudiLi commented Mar 13, 2026

Uh oh!

openshift-ci-robot commented Mar 13, 2026

Uh oh!

openshift-ci Bot commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alebedev87 commented Feb 6, 2026 •

edited

Loading

openshift-ci-robot commented Feb 6, 2026 •

edited by openshift-ci Bot

Loading

alebedev87 commented Feb 25, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Feb 25, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented Feb 25, 2026 •

edited by openshift-ci Bot

Loading

alebedev87 commented Mar 12, 2026 •

edited

Loading