Skip to content

Add PostgreSQL observability telemetry exposure via ServiceMonitors#1808

Draft
DmytroPI-dev wants to merge 5 commits intofeature/database-controllersfrom
postgres-operator-monitoring
Draft

Add PostgreSQL observability telemetry exposure via ServiceMonitors#1808
DmytroPI-dev wants to merge 5 commits intofeature/database-controllersfrom
postgres-operator-monitoring

Conversation

@DmytroPI-dev
Copy link
Copy Markdown

@DmytroPI-dev DmytroPI-dev commented Apr 1, 2026

Description

Adds PostgreSQL observability telemetry exposure for PostgresCluster with operator-managed metrics Services and Prometheus ServiceMonitors for PostgreSQL and PgBouncer.

Key Changes

api/v4/postgresclusterclass_types.go
Added class-level observability configuration for PostgreSQL and PgBouncer metrics.

api/v4/postgrescluster_types.go
Added cluster-level disable-only observability overrides.

pkg/postgresql/cluster/core/cluster.go
Wired PostgreSQL and PgBouncer metrics Service and ServiceMonitor reconciliation into the PostgresCluster flow.
Made ServiceMonitor presence required by failing reconciliation when the CRD is unavailable.

pkg/postgresql/cluster/core/monitoring.go
Added feature resolution helpers.
Added builders and reconcilers for PostgreSQL/PgBouncer metrics Services.
Added builders and reconcilers for PostgreSQL/PgBouncer ServiceMonitors.

internal/controller/postgrescluster_controller.go
Added RBAC for monitoring.coreos.com/servicemonitors.

cmd/main.go
Registered Prometheus Operator monitoring/v1 types in the manager scheme.

internal/controller/suite_test.go
Registered Prometheus Operator monitoring/v1 types in the test scheme.

pkg/postgresql/cluster/core/monitoring_unit_test.go
Added unit tests for observability flag resolution and monitoring resource builders.

Testing and Verification

Added unit tests in pkg/postgresql/cluster/core/monitoring_unit_test.go for:

  • class/cluster observability enablement logic
  • PostgreSQL and PgBouncer metrics Service builders
  • PostgreSQL and PgBouncer ServiceMonitor builders

Related Issues

CPI-1853 - related JIRA ticket.

PR Checklist

  • Code changes adhere to the project's coding standards.
  • Relevant unit and integration tests are included.
  • Documentation has been updated accordingly.
  • All tests pass locally.
  • The PR description follows the project's guidelines.

@DmytroPI-dev DmytroPI-dev force-pushed the postgres-operator-monitoring branch from a1b796f to 976ecd1 Compare April 2, 2026 14:08
@DmytroPI-dev DmytroPI-dev changed the title Create ServiceMonitor and basic Grafana dashboard for metrics Add PostgreSQL observability telemetry exposure via ServiceMonitors Apr 2, 2026
); err != nil {
return ctrl.Result{}, err
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's another block for our reconciliation metric, maybe it's worth to emit event in case of success? or issue?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also what about extending our status with information if this failed/succeeded i.e add new condition?

@github-actions
Copy link
Copy Markdown
Contributor

CLA Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contribution License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment with the exact sentence copied from below.


I have read the CLA Document and I hereby sign the CLA


1 out of 2 committers have signed the CLA.
@DmytroPI-dev
@limak9182
You can retrigger this bot by commenting recheck in this Pull Request

}

// PostgresObservabilityOverride overrides observability configuration options for PostgresClusterClass.
type PostgresObservabilityOverride struct {
Copy link
Copy Markdown
Collaborator

@mploski mploski Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostgresObservabilityOverride we should follow the same pattern we have for ConnectionPoolerEnabled
So maybe ConnectionPoolerMetricsEnabled and PostgreSQLMetricsEnabled?

PostgreSQL *FeatureDisableOverride `json:"postgresql,omitempty"`

// +optional
PgBouncer *FeatureDisableOverride `json:"pgbouncer,omitempty"`
Copy link
Copy Markdown
Collaborator

@mploski mploski Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in other provider we might not have pgbouncer ( aws for example) lets call it generic way ( connectionPooler). Also we should probably have CEL logic that doesnt allow connection pooler metrics enabled if connection pooler itself is disabled

// Can be overridden in PostgresCluster CR.
// +kubebuilder:default={}
// +optional
Observability *PostgresObservabilityClassConfig `json:"observability,omitempty"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to previous comment :-)

}

func isConnectionPoolerMetricsEnabled(cluster *enterprisev4.PostgresCluster, class *enterprisev4.PostgresClusterClass) bool {
if !isConnectionPoolerEnabled(cluster, class) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check shouldnt be a part of this function I believe

return override == nil || !*override
}

func isConnectionPoolerEnabled(cluster *enterprisev4.PostgresCluster, class *enterprisev4.PostgresClusterClass) bool {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this function be a part of connection pooler not monitoring?

return override == nil || !*override
}

func buildPostgreSQLMetricsService(scheme *runtime.Scheme, cluster *enterprisev4.PostgresCluster) (*corev1.Service, error) {
Copy link
Copy Markdown
Collaborator

@mploski mploski Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity why we need to create k8s service to expose those information? Service is effectively a load balancer that use round robin. If we have many postgres instances every call to that endpoint can fetch metrics from different instance, which can be different depending how users are connected. Is my understanding correct?

return fmt.Errorf("building PostgreSQL metrics Service: %w", err)
}

live := &corev1.Service{
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this, cant we use desired directly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants