Feature Request: PostSync Monitoring on all Config Sync managed resources

### Checklist

- [x] I did not find a related open enhancement request.
- [x] I understand that enhancement requests filed in the GitHub repository are by default low priority.
- [x] If this request is time-sensitive, I have submitted a corresponding issue with [GCP support](https://cloud.google.com/support-hub).

### Describe the feature

The current [experimental post sync feature](https://github.com/GoogleContainerTools/config-sync/tree/main/examples/post-sync#quick-start) is scoped to only monitor `RootSyncs` and `RepoSyncs`.  However, most issues that can be caused to those resources have been shifted left and are caught earlier in our CI pipelines using `kubectl apply --dry-run=server`.  What would be really powerful is `PostSync` monitoring for the resources that our `RootSyncs` control.

We are consuming config sync as a part of config-controller, so our `RootSyncs` are managing KCC/GCP resources.  Those resources are however a lot harder to dry run because there is a strong dependency on the GCP APIs willingness to accept a given change.  We often have resources reaching `UpdateFailed` or `DependencyNotFound` states.   Currently, the lowest granularity we can monitor/alert that at is a `RootSync` level using [`pipeline_error_observed` metric](https://cloud.google.com/kubernetes-engine/enterprise/config-sync/docs/how-to/monitoring-config-sync).  This is a huge downside because a `RootSync` can have > 1000 resources and we have to alert a centralized team rather than the owner of the breaking change.  The ideal is to measure, monitor, and alert on this at the per resource level.

The feature request is that PostSync would also log a structured log for these resource failures, and then we can follow the postsync flow of logs -> pubsub -> workload (which would file a bug to the relevant assignee) on a per resource rather than per `RootSync` basis.  There is some intricacy to work around here because a resource might be in ` DependencyNotFound` state for a few minutes, then resolve when the dependency is created.

An example of those failures on the GKE resources

```
> kc get LoggingLink -n REDACTED
NAME                            AGE   READY   STATUS         STATUS AGE
fooservicefranc-stg-stg1-link   9d    False   UpdateFailed   9d
insightsprocess-stg-stg1-link   9d    False   UpdateFailed   9d
```

And we can actually see those failures from config sync/`nomos`

```
> nomos status --name REDACTED | grep logginglink
     REDACTED   logginglink.logging.cnrm.cloud.google.com/fooservicefranc-stg-stg1-link                                        InProgress   ac89d2a
     REDACTED   logginglink.logging.cnrm.cloud.google.com/insightsprocess-stg-stg1-link   InProgress   ac89d2a
```

### Importance

This is a blocker for our adoption of `PostSync`.  Monitoring only our `RootSync`s doesnt provide us much value.  If we can't get per resource monitoring from the system, we are reaching the scale where we would need to implement this feature ourselves.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: PostSync Monitoring on all Config Sync managed resources #1886

Checklist

Describe the feature

Importance

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: PostSync Monitoring on all Config Sync managed resources #1886

Description

Checklist

Describe the feature

Importance

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions