Skip to content

feat: Add StatVar Series Aggregations#571

Draft
SandeepTuniki wants to merge 2 commits into
masterfrom
add-statvar-series-aggrs
Draft

feat: Add StatVar Series Aggregations#571
SandeepTuniki wants to merge 2 commits into
masterfrom
add-statvar-series-aggrs

Conversation

@SandeepTuniki

Copy link
Copy Markdown
Contributor

WIP. Don't review yet.

@codacy-production

codacy-production Bot commented Jun 22, 2026

Copy link
Copy Markdown

Not up to standards ⛔

🔴 Issues 23 medium · 25 minor

Alerts:
⚠ 48 issues (≤ 0 issues of at least minor severity)

Results:
48 new issues

Category Results
Documentation 11 minor
Security 16 medium
CodeStyle 14 minor
Complexity 7 medium

View in Codacy

🟢 Metrics 76 complexity

Metric Results
Complexity 76

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the StatVarSeriesAggregator class to orchestrate multi-round statistical variable series aggregations (such as anomalies and ensembles) using BigQuery Federation over Spanner, along with corresponding integration tests. The review feedback identifies critical issues in the generated SQL queries: first, the TimeSeries table does not contain a direct entity1 column and instead requires extracting it from the entities JSON column; second, the join between Observation and TimeSeries lacks extra_entities_id, which can lead to incorrect many-to-many mappings.

Comment on lines +425 to +428
FROM EXTERNAL_QUERY("{connection_id}",
'''SELECT variable_measured, entity1, extra_entities_id, facet
FROM TimeSeries
WHERE provenance IN ({input_provenance_str}) {ts_exclude_filter} ''')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The TimeSeries table in Spanner does not have a direct entity1 column; instead, it stores entity information in the entities JSON column (e.g., {"entity1": "..."}). Querying entity1 directly in the EXTERNAL_QUERY will result in a Spanner SQL error.

Use JSON_VALUE(entities, '$.entity1') AS entity1 to correctly extract the entity ID.

Suggested change
FROM EXTERNAL_QUERY("{connection_id}",
'''SELECT variable_measured, entity1, extra_entities_id, facet
FROM TimeSeries
WHERE provenance IN ({input_provenance_str}) {ts_exclude_filter} ''')
FROM EXTERNAL_QUERY("{connection_id}",
'''SELECT variable_measured, JSON_VALUE(entities, '$.entity1') AS entity1, extra_entities_id, facet
FROM TimeSeries
WHERE provenance IN ({input_provenance_str}) {ts_exclude_filter} ''')

Comment on lines +493 to +497
JOIN EXTERNAL_QUERY("{connection_id}",
'''SELECT variable_measured, entity1, extra_entities_id, facet_id, facet
FROM TimeSeries
WHERE provenance IN ({input_provenance_str})''') ts
ON o.variable_measured = ts.variable_measured AND o.entity1 = ts.entity1 AND o.facet_id = ts.facet_id

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There are two issues in this join block:

  1. The TimeSeries table does not have an entity1 column. It must be extracted from the entities JSON column using JSON_VALUE(entities, '$.entity1').
  2. The join condition does not include extra_entities_id. For statistical variables that represent relationships or have multiple entities (where extra_entities_id is populated), omitting this column from the join can cause incorrect many-to-many mappings and duplicate observation rows.

Update the query to extract entity1 and include a NULL-safe comparison for extra_entities_id in the join condition.

Suggested change
JOIN EXTERNAL_QUERY("{connection_id}",
'''SELECT variable_measured, entity1, extra_entities_id, facet_id, facet
FROM TimeSeries
WHERE provenance IN ({input_provenance_str})''') ts
ON o.variable_measured = ts.variable_measured AND o.entity1 = ts.entity1 AND o.facet_id = ts.facet_id
JOIN EXTERNAL_QUERY("{connection_id}",
'''SELECT variable_measured, JSON_VALUE(entities, '$.entity1') AS entity1, extra_entities_id, facet_id, facet
FROM TimeSeries
WHERE provenance IN ({input_provenance_str})''') ts
ON o.variable_measured = ts.variable_measured AND o.entity1 = ts.entity1 AND o.facet_id = ts.facet_id AND COALESCE(o.extra_entities_id, '') = COALESCE(ts.extra_entities_id, '')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant