Skip to content

Add data_freshness_sla and volume_threshold tests#932

Open
joostboon wants to merge 28 commits intomasterfrom
feature/volume-threshold-test
Open

Add data_freshness_sla and volume_threshold tests#932
joostboon wants to merge 28 commits intomasterfrom
feature/volume-threshold-test

Conversation

@joostboon
Copy link
Contributor

@joostboon joostboon commented Feb 18, 2026

Summary

  • data_freshness_sla: New test that verifies data was updated before a configurable SLA deadline time, with timezone support and day-of-week/month scheduling.
  • volume_threshold: New test that monitors row count changes with configurable warn/error percentage thresholds. Uses Elementary's metric caching infrastructure (data_monitoring_metrics) to avoid recalculating row counts for buckets that have already been computed.

Code Changes

  • macros/edr/tests/test_data_freshness_sla.sql — New SLA-based freshness test with timezone-aware deadline checking
  • macros/edr/tests/test_volume_threshold.sql — New volume monitoring test with multi-severity thresholds (warn at 5%, error at 10% by default), leveraging incremental metric collection via get_metric_buckets_min_and_max and store_metrics_table_in_cache

Breaking Downstream Impacts

None — both files are net-new additions with no changes to existing models or columns.

Merge Status

Safe to merge. No breaking changes.

Test plan

  • Verify data_freshness_sla compiles correctly against a model with a timestamp column
  • Verify volume_threshold compiles and stores metrics to data_monitoring_metrics
  • Confirm incremental behavior: second run should skip already-cached buckets
  • Test warn vs error severity levels with different threshold configurations
  • Test direction parameter: both, spike, drop

Made with Cursor

Summary by CodeRabbit

  • New Features

    • Data freshness SLA test: timezone-aware SLA times, scheduling by day-of-week/day-of-month, optional row filtering, skip-when-off-schedule, and clear NO_DATA / DATA_FRESH / DATA_STALE outcomes with descriptive messages.
    • Volume threshold test: detects row-count spikes/drops with multi-level severity, direction control, time buckets, baseline minimums, metric reuse, and customizable thresholds.
  • Tests

    • New integration suites covering freshness and volume scenarios, scheduling, filtering, thresholds, directionality, and timezone handling.
  • Bug Fixes

    • Improved timezone compatibility for environments lacking the standard timezone library.

@github-actions
Copy link
Contributor

👋 @joostboon
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link

coderabbitai bot commented Feb 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds two EDR tests (data_freshness_sla, volume_threshold) with public query helpers, integration tests for both, and updates SLA timezone utilities to support non-pytz/fusion environments and new scheduling fields (day_of_week/day_of_month).

Changes

Cohort / File(s) Summary
Data Freshness SLA
macros/edr/tests/test_data_freshness_sla.sql
Adds data_freshness_sla test and public get_data_freshness_sla_query. Validates params, resolves relation, guards against ephemeral/non-timestamp columns, supports scheduling (day_of_week/day_of_month), optional where_expression, computes UTC SLA deadlines, skip logic, and emits NO_DATA/DATA_FRESH/DATA_STALE with dialect-safe escaping and descriptive messages.
Volume Threshold
macros/edr/tests/test_volume_threshold.sql
Adds volume_threshold test and public get_volume_threshold_comparison_query. Builds metric props, collects fresh and cached metrics, deduplicates/merges, identifies current/previous buckets, computes absolute/percent changes, applies direction-based severity (both/spike/drop) with warn/error thresholds, honors min_row_count, supports where_expression, and returns result rows filtered by severity.
SLA Utils (timezone / fusion)
macros/edr/tests/test_utils/sla_utils.sql
Introduces a fusion guard and alternate timezone handling path avoiding pytz (uses stdlib zoneinfo/manual offset logic when needed), removes deadline_passed from payload and emits scheduling fields (day_of_week/day_of_month), and updates calculate_sla_deadline_utc behavior accordingly.
Integration Tests — Data Freshness SLA
integration_tests/tests/test_data_freshness_sla.py
Adds integration tests covering fresh vs stale data, filtered/no-data cases, deadline-not-passed behavior, and timezone-aware scenarios; asserts expected pass/fail outcomes for the SLA test.
Integration Tests — Volume Threshold
integration_tests/tests/test_volume_threshold.py
Adds integration tests for stable volumes, spike/drop failures, directionality behavior, min_row_count skipping, custom thresholds/backfill, and where_expression filtering; includes deterministic _generate_stable_data helper and test configuration constants.

Sequence Diagram(s)

sequenceDiagram
    participant Tester as Tester
    participant Macro as Test Macro
    participant QueryBuilder as Query Builder
    participant Warehouse as Warehouse
    participant Metrics as Metrics Cache

    Tester->>Macro: invoke test (model, args)
    Macro->>QueryBuilder: validate params, resolve relation, build SQL (SLA or bucket query, where_expression)
    alt volume_threshold uses cache
        QueryBuilder->>Metrics: read cached metrics
        Metrics-->>QueryBuilder: cached metrics
    end
    QueryBuilder->>Warehouse: execute SQL
    Warehouse-->>QueryBuilder: return rows (timestamps / bucket metrics)
    QueryBuilder->>QueryBuilder: dedupe/merge, compute deadlines/percent change, determine severity
    QueryBuilder-->>Macro: result rows (status, severity, description)
    Macro-->>Tester: return outcome (pass/warn/fail/skip)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I nibble timestamps beneath the moonlit sky,
I count the buckets as the hours fly by,
I hop through timezones, stitch UTC with care,
I sniff out spikes and tidy stale air,
A joyful hop — your tests purr in my lair.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and concisely summarizes the main changes: adding two new test macros (data_freshness_sla and volume_threshold) to the repository.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/volume-threshold-test
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@joostboon joostboon force-pushed the feature/volume-threshold-test branch from c133ff1 to c31987f Compare February 18, 2026 15:19
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
macros/edr/tests/test_volume_threshold.sql (2)

228-234: result_description will be NULL when percent_change is NULL.

When previous_row_count is 0 or NULL, percent_change is NULL (line 193). The || concatenation on lines 232–234 will then produce a NULL result_description in most SQL engines (Snowflake, Postgres, Redshift). This row still gets emitted if severity_level > 0, but that can't happen here because line 206 returns 0 when percent_change is null. So this is safe in practice — just noting the coupling: if the severity logic ever changes to flag null percent_change, the description silently disappears.

♻️ Optional: add a COALESCE for defensive description
-        'Row count changed by ' || cast(percent_change as {{ elementary.edr_type_string() }}) || 
-        '% (from ' || cast({{ elementary.edr_cast_as_int('previous_row_count') }} as {{ elementary.edr_type_string() }}) || 
-        ' to ' || cast({{ elementary.edr_cast_as_int('current_row_count') }} as {{ elementary.edr_type_string() }}) || ')' as result_description
+        case 
+            when percent_change is null then 'Row count: ' || cast({{ elementary.edr_cast_as_int('current_row_count') }} as {{ elementary.edr_type_string() }}) || ' (no prior baseline)'
+            else 'Row count changed by ' || cast(percent_change as {{ elementary.edr_type_string() }}) || 
+                 '% (from ' || cast({{ elementary.edr_cast_as_int('previous_row_count') }} as {{ elementary.edr_type_string() }}) || 
+                 ' to ' || cast({{ elementary.edr_cast_as_int('current_row_count') }} as {{ elementary.edr_type_string() }}) || ')'
+        end as result_description
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_volume_threshold.sql` around lines 228 - 234, The
concatenation for result_description can become NULL when percent_change is
NULL; update the expression that builds result_description (referencing
result_description, percent_change, previous_row_count, current_row_count and
the helper macros elementary.edr_cast_as_int and elementary.edr_type_string) to
defensively coalesce nulls before concatenation (e.g., COALESCE(percent_change,
'N/A') and COALESCE(cast(...), '0' or 'N/A') or wrap the whole concatenation
with COALESCE(..., '<no change info>')) so the description never yields NULL
even if percent_change or the counts are NULL.

63-77: Hardcoded days_back=14 limits the historical baseline for comparison.

The get_metric_buckets_min_and_max call uses days_back=14, which means only 14 days of history are considered. For weekly or monthly granularity buckets this may be too narrow to have a meaningful previous bucket. Consider either making this configurable or deriving it from time_bucket.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_volume_threshold.sql` around lines 63 - 77, The call to
elementary.get_metric_buckets_min_and_max currently hardcodes days_back=14 which
is too narrow for weekly/monthly buckets; update the macro to accept a
configurable days_back parameter (or compute it from the provided time_bucket in
metric_properties) and pass that variable into get_metric_buckets_min_and_max
instead of 14; locate the call site using get_metric_buckets_min_and_max, the
surrounding metric_properties and time_bucket variables, and either add a new
macro argument (e.g., days_back) or derive days_back from
time_bucket.period/count (e.g., multiply count by a suitable multiplier) so
longer bucket granularities use a larger history window.
macros/edr/tests/test_data_freshness_sla.sql (1)

198-207: Confusing dead-code pattern for is_failure logic — use a clearer Jinja branch instead.

The current approach renders SQL WHEN clauses that are logically dead (when not TRUE then false never matches) or always-true (when not FALSE then false always matches). While functionally correct, this is very hard to reason about for future maintainers.

♻️ Proposed simplification
             case
                 when freshness_status = 'DATA_FRESH' then false
-                {# If deadline hasn't passed, don't fail yet #}
-                {% if deadline_passed %}
-                when not TRUE then false
-                {% else %}
-                when not FALSE then false
-                {% endif %}
+                {% if not deadline_passed %}
+                {# Deadline hasn't passed yet - don't fail regardless of status #}
+                when true then false
+                {% endif %}
                 else true
             end as is_failure,

When deadline_passed is false, this renders when true then false which catches everything after the DATA_FRESH check and returns pass. When deadline_passed is true, no extra clause is emitted, so non-fresh statuses fall through to else true (fail).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_data_freshness_sla.sql` around lines 198 - 207, The
is_failure CASE for freshness_status uses confusing dead WHEN clauses; update
the Jinja around the CASE in macros/edr/tests/test_data_freshness_sla.sql so
that you emit a clear branch depending on the deadline_passed flag: keep the
initial "when freshness_status = 'DATA_FRESH' then false" check, and if
deadline_passed is false emit an explicit "when true then false" clause (so
non-fresh rows do not fail), but if deadline_passed is true emit no additional
WHEN clause so non-fresh statuses fall through to "else true" (fail). Locate the
CASE building around is_failure to implement this simpler Jinja branching using
the deadline_passed variable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macros/edr/tests/test_data_freshness_sla.sql`:
- Around line 176-184: The freshness_status logic classifies future-dated rows
as DATA_STALE because mdt.max_timestamp_utc > sd.target_date_end_utc falls
through; update the CASE in the macro to handle future timestamps by adding a
condition that treats mdt.max_timestamp_utc > sd.target_date_end_utc as
'DATA_FRESH' (placed before the existing DATA_FRESH when that checks the
window), or alternatively widen the DATA_FRESH check to only require
mdt.max_timestamp_utc >= sd.target_date_start_utc (removing the upper bound);
reference mdt.max_timestamp_utc, sd.target_date_start_utc,
sd.target_date_end_utc and the freshness_status CASE when applying the change.

In `@macros/edr/tests/test_volume_threshold.sql`:
- Around line 153-166: The deduplication using row_number() in
deduplicated_metrics is nondeterministic because it partitions and orders by the
same bucket_end; update the dedup logic to add a deterministic tiebreaker so
freshly computed metrics win: add a precedence column to the source rows (e.g.,
0 for fresh, 1 for cached) or include created_at (or both) and change
row_number() to partition by bucket_end and order by precedence asc, created_at
desc (or equivalent) so deduplicated_metrics and the metrics CTE
deterministically keep the desired row.

---

Nitpick comments:
In `@macros/edr/tests/test_data_freshness_sla.sql`:
- Around line 198-207: The is_failure CASE for freshness_status uses confusing
dead WHEN clauses; update the Jinja around the CASE in
macros/edr/tests/test_data_freshness_sla.sql so that you emit a clear branch
depending on the deadline_passed flag: keep the initial "when freshness_status =
'DATA_FRESH' then false" check, and if deadline_passed is false emit an explicit
"when true then false" clause (so non-fresh rows do not fail), but if
deadline_passed is true emit no additional WHEN clause so non-fresh statuses
fall through to "else true" (fail). Locate the CASE building around is_failure
to implement this simpler Jinja branching using the deadline_passed variable.

In `@macros/edr/tests/test_volume_threshold.sql`:
- Around line 228-234: The concatenation for result_description can become NULL
when percent_change is NULL; update the expression that builds
result_description (referencing result_description, percent_change,
previous_row_count, current_row_count and the helper macros
elementary.edr_cast_as_int and elementary.edr_type_string) to defensively
coalesce nulls before concatenation (e.g., COALESCE(percent_change, 'N/A') and
COALESCE(cast(...), '0' or 'N/A') or wrap the whole concatenation with
COALESCE(..., '<no change info>')) so the description never yields NULL even if
percent_change or the counts are NULL.
- Around line 63-77: The call to elementary.get_metric_buckets_min_and_max
currently hardcodes days_back=14 which is too narrow for weekly/monthly buckets;
update the macro to accept a configurable days_back parameter (or compute it
from the provided time_bucket in metric_properties) and pass that variable into
get_metric_buckets_min_and_max instead of 14; locate the call site using
get_metric_buckets_min_and_max, the surrounding metric_properties and
time_bucket variables, and either add a new macro argument (e.g., days_back) or
derive days_back from time_bucket.period/count (e.g., multiply count by a
suitable multiplier) so longer bucket granularities use a larger history window.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
macros/edr/tests/test_data_freshness_sla.sql (1)

198-207: The is_failure logic is correct but needlessly obfuscated.

When deadline_passed is True, the rendered SQL includes when not TRUE then false — a branch that can never match (dead code). When False, it renders when not FALSE then false — a catch-all that always matches. Both work, but anyone reading the generated SQL will be confused by these tautological / contradictory conditions.

A clearer alternative:

♻️ Suggested refactor
             case
                 when freshness_status = 'DATA_FRESH' then false
-                {# If deadline hasn't passed, don't fail yet #}
-                {% if deadline_passed %}
-                when not TRUE then false
-                {% else %}
-                when not FALSE then false
-                {% endif %}
-                else true
+                {% if not deadline_passed %}
+                {# Deadline hasn't passed yet — don't fail regardless of status #}
+                else false
+                {% else %}
+                {# Deadline has passed — fail if data is not fresh #}
+                else true
+                {% endif %}
             end as is_failure,

This produces the same behavior but the generated SQL is immediately understandable: either else false (always pass) or else true (fail on non-fresh).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_data_freshness_sla.sql` around lines 198 - 207, The
is_failure CASE is needlessly obfuscated by injecting tautological WHENs via the
deadline_passed Jinja branch; simplify by making the macro render a
straightforward CASE for is_failure that does not emit impossible or always-true
WHENs: change the template around the CASE producing is_failure so that when
deadline_passed is true the CASE returns "when freshness_status = 'DATA_FRESH'
then false else true" (fail on non-fresh) and when deadline_passed is false it
returns "when freshness_status = 'DATA_FRESH' then false else false" (always
pass), replacing the current conditional WHEN not TRUE/WHEN not FALSE logic in
the is_failure generation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macros/edr/tests/test_data_freshness_sla.sql`:
- Around line 208-214: The NO_DATA branch of result_description concatenates
where_expression directly into a single-quoted SQL string, which breaks when
where_expression contains single quotes; update the Jinja template in the
NO_DATA branch to escape single quotes before concatenation (e.g., use the Jinja
replace filter on where_expression to double single-quotes) so
formatted_sla_time, timezone and model_relation.identifier are preserved and the
generated SQL remains valid.

---

Duplicate comments:
In `@macros/edr/tests/test_data_freshness_sla.sql`:
- Around line 160-168: The alias max_timestamp_utc in the max_data_timestamp CTE
is misleading because elementary.edr_cast_as_timestamp(timestamp_column) only
casts and does not convert to UTC; update the CTE to either (a) perform an
explicit timezone-normalization to UTC after casting (using your project's
standard timezone conversion helper) so the value truly is UTC when compared to
target_date_start_utc / target_date_end_utc, or (b) if you do not want to
convert here, rename the alias to something accurate (e.g., max_timestamp) and
adjust downstream references that expect a UTC value (involving
max_data_timestamp, max_timestamp_utc, and comparisons to
target_date_start_utc/target_date_end_utc) to either convert to UTC at
comparison time or use the normalized value. Ensure you update all usages of
max_timestamp_utc to the new name or to use the converted UTC timestamp
consistently.
- Around line 176-184: The current CASE in freshness_status misses future-dated
rows because mdt.max_timestamp_utc > sd.target_date_end_utc falls through to the
final else and is marked 'DATA_STALE'; update the CASE to add an explicit branch
for future timestamps (e.g., add "when mdt.max_timestamp_utc >
sd.target_date_end_utc then 'DATA_FRESH'" or, if you need to track them
separately, "then 'DATA_FUTURE'") before the NO_DATA and else branches so
mdt.max_timestamp_utc values after sd.target_date_end_utc are handled correctly;
locate the CASE that produces freshness_status and modify the conditions around
mdt.max_timestamp_utc, sd.target_date_start_utc, and sd.target_date_end_utc
accordingly.

---

Nitpick comments:
In `@macros/edr/tests/test_data_freshness_sla.sql`:
- Around line 198-207: The is_failure CASE is needlessly obfuscated by injecting
tautological WHENs via the deadline_passed Jinja branch; simplify by making the
macro render a straightforward CASE for is_failure that does not emit impossible
or always-true WHENs: change the template around the CASE producing is_failure
so that when deadline_passed is true the CASE returns "when freshness_status =
'DATA_FRESH' then false else true" (fail on non-fresh) and when deadline_passed
is false it returns "when freshness_status = 'DATA_FRESH' then false else false"
(always pass), replacing the current conditional WHEN not TRUE/WHEN not FALSE
logic in the is_failure generation.

@joostboon joostboon force-pushed the feature/volume-threshold-test branch 2 times, most recently from 4600b5c to 11fe645 Compare February 18, 2026 15:28
Add two new Elementary tests:
- data_freshness_sla: checks if data was updated before a specified SLA deadline
- volume_threshold: monitors row count changes with configurable warn/error
  thresholds, using Elementary's metric caching to avoid redundant computation

Fixes applied:
- volume_threshold: union historical metrics with new metrics for comparison
- volume_threshold: deterministic dedup with source_priority tiebreaker
- volume_threshold: let get_time_bucket handle defaults
- data_freshness_sla: treat future-dated data as fresh (remove upper bound)
- data_freshness_sla: escape single quotes in where_expression
- data_freshness_sla: simplify deadline_passed logic
- data_freshness_sla: rename max_timestamp_utc to max_timestamp (no UTC conversion)
- data_freshness_sla: fix macro comment to match actual behavior
- data_freshness_sla: document UTC assumption, add ephemeral model check

Co-authored-by: Cursor <cursoragent@cursor.com>
@joostboon joostboon force-pushed the feature/volume-threshold-test branch from 11fe645 to a51ff11 Compare February 18, 2026 15:33
@joostboon
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Feb 18, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@joostboon
Copy link
Contributor Author

All review comments have been addressed in commit a51ff11. Summary of fixes:

1. max_timestamp_utc misleading alias (data_freshness_sla)

  • Renamed to `max_timestamp` since no UTC conversion happens
  • Updated macro comment to describe actual behavior
  • UTC assumption documented in test docstring (lines 58-61)

2. Future-dated data classified as DATA_STALE (data_freshness_sla)

  • Widened DATA_FRESH check to `>= target_date_start_utc` (removed upper bound)
  • Pre-loaded/future-timestamped data now correctly counts as fresh

3. Non-deterministic deduplication (volume_threshold)

  • Added `source_priority` column (0 = freshly computed, 1 = cached)
  • `order by source_priority asc` so new metrics deterministically win

4. SQL syntax error with single quotes in where_expression (data_freshness_sla)

  • Added `| replace("'", "''")` to escape single quotes before concatenation

joostboon added a commit to elementary-data/elementary that referenced this pull request Feb 18, 2026
Add docs for two new Elementary tests:
- data_freshness_sla: verifies data was updated before an SLA deadline
- volume_threshold: monitors row count changes with warn/error thresholds

Refs: elementary-data/dbt-data-reliability#912, elementary-data/dbt-data-reliability#932

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
joostboon and others added 2 commits February 18, 2026 21:11
…gQuery test_seed_run_results)

Co-authored-by: Cursor <cursoragent@cursor.com>
…alse (BigQuery test_seed_run_results)"

This reverts commit 9fc552e.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
macros/edr/tests/test_volume_threshold.sql (1)

179-193: result_description can produce NULL concatenation when previous_row_count is NULL.

Although the where severity_level > 0 filter (line 193) should exclude rows with a NULL previous_row_count (severity = 0 per line 162), the result_description on lines 189–191 unconditionally concatenates percent_change, previous_row_count, and current_row_count. If a future code change relaxes the severity filter, the concatenation with NULLs would produce NULL on most SQL engines (in standard SQL, 'text' || NULLNULL). Low risk given the current filter, but worth noting for maintainability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_volume_threshold.sql` around lines 179 - 193, The
result_description concatenation can yield NULL if any of percent_change,
previous_row_count, or current_row_count are NULL; update the expression in the
SELECT for result_description to guard each interpolated value with COALESCE (or
an equivalent null-safe conversion) before casting/concatenating so it always
produces a string (e.g., COALESCE(cast(previous_row_count as {{
elementary.edr_type_string() }}), 'NULL') etc. Modify the result_description
construction in the query (referencing result_description, percent_change,
previous_row_count, current_row_count and the casts using
elementary.edr_type_string()) so it is null-safe even if the severity_level
filter is relaxed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@macros/edr/tests/test_data_freshness_sla.sql`:
- Around line 180-193: The freshness check in the freshness_result CTE was
widened to treat future-dated rows as fresh by using mdt.max_timestamp >=
sd.target_date_start_utc (removing any upper bound), which is correct; no code
change required — leave the condition in freshness_result as-is, ensure the case
branches (NO_DATA / DATA_FRESH / DATA_STALE) remain unchanged and that
downstream references to freshness_result continue to expect this behavior
(e.g., any assertions in tests that reference freshness_status,
mdt.max_timestamp, sla_deadline/sd.target_date_start_utc).
- Around line 212-218: The existing single-quote escaping for where_expression
is correct — keep the replacement expression {{ where_expression | replace("'",
"''") }} inside the NO_DATA message construction (the CASE branch that builds
the 'No data found in "{{ model_relation.identifier }}"' string) as-is; no code
changes needed here, ensure the template continues to use replace("'", "''")
when interpolating where_expression into the SQL string literal.

In `@macros/edr/tests/test_volume_threshold.sql`:
- Around line 104-128: The nondeterminism is resolved by adding source_priority
and deterministic ordering: ensure in all_metrics that source_priority is set to
1 for the select from data_monitoring_metrics_table and 0 for the select from
test_metrics (as shown), that ranked_metrics partitions by bucket_start,
bucket_end and orders by source_priority asc (so fresh metrics win), and that
metrics selects rn = 1; also confirm the metric_properties comparison uses the
same normalized JSON produced by dict_to_quoted_json and that
full_table_name/metric_name casing normalization (upper/lower) is consistent
across the queries (references: all_metrics, ranked_metrics, metrics,
source_priority, dict_to_quoted_json, full_table_name, metric_name).

---

Nitpick comments:
In `@macros/edr/tests/test_volume_threshold.sql`:
- Around line 179-193: The result_description concatenation can yield NULL if
any of percent_change, previous_row_count, or current_row_count are NULL; update
the expression in the SELECT for result_description to guard each interpolated
value with COALESCE (or an equivalent null-safe conversion) before
casting/concatenating so it always produces a string (e.g.,
COALESCE(cast(previous_row_count as {{ elementary.edr_type_string() }}), 'NULL')
etc. Modify the result_description construction in the query (referencing
result_description, percent_change, previous_row_count, current_row_count and
the casts using elementary.edr_type_string()) so it is null-safe even if the
severity_level filter is relaxed.

haritamar and others added 2 commits March 12, 2026 15:30
* Add integration tests for data_freshness_sla and volume_threshold

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

* Fix sla_time YAML sexagesimal issue - use AM/PM format

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

---------

Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Itamar Hartstein <haritamar@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
integration_tests/tests/test_volume_threshold.py (1)

167-172: This rerun does not lock in the cache-precedence fix.

The second invocation backfills the same buckets from the same seeded data, so cached and freshly computed metrics are identical. A regression in the fresh-vs-cached ordering would still pass. To cover the source_priority fix, change one bucket between runs and rerun with the same metric props plus force_metrics_backfill=True.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/test_volume_threshold.py` around lines 167 - 172, The
second test invocation currently reuses the same seeded data so cached vs fresh
computations are identical; to exercise the source_priority fix, mutate one
bucket's seeded data between runs before calling dbt_project.test again (keep
DBT_TEST_NAME, test_args_high and pass test_vars={"force_metrics_backfill":
True}) so the fresh computation differs from the cached result; locate where
test_result is assigned and the first dbt_project.test call is made and insert a
small change to one bucket (e.g., alter its value or timestamp) then rerun
dbt_project.test with the same parameters to validate the cache-precedence
behavior.
integration_tests/tests/test_data_freshness_sla.py (1)

111-123: This timezone test never hits a timezone-sensitive boundary.

Seeding updated_at with utc_now makes the row fresh in both UTC and America/New_York, so a broken UTC→local-day conversion would still pass here. Use a timestamp near local midnight and assert different outcomes for different timezones to actually cover the conversion logic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/test_data_freshness_sla.py` around lines 111 - 123,
The test_with_timezone uses utc_now so it never exercises the UTC→local-day
boundary; change the seeded timestamp (TIMESTAMP_COLUMN using DATE_FORMAT) to a
specific UTC instant that falls just before/after midnight in the target
timezone (e.g., choose a UTC time that is 23:30 UTC so in America/New_York it
crosses the day boundary) so the conversion is timezone-sensitive, then run the
test with that timestamp and assert the expected pass/fail outcome for the given
"timezone" in test_args (or parametrize the test to assert different outcomes
for "UTC" vs "America/New_York"); update the assertion on test_result["status"]
to match the expected result for the chosen timestamp/timezone.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/tests/test_data_freshness_sla.py`:
- Around line 35-38: The "deadline passed" cases in test_data_freshness_sla.py
use sla_time/timezone values (set in test_args) that can be momentarily
non-stale around midnight UTC; update the failing scenarios by either (A)
choosing an sla_time/timezone pair that is always in the past in CI (e.g., set
sla_time to a time on the previous day or use a timezone guaranteed behind UTC)
and replace the test_args occurrences where "sla_time" and "timezone" are set,
or (B) wrap the specific test blocks that assert "deadline passed" with a
deterministic time freeze (e.g., freezegun.freeze_time) so current time is
fixed; apply the same change to the other test_args occurrences mentioned in the
comment.

In `@integration_tests/tests/test_volume_threshold.py`:
- Around line 50-51: The assertion using test_result["status"] != "pass" is too
permissive and allows warn to slip through; update the assertions that check
dbt_project.test(...) (referenced by test_result, dbt_project.test,
DBT_TEST_NAME, DBT_TEST_ARGS) to assert the exact failing status (e.g., assert
test_result["status"] == "fail") for the error-threshold path, and make the same
exact-status change for the other two occurrences mentioned (the similar
assertions at the other test locations).

---

Nitpick comments:
In `@integration_tests/tests/test_data_freshness_sla.py`:
- Around line 111-123: The test_with_timezone uses utc_now so it never exercises
the UTC→local-day boundary; change the seeded timestamp (TIMESTAMP_COLUMN using
DATE_FORMAT) to a specific UTC instant that falls just before/after midnight in
the target timezone (e.g., choose a UTC time that is 23:30 UTC so in
America/New_York it crosses the day boundary) so the conversion is
timezone-sensitive, then run the test with that timestamp and assert the
expected pass/fail outcome for the given "timezone" in test_args (or parametrize
the test to assert different outcomes for "UTC" vs "America/New_York"); update
the assertion on test_result["status"] to match the expected result for the
chosen timestamp/timezone.

In `@integration_tests/tests/test_volume_threshold.py`:
- Around line 167-172: The second test invocation currently reuses the same
seeded data so cached vs fresh computations are identical; to exercise the
source_priority fix, mutate one bucket's seeded data between runs before calling
dbt_project.test again (keep DBT_TEST_NAME, test_args_high and pass
test_vars={"force_metrics_backfill": True}) so the fresh computation differs
from the cached result; locate where test_result is assigned and the first
dbt_project.test call is made and insert a small change to one bucket (e.g.,
alter its value or timestamp) then rerun dbt_project.test with the same
parameters to validate the cache-precedence behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cc26f27f-6d00-4664-afe2-981864bbc466

📥 Commits

Reviewing files that changed from the base of the PR and between 9bfc0b7 and be7de95.

📒 Files selected for processing (2)
  • integration_tests/tests/test_data_freshness_sla.py
  • integration_tests/tests/test_volume_threshold.py

- Reformat both test_data_freshness_sla.sql and test_volume_threshold.sql to pass sqlfmt
- Fix Postgres round() bug: cast expression to numeric for round(numeric, int) compatibility
- Restructure Jinja conditionals in data_freshness_sla to be sqlfmt-compatible
- Extract where_suffix Jinja set to avoid parentheses inside SQL CASE expressions
- Use edr_boolean_literal for is_failure CASE and WHERE clause
- Remove 'where severity_level > 0' filter to prevent NULL fail_calc validation error

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
macros/edr/tests/test_data_freshness_sla.sql (2)

243-244: Unused column target_date_end_utc can be removed.

After widening the DATA_FRESH check to >= target_date_start_utc (removing the upper bound), target_date_end_utc is no longer referenced anywhere in the query. Consider removing it for clarity.

♻️ Proposed cleanup
         sla_deadline as (
             select
                 {{ elementary.edr_cast_as_timestamp("'" ~ sla_deadline_utc ~ "'") }}
                 as deadline_utc,
                 {{
                     elementary.edr_cast_as_timestamp(
                         "'" ~ target_date_start_utc ~ "'"
                     )
                 }} as target_date_start_utc,
-                {{ elementary.edr_cast_as_timestamp("'" ~ target_date_end_utc ~ "'") }}
-                as target_date_end_utc,
                 '{{ target_date }}' as target_date
         ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_data_freshness_sla.sql` around lines 243 - 244, Remove
the now-unused column by deleting the select expression that produces
target_date_end_utc (the call to elementary.edr_cast_as_timestamp("'" ~
target_date_end_utc ~ "'") and its alias target_date_end_utc) from the query;
search for the alias target_date_end_utc and the function call
elementary.edr_cast_as_timestamp in the select list and remove that expression
so the query no longer projects the unused column.

289-296: Deadline-not-passed logic is correct but could be clearer.

The when 1 = 1 at line 293 serves as a catch-all to prevent failures when the deadline hasn't passed yet. This is functionally correct, but the pattern may be confusing at first glance.

💡 Alternative for readability
                 case
                     when freshness_status = 'DATA_FRESH'
                     then {{ elementary.edr_boolean_literal(false) }}
                     {% if not deadline_passed %}
-                        when 1 = 1 then {{ elementary.edr_boolean_literal(false) }}
+                        {# Deadline hasn't passed yet - don't fail even if data appears stale #}
+                        when freshness_status in ('NO_DATA', 'DATA_STALE') then {{ elementary.edr_boolean_literal(false) }}
                     {% endif %}
                     else {{ elementary.edr_boolean_literal(true) }}
                 end as is_failure,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_data_freshness_sla.sql` around lines 289 - 296, The SQL
uses a confusing "when 1 = 1" as a catch-all inside the {% if not
deadline_passed %} block; replace that literal with an explicit boolean literal
for readability by emitting when {{ elementary.edr_boolean_literal(true) }} then
{{ elementary.edr_boolean_literal(false) }} (keeping the surrounding {% if not
deadline_passed %} and the case branches untouched) so the intent is clear for
the is_failure calculation that uses freshness_status and deadline_passed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macros/edr/tests/test_volume_threshold.sql`:
- Around line 139-155: The SQL embeds relation-derived literals (full_table_name
and model_relation.identifier) directly into single-quoted strings, which breaks
when identifiers contain apostrophes; update the template to escape single
quotes before embedding (e.g., replace "'" with "''" or call a provided
safe-quoting helper) for the uses in the all_metrics WHERE clause (the upper('{{
full_table_name }}') occurrence) and any other locations that drop
model_relation.identifier into single-quoted literals; ensure you reuse the
escaping approach consistently alongside existing helpers like
elementary.relation_to_full_name and elementary.dict_to_quoted_json so the
generated SQL remains valid.
- Around line 49-62: Add compile-time validation to reject negative values for
warn_threshold_percent, error_threshold_percent, and min_row_count similarly to
the existing comparisons: inside the macro where warn_threshold_percent and
error_threshold_percent are validated, add checks that raise
exceptions.raise_compiler_error when warn_threshold_percent < 0,
error_threshold_percent < 0, or min_row_count < 0; keep messages clear (e.g.,
"warn_threshold_percent must be non-negative", "error_threshold_percent must be
non-negative", "min_row_count must be non-negative") and ensure these checks run
before using those variables in logic that assumes non-negative values.
- Around line 183-195: Replace the ANSI string concatenation (the raw || usage)
inside the current_bucket/previous_bucket expressions with the existing macro
elementary.edr_concat([...]) (pass the column expressions as an array) so
concatenation is adapter-safe; also remove raw LIMIT 1 usage in the
previous_bucket/current_bucket queries and implement adapter dispatch for
single-row limits—either create a small helper macro (e.g., edr_limit_one())
that emits "LIMIT 1" for standard adapters and "TOP (1)" for fabric, or add
adapter-specific branches where current_bucket and previous_bucket select only
one row using TOP (1) on the fabric path; update references to current_bucket
and previous_bucket accordingly.

---

Nitpick comments:
In `@macros/edr/tests/test_data_freshness_sla.sql`:
- Around line 243-244: Remove the now-unused column by deleting the select
expression that produces target_date_end_utc (the call to
elementary.edr_cast_as_timestamp("'" ~ target_date_end_utc ~ "'") and its alias
target_date_end_utc) from the query; search for the alias target_date_end_utc
and the function call elementary.edr_cast_as_timestamp in the select list and
remove that expression so the query no longer projects the unused column.
- Around line 289-296: The SQL uses a confusing "when 1 = 1" as a catch-all
inside the {% if not deadline_passed %} block; replace that literal with an
explicit boolean literal for readability by emitting when {{
elementary.edr_boolean_literal(true) }} then {{
elementary.edr_boolean_literal(false) }} (keeping the surrounding {% if not
deadline_passed %} and the case branches untouched) so the intent is clear for
the is_failure calculation that uses freshness_status and deadline_passed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dd26795e-9e29-4076-9ed3-6e32f1ca6573

📥 Commits

Reviewing files that changed from the base of the PR and between be7de95 and 4bb34e6.

📒 Files selected for processing (2)
  • macros/edr/tests/test_data_freshness_sla.sql
  • macros/edr/tests/test_volume_threshold.sql

Comment on lines +139 to +155
{% set full_table_name = elementary.relation_to_full_name(model_relation) %}

with

{# Union persisted history with newly computed metrics from this run #}
all_metrics as (
select
bucket_start,
bucket_end,
metric_value as row_count,
1 as source_priority
from {{ data_monitoring_metrics_table }}
where
upper(full_table_name) = upper('{{ full_table_name }}')
and lower(metric_name) = 'row_count'
and metric_properties
= {{ elementary.dict_to_quoted_json(metric_props) }}
Copy link

@coderabbitai coderabbitai bot Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Escape relation-derived literals before embedding them in SQL.

full_table_name and model_relation.identifier are still dropped into single-quoted SQL literals without escaping. An apostrophe in a quoted identifier will break this query the same way where_expression did in the other new test.

🛡️ Possible fix
-    {% set full_table_name = elementary.relation_to_full_name(model_relation) %}
+    {% set full_table_name = elementary.relation_to_full_name(model_relation)
+        | replace("'", "''") %}
+    {% set model_identifier = model_relation.identifier | replace("'", "''") %}
@@
-        '{{ model_relation.identifier }}' as model_name,
+        '{{ model_identifier }}' as model_name,

Also applies to: 254-254

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_volume_threshold.sql` around lines 139 - 155, The SQL
embeds relation-derived literals (full_table_name and model_relation.identifier)
directly into single-quoted strings, which breaks when identifiers contain
apostrophes; update the template to escape single quotes before embedding (e.g.,
replace "'" with "''" or call a provided safe-quoting helper) for the uses in
the all_metrics WHERE clause (the upper('{{ full_table_name }}') occurrence) and
any other locations that drop model_relation.identifier into single-quoted
literals; ensure you reuse the escaping approach consistently alongside existing
helpers like elementary.relation_to_full_name and elementary.dict_to_quoted_json
so the generated SQL remains valid.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This applies to all Elementary tests that embed full_table_name and model_relation.identifier in SQL literals — not specific to this PR. Table/model names with apostrophes are an extreme edge case. Deferring to a separate hardening effort.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haritamar, understood — this is a cross-cutting concern across all Elementary tests rather than something introduced by this PR, and the apostrophe-in-identifier edge case is indeed extreme. Happy to defer the hardening to a dedicated effort.


✏️ Learnings added
Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 932
File: macros/edr/tests/test_volume_threshold.sql:149-165
Timestamp: 2026-03-13T22:34:26.847Z
Learning: In the Elementary dbt-data-reliability codebase, embedding `full_table_name` (from `elementary.relation_to_full_name(model_relation)`) and `model_relation.identifier` directly into single-quoted SQL literals without apostrophe-escaping is a known pattern across all Elementary tests. The team considers apostrophes in table/model names an extreme edge case and defers SQL-literal hardening (e.g., replacing `'` with `''`) to a separate cross-cutting effort rather than fixing it on a per-PR basis.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: haritamar
Repo: elementary-data/dbt-data-reliability PR: 918
File: macros/utils/cross_db_utils/incremental_strategy.sql:21-31
Timestamp: 2026-01-27T21:01:16.833Z
Learning: In `models/edr/run_results/test_result_rows.sql`, the model query is always empty and the table is updated via post-hook/on-run-end hook. The `incremental_strategy` configuration using `get_append_only_incremental_strategy()` is only needed to prevent Athena from failing with its default `insert_overwrite` strategy. For other adapters, returning `none` (to use adapter defaults) is acceptable because the empty query doesn't trigger incremental logic.

Learnt from: GuyEshdat
Repo: elementary-data/dbt-data-reliability PR: 838
File: macros/utils/table_operations/delete_and_insert.sql:138-146
Timestamp: 2025-08-10T11:29:06.982Z
Learning: In the Elementary dbt-data-reliability codebase, the team prefers to add the `escape_reserved_keywords` macro only in places where they've actually encountered errors with reserved keywords, rather than preemptively adding it everywhere. They rely on E2E tests to catch future issues if reserved keywords cause problems in untested code paths. This pragmatic approach avoids unnecessary defensive programming and keeps the codebase simpler.

Learnt from: GuyEshdat
Repo: elementary-data/dbt-data-reliability PR: 838
File: macros/utils/table_operations/insert_rows.sql:156-0
Timestamp: 2025-08-10T11:28:43.632Z
Learning: In the Elementary dbt-data-reliability codebase, the `dremio__escape_special_chars` macro in `macros/utils/table_operations/insert_rows.sql` receives strings where single quotes are already escaped as `\'` from upstream processing. The macro correctly converts these to Dremio's SQL escaping format by replacing `\'` with `''`. This is different from other database implementations that handle bare single quotes directly.

Learnt from: GuyEshdat
Repo: elementary-data/dbt-data-reliability PR: 838
File: macros/utils/sql_utils/escape_reserved_keywords.sql:29-31
Timestamp: 2025-08-10T11:29:44.525Z
Learning: In the Elementary dbt-data-reliability codebase, the `escape_keywords` macro in `macros/utils/sql_utils/escape_reserved_keywords.sql` is specifically designed to escape SQL reserved keywords for different database adapters. Reserved keywords are predefined language tokens (like 'filter', 'sql', 'timestamp', 'value', 'one', 'min', 'max', 'sum', 'count' for Dremio) that don't contain special characters like double quotes. Therefore, the `dremio__escape_keywords` macro correctly wraps these keywords in double quotes without needing to handle embedded quotes, as reserved keywords by definition are simple identifiers.

Learnt from: GuyEshdat
Repo: elementary-data/dbt-data-reliability PR: 838
File: macros/utils/sql_utils/escape_reserved_keywords.sql:29-31
Timestamp: 2025-08-10T11:29:44.525Z
Learning: In the Elementary dbt-data-reliability codebase, the `escape_keywords` macro in `macros/utils/sql_utils/escape_reserved_keywords.sql` is specifically designed to escape SQL reserved keywords for different database adapters. Reserved keywords are predefined language tokens (like 'filter', 'sql', 'timestamp', 'value', 'one', 'min', 'max', 'sum', 'count' for Dremio) that don't contain special characters like double quotes. Therefore, the `dremio__escape_keywords` macro correctly wraps these keywords in double quotes without needing to handle embedded quotes, as reserved keywords by definition are simple identifiers.

Learnt from: arbiv
Repo: elementary-data/dbt-data-reliability PR: 861
File: macros/edr/tests/test_utils/clean_elementary_test_tables.sql:45-50
Timestamp: 2025-09-15T06:06:05.601Z
Learning: In the Elementary dbt-data-reliability codebase, the team prefers simpler, direct approaches over creating helper macros for centralized quoting logic, even when it might provide better maintainability across multiple use cases. They value code simplicity and readability over abstraction.

devin-ai-integration bot and others added 4 commits March 12, 2026 14:39
Elementary only processes complete time buckets (the latest full bucket
before run_started_at). With daily buckets, today's bucket is incomplete
and gets excluded. Tests were putting anomalous data in today's bucket,
so the macro never saw the spike/drop.

Fix: shift all test data one day back so the anomalous data lands in
yesterday's bucket (the latest complete bucket) and baseline data in
the day before.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
- Replace 'cast(... as numeric)' with edr_type_numeric() for adapter-aware type
- Replace 'limit 1' with row_number() pattern (SQL Server/Fabric compatible)
- Replace '||' string concat with edr_concat() (SQL Server/Fabric compatible)

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
macros/edr/tests/test_volume_threshold.sql (2)

49-62: ⚠️ Potential issue | 🟡 Minor

Reject negative thresholds and min_row_count.

Negative warn_threshold_percent / error_threshold_percent make the severity branches effectively always-on, and a negative min_row_count disables the baseline guard. Add non-negative checks alongside the existing compiler validation.

Possible fix
         {% if warn_threshold_percent > error_threshold_percent %}
             {{
                 exceptions.raise_compiler_error(
                     "warn_threshold_percent cannot exceed error_threshold_percent"
                 )
             }}
         {% endif %}
+        {% if warn_threshold_percent < 0 %}
+            {{
+                exceptions.raise_compiler_error(
+                    "warn_threshold_percent must be non-negative"
+                )
+            }}
+        {% endif %}
+        {% if error_threshold_percent < 0 %}
+            {{
+                exceptions.raise_compiler_error(
+                    "error_threshold_percent must be non-negative"
+                )
+            }}
+        {% endif %}
+        {% if min_row_count < 0 %}
+            {{
+                exceptions.raise_compiler_error(
+                    "min_row_count must be non-negative"
+                )
+            }}
+        {% endif %}
         {% if direction not in ["both", "spike", "drop"] %}
             {{
                 exceptions.raise_compiler_error(
                     "direction must be 'both', 'spike', or 'drop'"
                 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_volume_threshold.sql` around lines 49 - 62, Add
non-negative validation for warn_threshold_percent, error_threshold_percent, and
min_row_count: detect if any of these values is less than 0 and call
exceptions.raise_compiler_error with a clear message (e.g., "X must be
non-negative") similar to the existing checks; update the macro where
warn_threshold_percent/error_threshold_percent and direction are validated so
the new checks run (use the same style as the current
exceptions.raise_compiler_error calls and reference the existing variables
warn_threshold_percent, error_threshold_percent, and min_row_count).

139-140: ⚠️ Potential issue | 🟡 Minor

Escape relation-derived literals before interpolating them.

full_table_name and model_relation.identifier are still dropped into single-quoted SQL literals verbatim. An apostrophe in a quoted identifier will break the generated SQL here the same way it did for where_expression in the sibling test.

Possible fix
-    {% set full_table_name = elementary.relation_to_full_name(model_relation) %}
+    {% set full_table_name = elementary.relation_to_full_name(model_relation)
+        | replace("'", "''") %}
+    {% set model_identifier = model_relation.identifier | replace("'", "''") %}
@@
-        '{{ model_relation.identifier }}' as model_name,
+        '{{ model_identifier }}' as model_name,

Also applies to: 267-268

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_volume_threshold.sql` around lines 139 - 140, The test
interpolates relation-derived values directly into single-quoted SQL literals
(full_table_name and model_relation.identifier), which will break if the
identifier contains an apostrophe; fix by escaping those literals before
interpolation (e.g., replace a single quote with two single quotes or use a
Jinja/dbt SQL-escaping filter) wherever full_table_name and
model_relation.identifier are injected into single-quoted strings in this test,
and apply the same escape to the other occurrences referenced (around the second
spot at the same test, lines ~267-268).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@macros/edr/tests/test_volume_threshold.sql`:
- Around line 49-62: Add non-negative validation for warn_threshold_percent,
error_threshold_percent, and min_row_count: detect if any of these values is
less than 0 and call exceptions.raise_compiler_error with a clear message (e.g.,
"X must be non-negative") similar to the existing checks; update the macro where
warn_threshold_percent/error_threshold_percent and direction are validated so
the new checks run (use the same style as the current
exceptions.raise_compiler_error calls and reference the existing variables
warn_threshold_percent, error_threshold_percent, and min_row_count).
- Around line 139-140: The test interpolates relation-derived values directly
into single-quoted SQL literals (full_table_name and model_relation.identifier),
which will break if the identifier contains an apostrophe; fix by escaping those
literals before interpolation (e.g., replace a single quote with two single
quotes or use a Jinja/dbt SQL-escaping filter) wherever full_table_name and
model_relation.identifier are injected into single-quoted strings in this test,
and apply the same escape to the other occurrences referenced (around the second
spot at the same test, lines ~267-268).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aedf92e9-ac2b-4d0f-9af9-e8f1f5a6bd67

📥 Commits

Reviewing files that changed from the base of the PR and between 6e75bc8 and 0184a34.

📒 Files selected for processing (1)
  • macros/edr/tests/test_volume_threshold.sql

devin-ai-integration bot and others added 2 commits March 12, 2026 16:08
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
- Skip pytz.all_timezones validation in dbt-fusion (known discrepancy, dbt-labs/dbt-fusion#143)
- Replace || concatenation with edr_concat() in test_data_freshness_sla.sql for SQL Server/Fabric

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
devin-ai-integration bot and others added 5 commits March 12, 2026 16:58
…la_deadline_utc

In dbt-fusion, pytz.localize() produces incorrect timezone-aware datetimes,
causing deadline_passed to be False when it should be True. Use
datetime.timezone.utc and replace(tzinfo=) in the fusion path instead.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
In dbt-fusion, datetime.timezone.utc doesn't exist and timezone-aware
datetime comparison produces incorrect results. Use datetime.utcnow()
with manual offset calculation so all comparisons use naive datetimes.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…a_freshness_sla

- Fabric: rename CTE current_bucket -> curr_bucket (SQL Server rejects CURRENT_ prefix)
- BigQuery: use \' instead of '' to escape single quotes in where_suffix string literals
- fusion: replace pytz.localize() with stdlib datetime.timezone.utc in calculate_sla_deadline_utc to fix deadline_passed always being False

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SQL Server/Fabric cannot resolve a CTE name inside a nested subquery
within another CTE. Replace (select bucket_start from curr_bucket)
with an INNER JOIN to curr_bucket.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macros/edr/tests/test_utils/sla_utils.sql`:
- Around line 73-99: The bug is reusing a single tz_offset (tz_offset) captured
from now_local for all boundary times, which breaks on DST transition days;
instead compute the offset separately for each combined local datetime
(day_start_utc, day_end_utc, sla_deadline_utc) by creating an aware local
datetime for that specific local time (using now_local.tzinfo or equivalent) and
calling utcoffset() on that datetime before subtracting it; update the code that
sets day_start_utc, day_end_utc and sla_deadline_utc to use per-datetime offsets
(referencing now_local.tzinfo, target_date_local, sla_hour, sla_minute, and the
day_* and sla_deadline_utc variables) so each boundary uses the correct
DST-aware offset.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fd17dc74-d15e-46aa-8897-bd165b00b96c

📥 Commits

Reviewing files that changed from the base of the PR and between 6e75bc8 and 380a9fa.

📒 Files selected for processing (3)
  • macros/edr/tests/test_data_freshness_sla.sql
  • macros/edr/tests/test_utils/sla_utils.sql
  • macros/edr/tests/test_volume_threshold.sql
🚧 Files skipped from review as they are similar to previous changes (1)
  • macros/edr/tests/test_volume_threshold.sql

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
macros/edr/tests/test_volume_threshold.sql (2)

139-140: ⚠️ Potential issue | 🟡 Minor

Escape relation-derived literals before embedding in single-quoted SQL.

full_table_name and model_relation.identifier are interpolated into quoted literals without escaping. Apostrophes in identifiers will break generated SQL. This also appears unresolved from earlier feedback.

🛠️ Suggested patch
-    {% set full_table_name = elementary.relation_to_full_name(model_relation) %}
+    {% set full_table_name = elementary.relation_to_full_name(model_relation) | replace("'", "''") %}
+    {% set model_identifier = model_relation.identifier | replace("'", "''") %}
@@
-        '{{ model_relation.identifier }}' as model_name,
+        '{{ model_identifier }}' as model_name,
#!/bin/bash
# Verify unescaped single-quoted literal interpolations
rg -n "upper\\('\\{\\{ full_table_name \\}\\}'\\)|'\\{\\{ model_relation\\.identifier \\}\\}'" macros/edr/tests/test_volume_threshold.sql

Also applies to: 152-153, 272-272

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_volume_threshold.sql` around lines 139 - 140, The SQL
embeds relation-derived literals raw into single-quoted strings (full_table_name
and model_relation.identifier), which breaks if identifiers contain apostrophes;
update the macro to escape single quotes before embedding by applying a Jinja
replace filter (e.g., full_table_name|replace("'", "''") and
model_relation.identifier|replace("'", "''")) wherever they are interpolated
into single-quoted SQL, ensuring all occurrences (including the other spots
noted) are fixed.

49-56: ⚠️ Potential issue | 🟡 Minor

Add non-negative validation for thresholds and min_row_count.

warn_threshold_percent, error_threshold_percent, and min_row_count can still be negative, which can invalidate severity semantics. This appears unresolved from earlier feedback.

🛠️ Suggested patch
         {% if warn_threshold_percent > error_threshold_percent %}
             {{
                 exceptions.raise_compiler_error(
                     "warn_threshold_percent cannot exceed error_threshold_percent"
                 )
             }}
         {% endif %}
+        {% if warn_threshold_percent < 0 %}
+            {{
+                exceptions.raise_compiler_error(
+                    "warn_threshold_percent must be non-negative"
+                )
+            }}
+        {% endif %}
+        {% if error_threshold_percent < 0 %}
+            {{
+                exceptions.raise_compiler_error(
+                    "error_threshold_percent must be non-negative"
+                )
+            }}
+        {% endif %}
+        {% if min_row_count < 0 %}
+            {{
+                exceptions.raise_compiler_error(
+                    "min_row_count must be non-negative"
+                )
+            }}
+        {% endif %}
         {% if direction not in ["both", "spike", "drop"] %}
             {{
                 exceptions.raise_compiler_error(
                     "direction must be 'both', 'spike', or 'drop'"
                 )
#!/bin/bash
# Verify missing non-negative validations in this macro
rg -n "warn_threshold_percent < 0|error_threshold_percent < 0|min_row_count < 0" macros/edr/tests/test_volume_threshold.sql

Also applies to: 240-267

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_volume_threshold.sql` around lines 49 - 56, Add
explicit non-negative validation checks for warn_threshold_percent,
error_threshold_percent, and min_row_count in the macro (similar to the existing
warn > error check): if any of warn_threshold_percent < 0,
error_threshold_percent < 0, or min_row_count < 0, call
exceptions.raise_compiler_error with a clear message (e.g., "<param> cannot be
negative"). Locate the validations near the existing comparisons around the
warn_threshold_percent/error_threshold_percent block and use the same
exceptions.raise_compiler_error mechanism to enforce these constraints.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@macros/edr/tests/test_volume_threshold.sql`:
- Around line 139-140: The SQL embeds relation-derived literals raw into
single-quoted strings (full_table_name and model_relation.identifier), which
breaks if identifiers contain apostrophes; update the macro to escape single
quotes before embedding by applying a Jinja replace filter (e.g.,
full_table_name|replace("'", "''") and model_relation.identifier|replace("'",
"''")) wherever they are interpolated into single-quoted SQL, ensuring all
occurrences (including the other spots noted) are fixed.
- Around line 49-56: Add explicit non-negative validation checks for
warn_threshold_percent, error_threshold_percent, and min_row_count in the macro
(similar to the existing warn > error check): if any of warn_threshold_percent <
0, error_threshold_percent < 0, or min_row_count < 0, call
exceptions.raise_compiler_error with a clear message (e.g., "<param> cannot be
negative"). Locate the validations near the existing comparisons around the
warn_threshold_percent/error_threshold_percent block and use the same
exceptions.raise_compiler_error mechanism to enforce these constraints.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c2d49a66-92fb-43e6-99e1-c5a71fd6a241

📥 Commits

Reviewing files that changed from the base of the PR and between 380a9fa and 5c289d1.

📒 Files selected for processing (1)
  • macros/edr/tests/test_volume_threshold.sql

…ility

Move SLA deadline_passed check from compile-time Python (broken in
dbt-fusion due to pytz issues) to SQL-level using
edr_condition_as_boolean with edr_current_timestamp_in_utc.

Replace cross-CTE subquery in volume_threshold with LAG window function
to fix Fabric/SQL Server "Invalid object name" error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
macros/edr/tests/test_utils/sla_utils.sql (1)

139-150: ⚠️ Potential issue | 🔴 Critical

Update test_execution_sla.sql to compute deadline_passed from SQL instead of expecting it from sla_info.

The removal of deadline_passed from sla_utils.sql's return payload (per the PR objective to move this check to SQL) was not completed in test_execution_sla.sql. Line 156 still attempts to access sla_info.deadline_passed, which will be None, causing the conditional gate at line 296 ({% if not deadline_passed %}) to behave incorrectly and always bypass SLA breach detection.

Compute deadline_passed within the SQL query by comparing CURRENT_TIMESTAMP() (in UTC) against sla_deadline_utc, or pass it as a parameter computed from the available deadline fields before the macro is called.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_utils/sla_utils.sql` around lines 139 - 150,
test_execution_sla.sql still expects sla_info.deadline_passed but sla_utils.sql
no longer returns it; update the test to compute deadline_passed from the SLA
deadline instead of reading sla_info.deadline_passed — either (A) add a SQL
expression in the test's query that sets deadline_passed = (CURRENT_TIMESTAMP()
> sla_deadline_utc) (ensuring UTC comparison) where sla_deadline_utc is
available, or (B) compute deadline_passed in the test harness before calling the
macro and pass it into the template; update all references to deadline_passed
(e.g., the conditional {% if not deadline_passed %}) to use this newly computed
value.
♻️ Duplicate comments (1)
macros/edr/tests/test_volume_threshold.sql (1)

139-155: ⚠️ Potential issue | 🟡 Minor

Escape apostrophes in relation-derived literals before embedding in SQL.

full_table_name is interpolated directly into a single-quoted SQL literal (line 152). If the relation name contains an apostrophe, this will produce a syntax error. The same applies to model_relation.identifier at line 250.

Proposed fix
-    {% set full_table_name = elementary.relation_to_full_name(model_relation) %}
+    {% set full_table_name = elementary.relation_to_full_name(model_relation) | replace("'", "''") %}
+    {% set model_identifier = model_relation.identifier | replace("'", "''") %}

Then at line 250:

-        '{{ model_relation.identifier }}' as model_name,
+        '{{ model_identifier }}' as model_name,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_volume_threshold.sql` around lines 139 - 155, The SQL
literal injection happens because full_table_name (from
elementary.relation_to_full_name(model_relation)) and model_relation.identifier
are interpolated into single-quoted SQL without escaping; fix by sanitizing
those values before embedding: replace any single-quote characters with two
single-quotes (standard SQL escaping) where full_table_name is set and where
model_relation.identifier is used, so the embedded literals are safe in the
WHERE clause and later uses; update the variable assignments that derive
full_table_name and the use site for model_relation.identifier to use the
escaped versions.
🧹 Nitpick comments (1)
macros/edr/tests/test_volume_threshold.sql (1)

49-62: Also validate non-negative thresholds and min_row_count.

Negative thresholds would make warn/error conditions always true or behave unexpectedly, and a negative min_row_count defeats the baseline guard. Add compile-time validation alongside the existing checks.

Proposed fix
         {% if warn_threshold_percent > error_threshold_percent %}
             {{
                 exceptions.raise_compiler_error(
                     "warn_threshold_percent cannot exceed error_threshold_percent"
                 )
             }}
         {% endif %}
+        {% if warn_threshold_percent < 0 or error_threshold_percent < 0 %}
+            {{
+                exceptions.raise_compiler_error(
+                    "warn_threshold_percent and error_threshold_percent must be non-negative"
+                )
+            }}
+        {% endif %}
+        {% if min_row_count < 0 %}
+            {{
+                exceptions.raise_compiler_error(
+                    "min_row_count must be non-negative"
+                )
+            }}
+        {% endif %}
         {% if direction not in ["both", "spike", "drop"] %}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_volume_threshold.sql` around lines 49 - 62, Add
compile-time validation to ensure warn_threshold_percent,
error_threshold_percent, and min_row_count are non-negative and keep the
existing warn<=error check; specifically, in the same macro where direction and
warn/error comparisons are validated, add checks that if warn_threshold_percent
< 0 or error_threshold_percent < 0 or min_row_count < 0 you call
exceptions.raise_compiler_error with a clear message (use the same
exceptions.raise_compiler_error function used for the other validations), and
keep the existing direction validation for values not in ["both", "spike",
"drop"] and the warn_threshold_percent > error_threshold_percent check as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macros/edr/tests/test_volume_threshold.sql`:
- Around line 136-138: The template uses {{ test_metrics_table }} (set via
elementary.get_elementary_test_table) directly in a FROM clause which can be
None during non-execute/cached-miss paths; update the macro to guard against
that by checking if test_metrics_table is truthy before emitting the FROM ... {{
test_metrics_table }} fragment (e.g., wrap the UNION branch or the query block
that references test_metrics_table in an {% if test_metrics_table %} ... {% else
%} block) and either skip that branch or raise a clear error message so you
never output "FROM None".

---

Outside diff comments:
In `@macros/edr/tests/test_utils/sla_utils.sql`:
- Around line 139-150: test_execution_sla.sql still expects
sla_info.deadline_passed but sla_utils.sql no longer returns it; update the test
to compute deadline_passed from the SLA deadline instead of reading
sla_info.deadline_passed — either (A) add a SQL expression in the test's query
that sets deadline_passed = (CURRENT_TIMESTAMP() > sla_deadline_utc) (ensuring
UTC comparison) where sla_deadline_utc is available, or (B) compute
deadline_passed in the test harness before calling the macro and pass it into
the template; update all references to deadline_passed (e.g., the conditional {%
if not deadline_passed %}) to use this newly computed value.

---

Duplicate comments:
In `@macros/edr/tests/test_volume_threshold.sql`:
- Around line 139-155: The SQL literal injection happens because full_table_name
(from elementary.relation_to_full_name(model_relation)) and
model_relation.identifier are interpolated into single-quoted SQL without
escaping; fix by sanitizing those values before embedding: replace any
single-quote characters with two single-quotes (standard SQL escaping) where
full_table_name is set and where model_relation.identifier is used, so the
embedded literals are safe in the WHERE clause and later uses; update the
variable assignments that derive full_table_name and the use site for
model_relation.identifier to use the escaped versions.

---

Nitpick comments:
In `@macros/edr/tests/test_volume_threshold.sql`:
- Around line 49-62: Add compile-time validation to ensure
warn_threshold_percent, error_threshold_percent, and min_row_count are
non-negative and keep the existing warn<=error check; specifically, in the same
macro where direction and warn/error comparisons are validated, add checks that
if warn_threshold_percent < 0 or error_threshold_percent < 0 or min_row_count <
0 you call exceptions.raise_compiler_error with a clear message (use the same
exceptions.raise_compiler_error function used for the other validations), and
keep the existing direction validation for values not in ["both", "spike",
"drop"] and the warn_threshold_percent > error_threshold_percent check as-is.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4763d245-67d0-471a-833e-8ce1a3ff655b

📥 Commits

Reviewing files that changed from the base of the PR and between 5c289d1 and 75a0779.

📒 Files selected for processing (3)
  • macros/edr/tests/test_data_freshness_sla.sql
  • macros/edr/tests/test_utils/sla_utils.sql
  • macros/edr/tests/test_volume_threshold.sql
🚧 Files skipped from review as they are similar to previous changes (1)
  • macros/edr/tests/test_data_freshness_sla.sql

Comment on lines +136 to +138
{% set test_metrics_table = elementary.get_elementary_test_table(
elementary.get_elementary_test_table_name(), "metrics"
) %}
Copy link

@coderabbitai coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if collect_table_metrics always creates a metrics table, or if there's a guard
rg -n -A10 "def collect_table_metrics|macro collect_table_metrics" macros/

# Check how other tests handle potentially missing test_metrics_table
rg -n -B2 -A5 "get_elementary_test_table.*metrics" macros/edr/tests/

Repository: elementary-data/dbt-data-reliability

Length of output: 1536


🏁 Script executed:

# Read the collect_table_metrics macro to understand its behavior
head -100 macros/edr/tests/test_utils/collect_table_metrics.sql

# Find the get_elementary_test_table implementation
rg -n "def get_elementary_test_table|macro get_elementary_test_table" macros/ --max-count=5

Repository: elementary-data/dbt-data-reliability

Length of output: 3369


🏁 Script executed:

# Read the test_volume_threshold.sql file around lines 136-138 and 157-165
sed -n '130,170p' macros/edr/tests/test_volume_threshold.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 1441


🏁 Script executed:

# Check how test_metrics_table is used after line 136
sed -n '136,200p' macros/edr/tests/test_volume_threshold.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 2384


🏁 Script executed:

# Read get_elementary_test_table implementation
cat -n macros/edr/tests/test_utils/get_elementary_test_table.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 466


🏁 Script executed:

# Find and read create_elementary_test_table implementation
rg -n "macro create_elementary_test_table" macros/ --max-count=1

Repository: elementary-data/dbt-data-reliability

Length of output: 181


🏁 Script executed:

# Check if there are any early returns or conditions in collect_table_metrics that might prevent table creation
cat -n macros/edr/tests/test_utils/collect_table_metrics.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 3796


🏁 Script executed:

# Check how test_volume_threshold calls collect_table_metrics
rg -n "collect_table_metrics|test_volume_threshold" macros/edr/tests/test_volume_threshold.sql -B2 -A2

Repository: elementary-data/dbt-data-reliability

Length of output: 680


🏁 Script executed:

# Read create_elementary_test_table implementation
sed -n '1,80p' macros/edr/tests/test_utils/create_elementary_test_table.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 1521


🏁 Script executed:

# Check store_metrics_table_in_cache implementation
rg -n "macro store_metrics_table_in_cache" macros/ -A20

Repository: elementary-data/dbt-data-reliability

Length of output: 1309


🏁 Script executed:

# Check the full context of lines 157-165 in test_volume_threshold.sql
sed -n '150,170p' macros/edr/tests/test_volume_threshold.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 799


🏁 Script executed:

# Check table_monitoring_query to see if it could be empty
rg -n "macro table_monitoring_query" macros/ -A30 | head -50

Repository: elementary-data/dbt-data-reliability

Length of output: 3054


🏁 Script executed:

# Check if there are any conditions that might prevent table creation
rg -n "create_or_replace" macros/edr/tests/test_utils/ -B2 -A5

Repository: elementary-data/dbt-data-reliability

Length of output: 919


🏁 Script executed:

# See how dbt/Jinja handles None in templates - check if there are other examples
rg -n "{% if.*test.*table.*%}" macros/edr/tests/ -A2

Repository: elementary-data/dbt-data-reliability

Length of output: 177


🏁 Script executed:

# Check if there are similar patterns in codebase with guards against None tables
rg -n "{% if.*_table" macros/edr/tests/ -A1 | head -30

Repository: elementary-data/dbt-data-reliability

Length of output: 168


🏁 Script executed:

# Verify table_monitoring_query always generates valid output
sed -n '1,100p' macros/edr/data_monitoring/monitors_query/table_monitoring_query.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 2697


🏁 Script executed:

# Check if test_volume_threshold is used in a dbt test context and when it executes
rg -n "test_volume_threshold" macros/edr/tests/ -B5 -A2 | head -40

Repository: elementary-data/dbt-data-reliability

Length of output: 62


🏁 Script executed:

# Check how dbt handles None in Jinja templates by looking for similar patterns
rg -n "from \{\{.*\}\}" macros/edr/tests/ -A1 -B1 | head -50

Repository: elementary-data/dbt-data-reliability

Length of output: 3082


🏁 Script executed:

# Verify if there are any guards we missed in the broader context
sed -n '130,170p' macros/edr/tests/test_volume_threshold.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 1441


🏁 Script executed:

# Check what happens when Jinja renders None - look for test cases or documentation
rg -n "return(none)" macros/ -A2 -B2 | head -40

Repository: elementary-data/dbt-data-reliability

Length of output: 62


🏁 Script executed:

# See if there are any error handling paths in collect_table_metrics
rg -n "try|catch|exception|error" macros/edr/tests/test_utils/collect_table_metrics.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 187


🏁 Script executed:

# Check if test_volume_threshold always expects collect_table_metrics to be called
sed -n '70,140p' macros/edr/tests/test_volume_threshold.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 2729


🏁 Script executed:

# Check if the SQL generation is conditional on test_metrics_table existing
sed -n '110,180p' macros/edr/tests/test_volume_threshold.sql | grep -A50 "get_volume_threshold_comparison_query"

Repository: elementary-data/dbt-data-reliability

Length of output: 1777


🏁 Script executed:

# Check if there's a guard around the macro invocation or any lazy evaluation
sed -n '105,120p' macros/edr/tests/test_volume_threshold.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 661


🏁 Script executed:

# Understand how dbt phases work - check if test execution is parse-only or execute-phase
rg -n "execute" macros/edr/tests/test_volume_threshold.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 176


🏁 Script executed:

# Check what no_results_query returns
rg -n "macro no_results_query" macros/ -A5

Repository: elementary-data/dbt-data-reliability

Length of output: 361


🏁 Script executed:

# Verify the full conditional context - does the guard protect the query generation?
sed -n '45,125p' macros/edr/tests/test_volume_threshold.sql

Repository: elementary-data/dbt-data-reliability

Length of output: 3152


Add null guard before using test_metrics_table in the FROM clause.

get_elementary_test_table returns None if the table is not cached or during non-execute phases. If collect_table_metrics fails to create or cache the metrics table (e.g., invalid monitoring query, missing permissions), using {{ test_metrics_table }} in the FROM clause generates from None, causing a SQL syntax error. Add a null check to handle this gracefully—either skip the union branch or raise a clear error.

This applies to both occurrences: line 164 in the main volume threshold query and the same pattern if repeated elsewhere.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_volume_threshold.sql` around lines 136 - 138, The
template uses {{ test_metrics_table }} (set via
elementary.get_elementary_test_table) directly in a FROM clause which can be
None during non-execute/cached-miss paths; update the macro to guard against
that by checking if test_metrics_table is truthy before emitting the FROM ... {{
test_metrics_table }} fragment (e.g., wrap the UNION branch or the query block
that references test_metrics_table in an {% if test_metrics_table %} ... {% else
%} block) and either skip that branch or raise a clear error message so you
never output "FROM None".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, collect_table_metrics creates the test metrics table — if it fails, the entire test errors out before reaching the query. The data_monitoring_metrics table is an Elementary model that always exists in a properly configured project. A null guard here would mask underlying setup issues rather than help. Deferring — this is general test infrastructure, not specific to this PR.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate Limit Exceeded

@haritamar have exceeded the limit for the number of chat messages per hour. Please wait 2 minutes and 5 seconds before sending another message.

haritamar and others added 2 commits March 13, 2026 18:05
…ezone.utc

dbt-fusion does not support datetime.timezone.utc, causing
"undefined value" render errors. Revert to the pytz probe approach.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix clock-flaky SLA tests: use Etc/GMT-14 for deadline-passed cases
  so the deadline is always in the past regardless of CI runner time
- Use exact status assertions (== "error") instead of != "pass" in
  volume threshold tests to catch severity regressions
- Add negative threshold and min_row_count validation in volume_threshold
- Document DST limitation in fusion sla_utils path

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
macros/edr/tests/test_utils/sla_utils.sql (1)

143-152: ⚠️ Potential issue | 🔴 Critical

Keep deadline_passed in the shared SLA payload until execution_sla is migrated.

macros/edr/tests/test_execution_sla.sql still passes sla_info.deadline_passed into get_execution_sla_query() at Lines 150-160, and the conditional at Lines 292-300 treats a falsy value as “deadline not passed”. With this key removed here, execution_sla will compile the when 1 = 1 then false branch on every run and stop failing after the deadline.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_utils/sla_utils.sql` around lines 143 - 152, The
returned SLA payload currently omits the deadline_passed key which breaks
get_execution_sla_query() (called from test_execution_sla) because that caller
expects sla_info.deadline_passed; restore a boolean "deadline_passed" entry in
the returned dict (using the existing deadline computation/variable in scope) so
sla_info.deadline_passed is present for the conditional logic in
get_execution_sla_query()/execution_sla until execution_sla is migrated.
♻️ Duplicate comments (2)
macros/edr/tests/test_volume_threshold.sql (2)

149-162: ⚠️ Potential issue | 🟡 Minor

Escape relation-derived literals before embedding them in SQL.

full_table_name and model_relation.identifier are still interpolated into single-quoted literals. A quoted identifier like O'Brien will break both the cache lookup predicate and the emitted model_name.

Possible fix
-    {% set full_table_name = elementary.relation_to_full_name(model_relation) %}
+    {% set full_table_name = elementary.relation_to_full_name(model_relation)
+        | replace("'", "''") %}
+    {% set model_identifier = model_relation.identifier | replace("'", "''") %}
@@
-        '{{ model_relation.identifier }}' as model_name,
+        '{{ model_identifier }}' as model_name,

Also applies to: 259-260

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_volume_threshold.sql` around lines 149 - 162, The SQL
interpolations of relation-derived strings (full_table_name and
model_relation.identifier) are not escaped and will break for names containing
single quotes; update places that embed '{{ full_table_name }}' and '{{
model_relation.identifier }}' (used in the query against
data_monitoring_metrics_table and the emitted model_name) to escape single
quotes before embedding (for example use Jinja replace to turn ' into ''), and
apply the same fix to the other occurrences noted around lines 259-260 so all
literal SQL comparisons and emitted names use the escaped values.

146-174: ⚠️ Potential issue | 🔴 Critical

Guard the fresh-metrics branch when the temp metrics table is absent.

On cache-only runs, collect_table_metrics can skip materializing a test metrics table. This query still renders from {{ test_metrics_table }}, so the cache-hit path turns into FROM None instead of falling back to data_monitoring_metrics.

Possible fix
     {% set test_metrics_table = elementary.get_elementary_test_table(
         elementary.get_elementary_test_table_name(), "metrics"
     ) %}
@@
         all_metrics as (
             select
                 bucket_start,
                 bucket_end,
                 metric_value as row_count,
                 1 as source_priority
             from {{ data_monitoring_metrics_table }}
             where
                 upper(full_table_name) = upper('{{ full_table_name }}')
                 and lower(metric_name) = 'row_count'
                 and metric_properties
                 = {{ elementary.dict_to_quoted_json(metric_props) }}

+            {% if test_metrics_table %}
             union all

             select
                 bucket_start,
                 bucket_end,
                 metric_value as row_count,
                 0 as source_priority
             from {{ test_metrics_table }}
+            {% endif %}
         ),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_volume_threshold.sql` around lines 146 - 174, The union
branch that selects from the ephemeral test_metrics_table can render "FROM None"
during cache-only runs; update the macro so the second SELECT (the fresh-metrics
branch that uses test_metrics_table) is only rendered when test_metrics_table is
present/defined (or otherwise truthy). In practice, guard the union-all block
that references {{ test_metrics_table }} (used inside the all_metrics CTE) with
a conditional around that SELECT (or swap it for an empty/no-op select) so when
collect_table_metrics doesn't materialize the temp table the query falls back to
only using {{ data_monitoring_metrics_table }}.
🧹 Nitpick comments (1)
integration_tests/tests/test_data_freshness_sla.py (1)

113-125: Make the timezone test depend on the timezone.

This case stays pass even if the macro ignores timezone completely, because the row is fresh in both UTC and America/New_York. Please use a boundary-sensitive timestamp/deadline combination so the expected result changes when timezone conversion is wrong.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/test_data_freshness_sla.py` around lines 113 - 125,
The test test_with_timezone currently uses a timestamp that is fresh in both UTC
and America/New_York; change the test data to use a UTC timestamp that straddles
the SLA boundary so the result differs by timezone (e.g., a UTC time that is
before the SLA in UTC but after 11:59pm in America/New_York), update the
test_args (timestamp_column, sla_time, timezone remain) and the assertion on
test_result to expect the opposite outcome (e.g., "fail") when timezone
conversion is honored; reference TIMESTAMP_COLUMN, DATE_FORMAT,
test_with_timezone, test_args and test_result so you can locate and replace the
generated utc_now value and final assert accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@macros/edr/tests/test_utils/sla_utils.sql`:
- Around line 143-152: The returned SLA payload currently omits the
deadline_passed key which breaks get_execution_sla_query() (called from
test_execution_sla) because that caller expects sla_info.deadline_passed;
restore a boolean "deadline_passed" entry in the returned dict (using the
existing deadline computation/variable in scope) so sla_info.deadline_passed is
present for the conditional logic in get_execution_sla_query()/execution_sla
until execution_sla is migrated.

---

Duplicate comments:
In `@macros/edr/tests/test_volume_threshold.sql`:
- Around line 149-162: The SQL interpolations of relation-derived strings
(full_table_name and model_relation.identifier) are not escaped and will break
for names containing single quotes; update places that embed '{{ full_table_name
}}' and '{{ model_relation.identifier }}' (used in the query against
data_monitoring_metrics_table and the emitted model_name) to escape single
quotes before embedding (for example use Jinja replace to turn ' into ''), and
apply the same fix to the other occurrences noted around lines 259-260 so all
literal SQL comparisons and emitted names use the escaped values.
- Around line 146-174: The union branch that selects from the ephemeral
test_metrics_table can render "FROM None" during cache-only runs; update the
macro so the second SELECT (the fresh-metrics branch that uses
test_metrics_table) is only rendered when test_metrics_table is present/defined
(or otherwise truthy). In practice, guard the union-all block that references {{
test_metrics_table }} (used inside the all_metrics CTE) with a conditional
around that SELECT (or swap it for an empty/no-op select) so when
collect_table_metrics doesn't materialize the temp table the query falls back to
only using {{ data_monitoring_metrics_table }}.

---

Nitpick comments:
In `@integration_tests/tests/test_data_freshness_sla.py`:
- Around line 113-125: The test test_with_timezone currently uses a timestamp
that is fresh in both UTC and America/New_York; change the test data to use a
UTC timestamp that straddles the SLA boundary so the result differs by timezone
(e.g., a UTC time that is before the SLA in UTC but after 11:59pm in
America/New_York), update the test_args (timestamp_column, sla_time, timezone
remain) and the assertion on test_result to expect the opposite outcome (e.g.,
"fail") when timezone conversion is honored; reference TIMESTAMP_COLUMN,
DATE_FORMAT, test_with_timezone, test_args and test_result so you can locate and
replace the generated utc_now value and final assert accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2cac5efc-924d-4957-b729-fa5ecc699acc

📥 Commits

Reviewing files that changed from the base of the PR and between 75a0779 and bba307d.

📒 Files selected for processing (4)
  • integration_tests/tests/test_data_freshness_sla.py
  • integration_tests/tests/test_volume_threshold.py
  • macros/edr/tests/test_utils/sla_utils.sql
  • macros/edr/tests/test_volume_threshold.sql

haritamar and others added 2 commits March 13, 2026 23:22
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ClickHouse: reorder columns in freshness SLA final_result CTE so
is_failure is computed before sla_deadline_utc is cast to string
(ClickHouse resolves column refs to already-aliased columns in the
same SELECT, causing DateTime vs String type mismatch).

DuckDB: add explicit casts to LAG window function results in
volume_threshold to work around DuckDB internal binder bug that
confuses TIMESTAMP and FLOAT types across multiple LAG calls.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
macros/edr/tests/test_volume_threshold.sql (1)

74-104: Validate timestamp_column before collecting metrics.

This macro sends timestamp_column straight into collect_table_metrics; unlike data_freshness_sla, a typo or non-timestamp column won’t be rejected until adapter SQL fails later. Reusing the same column/type validation here would make bad configs fail much more deterministically.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_volume_threshold.sql` around lines 74 - 104, The macro
currently forwards timestamp_column straight into
elementary.collect_table_metrics without validating it; add a validation step
before the collect_table_metrics call to ensure timestamp_column exists on
model_relation and is a timestamp/datetime type and raise a compiler error if
not (mirror the validation used in data_freshness_sla). Locate the block around
elementary.collect_table_metrics and add a pre-check using the same elementary
helper(s) used by data_freshness_sla (or a simple lookup on model_relation) to
assert existence and correct type of timestamp_column, then only call
elementary.collect_table_metrics when validation passes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macros/edr/tests/test_data_freshness_sla.sql`:
- Around line 262-267: The branch treating any timestamp on the target day as
DATA_FRESH is too permissive; change the condition in the CASE that currently
reads "when mdt.max_timestamp >= sd.target_date_start_utc then 'DATA_FRESH'" to
instead compare mdt.max_timestamp against the SLA deadline (e.g.,
sd.sla_deadline_utc or sd.target_date_start_utc + sd.sla_offset) so that
DATA_FRESH is returned only if mdt.max_timestamp <= sd.sla_deadline_utc, and
update the DATA_FRESH logic in the macro/test (references: mdt.max_timestamp,
sd.target_date_start_utc, DATA_FRESH) or alternatively narrow the macro/docs to
state it only checks "today's data exists" if you don’t want SLA enforcement.

In `@macros/edr/tests/test_volume_threshold.sql`:
- Around line 219-245: The code currently turns prev_row_count = 0 into
percent_change = null (in the percent_change case expression) and then maps null
to severity 0 in volume_result, which suppresses 0->N spikes even when
min_row_count may be 0; either disallow min_row_count = 0 or treat zero
baselines as infinite spikes. Fix by changing the percent_change logic in the
percent_change computation (the case that checks prev_row_count) to special-case
prev_row_count = 0: if prev_row_count = 0 and row_count > 0 set percent_change
to a sentinel “infinite” value (e.g., a very large number) or a non-null marker,
and then update volume_result (the case that currently maps previous_row_count
is null or previous_row_count < {{ min_row_count }} then 0) to treat
previous_row_count = 0 with row_count > 0 as maximum severity instead of 0;
alternatively, add input validation to reject min_row_count = 0 early. Ensure
changes reference percent_change, prev_row_count, previous_row_count,
volume_result, and min_row_count.

---

Nitpick comments:
In `@macros/edr/tests/test_volume_threshold.sql`:
- Around line 74-104: The macro currently forwards timestamp_column straight
into elementary.collect_table_metrics without validating it; add a validation
step before the collect_table_metrics call to ensure timestamp_column exists on
model_relation and is a timestamp/datetime type and raise a compiler error if
not (mirror the validation used in data_freshness_sla). Locate the block around
elementary.collect_table_metrics and add a pre-check using the same elementary
helper(s) used by data_freshness_sla (or a simple lookup on model_relation) to
assert existence and correct type of timestamp_column, then only call
elementary.collect_table_metrics when validation passes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 67fb8234-9a40-4e74-a427-81f09df9bdf9

📥 Commits

Reviewing files that changed from the base of the PR and between a2f6944 and 33d17f7.

📒 Files selected for processing (2)
  • macros/edr/tests/test_data_freshness_sla.sql
  • macros/edr/tests/test_volume_threshold.sql

Comment on lines +262 to +267
case
when mdt.max_timestamp is null
then 'NO_DATA'
{# Data from today or future (e.g. pre-loaded records) counts as fresh #}
when mdt.max_timestamp >= sd.target_date_start_utc
then 'DATA_FRESH'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This still misses late same-day arrivals.

Line 266 treats any timestamp from the target day as DATA_FRESH, so a model that finishes at 09:00 against a 07:00 SLA will pass once the test runs after the build. That makes this a “today’s data exists” check, not a true “arrived by the SLA” check. If the intended contract is really SLA enforcement, this branch needs to be deadline-aware, or the macro/docs should be narrowed to that weaker guarantee.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_data_freshness_sla.sql` around lines 262 - 267, The
branch treating any timestamp on the target day as DATA_FRESH is too permissive;
change the condition in the CASE that currently reads "when mdt.max_timestamp >=
sd.target_date_start_utc then 'DATA_FRESH'" to instead compare mdt.max_timestamp
against the SLA deadline (e.g., sd.sla_deadline_utc or sd.target_date_start_utc
+ sd.sla_offset) so that DATA_FRESH is returned only if mdt.max_timestamp <=
sd.sla_deadline_utc, and update the DATA_FRESH logic in the macro/test
(references: mdt.max_timestamp, sd.target_date_start_utc, DATA_FRESH) or
alternatively narrow the macro/docs to state it only checks "today's data
exists" if you don’t want SLA enforcement.

Comment on lines +219 to +245
case
when prev_row_count is null or prev_row_count = 0
then null
else
round(
cast(
(row_count - prev_row_count)
* 100.0
/ prev_row_count as {{ elementary.edr_type_numeric() }}
),
2
)
end as percent_change
from bucket_with_prev
where rn = 1
),

volume_result as (
select
*,
case
when
previous_row_count is null
or previous_row_count < {{ min_row_count }}
then 0
when percent_change is null
then 0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

min_row_count = 0 still suppresses zero-baseline spikes.

Line 220 turns prev_row_count = 0 into percent_change = null, and Line 244 maps that to severity 0. So a 0 -> N jump is always ignored even though the validation allows min_row_count = 0. Either reject 0 explicitly or special-case zero baselines as an infinite spike.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_volume_threshold.sql` around lines 219 - 245, The code
currently turns prev_row_count = 0 into percent_change = null (in the
percent_change case expression) and then maps null to severity 0 in
volume_result, which suppresses 0->N spikes even when min_row_count may be 0;
either disallow min_row_count = 0 or treat zero baselines as infinite spikes.
Fix by changing the percent_change logic in the percent_change computation (the
case that checks prev_row_count) to special-case prev_row_count = 0: if
prev_row_count = 0 and row_count > 0 set percent_change to a sentinel “infinite”
value (e.g., a very large number) or a non-null marker, and then update
volume_result (the case that currently maps previous_row_count is null or
previous_row_count < {{ min_row_count }} then 0) to treat previous_row_count = 0
with row_count > 0 as maximum severity instead of 0; alternatively, add input
validation to reject min_row_count = 0 early. Ensure changes reference
percent_change, prev_row_count, previous_row_count, volume_result, and
min_row_count.

haritamar and others added 5 commits March 14, 2026 00:38
…wing

ClickHouse resolves column references against output aliases regardless
of SELECT clause order. The cast(sla_deadline_utc as string) with the
same alias name caused the is_failure comparison to use the string
version instead of the timestamp, producing DateTime vs String type
mismatch. Renamed to sla_deadline_utc_str internally and re-aliased
in the final SELECT.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DuckDB has an internal binder bug where LAG window functions over
UNION ALL sources confuse TIMESTAMP and FLOAT column types, causing
"Failed to bind column reference bucket_end: inequal types". Using
ROW_NUMBER + self-join achieves the same result without triggering
the bug, and is cross-database compatible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…e test timezone

- sla_utils.sql: replace datetime.timezone.utc with pytz.utc in fusion path.
  dbt-fusion's modules.datetime does not expose datetime.timezone, causing
  dbt1501 "undefined value" errors in all data_freshness_sla tests on fusion.

- test_data_freshness_sla.py: change test_deadline_not_passed_does_not_fail
  from Etc/GMT-14 (UTC+14, deadline = 09:59 UTC) to plain UTC (deadline = 23:59 UTC).
  Etc/GMT-14 caused the test to fail whenever CI ran after 09:59 UTC.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…freshness_sla

volume_threshold (Fabric + ClickHouse):
- Cast row_number() to signed int in bucket_numbered CTE to fix ClickHouse
  NO_COMMON_TYPE error (UInt64 vs Int64) on JOIN with bucket_num - 1.
- Move max(bucket_num) into a separate max_bucket CTE and use INNER JOIN
  instead of a scalar subquery in the WHERE clause to fix SQL Server / Fabric
  "Invalid object name 'bucket_numbered'" error.

data_freshness_sla (Vertica):
- Replace timezone: "Etc/GMT-14" with timezone: "UTC" for tests that need a
  deadline in the past. Etc/GMT-14 behaved incorrectly in some pytz versions,
  causing Vertica tests to return 'pass' instead of 'fail'. 12:01am UTC
  (= 00:01 UTC) is always in the past when CI runs at 07:00+ UTC.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants