A retail analytics platform ingests individual transaction records from 200+ store locations. Before the nightly report runs, the raw transactions need grouping by region, aggregation into sums and averages, and outlier detection. A single corrupted record should not invalidate the entire regional rollup.
[agg_load_data]
|
v
[agg_group_by_dimension]
|
v
[agg_compute_aggregates]
|
v
[agg_format_report]
|
v
[agg_emit_results]
Workflow inputs: records, groupBy, aggregateField
ComputeAggregatesWorker (task: agg_compute_aggregates)
Computes aggregate statistics (count, sum, avg, min, max) for each group on a specified numeric field.
- Reads
groups,aggregateField. Writesaggregates
EmitResultsWorker (task: agg_emit_results)
Emits the final aggregation summary.
- Reads
report,groupCount. Writessummary
FormatReportWorker (task: agg_format_report)
Formats aggregate results into human-readable report lines.
- Reads
aggregates. Writesreport
GroupByDimensionWorker (task: agg_group_by_dimension)
Groups records by a specified dimension field.
- Reads
records,groupBy. Writesgroups,groupCount
LoadDataWorker (task: agg_load_data)
Loads input records and passes them through with a count.
- Reads
records. Writesrecords,count
41 tests | Workflow: data_aggregation_wf | Timeout: 120s
See RUNNING.md for setup and usage.