Skip to content

[Fix](variance) Fix sample variance/stddev NaN res for single value#63605

Merged
zclllyybb merged 2 commits into
apache:masterfrom
linrrzqqq:fix-sample-single-val
May 30, 2026
Merged

[Fix](variance) Fix sample variance/stddev NaN res for single value#63605
zclllyybb merged 2 commits into
apache:masterfrom
linrrzqqq:fix-sample-single-val

Conversation

@linrrzqqq
Copy link
Copy Markdown
Collaborator

@linrrzqqq linrrzqqq commented May 25, 2026

Problem Summary:

Fix VAR_SAMP, VARIANCE_SAMP, and STDDEV_SAMP to return NaN when the number of valid input values is less than or equal to 1. Sample variance/stddev are undefined for n <= 1, so returning 0.0 is misleading.

before:

CREATE TABLE t (id INT, v DOUBLE) DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES('replication_num'='1');
INSERT INTO t VALUES (1, 5.0);  -- 单行

SELECT VAR_SAMP(v), STDDEV_SAMP(v) FROM t;
+-------------+----------------+
| VAR_SAMP(v) | STDDEV_SAMP(v) |
+-------------+----------------+
|           0 |              0 |
+-------------+----------------+

now:

SELECT VAR_SAMP(v), STDDEV_SAMP(v) FROM t;
+-------------+----------------+
| VAR_SAMP(v) | STDDEV_SAMP(v) |
+-------------+----------------+
|         NaN |            NaN |
+-------------+----------------+

doc: apache/doris-website#3765

@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

/review

@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found. Critical checkpoints: goal is clear and implemented: sample variance/stddev now return NULL for zero or one non-null input, including grouped results; tests cover existing nullable aggregate suites plus a single-row aggregate case. Scope is small and focused on the two sample aggregate functions plus expected output changes. Concurrency/lifecycle: no new shared state, threads, locks, or lifecycle-sensitive objects. Configuration/compatibility: no configs added; BE function compatibility restriction remains in place and aggregate state serialization is unchanged. Parallel paths: variance_samp, var_samp alias, and stddev_samp paths are covered; population variance/stddev behavior is intentionally unchanged. Error handling/memory: no ignored Status paths or new owned allocations. Data correctness: nullable inputs are skipped and result nullability now matches SQL sample aggregate semantics for count <= 1. Tests: regression expectations were updated; I did not run the regression suite in this review runner. User focus: no additional user-provided review focus.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31648 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0d3684c8240240a8d33abfaea6a896b884d8301a, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17638	4105	4110	4105
q2	q3	10815	1419	824	824
q4	4686	471	354	354
q5	7642	2256	2110	2110
q6	248	180	140	140
q7	1011	760	658	658
q8	9365	1768	1612	1612
q9	6568	4998	4966	4966
q10	6471	2219	1954	1954
q11	458	274	241	241
q12	703	428	298	298
q13	18184	3344	2725	2725
q14	262	255	233	233
q15	q16	815	787	710	710
q17	971	990	929	929
q18	6917	5878	5759	5759
q19	1240	1281	1054	1054
q20	515	383	265	265
q21	5745	2613	2402	2402
q22	428	368	309	309
Total cold run time: 100682 ms
Total hot run time: 31648 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4513	4452	4450	4450
q2	q3	4581	5005	4329	4329
q4	2272	2324	1477	1477
q5	4512	4397	5014	4397
q6	262	216	156	156
q7	2096	1976	1681	1681
q8	2630	2325	2291	2291
q9	8268	8084	8160	8084
q10	4829	4970	4369	4369
q11	612	476	412	412
q12	820	792	585	585
q13	3280	3706	2969	2969
q14	289	323	274	274
q15	q16	765	777	676	676
q17	1421	1414	1394	1394
q18	8457	7551	6822	6822
q19	1086	1108	1105	1105
q20	2251	2237	1970	1970
q21	5438	4806	4649	4649
q22	553	457	420	420
Total cold run time: 58935 ms
Total hot run time: 52510 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172613 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 0d3684c8240240a8d33abfaea6a896b884d8301a, data reload: false

query5	4315	671	516	516
query6	347	222	198	198
query7	4257	545	297	297
query8	327	230	219	219
query9	8836	4162	4162	4162
query10	453	358	306	306
query11	5816	2535	2264	2264
query12	181	133	127	127
query13	1337	610	444	444
query14	6301	5609	5345	5345
query14_1	4641	4643	4632	4632
query15	219	210	192	192
query16	1015	463	446	446
query17	1160	751	626	626
query18	2674	503	368	368
query19	219	206	168	168
query20	142	145	133	133
query21	216	142	121	121
query22	13665	13555	13395	13395
query23	17410	16562	16294	16294
query23_1	16397	16353	16417	16353
query24	7794	1788	1331	1331
query24_1	1308	1322	1363	1322
query25	595	495	446	446
query26	1312	313	177	177
query27	2707	598	361	361
query28	4447	2034	2027	2027
query29	1052	655	529	529
query30	310	243	202	202
query31	1143	1095	970	970
query32	87	75	81	75
query33	560	366	308	308
query34	1177	1186	683	683
query35	778	792	694	694
query36	1379	1420	1274	1274
query37	153	101	97	97
query38	3272	3195	3078	3078
query39	906	902	863	863
query39_1	870	859	872	859
query40	226	147	131	131
query41	66	64	61	61
query42	110	110	112	110
query43	328	331	299	299
query44	
query45	217	204	196	196
query46	1080	1229	769	769
query47	2404	2324	2234	2234
query48	409	418	305	305
query49	639	499	410	410
query50	1095	362	266	266
query51	4432	4372	4338	4338
query52	117	107	99	99
query53	259	304	205	205
query54	324	292	258	258
query55	95	98	87	87
query56	306	309	313	309
query57	1438	1382	1323	1323
query58	298	266	296	266
query59	1575	1675	1447	1447
query60	320	324	311	311
query61	157	155	153	153
query62	710	655	571	571
query63	239	205	206	205
query64	2375	781	630	630
query65	
query66	1694	489	357	357
query67	29440	29998	29830	29830
query68	
query69	469	354	311	311
query70	1040	1030	970	970
query71	302	283	274	274
query72	2987	2796	2451	2451
query73	834	742	460	460
query74	4861	4682	4559	4559
query75	2689	2599	2275	2275
query76	2294	1186	791	791
query77	416	438	342	342
query78	12489	12467	11833	11833
query79	1473	1009	734	734
query80	1361	547	450	450
query81	518	284	237	237
query82	1219	161	121	121
query83	368	300	255	255
query84	267	140	110	110
query85	947	551	450	450
query86	458	358	323	323
query87	3470	3422	3279	3279
query88	3668	2774	2751	2751
query89	451	387	343	343
query90	1896	187	185	185
query91	177	167	146	146
query92	82	78	74	74
query93	1529	1375	905	905
query94	724	339	313	313
query95	697	483	351	351
query96	1013	778	355	355
query97	2759	2708	2597	2597
query98	239	226	234	226
query99	1158	1165	1030	1030
Total cold run time: 255960 ms
Total hot run time: 172613 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (6/6) 🎉
Increment coverage report
Complete coverage report

@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

run buildall

@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review opinion: no blocking issues found in the PR diff.

Critical checkpoint conclusions:

  • Goal/test proof: The PR changes VAR_SAMP/VARIANCE_SAMP/STDDEV_SAMP to return NULL for zero or one valid input and updates aggregate/window/foreach regression outputs covering the changed behavior.
  • Scope: The actual GitHub PR diff is focused on sample variance/stddev return nullability plus expected results.
  • Concurrency/lifecycle: No new concurrency, locking, thread lifecycle, or static initialization concerns were introduced in the reviewed diff.
  • Configuration/compatibility: No new configuration items. BE already registers variance_samp/stddev_samp as restricted function compatibility paths; aggregate state serialization is unchanged.
  • Parallel paths: BE execution, Nereids FE nullability, aliases through existing registration, window outputs, and foreach outputs were considered. I did not find a missed parallel path in the PR diff.
  • Conditional checks: The count <= 1 result-null condition matches the sample variance/stddev definition and null inputs continue to be ignored.
  • Tests/results: Regression expected outputs were updated for normal aggregate, nullable aggregate, window, MV, and foreach cases. I did not run the tests locally in this review runner.
  • Observability: No new observability needed for this scalar aggregate behavior change.
  • Transaction/persistence/data writes: Not applicable.
  • FE-BE variable passing/protocol: No new transmitted variables or thrift/protocol changes.
  • Performance: The added nullable-column check is local to aggregate add and avoids wrapper overhead for these functions; no obvious performance regression found.

User focus: No additional user-provided review focus was present.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31169 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 68e902d80bad105073ea4681bb069a7a192b3259, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17613	3957	3931	3931
q2	q3	10759	1367	802	802
q4	4692	475	348	348
q5	7593	2280	2117	2117
q6	250	178	137	137
q7	957	768	643	643
q8	9363	1791	1498	1498
q9	6027	5015	4916	4916
q10	6499	2229	1868	1868
q11	444	274	249	249
q12	689	428	296	296
q13	18257	3327	2800	2800
q14	269	258	240	240
q15	q16	831	782	712	712
q17	880	925	991	925
q18	6970	5670	5644	5644
q19	1242	1267	1095	1095
q20	539	406	261	261
q21	5660	2587	2374	2374
q22	436	362	313	313
Total cold run time: 99970 ms
Total hot run time: 31169 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4330	4261	4259	4259
q2	q3	4542	4966	4392	4392
q4	2105	2231	1426	1426
q5	4450	4333	4374	4333
q6	229	175	163	163
q7	2153	1846	1710	1710
q8	2515	2157	2117	2117
q9	8044	8041	8005	8005
q10	4791	4768	4369	4369
q11	570	597	389	389
q12	749	753	571	571
q13	3323	3574	3021	3021
q14	289	289	293	289
q15	q16	731	745	669	669
q17	1402	1355	1388	1355
q18	8003	7529	6876	6876
q19	1090	1064	1096	1064
q20	2232	2243	1959	1959
q21	5321	4646	4537	4537
q22	527	469	408	408
Total cold run time: 57396 ms
Total hot run time: 51912 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171137 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 68e902d80bad105073ea4681bb069a7a192b3259, data reload: false

query5	4326	653	523	523
query6	335	225	211	211
query7	4216	540	316	316
query8	323	232	221	221
query9	8816	4128	4115	4115
query10	451	344	300	300
query11	5830	2567	2246	2246
query12	183	123	123	123
query13	1257	595	433	433
query14	6156	5500	5155	5155
query14_1	4504	4488	4472	4472
query15	212	204	186	186
query16	997	454	411	411
query17	1120	718	582	582
query18	2478	474	350	350
query19	221	212	164	164
query20	140	134	134	134
query21	214	140	122	122
query22	13691	13598	13453	13453
query23	17366	16639	16284	16284
query23_1	16293	16453	16307	16307
query24	7497	1792	1321	1321
query24_1	1353	1359	1339	1339
query25	572	497	454	454
query26	1335	342	177	177
query27	2692	552	353	353
query28	4447	2013	2034	2013
query29	1003	657	520	520
query30	307	249	200	200
query31	1141	1091	955	955
query32	86	79	74	74
query33	563	355	312	312
query34	1201	1143	663	663
query35	784	811	690	690
query36	1461	1442	1255	1255
query37	152	105	93	93
query38	3263	3218	3098	3098
query39	912	897	870	870
query39_1	882	858	863	858
query40	233	152	130	130
query41	71	68	67	67
query42	112	111	116	111
query43	331	342	293	293
query44	
query45	220	201	199	199
query46	1084	1177	754	754
query47	2369	2454	2300	2300
query48	421	402	306	306
query49	655	509	399	399
query50	984	375	261	261
query51	4369	4347	4311	4311
query52	111	108	96	96
query53	257	289	206	206
query54	334	308	270	270
query55	97	93	98	93
query56	358	335	327	327
query57	1457	1469	1367	1367
query58	329	284	283	283
query59	1723	1755	1466	1466
query60	362	325	318	318
query61	180	151	145	145
query62	707	646	584	584
query63	245	198	199	198
query64	2423	809	647	647
query65	
query66	1717	484	371	371
query67	29061	29693	28918	28918
query68	
query69	459	360	308	308
query70	1074	951	998	951
query71	307	269	267	267
query72	2996	2659	2030	2030
query73	839	756	438	438
query74	4868	4685	4505	4505
query75	2704	2618	2269	2269
query76	2273	1183	787	787
query77	406	412	321	321
query78	12406	12541	11831	11831
query79	1458	1045	757	757
query80	1300	539	463	463
query81	518	279	240	240
query82	1160	157	127	127
query83	319	282	245	245
query84	266	141	110	110
query85	923	534	469	469
query86	472	353	328	328
query87	3458	3386	3279	3279
query88	3648	2718	2724	2718
query89	457	385	350	350
query90	1897	178	191	178
query91	178	167	139	139
query92	80	77	73	73
query93	1512	1482	876	876
query94	736	330	299	299
query95	676	472	344	344
query96	988	785	363	363
query97	2759	2757	2644	2644
query98	238	229	228	228
query99	1184	1164	1044	1044
Total cold run time: 254488 ms
Total hot run time: 171137 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/19) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.85% (20891/38797)
Line Coverage 37.43% (198010/528947)
Region Coverage 33.73% (155124/459922)
Branch Coverage 34.71% (67507/194480)

@linrrzqqq linrrzqqq force-pushed the fix-sample-single-val branch from 68e902d to dd10b05 Compare May 26, 2026 07:46
@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

run buildall

@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

/review

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31832 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit dd10b052b044cb11030c835fe59b28e3674e7e70, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17869	4133	4196	4133
q2	q3	10755	1392	831	831
q4	4686	479	345	345
q5	7571	2347	2091	2091
q6	235	177	135	135
q7	942	801	656	656
q8	9370	1688	1562	1562
q9	5218	4991	5031	4991
q10	6385	2226	1878	1878
q11	441	274	245	245
q12	631	423	299	299
q13	18163	3374	2813	2813
q14	269	264	242	242
q15	q16	821	777	712	712
q17	985	855	910	855
q18	6940	5825	5585	5585
q19	1353	1308	1176	1176
q20	589	432	298	298
q21	6375	2982	2667	2667
q22	451	367	318	318
Total cold run time: 100049 ms
Total hot run time: 31832 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4836	4756	4728	4728
q2	q3	4999	5360	4631	4631
q4	2160	2259	1458	1458
q5	5015	4841	4707	4707
q6	233	173	137	137
q7	1968	1761	1563	1563
q8	2475	2183	2110	2110
q9	7728	7469	7481	7469
q10	4742	4682	4248	4248
q11	543	380	358	358
q12	740	750	544	544
q13	2973	3407	2802	2802
q14	294	284	254	254
q15	q16	680	695	618	618
q17	1299	1280	1278	1278
q18	7338	6990	6951	6951
q19	1177	1106	1121	1106
q20	2208	2229	1975	1975
q21	5337	4623	4479	4479
q22	548	487	407	407
Total cold run time: 57293 ms
Total hot run time: 51823 ms

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a blocking correctness issue. The PR goal is to return NULL for VAR_SAMP/VARIANCE_SAMP/STDDEV_SAMP when the valid input count is <= 1, but the BE implementation and expected files now return NaN for the one-valid-value case.

Critical checkpoint conclusions:

  • Goal/test: The PR adds regression coverage for the one-row case, but the expected result verifies NaN rather than the stated NULL behavior, so it does not prove the intended fix.
  • Scope: The code change is small, but it works around result nullability by emitting a floating NaN instead of implementing nullable results for this semantic case.
  • Concurrency/lifecycle/config/compatibility: No new concurrency, lifecycle, config, storage format, or FE-BE protocol concerns found in the touched logic.
  • Parallel paths: Aggregate and window expected outputs are updated consistently, but they consistently encode the wrong value for single valid rows.
  • Tests/results: Existing result files were regenerated around this behavior, but the key new expectation is incorrect for SQL NULL semantics.
  • Observability/transactions/data writes: Not applicable.
  • Performance: No material performance concern found.

User focus: No additional user-provided review focus was supplied.

Comment thread be/src/exprs/aggregate/aggregate_function_stddev.h
@linrrzqqq linrrzqqq changed the title [Fix](variance) Fix sample variance/stddev null res for single value [Fix](variance) Fix sample variance/stddev NaN res for single value May 26, 2026
@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172222 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit dd10b052b044cb11030c835fe59b28e3674e7e70, data reload: false

query5	4321	670	516	516
query6	348	228	208	208
query7	4252	561	314	314
query8	328	236	220	220
query9	8820	4108	4101	4101
query10	453	376	305	305
query11	5828	2499	2255	2255
query12	192	129	125	125
query13	1313	581	444	444
query14	6224	5549	5253	5253
query14_1	4554	4562	4534	4534
query15	217	210	200	200
query16	1004	469	462	462
query17	1158	772	626	626
query18	2602	526	375	375
query19	241	215	178	178
query20	149	134	137	134
query21	224	143	127	127
query22	13585	13626	13409	13409
query23	17590	16672	16247	16247
query23_1	16510	16397	16425	16397
query24	7474	1823	1321	1321
query24_1	1299	1318	1339	1318
query25	590	516	441	441
query26	1316	336	184	184
query27	2661	583	342	342
query28	4441	1988	2009	1988
query29	1035	653	533	533
query30	314	246	201	201
query31	1139	1087	963	963
query32	91	80	79	79
query33	559	391	308	308
query34	1216	1145	672	672
query35	812	824	701	701
query36	1367	1415	1246	1246
query37	148	106	90	90
query38	3244	3184	3091	3091
query39	938	937	919	919
query39_1	875	867	879	867
query40	230	147	132	132
query41	67	63	62	62
query42	111	108	109	108
query43	329	341	310	310
query44	
query45	218	206	200	200
query46	1117	1199	755	755
query47	2417	2342	2190	2190
query48	412	422	303	303
query49	653	489	405	405
query50	1055	351	258	258
query51	4387	4353	4262	4262
query52	105	104	96	96
query53	257	287	203	203
query54	318	275	254	254
query55	96	91	86	86
query56	298	322	300	300
query57	1452	1429	1343	1343
query58	302	277	280	277
query59	1627	1742	1503	1503
query60	322	328	314	314
query61	165	157	151	151
query62	701	666	584	584
query63	240	202	205	202
query64	2414	819	653	653
query65	
query66	1726	483	354	354
query67	29794	29778	28930	28930
query68	
query69	455	336	305	305
query70	1013	1054	975	975
query71	317	275	270	270
query72	3024	2712	2715	2712
query73	834	802	432	432
query74	5146	4956	4852	4852
query75	2714	2612	2300	2300
query76	2275	1158	748	748
query77	409	417	349	349
query78	12582	12704	11942	11942
query79	1455	1010	738	738
query80	660	541	457	457
query81	465	278	248	248
query82	1392	160	123	123
query83	355	279	253	253
query84	265	147	110	110
query85	891	545	458	458
query86	397	343	317	317
query87	3456	3415	3252	3252
query88	3629	2760	2742	2742
query89	447	397	343	343
query90	1989	179	181	179
query91	178	172	168	168
query92	77	78	77	77
query93	1499	1466	850	850
query94	535	372	304	304
query95	678	500	348	348
query96	979	776	324	324
query97	2733	2776	2621	2621
query98	239	231	227	227
query99	1180	1156	1034	1034
Total cold run time: 255560 ms
Total hot run time: 172222 ms

@linrrzqqq linrrzqqq force-pushed the fix-sample-single-val branch from dd10b05 to ff22897 Compare May 29, 2026 03:40
@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

run buildall

@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed for PR 63605.

Findings: I did not identify any new inline issue that is distinct from the existing review thread on be/src/exprs/aggregate/aggregate_function_stddev.h about returning floating-point NaN for VAR_SAMP/VARIANCE_SAMP/STDDEV_SAMP when the valid-input count is <= 1. I did not duplicate that thread.

Critical checkpoint conclusions:

  • Goal and tests: The PR consistently changes BE sample variance/stddev output and updates regression outputs/adds a single-value aggregate case. Existing review context still covers the main semantic question of NaN vs SQL NULL.
  • Scope: The source change is small and focused; many expected-file changes appear to be generated output updates.
  • Concurrency/lifecycle/configuration: No new concurrency, lifecycle, static initialization, or configuration concerns found.
  • Compatibility: No storage-format or FE/BE protocol compatibility changes found. Function registration and return type remain Float64/Double.
  • Parallel paths: The aliases and window/regression output paths affected by the BE implementation were considered; no additional distinct missed implementation path found.
  • Error handling/memory: No ignored Status, exception-boundary, or memory-tracking issue found in this change.
  • Data correctness: The only substantive correctness concern observed is the already-known NaN vs SQL NULL semantics thread; no separate data visibility, transaction, or persistence concern applies.
  • Test coverage: Regression outputs cover aggregate/window/foreach-style cases, including the added single-row aggregate case. I did not run tests in this review.
  • Observability/performance: No new observability need or meaningful performance regression found.

User focus: No additional user-provided review focus was present in .opencode-review.hAcyGz/review_focus.txt.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31135 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ff22897b54cb2f14783f83fb2c6a3d8ef3e3977c, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17660	3899	3886	3886
q2	q3	10808	1386	832	832
q4	4690	477	345	345
q5	7752	2293	2090	2090
q6	391	174	133	133
q7	961	775	652	652
q8	9705	1748	1607	1607
q9	7174	4968	4957	4957
q10	6439	2247	1898	1898
q11	433	268	246	246
q12	695	427	305	305
q13	18201	3824	2820	2820
q14	268	260	237	237
q15	q16	822	788	708	708
q17	879	880	868	868
q18	6962	5686	5554	5554
q19	1176	1261	1065	1065
q20	509	386	263	263
q21	5738	2577	2369	2369
q22	429	362	300	300
Total cold run time: 101692 ms
Total hot run time: 31135 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4264	4200	4200	4200
q2	q3	4530	4921	4348	4348
q4	2057	2192	1371	1371
q5	4397	4251	4763	4251
q6	252	192	143	143
q7	1962	1848	1571	1571
q8	2469	2086	2171	2086
q9	7974	7982	8051	7982
q10	4829	4755	4408	4408
q11	564	416	390	390
q12	737	755	536	536
q13	3209	3611	2935	2935
q14	287	307	267	267
q15	q16	706	752	659	659
q17	1368	1328	1329	1328
q18	8122	7381	7070	7070
q19	1119	1106	1088	1088
q20	2227	2225	1945	1945
q21	5216	4536	4458	4458
q22	511	463	422	422
Total cold run time: 56800 ms
Total hot run time: 51458 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172074 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ff22897b54cb2f14783f83fb2c6a3d8ef3e3977c, data reload: false

query5	4295	647	516	516
query6	336	220	215	215
query7	4237	589	312	312
query8	330	229	218	218
query9	8833	3985	3957	3957
query10	450	355	294	294
query11	5732	2459	2243	2243
query12	179	130	132	130
query13	1296	606	443	443
query14	6133	5460	5149	5149
query14_1	4444	4456	4446	4446
query15	220	210	200	200
query16	1019	457	455	455
query17	1158	737	618	618
query18	2752	503	367	367
query19	229	199	179	179
query20	136	132	129	129
query21	222	139	117	117
query22	13615	13552	13355	13355
query23	17287	16500	16294	16294
query23_1	16404	16302	16270	16270
query24	7449	1770	1284	1284
query24_1	1339	1335	1314	1314
query25	568	501	427	427
query26	1304	317	182	182
query27	2682	570	353	353
query28	4413	2007	2007	2007
query29	987	638	515	515
query30	302	242	194	194
query31	1131	1092	978	978
query32	98	80	76	76
query33	557	368	307	307
query34	1211	1163	659	659
query35	790	805	766	766
query36	1416	1385	1275	1275
query37	148	107	86	86
query38	3224	3155	3071	3071
query39	940	912	903	903
query39_1	872	886	878	878
query40	239	141	124	124
query41	65	61	63	61
query42	110	108	107	107
query43	326	331	287	287
query44	
query45	212	204	202	202
query46	1095	1217	734	734
query47	2415	2394	2321	2321
query48	393	428	315	315
query49	623	495	376	376
query50	1012	350	250	250
query51	4339	4371	4358	4358
query52	106	105	93	93
query53	256	273	205	205
query54	328	274	275	274
query55	94	90	84	84
query56	329	315	306	306
query57	1436	1428	1367	1367
query58	292	270	269	269
query59	1579	1689	1423	1423
query60	322	320	320	320
query61	159	151	160	151
query62	702	648	574	574
query63	237	195	205	195
query64	2360	815	621	621
query65	
query66	1661	477	351	351
query67	29886	29539	29511	29511
query68	
query69	458	341	311	311
query70	1101	1006	923	923
query71	295	273	299	273
query72	2934	2722	2414	2414
query73	843	766	437	437
query74	5105	4990	4759	4759
query75	2710	2614	2269	2269
query76	2299	1166	751	751
query77	404	411	330	330
query78	12475	12503	11857	11857
query79	1506	1073	772	772
query80	1336	565	454	454
query81	520	287	241	241
query82	997	158	125	125
query83	369	279	250	250
query84	262	145	111	111
query85	921	539	451	451
query86	445	326	320	320
query87	3428	3395	3248	3248
query88	3596	2763	2720	2720
query89	446	394	351	351
query90	1922	187	178	178
query91	179	173	144	144
query92	81	75	71	71
query93	1526	1546	885	885
query94	726	349	312	312
query95	686	391	356	356
query96	1111	848	346	346
query97	2747	2737	2651	2651
query98	239	234	228	228
query99	1174	1155	1033	1033
Total cold run time: 255193 ms
Total hot run time: 172074 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (5/5) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.74% (28087/38089)
Line Coverage 57.71% (305213/528913)
Region Coverage 54.87% (255504/465677)
Branch Coverage 56.37% (110374/195808)

auto& col = assert_cast<ColumnFloat64&>(to);
if (this->count == 1 || this->count == 0) {
col.insert_default();
col.get_data().push_back(std::nan(""));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我记得好像这个推荐用 std::numeric_limits::quiet_NaN() ?

zclllyybb
zclllyybb previously approved these changes May 29, 2026
@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 29, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions Bot removed the approved Indicates a PR has been approved by one committer. label May 29, 2026
@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

run buildall

@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary
No additional distinct inline issues found in this review. I intentionally did not re-submit the existing thread about VAR_SAMP/VARIANCE_SAMP/STDDEV_SAMP returning NaN versus SQL NULL, because it is already present in the review context.

Critical checkpoint conclusions:

  • Goal/test: The code change implements the PR-stated goal of returning NaN for sample variance/stddev when the valid input count is <= 1, and the changed regression outputs plus the added single-row aggregate query cover that behavior.
  • Scope: The code modification is small and focused in aggregate_function_stddev.h; most other changes are expected-result updates.
  • Concurrency/lifecycle: No new concurrency, locking, static initialization, or non-obvious lifecycle management was introduced.
  • Config/compatibility: No new config items or storage/protocol format changes were introduced.
  • Parallel paths: The changed behavior is applied to variance_samp/var_samp and stddev_samp; population variants remain unchanged.
  • Conditional checks: The new count == 1 || count == 0 behavior is direct and matches the PR description.
  • Test coverage: Existing aggregate, window, foreach, MV, nullable, and Nereids output cases were updated; a direct single-row aggregate query was added.
  • Observability/transactions/data writes: Not applicable for this aggregate result change.
  • FE/BE variable passing: Not applicable.
  • Performance: No material performance concern found; this only changes final result insertion for sample aggregate states.
  • User focus: The focus file says there are no additional user-provided focus points.

Residual risk: The already-known semantic concern about using non-null NaN instead of SQL NULL remains a product/compatibility decision and was not duplicated as a new comment.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31971 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit afac2f0885cb6e5d067c1fed6f3fa4ffeeaaa33e, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	18065	4205	4019	4019
q2	q3	10759	1440	795	795
q4	4689	474	347	347
q5	7528	2340	2124	2124
q6	239	180	137	137
q7	920	787	638	638
q8	9353	1825	1631	1631
q9	5195	4947	4996	4947
q10	6383	2208	1873	1873
q11	443	275	247	247
q12	637	430	297	297
q13	18100	3389	2799	2799
q14	268	260	241	241
q15	q16	827	766	718	718
q17	995	950	930	930
q18	6830	5917	5677	5677
q19	1339	1289	1237	1237
q20	588	449	300	300
q21	6507	2938	2694	2694
q22	593	384	320	320
Total cold run time: 100258 ms
Total hot run time: 31971 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4861	4820	4756	4756
q2	q3	5020	5293	4675	4675
q4	2133	2222	1408	1408
q5	5105	4688	4736	4688
q6	231	184	136	136
q7	1889	1767	1546	1546
q8	2422	2132	2104	2104
q9	7742	7413	7378	7378
q10	4735	4700	4237	4237
q11	547	385	350	350
q12	739	745	531	531
q13	3031	3328	2823	2823
q14	279	277	251	251
q15	q16	677	698	607	607
q17	1288	1257	1255	1255
q18	7075	6789	6812	6789
q19	1124	1090	1103	1090
q20	2228	2211	1930	1930
q21	5331	4595	4448	4448
q22	535	449	415	415
Total cold run time: 56992 ms
Total hot run time: 51417 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172055 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit afac2f0885cb6e5d067c1fed6f3fa4ffeeaaa33e, data reload: false

query5	4299	663	526	526
query6	346	235	203	203
query7	4256	578	322	322
query8	329	238	233	233
query9	8775	4035	4000	4000
query10	438	340	297	297
query11	5784	2446	2262	2262
query12	184	127	127	127
query13	1292	615	448	448
query14	6092	5463	5155	5155
query14_1	4486	4489	4460	4460
query15	215	210	186	186
query16	1009	466	493	466
query17	1162	781	620	620
query18	2568	504	376	376
query19	218	217	171	171
query20	143	138	135	135
query21	224	141	123	123
query22	13653	13601	13362	13362
query23	17368	16558	16315	16315
query23_1	16485	16270	16310	16270
query24	7661	1772	1371	1371
query24_1	1337	1332	1344	1332
query25	586	515	454	454
query26	1327	337	175	175
query27	2669	559	359	359
query28	4433	2034	2022	2022
query29	1067	659	521	521
query30	304	233	205	205
query31	1129	1086	967	967
query32	100	80	78	78
query33	556	373	338	338
query34	1192	1156	635	635
query35	751	792	691	691
query36	1373	1397	1295	1295
query37	160	108	90	90
query38	3212	3187	3101	3101
query39	920	918	897	897
query39_1	875	870	876	870
query40	245	148	121	121
query41	71	62	62	62
query42	109	108	106	106
query43	335	330	289	289
query44	
query45	221	215	198	198
query46	1088	1184	753	753
query47	2395	2372	2298	2298
query48	412	405	298	298
query49	631	504	421	421
query50	1011	354	264	264
query51	4376	4319	4182	4182
query52	108	105	94	94
query53	264	280	208	208
query54	309	272	256	256
query55	93	91	87	87
query56	302	297	298	297
query57	1449	1423	1339	1339
query58	294	266	253	253
query59	1610	1646	1447	1447
query60	327	323	304	304
query61	163	160	153	153
query62	714	666	591	591
query63	239	193	202	193
query64	2423	812	655	655
query65	
query66	1687	485	354	354
query67	30244	29774	29555	29555
query68	
query69	460	336	308	308
query70	1019	1004	964	964
query71	313	269	263	263
query72	2984	2699	2431	2431
query73	872	785	449	449
query74	5102	4957	4773	4773
query75	2685	2615	2283	2283
query76	2303	1150	787	787
query77	411	400	334	334
query78	12448	12480	11886	11886
query79	1548	1041	782	782
query80	846	521	437	437
query81	483	280	243	243
query82	1845	155	128	128
query83	353	270	249	249
query84	263	140	109	109
query85	911	565	443	443
query86	442	356	310	310
query87	3429	3395	3238	3238
query88	3628	2731	2744	2731
query89	466	393	336	336
query90	1808	178	177	177
query91	177	168	142	142
query92	81	87	79	79
query93	1524	1411	857	857
query94	623	327	309	309
query95	664	399	346	346
query96	1026	808	355	355
query97	2741	2741	2600	2600
query98	235	230	236	230
query99	1158	1165	1008	1008
Total cold run time: 256086 ms
Total hot run time: 172055 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/6) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.99% (20997/38891)
Line Coverage 37.54% (199052/530261)
Region Coverage 33.83% (156047/461257)
Branch Coverage 34.81% (67913/195078)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (6/6) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.90% (28148/38089)
Line Coverage 57.84% (305926/528913)
Region Coverage 55.04% (256316/465677)
Branch Coverage 56.51% (110643/195808)

@zclllyybb zclllyybb merged commit d7c033f into apache:master May 30, 2026
32 of 33 checks passed
@linrrzqqq linrrzqqq deleted the fix-sample-single-val branch May 30, 2026 15:54
linrrzqqq added a commit to linrrzqqq/doris that referenced this pull request Jun 1, 2026
…pache#63605)

Problem Summary:

Fix `VAR_SAMP`, `VARIANCE_SAMP`, and `STDDEV_SAMP` to return `NaN` when
the number of valid input values is less than or equal to 1. Sample
variance/stddev are undefined for `n <= 1`, so returning `0.0` is
misleading.

before:
```sql
CREATE TABLE t (id INT, v DOUBLE) DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES('replication_num'='1');
INSERT INTO t VALUES (1, 5.0);  -- 单行

SELECT VAR_SAMP(v), STDDEV_SAMP(v) FROM t;
+-------------+----------------+
| VAR_SAMP(v) | STDDEV_SAMP(v) |
+-------------+----------------+
|           0 |              0 |
+-------------+----------------+
```

now:
```sql
SELECT VAR_SAMP(v), STDDEV_SAMP(v) FROM t;
+-------------+----------------+
| VAR_SAMP(v) | STDDEV_SAMP(v) |
+-------------+----------------+
|         NaN |            NaN |
+-------------+----------------+
```

doc: apache/doris-website#3765
linrrzqqq added a commit to linrrzqqq/doris that referenced this pull request Jun 1, 2026
…pache#63605)

Problem Summary:

Fix `VAR_SAMP`, `VARIANCE_SAMP`, and `STDDEV_SAMP` to return `NaN` when
the number of valid input values is less than or equal to 1. Sample
variance/stddev are undefined for `n <= 1`, so returning `0.0` is
misleading.

before:
```sql
CREATE TABLE t (id INT, v DOUBLE) DUPLICATE KEY(id) DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES('replication_num'='1');
INSERT INTO t VALUES (1, 5.0);  -- 单行

SELECT VAR_SAMP(v), STDDEV_SAMP(v) FROM t;
+-------------+----------------+
| VAR_SAMP(v) | STDDEV_SAMP(v) |
+-------------+----------------+
|           0 |              0 |
+-------------+----------------+
```

now:
```sql
SELECT VAR_SAMP(v), STDDEV_SAMP(v) FROM t;
+-------------+----------------+
| VAR_SAMP(v) | STDDEV_SAMP(v) |
+-------------+----------------+
|         NaN |            NaN |
+-------------+----------------+
```

doc: apache/doris-website#3765
yiguolei pushed a commit that referenced this pull request Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants