feat(sort): add locale-aware numeric sorting support(sort-h-thousands-sep.sh) #9848

mattsu2020 · 2025-12-25T23:25:15Z

Implement NumericLocaleSettings to handle thousands separators and decimal points based on locale. Update tokenization logic to accommodate blank thousands separators for numeric and human-numeric modes, ensuring proper parsing of numbers with locale-specific formatting. This enhances compatibility with international number representations.

codspeed-hq · 2025-12-25T23:47:25Z

CodSpeed Performance Report

Merging this PR will not alter performance

_{Comparing mattsu2020:sort_sort-h-thousands-sep.sh (6cde13e) with main (8bb31ee)}

Summary

✅ 142 untouched benchmarks
⏩ 180 skipped benchmarks¹

180 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

github-actions · 2025-12-25T23:48:09Z

GNU testsuite comparison:

Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!
Note: The gnu test tests/tail/tail-n0f is now being skipped but was previously passing.

github-actions · 2025-12-26T05:39:30Z

GNU testsuite comparison:

GNU test failed: tests/sort/sort-stale-thread-mem. tests/sort/sort-stale-thread-mem is passing on 'main'. Maybe you have to rebase?
Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

github-actions · 2025-12-29T02:20:55Z

GNU testsuite comparison:

Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

sylvestre · 2026-01-13T21:33:42Z

it needs tests

src/uu/sort/src/sort.rs

github-actions · 2026-01-13T23:44:32Z

GNU testsuite comparison:

Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

github-actions · 2026-01-15T11:30:01Z

GNU testsuite comparison:

Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

github-actions · 2026-01-17T12:23:36Z

GNU testsuite comparison:

Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

github-actions · 2026-01-17T13:02:16Z

GNU testsuite comparison:

Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

github-actions · 2026-01-21T03:28:08Z

GNU testsuite comparison:

Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

github-actions · 2026-01-21T11:49:19Z

GNU testsuite comparison:

Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

ChrisDryden · 2026-01-21T21:00:56Z

Hey Matt! #10339 (comment) we had a discussion here about the two different PR's for the thousands seperator here, do you think it would be possible to rebase your PR to the latest mainline to fix the issues related to humanreadable suffixes and the tokenizer issues.

github-actions · 2026-01-21T23:56:01Z

GNU testsuite comparison:

GNU test failed: tests/sort/sort-debug-keys. tests/sort/sort-debug-keys is passing on 'main'. Maybe you have to rebase?
Skipping an intermittent issue tests/shuf/shuf-reservoir (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/sort/sort-stale-thread-mem (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/env/env-signal-handler is now passing!

github-actions · 2026-01-22T00:21:46Z

GNU testsuite comparison:

Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

ChrisDryden · 2026-01-22T03:18:47Z

Any idea how we broke sort-debug-keys and sort-debug-warn in the last merge?

I think its because the default grouping for C is an empty string instead of whats provided by the ICU library

ChrisDryden · 2026-01-22T03:25:12Z

src/uu/sort/src/sort.rs

Can we add the

!matches.contains_id(options::KEY) &&

For fixing the issue with sort-debug-keys

I checked out this pr locally and I think this is the only thing stopping the debug warn gnu test from passing, mind adding it to this pr too?

I checked out this pr locally and I think this is the only thing stopping the debug warn gnu test from passing, mind adding it to this pr too?

ok add

ChrisDryden · 2026-01-22T03:26:21Z

In the i18n helper function for GROUPING_SEP can we override the default grouping seperator for the C locale? I was thinking something like this?

GROUPING_SEP.get_or_init(|| {
          let loc = get_numeric_locale().0.clone();
          // C/POSIX locale (represented as "und") has no grouping separator
          if loc == locale!("und") {
              String::new()
          } else {
              get_grouping_separator(loc)
          }
      })

github-actions · 2026-01-22T03:56:37Z

GNU testsuite comparison:

Skip an intermittent issue tests/tty/tty-eof (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/sort/sort-debug-keys is no longer failing!
Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

github-actions · 2026-01-22T05:04:19Z

GNU testsuite comparison:

GNU test failed: tests/tail/tail-n0f. tests/tail/tail-n0f is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/shuf/shuf-reservoir (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/sort/sort-stale-thread-mem (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/sort/sort-debug-keys is no longer failing!
Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

github-actions · 2026-01-22T06:09:32Z

GNU testsuite comparison:

Skip an intermittent issue tests/tty/tty-eof (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/sort/sort-debug-keys is no longer failing!
Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

sylvestre · 2026-01-22T12:43:07Z

maybe do we it in a separate PR ?

github-actions · 2026-01-22T23:42:13Z

GNU testsuite comparison:

Congrats! The gnu test tests/sort/sort-debug-keys is no longer failing!
Congrats! The gnu test tests/sort/sort-debug-warn is no longer failing!
Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!
Congrats! The gnu test tests/printf/printf-surprise is now passing!

github-actions · 2026-01-24T04:55:14Z

GNU testsuite comparison:

Congrats! The gnu test tests/sort/sort-debug-keys is no longer failing!
Congrats! The gnu test tests/sort/sort-debug-warn is no longer failing!
Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

Implement NumericLocaleSettings to handle thousands separators and decimal points based on locale. Update tokenization logic to accommodate blank thousands separators for numeric and human-numeric modes, improving parsing of locale-specific numbers. Also refactor numeric locale detection for safety/readability and clean up related initialization/spell-checker ignore.

…in sv_SE locale Add a new test function `test_human_numeric_blank_thousands_sep_locale` to verify that the sort utility correctly handles human-readable numeric sorting when the locale's thousands separator is a blank space (e.g., in sv_SE.UTF-8 or sv_SE). This ensures proper behavior of the `-h` flag with key-based sorting in such locales, preventing potential sorting errors with space-separated numeric strings.

Use array slice for trim_end_matches and String::len for length check to improve readability and efficiency in test_human_numeric_blank_thousands_sep_locale.

Implement NumericLocaleSettings to handle thousands separators and decimal points based on locale. Update tokenization logic to accommodate blank thousands separators for numeric and human-numeric modes, improving parsing of locale-specific numbers. Also refactor numeric locale detection for safety/readability and clean up related initialization/spell-checker ignore.

Use struct literal initialization instead of creating a mutable default and assigning fields, improving code conciseness and readability without changing functionality.

- Ignore thousands separators in debug annotations to match GNU output - Simplify NumInfo parsing by removing redundant thousands separator logic - Enhance detection of numeric locale settings to handle multibyte separators like NBSP correctly, maintaining single-byte behavior for compatibility with upstream GNU coreutils

- Update detect_numeric_locale to check for C locale (ASCII encoding and "und" locale) - In C locale, set thousands_sep to None to avoid incorrect grouping separators - Adjust test expectations to match new sorting behavior for numeric fields in C locale

The assignment of NumInfo::parse result was reformatted by splitting it across two lines to enhance code readability and adhere to line length guidelines.

…ixture Update the expected output for the multiple decimals numeric sort test to reflect the proper ascending order. The values "576,446.88800000" and "576,446.890" were misplaced and have been repositioned to their correct locations in the sorted sequence, ensuring the test accurately validates the sorting logic. The debug fixture was updated accordingly.

Previously, the ordering_incompatible check was performed unconditionally, causing errors even when the --key option was used, where such incompatibilities might not apply. This change adds a condition to skip the check if --key is present, ensuring correct behavior for key-based sorting.

github-actions · 2026-02-01T21:48:08Z

GNU testsuite comparison:

Skip an intermittent issue tests/shuf/shuf-reservoir (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/sort/sort-stale-thread-mem (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tail/overlay-headers (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/env/env-signal-handler is no longer failing!
Congrats! The gnu test tests/sort/sort-debug-keys is no longer failing!
Congrats! The gnu test tests/sort/sort-debug-warn is no longer failing!
Congrats! The gnu test tests/sort/sort-h-thousands-sep is no longer failing!

ChrisDryden mentioned this pull request Dec 26, 2025

GNU coreutils 9.9: Detailed Test report 12/19 #9729

Closed

sylvestre reviewed Jan 13, 2026

View reviewed changes

src/uu/sort/src/sort.rs Outdated Show resolved Hide resolved

mattsu2020 force-pushed the sort_sort-h-thousands-sep.sh branch from 4e5498a to be1ea65 Compare January 13, 2026 23:30

mattsu2020 force-pushed the sort_sort-h-thousands-sep.sh branch from ff7e69e to 22fa140 Compare January 16, 2026 02:50

mattsu2020 requested a review from sylvestre January 16, 2026 03:29

ChrisDryden mentioned this pull request Jan 20, 2026

GNU coreutils 9.9: Detailed Test report 01/20 #10390

Open

mattsu2020 force-pushed the sort_sort-h-thousands-sep.sh branch from 9688011 to cfbcf7e Compare January 21, 2026 03:16

ChrisDryden mentioned this pull request Jan 21, 2026

fix: numeric sort (-n) does not recognize thousand separators #10339

Merged

mattsu2020 force-pushed the sort_sort-h-thousands-sep.sh branch from 6d50fc4 to 9dc0648 Compare January 21, 2026 23:41

ChrisDryden reviewed Jan 22, 2026

View reviewed changes

mattsu2020 added 11 commits February 1, 2026 22:34

refactor: simplify separator trimming in locale test

80f49fb

Use array slice for trim_end_matches and String::len for length check to improve readability and efficiency in test_human_numeric_blank_thousands_sep_locale.

refactor(sort): simplify detect_numeric_locale with struct literal

dd72034

Use struct literal initialization instead of creating a mutable default and assigning fields, improving code conciseness and readability without changing functionality.

refactor(sort): split long line assignment for improved readability

5e0d83f

The assignment of NumInfo::parse result was reformatted by splitting it across two lines to enhance code readability and adhere to line length guidelines.

i18n: treat C locale as no grouping separator

b1b2a19

sylvestre force-pushed the sort_sort-h-thousands-sep.sh branch from 4035d6a to 6cde13e Compare February 1, 2026 21:34

sylvestre merged commit d24c343 into uutils:main Feb 1, 2026
128 of 130 checks passed

sylvestre mentioned this pull request Feb 1, 2026

Sort: improve the code after recent changes #10647

Draft

mattsu2020 deleted the sort_sort-h-thousands-sep.sh branch February 1, 2026 23:15

moonfruit mentioned this pull request Feb 3, 2026

uutils-selected 0.6.0 moonfruit/homebrew-tap#453

Closed

Uh oh!

feat(sort): add locale-aware numeric sorting support(sort-h-thousands-sep.sh) #9848

feat(sort): add locale-aware numeric sorting support(sort-h-thousands-sep.sh) #9848

Uh oh!

Conversation

mattsu2020 commented Dec 25, 2025

Uh oh!

codspeed-hq bot commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Performance Report

Merging this PR will not alter performance

Summary

Footnotes

Uh oh!

github-actions bot commented Dec 25, 2025

Uh oh!

github-actions bot commented Dec 26, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

sylvestre commented Jan 13, 2026

Uh oh!

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 15, 2026

Uh oh!

github-actions bot commented Jan 17, 2026

Uh oh!

github-actions bot commented Jan 17, 2026

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

ChrisDryden commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 21, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

ChrisDryden commented Jan 22, 2026

Uh oh!

ChrisDryden Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

ChrisDryden Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattsu2020 Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

ChrisDryden commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

sylvestre commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 24, 2026

Uh oh!

github-actions bot commented Feb 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codspeed-hq bot commented Dec 25, 2025 •

edited

Loading

ChrisDryden Jan 22, 2026 •

edited

Loading