-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
feat(sort): add locale-aware numeric sorting support(sort-h-thousands-sep.sh) #9848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sort): add locale-aware numeric sorting support(sort-h-thousands-sep.sh) #9848
Conversation
CodSpeed Performance ReportMerging this PR will not alter performanceComparing Summary
Footnotes
|
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
it needs tests |
4e5498a to
be1ea65
Compare
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
ff7e69e to
22fa140
Compare
|
GNU testsuite comparison: |
1 similar comment
|
GNU testsuite comparison: |
9688011 to
cfbcf7e
Compare
|
GNU testsuite comparison: |
1 similar comment
|
GNU testsuite comparison: |
|
Hey Matt! #10339 (comment) we had a discussion here about the two different PR's for the thousands seperator here, do you think it would be possible to rebase your PR to the latest mainline to fix the issues related to humanreadable suffixes and the tokenizer issues. |
6d50fc4 to
9dc0648
Compare
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
Any idea how we broke sort-debug-keys and sort-debug-warn in the last merge? I think its because the default grouping for C is an empty string instead of whats provided by the ICU library |
src/uu/sort/src/sort.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the
!matches.contains_id(options::KEY)
&&
For fixing the issue with sort-debug-keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked out this pr locally and I think this is the only thing stopping the debug warn gnu test from passing, mind adding it to this pr too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked out this pr locally and I think this is the only thing stopping the debug warn gnu test from passing, mind adding it to this pr too?
ok add
|
In the i18n helper function for GROUPING_SEP can we override the default grouping seperator for the C locale? I was thinking something like this? |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
|
maybe do we it in a separate PR ? |
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
Implement NumericLocaleSettings to handle thousands separators and decimal points based on locale. Update tokenization logic to accommodate blank thousands separators for numeric and human-numeric modes, improving parsing of locale-specific numbers. Also refactor numeric locale detection for safety/readability and clean up related initialization/spell-checker ignore.
…in sv_SE locale Add a new test function `test_human_numeric_blank_thousands_sep_locale` to verify that the sort utility correctly handles human-readable numeric sorting when the locale's thousands separator is a blank space (e.g., in sv_SE.UTF-8 or sv_SE). This ensures proper behavior of the `-h` flag with key-based sorting in such locales, preventing potential sorting errors with space-separated numeric strings.
Use array slice for trim_end_matches and String::len for length check to improve readability and efficiency in test_human_numeric_blank_thousands_sep_locale.
Implement NumericLocaleSettings to handle thousands separators and decimal points based on locale. Update tokenization logic to accommodate blank thousands separators for numeric and human-numeric modes, improving parsing of locale-specific numbers. Also refactor numeric locale detection for safety/readability and clean up related initialization/spell-checker ignore.
Use struct literal initialization instead of creating a mutable default and assigning fields, improving code conciseness and readability without changing functionality.
- Ignore thousands separators in debug annotations to match GNU output - Simplify NumInfo parsing by removing redundant thousands separator logic - Enhance detection of numeric locale settings to handle multibyte separators like NBSP correctly, maintaining single-byte behavior for compatibility with upstream GNU coreutils
- Update detect_numeric_locale to check for C locale (ASCII encoding and "und" locale) - In C locale, set thousands_sep to None to avoid incorrect grouping separators - Adjust test expectations to match new sorting behavior for numeric fields in C locale
The assignment of NumInfo::parse result was reformatted by splitting it across two lines to enhance code readability and adhere to line length guidelines.
…ixture Update the expected output for the multiple decimals numeric sort test to reflect the proper ascending order. The values "576,446.88800000" and "576,446.890" were misplaced and have been repositioned to their correct locations in the sorted sequence, ensuring the test accurately validates the sorting logic. The debug fixture was updated accordingly.
Previously, the ordering_incompatible check was performed unconditionally, causing errors even when the --key option was used, where such incompatibilities might not apply. This change adds a condition to skip the check if --key is present, ensuring correct behavior for key-based sorting.
4035d6a to
6cde13e
Compare
|
GNU testsuite comparison: |
Implement NumericLocaleSettings to handle thousands separators and decimal points based on locale. Update tokenization logic to accommodate blank thousands separators for numeric and human-numeric modes, ensuring proper parsing of numbers with locale-specific formatting. This enhances compatibility with international number representations.