Add APIs for dealing with titlecase#122668
Add APIs for dealing with titlecase#122668Jules-Bertholet wants to merge 10 commits intorust-lang:mainfrom
Conversation
|
☔ The latest upstream changes (presumably #122616) made this pull request unmergeable. Please resolve the merge conflicts. |
5f6fa45 to
fe5d72b
Compare
|
☔ The latest upstream changes (presumably #122013) made this pull request unmergeable. Please resolve the merge conflicts. |
fe5d72b to
b2fdfab
Compare
|
If you want to modify |
This comment has been minimized.
This comment has been minimized.
b2fdfab to
a2512b4
Compare
|
The libs team gave a positive response to the ACP. @rustbot label -S-waiting-on-ACP S-waiting-on-review |
This comment has been minimized.
This comment has been minimized.
a2512b4 to
108a15c
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
108a15c to
15a00b3
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
15a00b3 to
ea8393a
Compare
|
@rustbot reroll |
| c => c > '\x7f' && unicode::Alphabetic(c), | ||
| 'A'..='Z' | 'a'..='z' => true, | ||
| '\0'..='\u{A9}' => false, | ||
| _ => unicode::Alphabetic(self), |
There was a problem hiding this comment.
I think this is from "Extend ASCII fast paths of char methods beyond ASCII"...
Can you confirm we continue to have exhaustive coverage in tests? I'm not sure how much the generated tests by the test generator call the public methods vs check that unicode::Alphabetic (for example) is accurate.
Also, I imagine that flipping the order of capital A-Z vs. lowercase a-z might influence codegen, and lowercase seems more likely to be common. Maybe worth doing something different there?
How did you decide on the particular threshold here (and in other modified functions)? Maybe we can split this out to a separate PR?
There was a problem hiding this comment.
Also, I imagine that flipping the order of capital A-Z vs. lowercase a-z might influence codegen, and lowercase seems more likely to be common. Maybe worth doing something different there?
Good point, I fixed it.
How did you decide on the particular threshold here (and in other modified functions)?
I chose the highest value that would work, using https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp to verify. E.g., for this function, https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5B%3AAlphabetic%3A%5D-%5B%3AASCII%3A%5D&abb=on says that the first non-ASCII alphabetic character is U+AA.
There was a problem hiding this comment.
Can you confirm we continue to have exhaustive coverage in tests? I'm not sure how much the generated tests by the test generator call the public methods vs check that
unicode::Alphabetic(for example) is accurate.
They didn't call the public methods, no. But should now
| /// as returned by [`char::case`]. | ||
| #[unstable(feature = "titlecase", issue = "none")] | ||
| #[derive(Clone, Copy, Debug, PartialEq, Eq, Hash, PartialOrd, Ord)] | ||
| pub enum CharCase { |
There was a problem hiding this comment.
nit: also #[must_use], probably.
There was a problem hiding this comment.
Why? I don't think we usually label types like this #[must_use]?
| /// Titlecase. Corresponds to the `Titlecase_Letter` Unicode general category. | ||
| Title = 0b10, | ||
| /// Uppercase. Corresponds to the `Uppercase` Unicode property. | ||
| Upper = 0b11, |
There was a problem hiding this comment.
Can you add a comment about why these particular discriminants are chosen?
There was a problem hiding this comment.
I added a doc comment explaining this.
|
Reminder, once the PR becomes ready for a review, use |
514f3bb to
c69ea1e
Compare
This comment has been minimized.
This comment has been minimized.
|
@rustbot ready |
ACP: rust-lang/libs-team#354
Tracking issue: #153892
r? libs-api
@rustbot label T-libs -T-libs-api A-unicode
The last commit has some insta-stable(Edit: will do in follow-up)PartialEqimpls, therefore: @rustbot label -needs-fcpAlternatively, I could split those out into a follow-up PR.