Skip to content

Conversation

@ehuss
Copy link
Contributor

@ehuss ehuss commented Dec 20, 2025

I realized that we never described what the (non-greedy) text meant. Although I suspect most readers understand it, I felt like it would be best to be explicit about what it means.

I realized that we never described what the `(non-greedy)` text meant.
Although I suspect most readers understand it, I felt like it would be
best to be explicit about what it means.
@rustbot rustbot added the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label Dec 20, 2025
```

When presented with the input `"one" or "two"`, the EXAMPLE_STRING rule will match `"one"` instead of the entire input.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that example isn't illustrating what you intend it to: with the ~" there, greediness doesn't make any difference. The repetition has to stop before the first " in any case.

@mattheww
Copy link
Contributor

mattheww commented Dec 21, 2025

I don't think there's a way of interpreting "stops matching as soon as possible" which makes the current tokenisation rules work.

Consider whether RAW_STRING_LITERAL accepts r#"a"b"#. We need the answer to be "yes".

Looking at the definition of RAW_STRING_LITERAL, it must be the case that RAW_STRING_CONTENT accepts #"a"b"#.

Looking at the definition of RAW_STRING_CONTENT, it must be the case that RAW_STRING_CONTENT accepts "a"b".

(So in some sense the non-greedy expression in RAW_STRING_CONTENT must be willing in this case to skip across one ", in order to find a solution for its parent expression.)

Now consider whether Token accepts r"a"b". We need the answer to be "no".

Token will accept r"a"b" if RAW_STRING_LITERAL does.

RAW_STRING_LITERAL will accept r"a"b" if RAW_STRING_CONTENT accepts "a"b".

But we've already decided above that it does. That isn't the answer we want.

I put one suggestion for dealing with this in rust-lang/fls#600 (comment)

I suppose another approach might be to find a way to make the spec say that tokenisation would never consider r"a"b" because it would have already found r"a", and won't backtrack from there.

@ehuss
Copy link
Contributor Author

ehuss commented Dec 21, 2025

Thanks for the feedback! Indeed this is not right and a bit of a mess. Sorry, I should have posted that we may be considering removing the non-greedy rules, and thus not merging this PR. They are now only used in the raw string rules, and we may take a different approach for those. Hopefully I'll be able to post more on it soonish.

@traviscross traviscross added S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. and removed S-waiting-on-review Status: The marked PR is awaiting review from a maintainer labels Dec 22, 2025
@traviscross
Copy link
Contributor

traviscross commented Dec 22, 2025

Something we had talked about is that if we did keep *?, the two plausible rewrite rules are, I think:

Rule 1:

R -> A E*? S B
// Desugars to:
R -> A _0 B
_0 -> S | E _0

Rule 2:

R -> A E*? S B
// Desugars to:
R -> A _0
_0 -> S B | E _0

The second rule is fairly subtle, especially in its interaction with cut.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants