Skip to content

test: grammar rule conformance#80

Merged
mhanberg merged 2 commits intoelixir-tools:mainfrom
lukaszsamson:ls-grammar-tests
Feb 7, 2026
Merged

test: grammar rule conformance#80
mhanberg merged 2 commits intoelixir-tools:mainfrom
lukaszsamson:ls-grammar-tests

Conversation

@lukaszsamson
Copy link
Contributor

This PR adds test coverage for all rules in original elixir grammar elixir_parser.yrl. Each rule is expanded to a few simple cases of valid elixir code and the parser is expected to produce exactly the same AST

@mhanberg mhanberg changed the title Add grammar rule conformance tests test: grammar rule conformance Jan 31, 2026
mhanberg pushed a commit that referenced this pull request Feb 7, 2026
)

Supercedes #87 

I had originally intended to break down this into multiple PRs like #87,
but changing one part of the parser often broke another or didn't fully
fix the issues, so I apologize for the large diff. Tackling the root
cause ended up being a larger change but it fixed a whole family of
parser issues that would have taken 100 PRs applying bandaids. This is
also the reason this PR includes some tests from that PR.

This PR brings the `with_context` helper from that PR to help scope the
parser context to specific parsing contexts.

The main issue this PR tackles is how matched, unmatched and no-parens
expressions are handled.
The elixir grammar distinguishes three kinds of expressions:

- "matched": expressions with clear boundaries, like literals,
parenthesized calls, tuples, lists, maps, identifiers, modules.
The key property of these is that they can safely appear as an operand
without ambiguity.

- "unmatched": expressions with do-end blocks. These are "unmatched"
because their boundary is the end keyword, not a closing delimiter.
You cannot nest an unmatched expression inside another unmatched
expression without parentheses:
`if if true do true else false end do` - invalid
`if(if true do true else false end) do` - valid

Operators can take unmatched expressions as their RHS (e.g., x = if true
do 1 end), which is why `unmatched_op_expr` exists separately from
`matched_op_expr` in elixir_parser.yrl

- "no_parens": multi-argument calls without parentheses like `foo a, b,
c`. These cannot be nested.
No-parens expressions are further subdivided:
  - `no_parens_one`: single unambiguous arg (`f a, f key: val`)
  - `no_parens_many`: multiple args (`f a, b, c`)
- `no_parens_one_ambig`: single arg that is itself a no_parens_many (`f
g a, b -> f(g(a, b))`)

The grammar enforces that operators with a no_parens RHS can only
combine with matched/unmatched LHS — not another no_parens expression.

In Spitfire we don't work with grammar rules but we rather consume
tokens as we go, so to achieve the same we have to resort to flags like
`nesting`, `in_map`, `stab_state` etc on top of precedence checks). This
PR expands that to mimic the grammar rules behavior in elixir. I clarify
this because to explain the parser behavior I need to refer to the
elixir grammar, but when you look at the diff in this PR it's mostly
flags and extra checks, not explicit grammar rules.

Some examples of the issues this fixes:

- No-parens calls tended to consume too many tokens
Given `%{foo a, b => c}`, Elixir parses this as `%{foo(a, b) => c}`
because `foo a, b` is a `no_parend_many` and `=>` is an operator that
expects `matched_expr` on both sides.
Spitfire would parse this as `{foo(a, b =>c)}` because it parsed
no-parens calls at the `@lowest` precedence regardless of context.
The fix uses `assoc_op` precedence when `is_map == true` to force
`foo(a, b)` to complete before continuing parsing the rhs of `=>`

- `->` would be parsed as infix in the wrong contexts
`->` should only be allowed when wrapped around parens, `do`-`end` or
`fn`-`end`, but *never* as a regular infix operator.
Spitfire would treat `->` as an infix operator controlled by `is_stab`
but this flag was not scoped.
In code like `fn a, b -> (c -> d) end`, `is_stab` from the `fn` body
leaked into the grouped `(c -> d)` or viceversa.
Adding `stop_before_stab_op?` and scoping it with `with_context` each
boundary is managed independently

- keyword lists not stopping at stab boundary
There is a grammar rule `call_args_no_parens_kw_expr -> kw_eol
matched_expr`, it basically says the value of a keyword pair in
no-parens must be a matched expression, so in this code:
`fn key: expr -> body end`
`expr` stops at `->` because `->` can't appear in a matched expression

Spitfire would continue consuming tokens after `->` because it wasn't
aware of stabs in that context. This PR adds stab-aware functions to
deal with this scenario. Also, differently to #87, this does so more
generally so this takes care of the `%{a do :ok end | b c, d => e, f =>
g}` scenario that is broken in that PR.

Some other issues that aren't strictly related to
matched/unmatched/no-parens but were still necessary:

- `.` no-parens calls didn't parse args
`foo.bar 1, 2` is a valid no-parens call, but Spitfire parsed it as
`foo.bar()` followed by dangling `1, 2`. Checking `no_parens` metadata
like elixir does in `build_identifier`) fixes that

- `block_identifier` not treated as terminal
In the Elixir grammar, `block_identifier` (`else`, `rescue`, `catch`,
`after`) only appears in rules within do-blocks. In Spitfire,
`block_identifier` wasn't in `@terminals` or `@peeks` so the parser
could consume `else`/`rescue` as expression tokens instead of stopping
at block boundaries. For example, a stab body inside a do-block could
run past `else`.

---

A lot of those changes don't have corresponding tests in the PR
specifically. I was working on fixing the issues in property tests added
in #78, #80 and #81. The way I tested all this was to copy the new tests
from those files into my branch and running them locally.

After the changes here:
- Only 6 failures in tests from #78 
- All tests from #80 pass, so that PR could be merged after this
- Only 3 failures in tests from #81 (which I believe are unrelated to
this PR)
@mhanberg mhanberg merged commit 2bfd8c9 into elixir-tools:main Feb 7, 2026
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants