Skip to content

Replace core parsing/resolution engine with Rubydex#447

Open
paracycle wants to merge 7 commits intomainfrom
ufuk/rubydex-rewrite
Open

Replace core parsing/resolution engine with Rubydex#447
paracycle wants to merge 7 commits intomainfrom
ufuk/rubydex-rewrite

Conversation

@paracycle
Copy link
Copy Markdown
Member

Summary

  • Replace the entire AST walking, constant extraction, and resolution pipeline with Rubydex, a high-performance Ruby indexer written in Rust
  • Rewrite association detection to use Prism's native AST directly, dropping the parser gem as a direct dependency
  • Net result: +403 / -3,391 lines across 47 files

Architecture change

Before: For each file → parse with Prism → walk AST nodes → extract constant references via ConstNodeInspector → resolve via constant_resolver gem → check package violations

After: Index all workspace files in one Rubydex::Graph batch call → resolve all constants → iterate ResolvedConstantReference objects with direct links to target declarations → check package violations. Association detection (has_many, belongs_to, etc.) runs as a supplementary Prism-based pass since Rubydex doesn't understand ActiveRecord semantics.

Dependencies

Removed Added
constant_resolver rubydex
parallel
ast (direct)
parser (direct)

Kept: prism (for association detection), better_html (for ERB), activesupport, sorbet-runtime, zeitwerk, bundler.

Deleted files (16 source + 11 test + 3 RBI)

The entire per-file parsing pipeline: file_processor, node_processor, node_processor_factory, node_visitor, node_helpers, const_node_inspector, constant_name_inspector, constant_discovery, parsed_constant_definitions, reference_extractor, unresolved_reference, association_inspector, cache, parsers/ruby, parsers/factory, parsers/parser_interface.

Key design decisions

  1. Full workspace indexing: index_and_resolve indexes ALL Ruby files in the workspace (not just the scoped check set) so Rubydex can resolve cross-package references. Only the scoped files are checked for violations.
  2. Association pass: Kept as a separate Prism-native AST walk since Rubydex treats has_many :orders as a method call, not a constant reference. Uses graph.resolve_constant to resolve the inferred constant name.
  3. ERB support: Parsers::Erb simplified to extract_ruby_source which feeds into graph.index_source.
  4. No more caching: Rubydex's Rust engine is fast enough that the MD5-based file cache is unnecessary.
  5. No more parallelism gem: Rubydex handles parallelism internally in Rust. The parallel flag is accepted but ignored.

Test results

All 136 tests pass (excluding spring_command_test and autoload_test which have pre-existing Ruby 4.0 incompatibilities unrelated to this change).

@paracycle paracycle requested a review from a team as a code owner April 13, 2026 23:25
Replace the entire AST walking, constant extraction, and resolution
pipeline with Rubydex, a high-performance Ruby indexer written in Rust.

The key architectural change is that Packwerk no longer parses files
individually and walks AST nodes to find constant references. Instead,
Rubydex indexes all workspace files in a single batch call, resolves
all constants, and provides resolved references with direct links to
their target declarations.

Core changes:
- RunContext: rewritten to create a Rubydex::Graph, index the full
  workspace for resolution, and iterate resolved constant references
  to detect cross-package violations
- ParseRun: simplified to two phases (index_and_resolve + find_offenses)
  instead of per-file parallel processing
- Association detection: rewritten using Prism native AST (no longer
  needs the parser gem translation layer), runs as a supplementary
  pass since Rubydex doesn't understand ActiveRecord associations
- ERB support: simplified to extract_ruby_source which feeds into
  Rubydex's index_source API
- ApplicationValidator: uses Rubydex::Graph instead of ConstantResolver

Removed dependencies: constant_resolver, parallel, ast, parser (direct)
Added dependency: rubydex
Kept: prism (for association detection), better_html (for ERB)

Deleted 16 source files, 11 test files, 3 RBI files (~3,400 lines removed).
Net change: +403 / -3,391 lines.
@paracycle paracycle force-pushed the ufuk/rubydex-rewrite branch 2 times, most recently from ad87dca to 7fa6be2 Compare April 13, 2026 23:57
Update tapioca require file to remove constant_resolver, parallel,
spring, and minitest/autorun; add prism and rubydex requires.
Regenerate all gem RBIs, cleaning up stale files from older gem versions.
Add ostruct gem for Ruby 4.0 compatibility with yard/tapioca.
@paracycle paracycle force-pushed the ufuk/rubydex-rewrite branch from 7fa6be2 to 46eea11 Compare April 14, 2026 00:00
@exterm
Copy link
Copy Markdown
Contributor

exterm commented Apr 14, 2026

Cool stuff, Ufuk. Would this mean packwerk doesn't depend on the interrogated codebase using zeitwerk anymore?

@paracycle
Copy link
Copy Markdown
Member Author

Cool stuff, Ufuk. Would this mean packwerk doesn't depend on the interrogated codebase using zeitwerk anymore?

That's correct. Rubydex can do proper Ruby constant resolution, so we don't need to use Zeitwerk heuristics to figure out what constant references resolve to based on their filename. At least, we shouldn't, and, if there are any problems with the resolution, then we should fix them in Rubydex.

ERB files fed to graph.index_source need a file:// URI, not a bare
path. Also add location_to_relative_path helper that catches
NotFileUriError for any edge cases where a location doesn't have a
file:// URI.
The post-graph work (especially the association detection pass that
re-parses all files with Prism) is a significant portion of total
runtime. On the Shopify monolith, the association pass takes ~39s
single-threaded.

Parallelize the Prism parsing phase of association detection using
the parallel gem. The resolution and violation checking phases remain
sequential since they use shared state (graph + package_set).

The parallel flag flows from Configuration -> ParseRun -> RunContext
as before.
Split collect_constant_reference_offenses into two phases:

1. Extract: iterate Rubydex's resolved references and pull all needed
   data into plain Ruby hashes grouped by source file. This must be
   sequential since it crosses the Rust FFI boundary.

2. Check: process each file's references for dependency violations in
   parallel using forked workers. Only plain Ruby objects (strings,
   integers, hashes) cross the fork boundary -- no Rust FFI objects.

On the Shopify monolith, the post-graph reference iteration + violation
checking was the biggest bottleneck at ~134s single-threaded. The
extraction phase remains sequential but the violation checking across
~57k cross-package references is now parallelized.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants