Small Python prototype for parsing T-SQL with ANTLR4 and extracting a table-to-column usage map from a SQL file.
The current Python workflow is centered on parse.py and listener.py:
- Reads SQL from
query.sql - Replaces report-style placeholders like
$P{RA_COMP}with a neutral token before parsing - Parses the query using the generated ANTLR T-SQL lexer/parser in
lib/tsql - Walks the parse tree with a custom listener
- Builds a mapping of table names to referenced columns
- Resolves table aliases back to the original table name
- Deduplicates columns before printing the final result
In practice, this is useful when you want a quick dependency view of which columns are referenced from which tables in a T-SQL statement.
parse.py: entry point that loads the SQL, parses it, walks the tree, and prints the final mappinglistener.py: custom ANTLR listener that collects tables, aliases, and column referencesquery.sql: input SQL file parsed by the scriptlib/tsql: generated T-SQL lexer/parser files and upstream grammar assetsansii.py: separate experimental visitor intended to rewrite implicit joins to ANSI joins
TableColumnExtractor listens for two main parse events:
enterTable_source_item: captures table names from theFROMclause and records aliasesenterFull_column_name: captures qualified column references such asa.col
Alias handling works like this:
- If a table appears as
dbo.TableA a, the listener recordsa -> TableA - When it later sees
a.column1, it attributes that column toTableA - After the parse walk completes, alias buckets are merged into the original table entry
For unqualified columns, the listener attempts to assign them to the current table context. That behavior is heuristic and works best for simpler statements.
The Python code expects:
- Python 3
antlr4-python3-runtime
The repository already includes generated parser files, so you do not need to regenerate the grammar just to run the extractor.
Install the ANTLR runtime if needed:
pip install antlr4-python3-runtimeRun the parser from the repository root:
python parse.pyThe script prints output in this shape:
Final Mapping (Original Table Names, Unique Columns):
Original Table: CTSTRS
Column: TRS_DATE
Column: VALUE_DATE
- The script always reads from
query.sql - Placeholder expressions matching
$P{...}are replaced before parsing - The extractor is aimed at
SELECT-style queries with table aliases and column references - Results are printed to stdout; nothing is written to a file
- This is currently a script, not a packaged library or CLI tool
- Unqualified columns may be assigned ambiguously
- More complex T-SQL constructs may need additional listener logic
ansii.pyappears to be exploratory and is not integrated into the main workflow