Add Python/uv tool to generate Translator component dependency diagrams by gaurav · Pull Request #9 · NCATSTranslator/Core-Components-Working-Group

gaurav · 2026-05-26T00:38:00Z

Sets up a click-based CLI (generate_diagram.py) in translator-components-diagram/ that reads a components CSV, validates id references, and uses Graphviz to produce a dependency diagram. Nodes are clustered by owner team; solid arrows show hard dependencies, dashed arrows show optional "uses" relationships. Components outside the active filter appear as grayed ghost nodes.

Also adds a root .gitignore that excludes all data/ directories (generated outputs and input CSV live there and are not checked in).

WIP

Sets up a click-based CLI (generate_diagram.py) in translator-components-diagram/ that reads a components CSV, validates id references, and uses Graphviz to produce a dependency diagram. Nodes are clustered by owner team; solid arrows show hard dependencies, dashed arrows show optional "uses" relationships. Components outside the active filter appear as grayed ghost nodes. Also adds a root .gitignore that excludes all data/ directories (generated outputs and input CSV live there and are not checked in). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e Sheet Reads GOOGLE_SHEET_ID from a gitignored .env file in the script directory and downloads the sheet's CSV export to data/components.csv before processing. Supports --sheet-gid for selecting a non-default tab. Also gitignores .env files and adds python-dotenv as a dependency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Read 'Name' field (renamed from 'Apps') and 'Gets data from' (renamed from 'Depends on') from the updated Google Sheet column layout - Node labels now show Name / id / Owner on three lines - 'Gets data from' edges now run A→B (data flows toward the source) - 'Uses' edges are dotted bidirectional (A←··→B) - Add a legend cluster explaining both edge types Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ntal layout - Read 'Gets results from' (renamed from 'Gets data from') and 'Calls' (renamed from 'Uses') columns - Solid arrows now run B→A for 'Gets results from' (provider → consumer), consistent with dotted 'Calls' arrows — both point from provider to consumer - Default layout direction changed to TB for a wide horizontal output - Legend updated to reflect new column names and corrected arrow directions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Owner is already shown in each node label, so the cluster boxes were cluttering the data flow layout. Nodes now float freely and are arranged purely by their Gets results from / Calls relationships. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- NCATS (red) and UI (pink): vivid/prominent as the main consumers - DOGSLED (blue), DOGSURF (green), CATRAX (amber): distinct colors for the three main teams - Core Components WG (purple), DINGO (cyan), Shepherd (lime), Retriever (brown): distinct from the main teams for specialized cross-team groups Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Dotted 'Calls' edges now run A→B (caller to callee) - Legend node labels: Producer/Consumer for 'Gets results from', Component/Service for 'Calls' - Legend edge labels: 'Results' and 'API call' Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- 'External data sources' cylinder at the top feeds into kgx-storage-pipeline, marking where the solid-line data flow begins - 'User' double-border oval at the bottom receives from ui, marking where results ultimately go Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

IDs prefixed with '~' in 'Gets results from' or 'Calls' columns are treated as planned connections. These render in gray: dashed for planned 'Gets results from', dotted for planned 'Calls'. Validation and JSON output also cover planned refs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Covers purpose, quick start, CSV format, all diagram conventions (node colours, edge types, ghost nodes, planned edges, terminal nodes), CLI options, repository layout, and a list of possible future improvements. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- validate() now treats unknown refs and duplicate ids as hard errors; main() raises ClickException so a broken sheet never silently renders. - build_graph resolves refs via a resolve() helper that returns None for unknowns instead of falling back to the raw ref string, so missing components no longer materialize as phantom ghost nodes. - Hardcoded entry/exit edges (External-sources → kgx-storage-pipeline and ui → User) are gated on the target id being in active_set or ghost_ids, so filtering or renaming those components no longer leaves default-styled phantom boxes in the diagram. - Google Sheet download switches from urlretrieve to urlopen with a Content-Type check, so HTML login pages from private/missing sheets raise an error instead of being saved as components.csv. - CSV is read with utf-8-sig so a UTF-8 BOM (e.g. from an Excel resave) no longer corrupts the first column header and KeyError on c['id']. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Encapsulate fallback-color state in ColorAssigner so repeated main() invocations in one process don't drift via a module-level counter. - cleanup=True on dot.render so the extension-less duplicate of the dot source is removed (we already write {output_name}.dot explicitly). - Drop rank='min' from the legend cluster — graphviz ignores rank on cluster subgraphs, so the attribute was misleading. - Drop 'png' from --format choices and update the README: PNG is always produced, so listing it as a togglable option was confusing. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Code: - Replace stringly-typed dict rows with a Component dataclass so column typos KeyError at parse time, not midway through rendering. - Extract index_by_id() helper used by both validate and build_graph (previously each built its own parallel id_lower_map). - Sort components by id after load so .dot and .json diffs are stable across CSV row reorderings. - Split build_graph into _compute_active_set, _compute_ghost_ids, _add_active_nodes, _add_ghost_nodes, _add_edges, _add_terminal_nodes, and _add_legend. - Add text_color_for() — picks black/white text from background luminance so dark fills (NCATS red, Retriever brown, etc.) stay readable. - Component.display_name falls back to id when Name is empty. Diagram: - dpi=150 and splines=polyline for sharper, less-busy PNGs. - Drop owner from node labels — already encoded by fill color, and shown in a new HTML-table owner legend. - Recolor planned edges to soft indigo (#7986CB) so they no longer blur with ghost-node gray borders (#999999). - Expand the legend to cover all visual encodings the README documents: owner color swatches, Producer→Consumer / API-call / Planned edge styles, bold border = New in Refactor, (excluded) ghost node, and the cylinder / double-oval terminal shapes. Ergonomics: - load .env from cwd first (standard dotenv behavior, walks up the tree) then from the script directory as a fallback. Document the search order in --google-sheet help text. - Add __pycache__, *.pyc, .DS_Store to .gitignore. README updated for new planned-edge color and to drop the obsolete "compact HTML-table legend" future-improvement bullet (now implemented). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

31 tests covering parse_id_list (tilde handling, whitespace, empty), ColorAssigner (known + unknown owners, palette rotation, state isolation across instances), text_color_for (black/white pick by luminance), index_by_id (case insensitivity), validate (clean, unknown ref, duplicate ids case-insensitive, case-mismatch as warning not error), Component (display_name fallback, all_refs), and load_components (sort order, UTF-8 BOM tolerance, empty Owner becomes "None"). Pytest is added as a dev-group dependency (PEP 735), discoverable via `uv sync --group dev && uv run pytest`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The OWNER_COLORS dict in generate_diagram.py is replaced by a sibling owner-colors.csv (two columns: owner, color) loaded at runtime via load_owner_colors(). Row order in the CSV doubles as legend order, so reordering rows reorders the legend without any Python edit. Add tests for the loader (parse, row-order preservation, whitespace trim, missing-file and missing-column ClickExceptions, and a smoke-test that the shipped CSV always loads). README points maintainers at the CSV for colour changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Cross-cutting infrastructure (Jaeger, etc.) that nearly every component calls produces long converging edges that obscure the actual data-flow structure. A new Ubiquitous boolean column in the sheet (TRUE/yes/1) flags such components to render as a small per-caller copy next to each caller rather than as a single central node. - Component.ubiquitous parsed from the new column with a tolerant _parse_bool helper (TRUE/yes/y/1, case-insensitive). - _compute_ghost_ids and _add_active_nodes skip ubiquitous components so no central or ghost copy is rendered. - _add_edges emits a per-caller clone node (idempotent) with a synthetic id "{caller}__{target}" the first time a caller references a ubiquitous target, then routes the edge to it. The clone reuses the full styling (fill colour, font colour, border weight) so it reads as the same component. - _emit_component_node factored out from _add_active_nodes for reuse by the clone path. - write_json includes Ubiquitous so the JSON export stays a complete round-trip of the CSV columns. - Legend gains an "Ubiquitous (cloned per caller)" entry. Cheap layout knobs applied alongside (concentrate=true to merge parallel edges, splines=true for free routing, ranksep 1.0→0.5, nodesep 0.5→0.3) — together with the Jaeger duplication this packs the diagram into roughly a third of its previous footprint. 16 new tests cover _parse_bool variants, the Ubiquitous column being parsed when present, gracefully defaulting to False when the column is missing (back-compat for older sheets), and the dataclass default. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Remove Data source, User, Planned, New in Refactor, Excluded, and Ubiquitous entries — they're either self-evident or no longer needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>