Add Python/uv tool to generate Translator component dependency diagrams#9
Draft
gaurav wants to merge 45 commits into
Draft
Add Python/uv tool to generate Translator component dependency diagrams#9gaurav wants to merge 45 commits into
gaurav wants to merge 45 commits into
Conversation
Sets up a click-based CLI (generate_diagram.py) in translator-components-diagram/ that reads a components CSV, validates id references, and uses Graphviz to produce a dependency diagram. Nodes are clustered by owner team; solid arrows show hard dependencies, dashed arrows show optional "uses" relationships. Components outside the active filter appear as grayed ghost nodes. Also adds a root .gitignore that excludes all data/ directories (generated outputs and input CSV live there and are not checked in). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e Sheet Reads GOOGLE_SHEET_ID from a gitignored .env file in the script directory and downloads the sheet's CSV export to data/components.csv before processing. Supports --sheet-gid for selecting a non-default tab. Also gitignores .env files and adds python-dotenv as a dependency. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Read 'Name' field (renamed from 'Apps') and 'Gets data from' (renamed from 'Depends on') from the updated Google Sheet column layout - Node labels now show Name / id / Owner on three lines - 'Gets data from' edges now run A→B (data flows toward the source) - 'Uses' edges are dotted bidirectional (A←··→B) - Add a legend cluster explaining both edge types Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ntal layout - Read 'Gets results from' (renamed from 'Gets data from') and 'Calls' (renamed from 'Uses') columns - Solid arrows now run B→A for 'Gets results from' (provider → consumer), consistent with dotted 'Calls' arrows — both point from provider to consumer - Default layout direction changed to TB for a wide horizontal output - Legend updated to reflect new column names and corrected arrow directions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Owner is already shown in each node label, so the cluster boxes were cluttering the data flow layout. Nodes now float freely and are arranged purely by their Gets results from / Calls relationships. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- NCATS (red) and UI (pink): vivid/prominent as the main consumers - DOGSLED (blue), DOGSURF (green), CATRAX (amber): distinct colors for the three main teams - Core Components WG (purple), DINGO (cyan), Shepherd (lime), Retriever (brown): distinct from the main teams for specialized cross-team groups Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Dotted 'Calls' edges now run A→B (caller to callee) - Legend node labels: Producer/Consumer for 'Gets results from', Component/Service for 'Calls' - Legend edge labels: 'Results' and 'API call' Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- 'External data sources' cylinder at the top feeds into kgx-storage-pipeline, marking where the solid-line data flow begins - 'User' double-border oval at the bottom receives from ui, marking where results ultimately go Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
IDs prefixed with '~' in 'Gets results from' or 'Calls' columns are treated as planned connections. These render in gray: dashed for planned 'Gets results from', dotted for planned 'Calls'. Validation and JSON output also cover planned refs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers purpose, quick start, CSV format, all diagram conventions (node colours, edge types, ghost nodes, planned edges, terminal nodes), CLI options, repository layout, and a list of possible future improvements. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- validate() now treats unknown refs and duplicate ids as hard errors; main() raises ClickException so a broken sheet never silently renders. - build_graph resolves refs via a resolve() helper that returns None for unknowns instead of falling back to the raw ref string, so missing components no longer materialize as phantom ghost nodes. - Hardcoded entry/exit edges (External-sources → kgx-storage-pipeline and ui → User) are gated on the target id being in active_set or ghost_ids, so filtering or renaming those components no longer leaves default-styled phantom boxes in the diagram. - Google Sheet download switches from urlretrieve to urlopen with a Content-Type check, so HTML login pages from private/missing sheets raise an error instead of being saved as components.csv. - CSV is read with utf-8-sig so a UTF-8 BOM (e.g. from an Excel resave) no longer corrupts the first column header and KeyError on c['id']. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Encapsulate fallback-color state in ColorAssigner so repeated main()
invocations in one process don't drift via a module-level counter.
- cleanup=True on dot.render so the extension-less duplicate of the
dot source is removed (we already write {output_name}.dot explicitly).
- Drop rank='min' from the legend cluster — graphviz ignores rank on
cluster subgraphs, so the attribute was misleading.
- Drop 'png' from --format choices and update the README: PNG is always
produced, so listing it as a togglable option was confusing.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Code: - Replace stringly-typed dict rows with a Component dataclass so column typos KeyError at parse time, not midway through rendering. - Extract index_by_id() helper used by both validate and build_graph (previously each built its own parallel id_lower_map). - Sort components by id after load so .dot and .json diffs are stable across CSV row reorderings. - Split build_graph into _compute_active_set, _compute_ghost_ids, _add_active_nodes, _add_ghost_nodes, _add_edges, _add_terminal_nodes, and _add_legend. - Add text_color_for() — picks black/white text from background luminance so dark fills (NCATS red, Retriever brown, etc.) stay readable. - Component.display_name falls back to id when Name is empty. Diagram: - dpi=150 and splines=polyline for sharper, less-busy PNGs. - Drop owner from node labels — already encoded by fill color, and shown in a new HTML-table owner legend. - Recolor planned edges to soft indigo (#7986CB) so they no longer blur with ghost-node gray borders (#999999). - Expand the legend to cover all visual encodings the README documents: owner color swatches, Producer→Consumer / API-call / Planned edge styles, bold border = New in Refactor, (excluded) ghost node, and the cylinder / double-oval terminal shapes. Ergonomics: - load .env from cwd first (standard dotenv behavior, walks up the tree) then from the script directory as a fallback. Document the search order in --google-sheet help text. - Add __pycache__, *.pyc, .DS_Store to .gitignore. README updated for new planned-edge color and to drop the obsolete "compact HTML-table legend" future-improvement bullet (now implemented). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
31 tests covering parse_id_list (tilde handling, whitespace, empty), ColorAssigner (known + unknown owners, palette rotation, state isolation across instances), text_color_for (black/white pick by luminance), index_by_id (case insensitivity), validate (clean, unknown ref, duplicate ids case-insensitive, case-mismatch as warning not error), Component (display_name fallback, all_refs), and load_components (sort order, UTF-8 BOM tolerance, empty Owner becomes "None"). Pytest is added as a dev-group dependency (PEP 735), discoverable via `uv sync --group dev && uv run pytest`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The OWNER_COLORS dict in generate_diagram.py is replaced by a sibling owner-colors.csv (two columns: owner, color) loaded at runtime via load_owner_colors(). Row order in the CSV doubles as legend order, so reordering rows reorders the legend without any Python edit. Add tests for the loader (parse, row-order preservation, whitespace trim, missing-file and missing-column ClickExceptions, and a smoke-test that the shipped CSV always loads). README points maintainers at the CSV for colour changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cross-cutting infrastructure (Jaeger, etc.) that nearly every component
calls produces long converging edges that obscure the actual data-flow
structure. A new Ubiquitous boolean column in the sheet (TRUE/yes/1)
flags such components to render as a small per-caller copy next to each
caller rather than as a single central node.
- Component.ubiquitous parsed from the new column with a tolerant
_parse_bool helper (TRUE/yes/y/1, case-insensitive).
- _compute_ghost_ids and _add_active_nodes skip ubiquitous components
so no central or ghost copy is rendered.
- _add_edges emits a per-caller clone node (idempotent) with a synthetic
id "{caller}__{target}" the first time a caller references a
ubiquitous target, then routes the edge to it. The clone reuses the
full styling (fill colour, font colour, border weight) so it reads as
the same component.
- _emit_component_node factored out from _add_active_nodes for reuse by
the clone path.
- write_json includes Ubiquitous so the JSON export stays a complete
round-trip of the CSV columns.
- Legend gains an "Ubiquitous (cloned per caller)" entry.
Cheap layout knobs applied alongside (concentrate=true to merge parallel
edges, splines=true for free routing, ranksep 1.0→0.5, nodesep 0.5→0.3)
— together with the Jaeger duplication this packs the diagram into
roughly a third of its previous footprint.
16 new tests cover _parse_bool variants, the Ubiquitous column being
parsed when present, gracefully defaulting to False when the column is
missing (back-compat for older sheets), and the dataclass default.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Remove Data source, User, Planned, New in Refactor, Excluded, and Ubiquitous entries — they're either self-evident or no longer needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Nodes sharing the same "Part of" value are wrapped in a dotted-border cluster subgraph with the group name as its label. Ubiquitous components are excluded from grouping (they render as per-caller clones). The field is also exported to components.json. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each component can now declare external entities in a new "Externals" column using standard CSV syntax. A '<' prefix marks a data source (rendered as a cylinder at rank=min); '>' marks a sink (double-oval at rank=max). Multiple components can reference the same external name — one node is emitted and one edge per referencing component is drawn. Externals are styled in amber (#FFE082) with a bold border (penwidth=2.5) and larger font (13pt) so they stand out as the diagram's entry/exit tier. Removes the hardcoded ENTRY_TARGET / EXIT_SOURCE constants and _add_terminal_nodes in favour of the new _add_external_nodes_and_edges. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Renders the label as an HTML table cell with a #555555 background and white 13pt bold text, matching the cluster border color. This gives each group a clear header tab at the top of its bounding box. Graphviz cluster labels don't support rotation, so a left-edge label isn't achievable without complex workarounds. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pins the owner-color legend to the bottom of the diagram alongside any sink-external nodes. The invisible side-by-side ordering edge to the edge-style legend cluster is removed so the owner legend can float to wherever the layout engine places it (typically the right side). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Incoming externals (<) are already pinned to rank=min and outgoing (>) to rank=max. _add_external_nodes_and_edges now returns the sink node IDs so build_graph can add invisible constraint=false ordering edges from each sink external to _leg_owners, nudging the owner legend to the right of the sink nodes within the shared rank=max row. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Declaration order in the DOT source doesn't affect dot layout; only rank constraints do. Pins _leg_owners (owner colors) and _leg_p (edge-style examples) to rank=max so both clusters stay at the bottom. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pinning only _leg_p left _leg_c/_leg_a/_leg_b free, causing the legend cluster to span multiple ranks and stretch vertically. Pinning all four keeps them on the same rank row. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds rank=same subgraphs inside cluster_legend to lock each example pair (Producer→Consumer, Component→Service) onto its own horizontal row, with an invisible ordering edge keeping row 1 above row 2. Removes the four edge-legend nodes from the rank=max pin — they no longer need it since the internal rank constraints keep them compact. Only the owner legend stays pinned to rank=max. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a third row to cluster_legend showing the two external-entity shapes: a cylinder (data source, rank=min) and a double-oval (user/agent, rank=max), styled identically to the real external nodes (amber fill, penwidth 2.5). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Components with Hide=TRUE are excluded from the active set, never rendered as ghost nodes, and have all edges to/from them dropped. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… the same nodes Fixes visual corruption where concentrate=true caused dotted edges to overwrite solid ones between the same node pair. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Planned/in-development "Gets results from" edges are now solid red, and planned "Calls" edges are dotted red, replacing the previous dashed/dotted indigo that was hard to distinguish from normal edges. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dashed lines are easier to distinguish from solid "Gets results from" edges than the previous widely-spaced dots. Updates both the diagram edges (implemented and planned) and the legend example. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
concentrate=true can cause mixed solid/dashed edges between nearby nodes to render incorrectly merged (e.g. solid edge visually branching off a dashed edge). --no-concentrate disables this behaviour; --concentrate (the default) preserves the existing layout. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mixed solid/dashed edges render more correctly without concentrate. --concentrate can still be passed to enable it; --no-concentrate is retained as a no-op for forward compatibility. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents script layout with line-number references, CSV column → Component field mapping, and "I want to change X" navigation patterns so quick edits don't require reading the full 891-line file. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a third line to component labels for components hosted outside ITRB (the default). RENCI shows as "Hosted at: RENCI 🌐", Local as "Hosted at: Local 💻", and Unknown as "Hosted at: Unknown ❓". Also notes in CLAUDE.md that the diagram script should be run by the user, not Claude. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds Hosted at column to the data model table, a "Change node label format" common-change entry, and a note to not run the diagram script (the user runs it). Also creates a top-level CLAUDE.md describing the repo structure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Sets up a click-based CLI (generate_diagram.py) in translator-components-diagram/ that reads a components CSV, validates id references, and uses Graphviz to produce a dependency diagram. Nodes are clustered by owner team; solid arrows show hard dependencies, dashed arrows show optional "uses" relationships. Components outside the active filter appear as grayed ghost nodes.
Also adds a root .gitignore that excludes all data/ directories (generated outputs and input CSV live there and are not checked in).
WIP