Skip to content

Add Python/uv tool to generate Translator component dependency diagrams#9

Draft
gaurav wants to merge 45 commits into
mainfrom
add-translator-components-diagrams-code
Draft

Add Python/uv tool to generate Translator component dependency diagrams#9
gaurav wants to merge 45 commits into
mainfrom
add-translator-components-diagrams-code

Conversation

@gaurav

@gaurav gaurav commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Sets up a click-based CLI (generate_diagram.py) in translator-components-diagram/ that reads a components CSV, validates id references, and uses Graphviz to produce a dependency diagram. Nodes are clustered by owner team; solid arrows show hard dependencies, dashed arrows show optional "uses" relationships. Components outside the active filter appear as grayed ghost nodes.

Also adds a root .gitignore that excludes all data/ directories (generated outputs and input CSV live there and are not checked in).

WIP

gaurav and others added 30 commits May 25, 2026 20:37
Sets up a click-based CLI (generate_diagram.py) in translator-components-diagram/
that reads a components CSV, validates id references, and uses Graphviz to produce
a dependency diagram. Nodes are clustered by owner team; solid arrows show hard
dependencies, dashed arrows show optional "uses" relationships. Components outside
the active filter appear as grayed ghost nodes.

Also adds a root .gitignore that excludes all data/ directories (generated outputs
and input CSV live there and are not checked in).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e Sheet

Reads GOOGLE_SHEET_ID from a gitignored .env file in the script directory
and downloads the sheet's CSV export to data/components.csv before processing.
Supports --sheet-gid for selecting a non-default tab.

Also gitignores .env files and adds python-dotenv as a dependency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Read 'Name' field (renamed from 'Apps') and 'Gets data from' (renamed
  from 'Depends on') from the updated Google Sheet column layout
- Node labels now show Name / id / Owner on three lines
- 'Gets data from' edges now run A→B (data flows toward the source)
- 'Uses' edges are dotted bidirectional (A←··→B)
- Add a legend cluster explaining both edge types

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ntal layout

- Read 'Gets results from' (renamed from 'Gets data from') and 'Calls'
  (renamed from 'Uses') columns
- Solid arrows now run B→A for 'Gets results from' (provider → consumer),
  consistent with dotted 'Calls' arrows — both point from provider to consumer
- Default layout direction changed to TB for a wide horizontal output
- Legend updated to reflect new column names and corrected arrow directions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Owner is already shown in each node label, so the cluster boxes were
cluttering the data flow layout. Nodes now float freely and are arranged
purely by their Gets results from / Calls relationships.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- NCATS (red) and UI (pink): vivid/prominent as the main consumers
- DOGSLED (blue), DOGSURF (green), CATRAX (amber): distinct colors for the three main teams
- Core Components WG (purple), DINGO (cyan), Shepherd (lime), Retriever (brown):
  distinct from the main teams for specialized cross-team groups

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Dotted 'Calls' edges now run A→B (caller to callee)
- Legend node labels: Producer/Consumer for 'Gets results from',
  Component/Service for 'Calls'
- Legend edge labels: 'Results' and 'API call'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- 'External data sources' cylinder at the top feeds into kgx-storage-pipeline,
  marking where the solid-line data flow begins
- 'User' double-border oval at the bottom receives from ui,
  marking where results ultimately go

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
IDs prefixed with '~' in 'Gets results from' or 'Calls' columns are
treated as planned connections. These render in gray: dashed for planned
'Gets results from', dotted for planned 'Calls'. Validation and JSON
output also cover planned refs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers purpose, quick start, CSV format, all diagram conventions (node
colours, edge types, ghost nodes, planned edges, terminal nodes), CLI
options, repository layout, and a list of possible future improvements.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- validate() now treats unknown refs and duplicate ids as hard errors;
  main() raises ClickException so a broken sheet never silently renders.
- build_graph resolves refs via a resolve() helper that returns None for
  unknowns instead of falling back to the raw ref string, so missing
  components no longer materialize as phantom ghost nodes.
- Hardcoded entry/exit edges (External-sources → kgx-storage-pipeline
  and ui → User) are gated on the target id being in active_set or
  ghost_ids, so filtering or renaming those components no longer leaves
  default-styled phantom boxes in the diagram.
- Google Sheet download switches from urlretrieve to urlopen with a
  Content-Type check, so HTML login pages from private/missing sheets
  raise an error instead of being saved as components.csv.
- CSV is read with utf-8-sig so a UTF-8 BOM (e.g. from an Excel resave)
  no longer corrupts the first column header and KeyError on c['id'].

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Encapsulate fallback-color state in ColorAssigner so repeated main()
  invocations in one process don't drift via a module-level counter.
- cleanup=True on dot.render so the extension-less duplicate of the
  dot source is removed (we already write {output_name}.dot explicitly).
- Drop rank='min' from the legend cluster — graphviz ignores rank on
  cluster subgraphs, so the attribute was misleading.
- Drop 'png' from --format choices and update the README: PNG is always
  produced, so listing it as a togglable option was confusing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Code:
- Replace stringly-typed dict rows with a Component dataclass so column
  typos KeyError at parse time, not midway through rendering.
- Extract index_by_id() helper used by both validate and build_graph
  (previously each built its own parallel id_lower_map).
- Sort components by id after load so .dot and .json diffs are stable
  across CSV row reorderings.
- Split build_graph into _compute_active_set, _compute_ghost_ids,
  _add_active_nodes, _add_ghost_nodes, _add_edges, _add_terminal_nodes,
  and _add_legend.
- Add text_color_for() — picks black/white text from background luminance
  so dark fills (NCATS red, Retriever brown, etc.) stay readable.
- Component.display_name falls back to id when Name is empty.

Diagram:
- dpi=150 and splines=polyline for sharper, less-busy PNGs.
- Drop owner from node labels — already encoded by fill color, and shown
  in a new HTML-table owner legend.
- Recolor planned edges to soft indigo (#7986CB) so they no longer blur
  with ghost-node gray borders (#999999).
- Expand the legend to cover all visual encodings the README documents:
  owner color swatches, Producer→Consumer / API-call / Planned edge
  styles, bold border = New in Refactor, (excluded) ghost node, and the
  cylinder / double-oval terminal shapes.

Ergonomics:
- load .env from cwd first (standard dotenv behavior, walks up the tree)
  then from the script directory as a fallback. Document the search
  order in --google-sheet help text.
- Add __pycache__, *.pyc, .DS_Store to .gitignore.

README updated for new planned-edge color and to drop the obsolete
"compact HTML-table legend" future-improvement bullet (now implemented).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
31 tests covering parse_id_list (tilde handling, whitespace, empty),
ColorAssigner (known + unknown owners, palette rotation, state isolation
across instances), text_color_for (black/white pick by luminance),
index_by_id (case insensitivity), validate (clean, unknown ref,
duplicate ids case-insensitive, case-mismatch as warning not error),
Component (display_name fallback, all_refs), and load_components
(sort order, UTF-8 BOM tolerance, empty Owner becomes "None").

Pytest is added as a dev-group dependency (PEP 735), discoverable via
`uv sync --group dev && uv run pytest`.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The OWNER_COLORS dict in generate_diagram.py is replaced by a
sibling owner-colors.csv (two columns: owner, color) loaded at
runtime via load_owner_colors(). Row order in the CSV doubles as
legend order, so reordering rows reorders the legend without any
Python edit.

Add tests for the loader (parse, row-order preservation, whitespace
trim, missing-file and missing-column ClickExceptions, and a
smoke-test that the shipped CSV always loads). README points
maintainers at the CSV for colour changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cross-cutting infrastructure (Jaeger, etc.) that nearly every component
calls produces long converging edges that obscure the actual data-flow
structure. A new Ubiquitous boolean column in the sheet (TRUE/yes/1)
flags such components to render as a small per-caller copy next to each
caller rather than as a single central node.

- Component.ubiquitous parsed from the new column with a tolerant
  _parse_bool helper (TRUE/yes/y/1, case-insensitive).
- _compute_ghost_ids and _add_active_nodes skip ubiquitous components
  so no central or ghost copy is rendered.
- _add_edges emits a per-caller clone node (idempotent) with a synthetic
  id "{caller}__{target}" the first time a caller references a
  ubiquitous target, then routes the edge to it. The clone reuses the
  full styling (fill colour, font colour, border weight) so it reads as
  the same component.
- _emit_component_node factored out from _add_active_nodes for reuse by
  the clone path.
- write_json includes Ubiquitous so the JSON export stays a complete
  round-trip of the CSV columns.
- Legend gains an "Ubiquitous (cloned per caller)" entry.

Cheap layout knobs applied alongside (concentrate=true to merge parallel
edges, splines=true for free routing, ranksep 1.0→0.5, nodesep 0.5→0.3)
— together with the Jaeger duplication this packs the diagram into
roughly a third of its previous footprint.

16 new tests cover _parse_bool variants, the Ubiquitous column being
parsed when present, gracefully defaulting to False when the column is
missing (back-compat for older sheets), and the dataclass default.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Remove Data source, User, Planned, New in Refactor, Excluded, and
Ubiquitous entries — they're either self-evident or no longer needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Nodes sharing the same "Part of" value are wrapped in a dotted-border
cluster subgraph with the group name as its label. Ubiquitous components
are excluded from grouping (they render as per-caller clones). The field
is also exported to components.json.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Each component can now declare external entities in a new "Externals"
column using standard CSV syntax. A '<' prefix marks a data source
(rendered as a cylinder at rank=min); '>' marks a sink (double-oval at
rank=max). Multiple components can reference the same external name —
one node is emitted and one edge per referencing component is drawn.

Externals are styled in amber (#FFE082) with a bold border (penwidth=2.5)
and larger font (13pt) so they stand out as the diagram's entry/exit tier.

Removes the hardcoded ENTRY_TARGET / EXIT_SOURCE constants and
_add_terminal_nodes in favour of the new _add_external_nodes_and_edges.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Renders the label as an HTML table cell with a #555555 background and
white 13pt bold text, matching the cluster border color. This gives each
group a clear header tab at the top of its bounding box.

Graphviz cluster labels don't support rotation, so a left-edge label
isn't achievable without complex workarounds.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pins the owner-color legend to the bottom of the diagram alongside any
sink-external nodes. The invisible side-by-side ordering edge to the
edge-style legend cluster is removed so the owner legend can float to
wherever the layout engine places it (typically the right side).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Incoming externals (<) are already pinned to rank=min and outgoing (>)
to rank=max. _add_external_nodes_and_edges now returns the sink node IDs
so build_graph can add invisible constraint=false ordering edges from
each sink external to _leg_owners, nudging the owner legend to the right
of the sink nodes within the shared rank=max row.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Declaration order in the DOT source doesn't affect dot layout; only rank
constraints do. Pins _leg_owners (owner colors) and _leg_p (edge-style
examples) to rank=max so both clusters stay at the bottom.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
gaurav and others added 15 commits May 29, 2026 17:13
Pinning only _leg_p left _leg_c/_leg_a/_leg_b free, causing the legend
cluster to span multiple ranks and stretch vertically. Pinning all four
keeps them on the same rank row.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds rank=same subgraphs inside cluster_legend to lock each example pair
(Producer→Consumer, Component→Service) onto its own horizontal row, with
an invisible ordering edge keeping row 1 above row 2.

Removes the four edge-legend nodes from the rank=max pin — they no longer
need it since the internal rank constraints keep them compact. Only the
owner legend stays pinned to rank=max.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a third row to cluster_legend showing the two external-entity shapes:
a cylinder (data source, rank=min) and a double-oval (user/agent, rank=max),
styled identically to the real external nodes (amber fill, penwidth 2.5).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Components with Hide=TRUE are excluded from the active set, never
rendered as ghost nodes, and have all edges to/from them dropped.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… the same nodes

Fixes visual corruption where concentrate=true caused dotted edges to
overwrite solid ones between the same node pair.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Planned/in-development "Gets results from" edges are now solid red,
and planned "Calls" edges are dotted red, replacing the previous
dashed/dotted indigo that was hard to distinguish from normal edges.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dashed lines are easier to distinguish from solid "Gets results from"
edges than the previous widely-spaced dots. Updates both the diagram
edges (implemented and planned) and the legend example.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
concentrate=true can cause mixed solid/dashed edges between nearby nodes
to render incorrectly merged (e.g. solid edge visually branching off a
dashed edge). --no-concentrate disables this behaviour; --concentrate
(the default) preserves the existing layout.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mixed solid/dashed edges render more correctly without concentrate.
--concentrate can still be passed to enable it; --no-concentrate is
retained as a no-op for forward compatibility.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents script layout with line-number references, CSV column → Component
field mapping, and "I want to change X" navigation patterns so quick edits
don't require reading the full 891-line file.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a third line to component labels for components hosted outside ITRB
(the default). RENCI shows as "Hosted at: RENCI 🌐", Local as "Hosted at:
Local 💻", and Unknown as "Hosted at: Unknown ❓". Also notes in CLAUDE.md
that the diagram script should be run by the user, not Claude.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds Hosted at column to the data model table, a "Change node label format"
common-change entry, and a note to not run the diagram script (the user runs
it). Also creates a top-level CLAUDE.md describing the repo structure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant