Skip to content

Add network, sankey, COG bar chart, and ego-network visualizations#19

Closed
eboyer221 wants to merge 1 commit into
mainfrom
additional-plots-2
Closed

Add network, sankey, COG bar chart, and ego-network visualizations#19
eboyer221 wants to merge 1 commit into
mainfrom
additional-plots-2

Conversation

@eboyer221
Copy link
Copy Markdown
Contributor

Summary

This PR adds several new visualizations to the dashboard, all integrated with the existing data sources.

New visualizations

  • Drug-feature network (new "Network" tab) — interactive networkD3 force-directed graph linking drugs/drug classes to their top ML-predicted features. Edge thickness encodes feature importance. Optional toggles add cluster and COG tiers from the annotations parquet.

  • Resistance flow Sankey (Metadata tab) — multi-tier diagram showing phenotype → drug class → drug → country → host → isolation source. Flow widths are record counts. A drug-class multi-select lets the user focus the diagram (defaults to top 3 classes by record count).

  • Cross-model accuracy distributions (Model holdouts tab) — box-with-jitter plots of balanced accuracy by drug class, split by country and time, colored Same vs Different (self-eval vs cross-tested). Added as a new "Accuracy distributions" sub-tab.

  • Cross-model performance heatmaps (Model holdouts tab) — country and time heatmaps now render side by side in a unified "Model performance" sub-tab, using a purple gradient (#f2f0f7#54278f) with axes labelled "Train data" / "Test data". Titled "Cross-country performance" and "Cross-time performance".

  • COG bar chart (Feature Importance tab) — horizontal bar chart of the most common COGs across the currently displayed features, on both "Across bugs" and "Across drugs" sub-tabs.

  • Feature ego-network (Feature Importance tab) — small networkD3 graph that updates when a row is clicked in the feature importance table, showing the selected feature → its cluster (BV-BRC link) → its COGs (NCBI links).

Annotation enrichment

  • New load_feature_annotations() and enrich_with_annotations() helpers read cluster_feature_COG.parquet and join feature ids on Variablefeature (splitting on _ to extract Pfam ids), collapsing multiple COGs per feature into comma-separated cells.
  • New load_feature_name_map() reads gene_names.parquet / domain_names.parquet / protein_names.parquet from amrdata_root (defaults to ~/amRdata/data if it exists), giving heatmap rows like group_6367 (Glutamate decarboxylase (EC 4.1.1.15)) instead of opaque IDs.
  • Feature importance table now shows cluster (linked to BV-BRC) and COG (linked to NCBI COG) columns. The unhelpful COG_name hyperlink was removed.

Other changes

  • launchAMRDashboard() gains an optional amrdata_root argument for the name-map / annotation lookup.
  • networkD3 moved from Suggests to Imports.
  • Included cluster_feature_COG.parquet for Shigella flexneri in inst/extdata/ so the new visualizations work out-of-the-box with the demo data.

Test plan

  1. From the package root, open R and run:
    devtools::load_all(".")
    app <- launchAMRDashboard()
    shiny::runApp(app)

@eboyer221
Copy link
Copy Markdown
Contributor Author

Closing in favor of #20, which includes all these changes plus the package rename from amRshiny to amRviz.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant