Add network, sankey, COG bar chart, and ego-network visualizations#19
Closed
eboyer221 wants to merge 1 commit into
Closed
Add network, sankey, COG bar chart, and ego-network visualizations#19eboyer221 wants to merge 1 commit into
eboyer221 wants to merge 1 commit into
Conversation
11 tasks
Contributor
Author
|
Closing in favor of #20, which includes all these changes plus the package rename from amRshiny to amRviz. |
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds several new visualizations to the dashboard, all integrated with the existing data sources.
New visualizations
Drug-feature network (new "Network" tab) — interactive
networkD3force-directed graph linking drugs/drug classes to their top ML-predicted features. Edge thickness encodes feature importance. Optional toggles add cluster and COG tiers from the annotations parquet.Resistance flow Sankey (Metadata tab) — multi-tier diagram showing phenotype → drug class → drug → country → host → isolation source. Flow widths are record counts. A drug-class multi-select lets the user focus the diagram (defaults to top 3 classes by record count).
Cross-model accuracy distributions (Model holdouts tab) — box-with-jitter plots of balanced accuracy by drug class, split by country and time, colored Same vs Different (self-eval vs cross-tested). Added as a new "Accuracy distributions" sub-tab.
Cross-model performance heatmaps (Model holdouts tab) — country and time heatmaps now render side by side in a unified "Model performance" sub-tab, using a purple gradient (
#f2f0f7→#54278f) with axes labelled "Train data" / "Test data". Titled "Cross-country performance" and "Cross-time performance".COG bar chart (Feature Importance tab) — horizontal bar chart of the most common COGs across the currently displayed features, on both "Across bugs" and "Across drugs" sub-tabs.
Feature ego-network (Feature Importance tab) — small
networkD3graph that updates when a row is clicked in the feature importance table, showing the selected feature → its cluster (BV-BRC link) → its COGs (NCBI links).Annotation enrichment
load_feature_annotations()andenrich_with_annotations()helpers readcluster_feature_COG.parquetand join feature ids onVariable→feature(splitting on_to extract Pfam ids), collapsing multiple COGs per feature into comma-separated cells.load_feature_name_map()readsgene_names.parquet/domain_names.parquet/protein_names.parquetfromamrdata_root(defaults to~/amRdata/dataif it exists), giving heatmap rows likegroup_6367 (Glutamate decarboxylase (EC 4.1.1.15))instead of opaque IDs.COG_namehyperlink was removed.Other changes
launchAMRDashboard()gains an optionalamrdata_rootargument for the name-map / annotation lookup.networkD3moved from Suggests to Imports.cluster_feature_COG.parquetfor Shigella flexneri ininst/extdata/so the new visualizations work out-of-the-box with the demo data.Test plan