Skip to content

feat: scaffolding, caching, EGFR#65

Open
tristan-f-r wants to merge 21 commits intomainfrom
egfr-and-infrastructure
Open

feat: scaffolding, caching, EGFR#65
tristan-f-r wants to merge 21 commits intomainfrom
egfr-and-infrastructure

Conversation

@tristan-f-r
Copy link
Contributor

@tristan-f-r tristan-f-r commented Mar 18, 2026

We bundle EGFR along with the rest of the caching infrastructure. Notes:

  • All motivation for the caching system lives under cache/README.md.
  • We removed pra.yaml for now, as the only PRAs are the synthetic data and the ResponseNet data, and soon the DepMap data.
  • The CONTRIBUTING.md file is not finalized, and is simply there to not break Changes to CONTRIBUTING guide #57. I may split all contributing material into Changes to CONTRIBUTING guide #57 later.
  • directory.py contains unnecessary files from other datasets that were deemed universal.
  • I would like to keep the web folder even though I'm aware no one is currently in a position to review it.

@tristan-f-r tristan-f-r added the enhancement New feature or request label Mar 18, 2026
@tristan-f-r tristan-f-r changed the title feat: initial scaffolding, EGFR feat: scaffolding, caching, EGFR Mar 18, 2026
Copy link
Collaborator

@ntalluri ntalluri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a light review of the PR; did not look to hard at the code itself yet. I mostly was gathering ideas on what was happening from the READMEs.

tristan-f-r and others added 2 commits March 18, 2026 16:54
Co-authored-by: Neha Talluri <78840540+ntalluri@users.noreply.github.com>
@tristan-f-r tristan-f-r mentioned this pull request Mar 24, 2026
@tristan-f-r tristan-f-r requested a review from ntalluri March 24, 2026 23:24

The score data (`egfr-prizes.txt`), gold standard nodes `eight-egfr-reference-all.txt`, and the (now-deprecated) manually edited `iRefIndex`-based interactome are all from [_Synthesizing Signaling Pathways from Temporal Phosphoproteomic Data_](https://doi.org/10.1016/j.celrep.2018.08.085).

We also use the StringDB human interactome and UniProt mapping files. See `cache/directory.py` for more info on these.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We also use the StringDB human interactome and UniProt mapping files. See `cache/directory.py` for more info on these.
We also use the StringDB v12 human interactome and UniProt mapping files. See `cache/directory.py` for more info on these.

what is the date of the UniProt mapping file (when was it downloaded)? Please add that date to this as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add that identifiers we are mapping from and mapping to for each of the data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That for the iRefIndex/PhosphoSite we use Uniprot IDs for all the data and for String we use ENSP.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be moved to the new web PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be added to #57?

```

`spras` is the cloned submodule of [SPRAS](https://github.com/reed-compbio/spras),
`configs` is the YAML file used to talk to SPRAS, and `datasets` contains the raw data. `cache` is utility for `datasets` which provides a convenient
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`configs` is the YAML file used to talk to SPRAS, and `datasets` contains the raw data. `cache` is utility for `datasets` which provides a convenient
`configs` is the YAML file used to set up workflows in SPRAS, and `datasets` contains the raw and processed data. `cache` is utility for `datasets` which provides a convenient

Copy link
Collaborator

@ntalluri ntalluri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated Review

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we doing the trimming in this PR?

tristan-f-r and others added 2 commits March 25, 2026 12:05
Co-authored-by: Neha Talluri <78840540+ntalluri@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset Mutating datasets in any way. enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants