Conversation
not needed just yet
ntalluri
left a comment
There was a problem hiding this comment.
I did a light review of the PR; did not look to hard at the code itself yet. I mostly was gathering ideas on what was happening from the READMEs.
Co-authored-by: Neha Talluri <78840540+ntalluri@users.noreply.github.com>
|
|
||
| The score data (`egfr-prizes.txt`), gold standard nodes `eight-egfr-reference-all.txt`, and the (now-deprecated) manually edited `iRefIndex`-based interactome are all from [_Synthesizing Signaling Pathways from Temporal Phosphoproteomic Data_](https://doi.org/10.1016/j.celrep.2018.08.085). | ||
|
|
||
| We also use the StringDB human interactome and UniProt mapping files. See `cache/directory.py` for more info on these. |
There was a problem hiding this comment.
| We also use the StringDB human interactome and UniProt mapping files. See `cache/directory.py` for more info on these. | |
| We also use the StringDB v12 human interactome and UniProt mapping files. See `cache/directory.py` for more info on these. |
what is the date of the UniProt mapping file (when was it downloaded)? Please add that date to this as well.
There was a problem hiding this comment.
Can you also add that identifiers we are mapping from and mapping to for each of the data.
There was a problem hiding this comment.
That for the iRefIndex/PhosphoSite we use Uniprot IDs for all the data and for String we use ENSP.
There was a problem hiding this comment.
should this be moved to the new web PR?
| ``` | ||
|
|
||
| `spras` is the cloned submodule of [SPRAS](https://github.com/reed-compbio/spras), | ||
| `configs` is the YAML file used to talk to SPRAS, and `datasets` contains the raw data. `cache` is utility for `datasets` which provides a convenient |
There was a problem hiding this comment.
| `configs` is the YAML file used to talk to SPRAS, and `datasets` contains the raw data. `cache` is utility for `datasets` which provides a convenient | |
| `configs` is the YAML file used to set up workflows in SPRAS, and `datasets` contains the raw and processed data. `cache` is utility for `datasets` which provides a convenient |
There was a problem hiding this comment.
are we doing the trimming in this PR?
Co-authored-by: Neha Talluri <78840540+ntalluri@users.noreply.github.com>
We bundle EGFR along with the rest of the caching infrastructure. Notes:
cache/README.md.pra.yamlfor now, as the only PRAs are the synthetic data and the ResponseNet data, and soon the DepMap data.CONTRIBUTING.mdfile is not finalized, and is simply there to not break Changes to CONTRIBUTING guide #57. I may split all contributing material into Changes to CONTRIBUTING guide #57 later.directory.pycontains unnecessary files from other datasets that were deemed universal.webfolder even though I'm aware no one is currently in a position to review it.