Add X chr functionality#66
Conversation
|
|
||
| spec["idn"] = int64 | ||
| spec["nLoci"] = int64 | ||
| spec["sex"] = int64 |
There was a problem hiding this comment.
We don’t need int64 for 0/1 sex code. @XingerTang can you suggest a more appropriate type?
There was a problem hiding this comment.
Based on https://numba.readthedocs.io/en/stable/reference/types.html#numbers. uint8 or boolean might be useful here.
| logGenotypeSegregationTensor_XXChrom = np.log( | ||
| genotypeSegregationTensor | ||
| ) | ||
| for seg0 in [0, 1, 9]: |
There was a problem hiding this comment.
Do we need this second block of for loops? Could we not move the content of the loop to the above loop?
|
|
||
| def generateGenoProbs_Xchr_male(): | ||
| global geno_probs_Xchr_male | ||
| error = 0.0001 |
There was a problem hiding this comment.
@AprilYUZhang @XingerTang we should pull out this error number and move it to an argument with a default value (the same for generateGenoProbs() above). Do you want to do it as part of this PR or open a new issue and we tackle this down the road?
| np.array([0, 0.5, 0, 0.5], dtype=np.float32) * (1 - error) + error / 4 | ||
| ) | ||
|
|
||
| # geno_probs[9, 0, 1] no possible genotype |
There was a problem hiding this comment.
If I am reading these correctly we have father, mother, and male progeny genotypes so mother 0/0—>0 and son 0/1->1 is indeed not a biologically plausible situation, however we should set probabilities for such a case nevertheless - there can be errors in the data, so we need a way to handle this case. Any suggestions what to do here?
| geno_probs[9, 0, 0] = ( | ||
| np.array([0.5, 0, 0.5, 0], dtype=np.float32) * (1 - error) + error / 4 | ||
| ) | ||
| # geno_probs[1, 1, 2] no possible genotype |
| geno_probs[9, 9, 1] = ( | ||
| np.array([0, 0.5, 0.5, 0], dtype=np.float32) * (1 - error) + error / 4 | ||
| ) | ||
| geno_probs[9, 9, 0] = ( |
There was a problem hiding this comment.
Nice additions, but do we need these? I don’t know as I am removed from this code. I notice that below we don’t have all possible combinations of genotypes covered either. @XingerTang what are your thoughts?
There was a problem hiding this comment.
AlphaImpute2 keeps its options as simplistic as possible. I prefer to revise the errors once we have decided to update all the remaining option names according to AlphaPeel changes.
There was a problem hiding this comment.
OK re doing the errors later, but the comment refers to assigning geno_probs[x, y, z] for some combinations of x, y, z but not all of these, unless I got the code wrong, which confused me and hence my question.
There was a problem hiding this comment.
Sorry, I was intending to comment on the other review that related to the errors.
This change is probably not completed. We can review the logic afterwards.
| pointEstimates[3, i] *= e | ||
| if sirehap0 != 9: | ||
| if sirehap0 + damhap0 == 1: | ||
| pointEstimates[0, i] *= 1 - e |
There was a problem hiding this comment.
This e is defined above at the beginning of the function. What kind of error is this?
@AprilYUZhang @XingerTang we should pull out this error number and move it to an argument with a default value. Do you want to do it as part of this PR or open a new issue and we tackle this down the road?
| print_title(f"Imputation cycle {cycle + 1}") | ||
| pedigreePeelDown(pedigree, args, genotype_cutoff) | ||
| pedigreePeelUp(pedigree, args, genotype_cutoff) | ||
| pedigreePeelDown(pedigree, args, genotype_cutoff, isXChrom) |
There was a problem hiding this comment.
Once we move to other type of chromosomes/linkage-group/contig (such as Y chr and mtDNA) we should probably change isXChrom to chrType and have autosomal (A) as default and then we switch to X/Y/M. @AprilYUZhang please keep this in mind for future work.
| pointEstimates[3, i] *= e | ||
|
|
||
| elif sirehap0 == 9 and isXChrom: | ||
| if ind.sex == 0: # individual is male |
There was a problem hiding this comment.
I think this part is ok, but I am struggling since I am doing this on the fly and can’t check. @XingerTang can you have a look too?
| pointEstimates[1, i] *= e | ||
| pointEstimates[2, i] *= 1 - e | ||
| pointEstimates[3, i] *= 1 - e | ||
| elif isXChrom: |
There was a problem hiding this comment.
I am struggling since I am doing this on the fly and can’t check. @XingerTang can you have a look too?
Also, we don’t need the isXChrom block for the second haplotype below?
|
|
||
|
|
||
| def writeGenoProbs(pedigree, genoProbFunc, outputFile): | ||
| def writeGenoProbs(pedigree, outputFile, isXChr=False): |
There was a problem hiding this comment.
Sometimes you use isXChr sometimes isXChrom, consolidate. I prefer isXChr (consistent with the command flag), but most importantly be consistent.
| required=False, | ||
| help="Output genotype probabilities (see format details in the docs).", | ||
| ) | ||
| core_parser.add_argument('-writekey', default="id", required=False, type=str, |
There was a problem hiding this comment.
I don’t think we support sequence data in AlphaImpute2?
There was a problem hiding this comment.
AlphaImpute2 indeed does not support sequence input. This option can apply to a genotype input file in AlphaPeel.
But this option is not working for AlphaImpute2, and it is not in the help message or the documentation.
There was a problem hiding this comment.
It is called out_id_order in AlphaPeel now.
| help="Output genotype probabilities (see format details in the docs).", | ||
| ) | ||
| core_parser.add_argument('-writekey', default="id", required=False, type=str, | ||
| help='Determines the order in which individuals are ordered in the output file based on their order in the corresponding input file. Animals not in the input file are placed at the end of the file and sorted in alphanumeric order. These animals can be surpressed with the "-onlykeyed" option. Options: id, pedigree, genotypes, sequence, segregation. Defualt: id.') |
There was a problem hiding this comment.
| help='Determines the order in which individuals are ordered in the output file based on their order in the corresponding input file. Animals not in the input file are placed at the end of the file and sorted in alphanumeric order. These animals can be surpressed with the "-onlykeyed" option. Options: id, pedigree, genotypes, sequence, segregation. Defualt: id.') | |
| help='Determines the order in which individuals are ordered in the output file based on their order in the corresponding input file. Individuals not in the input file are placed at the end of the file and sorted in alphanumeric order. These individuals can be suppressed with the "-onlykeyed" option. Options: id, pedigree, genotypes, sequence, segregation. Default: id.') |
There was a problem hiding this comment.
Port these edits to AlphaPeel/tinyhouse too?
There was a problem hiding this comment.
These changes are already placed in AlphaPeel, and tinyhouse does not have this docstring.
| core_parser.add_argument('-writekey', default="id", required=False, type=str, | ||
| help='Determines the order in which individuals are ordered in the output file based on their order in the corresponding input file. Animals not in the input file are placed at the end of the file and sorted in alphanumeric order. These animals can be surpressed with the "-onlykeyed" option. Options: id, pedigree, genotypes, sequence, segregation. Defualt: id.') | ||
| core_parser.add_argument('-onlykeyed', action='store_true', required=False, | ||
| help='Flag to surpress the animals who are not present in the file used with -writekey. Also surpresses "dummy" animals.') |
There was a problem hiding this comment.
| help='Flag to surpress the animals who are not present in the file used with -writekey. Also surpresses "dummy" animals.') | |
| help='Flag to suppress the individuals who are not present in the file used with -writekey. Also suppresses "dummy" individuals.') |
There was a problem hiding this comment.
Port these edits to AlphaPeel/tinyhouse too?
There was a problem hiding this comment.
Same here. AlphaPeel is updated, tinyhouse does not have this. This would not be printed as part of the help message of AlphaImpute2 as well.
gregorgorjanc
left a comment
There was a problem hiding this comment.
@AprilYUZhang great that you are pushing this functionality! I reviewed on the go without access to computer to look into the code, so I wasn’t sure in some places if the code is correct or not or it’s just my lack of deep understanding. @XingerTang your review would be also appreciated.
|
@AprilYUZhang I left some more comments at de2b6c8#r184944694 on your latest commit |
|
Github is behaving odd lately - you might need to toggle file by clicking the > symbol to hide and the unhide file contents to see the comments |
XingerTang
left a comment
There was a problem hiding this comment.
- Pre-commit not passed
- pytest failed on my end:
(phase) xtang3@S37-50MAQ05D AlphaImpute2 % pytest tests/functional_tests
======================================== test session starts =========================================
platform darwin -- Python 3.11.11, pytest-9.0.1, pluggy-1.6.0
benchmark: 5.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/xtang3/AlphaImpute2
configfile: pyproject.toml
plugins: benchmark-5.2.3
collected 2 items
tests/functional_tests/run_func_test.py F. [100%]
============================================== FAILURES ==============================================
_______________________________________________ test_1 _______________________________________________
def test_1():
"""basic functionality test with pedigree only mode"""
os.system(
"AlphaImpute2 -genotypes tests/functional_tests/test_1/genotypes.txt -pedigree tests/functional_tests/test_1/pedigree.txt -ped_only -phase_output -seg_output -out tests/functional_tests/outputs/test_1"
)
assert os.path.exists("tests/functional_tests/outputs/test_1.genotypes")
assert os.path.exists("tests/functional_tests/outputs/test_1.haplotypes")
assert os.path.exists("tests/functional_tests/outputs/test_1.segregation")
genotypes = read_file("tests/functional_tests/outputs/test_1.genotypes")
expected_genotypes = read_file("tests/functional_tests/test_1/true_genotypes.txt")
> assert genotypes == expected_genotypes
E AssertionError: assert [['1', 1.0, 2....0, 0.0, 0.0]] == [['1', 1.0, 2....0, 0.0, 0.0]]
E
E At index 2 diff: ['3', 1.0, 2.0, 0.0, 0.0, 1.0] != ['3', 0.0, 2.0, 0.0, 0.0, 1.0]
E Use -v to get more diff
tests/functional_tests/run_func_test.py:53: AssertionError
---------------------------------------- Captured stdout call ----------------------------------------
------------------------------------------
AlphaImpute2
------------------------------------------
Version: 0.0.3
Email: alphagenes.dev@gmail.com
Website: https://github.com/AlphaGenes
------------------------------------------
Reading in AlphaGenes format: tests/functional_tests/test_1/genotypes.txt
Read in: 0.01 seconds
Pedigree Imputation Only
------------------------------------------
Number of peeling cycles: 4
Final cutoff: 0.1
Imputation cycle 1
False 1
[[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]]
False -1
[[2.8239352e-01 3.0398920e-01 3.3998200e-01 3.9997000e-01 4.9994999e-01]
[2.1760647e-01 1.9601080e-01 1.6001801e-01 1.0003000e-01 4.9999999e-05]
[2.8239352e-01 3.0398920e-01 3.3998200e-01 3.9997000e-01 4.9994999e-01]
[2.1760647e-01 1.9601080e-01 1.6001801e-01 1.0003000e-01 4.9999999e-05]]
Peel down: 4.86 seconds
False -1
[[2.8239352e-01 3.0398920e-01 3.3998200e-01 3.9997000e-01 4.9994999e-01]
[2.1760647e-01 1.9601080e-01 1.6001801e-01 1.0003000e-01 4.9999999e-05]
[2.8239352e-01 3.0398920e-01 3.3998200e-01 3.9997000e-01 4.9994999e-01]
[2.1760647e-01 1.9601080e-01 1.6001801e-01 1.0003000e-01 4.9999999e-05]]
False 1
[[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]]
Peel up: 4.89 seconds
Imputation cycle 2
False 1
[[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]]
False -1
[[2.8239352e-01 3.0398920e-01 3.3998200e-01 3.9997000e-01 4.9994999e-01]
[2.1760647e-01 1.9601080e-01 1.6001801e-01 1.0003000e-01 4.9999999e-05]
[2.8239352e-01 3.0398920e-01 3.3998200e-01 3.9997000e-01 4.9994999e-01]
[2.1760647e-01 1.9601080e-01 1.6001801e-01 1.0003000e-01 4.9999999e-05]]
Peel down: 0.00 seconds
False -1
[[2.8239352e-01 3.0398920e-01 3.3998200e-01 3.9997000e-01 4.9994999e-01]
[2.1760647e-01 1.9601080e-01 1.6001801e-01 1.0003000e-01 4.9999999e-05]
[2.8239352e-01 3.0398920e-01 3.3998200e-01 3.9997000e-01 4.9994999e-01]
[2.1760647e-01 1.9601080e-01 1.6001801e-01 1.0003000e-01 4.9999999e-05]]
False 1
[[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]]
Peel up: 0.00 seconds
Imputation cycle 3
False 1
[[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]]
False -1
[[2.8239352e-01 3.0398920e-01 3.3998200e-01 3.9997000e-01 4.9994999e-01]
[2.1760647e-01 1.9601080e-01 1.6001801e-01 1.0003000e-01 4.9999999e-05]
[2.8239352e-01 3.0398920e-01 3.3998200e-01 3.9997000e-01 4.9994999e-01]
[2.1760647e-01 1.9601080e-01 1.6001801e-01 1.0003000e-01 4.9999999e-05]]
Peel down: 0.00 seconds
False -1
[[2.8239352e-01 3.0398920e-01 3.3998200e-01 3.9997000e-01 4.9994999e-01]
[2.1760647e-01 1.9601080e-01 1.6001801e-01 1.0003000e-01 4.9999999e-05]
[2.8239352e-01 3.0398920e-01 3.3998200e-01 3.9997000e-01 4.9994999e-01]
[2.1760647e-01 1.9601080e-01 1.6001801e-01 1.0003000e-01 4.9999999e-05]]
False 1
[[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]]
Peel up: 0.00 seconds
Imputation cycle 4
False 1
[[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]]
False -1
[[2.8239352e-01 3.0398920e-01 3.3998200e-01 3.9997000e-01 4.9994999e-01]
[2.1760647e-01 1.9601080e-01 1.6001801e-01 1.0003000e-01 4.9999999e-05]
[2.8239352e-01 3.0398920e-01 3.3998200e-01 3.9997000e-01 4.9994999e-01]
[2.1760647e-01 1.9601080e-01 1.6001801e-01 1.0003000e-01 4.9999999e-05]]
Peel down: 0.00 seconds
False -1
[[2.8239352e-01 3.0398920e-01 3.3998200e-01 3.9997000e-01 4.9994999e-01]
[2.1760647e-01 1.9601080e-01 1.6001801e-01 1.0003000e-01 4.9999999e-05]
[2.8239352e-01 3.0398920e-01 3.3998200e-01 3.9997000e-01 4.9994999e-01]
[2.1760647e-01 1.9601080e-01 1.6001801e-01 1.0003000e-01 4.9999999e-05]]
False 1
[[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]
[0.25 0.25 0.25 0.25 0.25]]
Peel up: 0.00 seconds
Core peeling cycles: 9.75 seconds
Heuristic Peeling: 11.28 seconds
Writing Out Results
------------------------------------------
Write out: 0.00 seconds
Full Program Run: 11.52 seconds
| required=False, | ||
| help="Output genotype probabilities (see format details in the docs).", | ||
| ) | ||
| core_parser.add_argument('-writekey', default="id", required=False, type=str, |
There was a problem hiding this comment.
AlphaImpute2 indeed does not support sequence input. This option can apply to a genotype input file in AlphaPeel.
But this option is not working for AlphaImpute2, and it is not in the help message or the documentation.
| required=False, | ||
| help="Output genotype probabilities (see format details in the docs).", | ||
| ) | ||
| core_parser.add_argument('-writekey', default="id", required=False, type=str, |
There was a problem hiding this comment.
It is called out_id_order in AlphaPeel now.
| help="Output genotype probabilities (see format details in the docs).", | ||
| ) | ||
| core_parser.add_argument('-writekey', default="id", required=False, type=str, | ||
| help='Determines the order in which individuals are ordered in the output file based on their order in the corresponding input file. Animals not in the input file are placed at the end of the file and sorted in alphanumeric order. These animals can be surpressed with the "-onlykeyed" option. Options: id, pedigree, genotypes, sequence, segregation. Defualt: id.') |
There was a problem hiding this comment.
These changes are already placed in AlphaPeel, and tinyhouse does not have this docstring.
| core_parser.add_argument('-writekey', default="id", required=False, type=str, | ||
| help='Determines the order in which individuals are ordered in the output file based on their order in the corresponding input file. Animals not in the input file are placed at the end of the file and sorted in alphanumeric order. These animals can be surpressed with the "-onlykeyed" option. Options: id, pedigree, genotypes, sequence, segregation. Defualt: id.') | ||
| core_parser.add_argument('-onlykeyed', action='store_true', required=False, | ||
| help='Flag to surpress the animals who are not present in the file used with -writekey. Also surpresses "dummy" animals.') |
There was a problem hiding this comment.
Same here. AlphaPeel is updated, tinyhouse does not have this. This would not be printed as part of the help message of AlphaImpute2 as well.
|
No tests are written for this code. Convert this PR to a draft. |
|
|
||
| ind.out_segregation = smoothedEstimates.copy() | ||
|
|
||
| print(ind.out_segregation) |
There was a problem hiding this comment.
Remember to remove this at the end.
|
|
||
| pointEstimates = np.full((4, nLoci), 1, dtype=np.float32) | ||
| fillPointEstimates(pointEstimates, ind, sire, dam) | ||
| print(isXChrom,ind.sex) |
There was a problem hiding this comment.
Remember to remove this at the end.
| geno_probs[9, 9, 1] = ( | ||
| np.array([0, 0.5, 0.5, 0], dtype=np.float32) * (1 - error) + error / 4 | ||
| ) | ||
| geno_probs[9, 9, 0] = ( |
There was a problem hiding this comment.
Sorry, I was intending to comment on the other review that related to the errors.
This change is probably not completed. We can review the logic afterwards.
|
@AprilYUZhang, please rebase the change in #68 for correct geno prob updates after cycle 2 |
|
Hi @gregorgorjanc, rough modification for X chr, now it works on the simple X chr sample. However, pytest still doesn't work. I will be back after @XingerTang updates the pytest. @XingerTang, please let me know once you finish all basic and general modifications. |
pytest doesn't work on my end, too. I have no idea why pytest isn't working, since I didn't make any modifications that might affect the function test. But the tinyhouse version is different. I used the latest main branch because it includes the updates about sex chromosomes. This is the commit that I stay:f6cd2bc45f3e81d5a2b34e093be0f14b20f4218d. I suggest prioritizing tinyhouse update on your end, and then we can figure out where the problems are together. Finally, merge your branch to this request. |
|
It must be due to code changes you have done for the autosomal case. I have spotted some odd cases and highlighted those via comments but could have missed some, I reckon. Happy bug hunting;) |
|
Just to add that you should attempt to fix this @AprilYUZhang if at all possible. @XingerTang input is of course welcome and valuable. |
| mat = mat_probs[i] * (1 - e) + e / 2 | ||
| if isXChrom and (ind.sex == 0): | ||
| for i in range(nLoci): | ||
| mat = mat_probs[i] * (1 - e) + e / 2 |
There was a problem hiding this comment.
Should the +e/2 be only e?
Rebased. |
I agree that the differences might come from @AprilYUZhang's autosomal-related change in #66 (comment) The function seems to use the known haplotypes to generate genotype probabilities and later to help construct the segregation tensor that indicates the inheritance. Introducing the genotype information here might be distracting for the segregation tensor. I don't have any planned changes in tinyhouse or here in AlphaImpute2 recently. |
Pytest is passing now. The bug was caused by the genoprob variable being shared between the X_male and autosome functions. I had updated it in X_male but forgot the autosome function, so I've now split them so they operate independently. I'm assuming this means the current version is stable now, right? I will check the comments you left periodically. These changes mainly ensure that the X chromosome works in AlphaImputation. |
|
The functional tests passed on my end, while some of the results for the accuracy tests for autosomal cases are changed:
You may want to know what led to these. Before your code change: After your code change: |
Issue #47