goal: latest figures, tables for BCR-ABL paper submission
================================================================
---- Highlight long read lengths.
We are doing 3kb CCS for HIV projects. Cover the entire kinase domain?
================================================================
---- Plot all variant positions in time series, not just the variable ones.
Here are all variants:
Some plots are too busy.
================================================================
Clustering plots
Clustering plots for CSY time series:
2450177-0033.F1 21/3/05 2450177-0033.F2 28/3/06 2450177-0033.F3 22/1/08
CSY.21305 is simple (mostly f359c and wildtype), has large deletion in about 10% of reads
CSY.28036 is simple (mostly f359c and wildtype, t315i+f359c at 1%), has large deletions in about 10% of reads
CSY.22108 is complex with 8 compounds above 1% but no large deletions
JLR 12/11/07 has most reads (81%) containing a 183 base consistent
deletion (61 amino acids) starting about halfway into the amplicon.
Here are some 256-read multiple sequence alignments to show the
deletes. Note the aligner does not model large delete and tends to
fill in the delete with flanking sequence.
MSA CSY.21305
MSA CSY.28036
MSA CSY.22108
MSA JLR.121107
Note that large deletions are _not_ a PacBio error mode, so most
likely are not sequencing artifacts. I don't know whether PCR could
cause large consistent deletions.
================================================================
---- Cross-over PCR in compound mutations
Consider double compounds that might be caused by PCR cross-over.
First consider single cross over events with rates 13%-40%. Multiple
cross overs probably happen also.
I will consider breaking crosses: (A+B)X(None) = (A),(B) and building
crosses: (A)X(B) = (A+B)
Find a minor that is a compound of more abundant. Do the probablities
discount it away?
Here are all the discounted two component crosses (some are not found in both repeats):
discountBuild 2450177-0029.F2 AHP 6/3/07 q252h.cac,f317l.ttg 0.015522 q252h.cac 0.038806 f317l.ttg 0.52597 cross 0.760480050793
discountBreak 2450177-0044.F5 BHK 24/5/05 t315i.att,l387f.ttc 0.038298 none 0.278723 l387f.ttc 0.010638 cross 0.996577512811
discountBuild 2450177-0035.F7 BRM 21/12/05 m244v.gtg,d276g.ggc 0.013185 m244v.gtg 0.531947 d276g.ggc 0.024848 cross 0.997517059671
discountBuild 2450177-0026.F4 CSC 26/4/05 t315i.att,h396r.cgt 0.032007 t315i.att 0.070069 h396r.cgt 0.472318 cross 0.967129328463
discountBuild 2450177-0027.F4 CSC 26/4/05 t315i.att,h396r.cgt 0.028529 t315i.att 0.064565 h396r.cgt 0.459459 cross 0.961706675511
discountBuild 2450177-0047.F3 DMJ 12/7/06 f317i.atc,w476c.tgt 0.019909 f317i.atc 0.403982 w476c.tgt 0.055164 cross 0.893370652934
discountBuild 2450177-0046.F3 DMJ 12/7/06 f317i.atc,w476c.tgt 0.019769 f317i.atc 0.392391 w476c.tgt 0.054457 cross 0.925149569413
discountBreak 2450177-0036.F4 DWB 21/9/05 t315i.att,m351t.acg 0.236657 t315i.att 0.047331 none 0.246727 cross 0.810605688839
discountBuild 2450177-0052.F0 EAD 5/1/06 g250e.gag,e255k.aag 0.013311 g250e.gag 0.059727 e255k.aag 0.385324 cross 0.578380872571
discountBuild 2450177-0053.F0 EAD 5/1/06 g250e.gag,e255k.aag 0.017427 g250e.gag 0.059497 e255k.aag 0.368597 cross 0.794649779158
discountBuild 2450177-0021.F4 KM 5/7/05 g250e.gag,e255k.aag 0.011838 g250e.gag 0.057321 e255k.aag 0.507165 cross 0.407207063287
discountBuild 2450177-0020.F4 KM 5/7/05 g250e.gag,e255k.aag 0.014414 g250e.gag 0.064702 e255k.aag 0.529148 cross 0.421007326292
discountBreak 2450177-0034.F2 LYS 27/5/05 t240a.gcg,y253f.ttc 0.181511 none 0.409582 y253f.ttc 0.046559 cross 0.626267438624
discountBreak 2450177-0035.F2 LYS 27/5/05 t240a.gcg,y253f.ttc 0.169377 none 0.415031 y253f.ttc 0.055524 cross 0.789852091323
discountBuild 2450177-0030.F3 MDL 02/12/05 l248v.gtg,v299l.ttg 0.012014 l248v.gtg 0.073145 v299l.ttg 0.192226 cross 0.854458264048
discountBuild 2450177-0031.F3 MDL 02/12/05 l248v.gtg,v299l.ttg 0.011028 l248v.gtg 0.076485 v299l.ttg 0.176805 cross 0.815503715054
discountBuild 2450177-0034.F1 NEF 25/9/06 v299l.ctg,f317l.ctc 0.01169 v299l.ctg 0.388818 f317l.ctc 0.03202 cross 0.938959416227
For example in the first line, in one of the AHP 6/3/07 runs: you can
explain q252h+f317l compound at 1.5% by taking q252h at 3.9% along
with f317l at 52.6% and crossing them at a rate of 76%. (1.5% =
3.9%*52.6%*76%).
I can consider more than two components with more logic, and multiple
crosses.
================================================================