Biomusings: 2015

Two milestones happened this week in genomics: the 1000-genomes project wrapped up, and pacbio announced a new machine. The 1kg papers are setting the new minimum for high-impact “large” genomics studies: cohorts on the order of 1000 genomes (funded by a dedicated R01 or consortium). Any large scale switch from Illumina would have to be able to sequence this many samples.

Genomics studies using pacbio sequencing will likely target structural variation since Amplified and Cycled Sequencing (ACS*, Illumina for example) is just fine for SNVs. My current methods for structural detection need more than 10x coverage, though they were hastily written and I will try to make coverage requirements go down with better methods.

For the time being, for a 30x genome on sequel, many rough estimates are placing a pacbio based human genome between $10k-20k. Before you spend any time debating that, remember this is a moving target with many variables for actual cost at play. For example, our cost center charges around $800/SMRTCell, and this includes overhead of labor and cost amortization of the instrument. The lower instrument cost and fixed labor cost means a 3x increase in SMRTCell doesn't translate to a 3x increase in total run cost.

For disorders where there is missing heritability, we may see some <100 genome pilot studies launched - consider the Gilissen et al paper http://www.ncbi.nlm.nih.gov/pubmed/24896178 which demonstrated an increase in diagnostic yield moving from WES to WGS. Still, much of the variation detected in this paper is coding regions missed by exome captures but picked up in WGS. The 3 percent or so of genes missed by ACS reads isn't likely to explain the remainder of missing heritability, and so we'll be looking for new variation in noncoding sequences.

Large scale WES studies were pretty much guaranteed to have significant results -- either the causal mutations in Mendelian disorders or enough likely causal genes in simplex cohorts to improve our understanding of a disease. The noncoding space is a bit more unpredictable. There are plenty of examples of disrupted enhancers. The SHH mutation in polydactyly comes to mind, though this is a point mutation. If more examples of small SVs taking out functional regulatory elements are found, it can certainly help increase our understanding of regulatory architecture. Also, consider that the FTD-ALS locus was not detected until several years ago after a massive dedicated effort from multiple groups. The initial human SMS studies are finding lots of tandem repeat variation, perhaps there are other disorders lurking in our genomes?

The short answer is that an SMS based study of human disease is almost guaranteed. I apologize for the trite ending, but in the long run, the question will be what will human genetics mean for SEQUEL?

*This is my acronym for NGS/HTS/short-reads, and I'd love for it to catch on.

I visited Vietnam before starting my postdoc, and there was a popular meme "Same same, but different". The way people live their lives in this land that seems totally foreign is the same (breakfast, lunch, dinner, friends, family), but there are different details. A few bioinformatics trainees who were working out their careers have asked me about my experiences in industry and academia since I've done both (at PacBio and now as a postdoc with Evan Eichler), and it turns out this meme comes to mind again.

Historically there has been a pretty deep chasm between the two, and for trainees about to finish their Ph.D. making this decision may seem like choosing between the red and the blue pill. There are plenty of differences but perhaps more similarities now than there used to be.

Science and innovation. A common characterization of industrial work is that it often involves perfecting a product, rather than innovating it. If this is more appealing to you than the work you did for your thesis, there are ample opportunities to do so in industry and that is probably the right choice for you.

If you are pursuing 'pure' science, then academia is the right path, and you too do not need to continue reading. For the rest of us, much of bioinformatics research is applied science, and the boundaries between what you do in research and what you could do in industry are somewhat blurred. Case in point the two competing assemblers for PacBio data are HGAP/Falcon (industry) and MHAP (academia), but both have authors from each side on publications.

Businesses may gain greater acceptance in the scientific community if their research is accepted as equal by scientific peers than if their sole communiques are through marketing. In this way, it is possible to do innovative research in industry, but the main difference to get used to from your training is it must be ultimately directed at increasing revenue at the company, and you have less freedom in deciding the direction of your work.

The direct parallel in academia is that your research needs to feed into your next grant application. True, you should maintain a coherent research thread that supports likelihood of being able to carry out your aims in your grant application. Same same, but different. Doing new research in both academia and industry involves convincing some sort of board your work is worth funding, but academia is built upon new research, and it is a bit more difficult to take on new directions in industry.

Job stability. NIH funding has the stability of a roller coaster, but it is not safe to assume your position in industry is as solid as bedrock either. In both, you need to make sure your skills and knowledge are state of the art. In industry it seems there is a trade off between risk and innovation, where companies are more likely to cut experimental branches of their company that may have been more intellectually stimulating than their core counter parts. If you are adamant about not going into academia, the environment to join a startup is a good alternative right now. You will have to have laser focus on getting the startup running, but it is at least as challenging as your thesis work, and the lack of job stability from a startup is compensated by the number of opportunities to move to if the first you start or join does not work out.

Organization. Academic research by nature tends to be a bit more... disorganized. Maybe compare the cathedral and the bazaar analogy of open source development. If you are going to go into industry, be prepared to face the Gantt chart and the program manager. This is not for the scientific faint of heart. Looking at a monitor showing an organized timeline of conservatively chosen milestones that dictate the next 3-6 months of your work, some may see relief, and others a straight jacket.

Work-life balance. With some exaggeration, one can say the work-life balance in academia is perfectly balanced, as long as your work and life are the same. But in seriousness, it is much easier to put in a 9-5 day in industry than academia and I tend to see people who have a non-scientific passion more satisfied in industry. As with everything I've stated before, there are plenty of exceptions, but I often saw in the bay area a trade-off between work that was not very engaging, but allowed time for other activities, and all-consuming positions such as what you often see in faculty. Pavel Pevzner addressed this in a great commencement address at SFU in 2011: as long as you're following your passion, you don't notice you're working at all! It is not a completely dire situation, just the balance in this part is tipped a bit towards industry.

Biomusings

Thursday, October 1, 2015

What does SEQUEL mean for human genetics?

Wednesday, September 2, 2015

Same same, but different