VT College of Science Magazine Annual 2014 | Page 19

“My collaborators are research medical doctors and genetic epidemiologists who investigate the genetics and the molecular basis of complex diseases – in particular cardiovascular disease, obesity, and type II diabetes – and I provide statistics and statistical genetics expertise for designing their studies and integratively analyzing the resulting data,” Hoeschele said. “We know, for instance, that obesity leads to type-II diabetes but we don’t understand the precise mechanism of how it happens, so they collect data that I help analyze and interpret.” When Hoeschele says “they collect data,” she is very casual with the word, as data collection has changed significantly from when she started analyzing pedigrees in humans and animals, trying to find genes that segregated in families. Today’s data come in much larger populations across the entire genome and include not just whether or not a disease is present, but also genome-wide gene expression and its potential epigenetic regulators. “We measure different regulatory mechanisms of gene expression on a genome-wide scale, such as micro-RNA expression and DNA methylation – a chemical modification of DNA that doesn’t change the sequence but can be inherited,” she said. The link between environment and genes can be seen in smokers, for example, who have vast changes in their DNA methylation patterns. “We can collect vast amounts of data but the difficulty is interpreting them and determining what it is they tell us” up with a lot of false positives with so much data being analyzed. At the same time, multiple high-dimensional datasets can provide information that classical (e.g., single response) data cannot, for example on how to fit a model that accounts for all major technical and biological sources of variation. Using all the available information in the data, maximizing the power of discovery, and controlling the rate of false positives is what modern statistics and statistical genetics is all about.” Hoeschele continues to make her mark in the quest she began as a teenager to learn more about the genetics of disease, knowing that one day the work she does in front of a computer will prove invaluable to breakthrough discoveries leading to novel therapies for human diseases. “We collect data on thousands of people, millions of genetic markers, tens of thousands of genes, and hundreds of thousands to millions of epigenetic markers – and we have to interpret this vast amount of data.” Today, Hoeschele figures she can do in a day what it would have taken years to do just two decades ago. “The computational analysis is the bottleneck in (human) genomics research today. The big issue is that the data can be generated fairly easily, but trying to make sense of so much information can make it hard to find out what is really going on. If you test so many things at the same time, you can have low power to actually find out