Data Sciences

ӳ��ý researchers generate on the order of 20 terabytes (roughly equivalent to more than 6.6 billion tweets or 3,300 high definition feature-length movies) of sequence data every day. This vast trove of information holds knowledge that could fundamentally transform our understanding of human biology, health, and disease — especially when combined with other sources of data, such as phenotypes, patient medical records, and even information from personal fitness devices.

Generating insights that will lead to breakthroughs requires that those data and the tools we build to study them be stored, curated, analyzed, updated, and shared rapidly, efficiently, openly, accurately, and broadly — all with privacy, security, and informed patient consent remaining top-of-mind.

The computer scientists, software engineers, informaticians, mathematicians, and others who make up the ӳ��ý’s data science community share three core principles that form a foundation for addressing the growing computational needs of large-scale genomic and biomedical research. We believe in:

The value of vast and diverse types of data. Biomedical research today requires platforms that allow secure but easy storage, access, analysis, and processing of sequence data, medical records, and other complementary forms of information at very large scale, while protecting patient privacy and ensuring security.
Development of open source tools and resources. The ӳ��ý’s Data Sciences Platform has committed to making all of the software products it develops open source. (Learn more in our blog post, “Open source: Foundation for the future”, and our explainer, "Creating tools to generate data insights.")
Widespread sharing of ideas and data within the scientific and computational community. Since before the launch of the Human Genome Project, the ӳ��ý’s research community has been committed to making data and tools available to researchers worldwide.

Members of the data sciences community are woven tightly into the fabric of the ӳ��ý. They play prominent roles in the Institute’s programs, platforms, and initiatives. A few examples:

Cancer Program: The ӳ��ý Cancer Program’s many data scientists form the backbone of several large teams, including the , Cancer Genome Computational Analysis, and groups. These teams develop, build, and maintain a variety of tools and resources for analyzing a wide variety of high-throughput screening results and cancer genome data, such as the portal and . Many of these tools are available on the ӳ��ý's Data, Software and Tools page.
Data Sciences Platform (DSP): The DSP is a team of software engineers, computational biologists, and other technical contributors who are developing open-source software products for the analysis of genomic and clinical data at large scale, including , , , , , and numerous direct-to-patient portals.
Epigenomics Program: The ӳ��ý Epigenomics Program includes robust computational and software engineering efforts responsible for developing tools and generating data for understanding how the genome is regulated.
Imaging Platform: The ӳ��ý Imaging Platform develops open-source software tools such as and for analyzing and mining image-based data, and helps biologists to apply them to important questions in biomedicine.
LIMS and Analytics: The LIMS and Analytics group develops and maintains information and reporting systems that support the ӳ��ý Genomics Platform’s daily activities.
Program in Medical and Population Genetics (MPG): Members of MPG have played key roles in developing a range of portals and computational tools, including the variant browser and the variant analysis and exploration framework.

In addition, ӳ��ý data scientists have created two unique activities that support collaboration and provide opportunities for ongoing professional development:

Models, Inference, and Algorithms (MIA): MIA is an initiative that supports learning and collaboration at the interface of biology with mathematics / statistics / machine learning / computer science.
Software Engineering (SoftEng) Affinity Group: This internal group supports software engineers at ӳ��ý and their professional growth with an ongoing speaker series, career development opportunities, and occasions for community building.