ӳ��ý

The ENCODE Uniform Analysis Pipelines.

bioRxiv : the preprint server for biology

Authors	Benjamin Hitz Lee Jin-Wook Otto Jolanki Meenakshi Kagda Keenan Graham Paul Sud Idan Gabdank Seth Strattan Cricket Sloan Timothy Dreszer Laurence Rowe Nikhil Podduturi Venkat Malladi Esther Chan Jean Davidson Marcus Ho Stuart Miyasato Matt Simison Forrest Tanaka Yunhai Luo Ian Whaling Eurie Hong Brian Lee Richard Sandstrom Eric Rynes Jemma Nelson Andrew Nishida Alyssa Ingersoll Michael Buckley Mark Frerker Daniel Kim Nathan Boley Diane Trout Alex Dobin Sorena Rahmanian Dana Wyman Gabriela Balderrama-Gutierrez Fairlie Reese Neva Durand Olga Dudchenko David Weisz Suhas Rao Alyssa Blackburn Dimos Gkountaroulis Mahdi Sadr Moshe Olshansky Yossi Eliaz Dat Nguyen Ivan Bochkov Muhammad Shamim Ragini Mahajan Erez Aiden Tom Gingeras Simon Heath Martin Hirst James Kent Anshul Kundaje Ali Mortazavi Barbara Wold Michael Cherry
Keywords	NGS analysis analysis pipelines Software
Abstract	The Encyclopedia of DNA elements (ENCODE) project is a collaborative effort to create a comprehensive catalog of functional elements in the human genome. The current database comprises more than 19000 functional genomics experiments across more than 1000 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the and genomes. All experimental data, metadata, and associated computational analyses created by the ENCODE consortium are submitted to the Data Coordination Center (DCC) for validation, tracking, storage, and distribution to community resources and the scientific community. The ENCODE project has engineered and distributed uniform processing pipelines in order to promote data provenance and reproducibility as well as allow interoperability between genomic resources and other consortia. All data files, reference genome versions, software versions, and parameters used by the pipelines are captured and available the ENCODE Portal. The pipeline code, developed using Docker and Workflow Description Language (WDL; ) is publicly available in GitHub, with images available on Dockerhub (), enabling access to a diverse range of biomedical researchers. ENCODE pipelines maintained and used by the DCC can be installed to run on personal computers, local HPC clusters, or in cloud computing environments Cromwell. Access to the pipelines and data the cloud allows small labs the ability to use the data or software without access to institutional compute clusters. Standardization of the computational methodologies for analysis and quality control leads to comparable results from different ENCODE collections - a prerequisite for successful integrative analyses.
Year of Publication	2023
Journal	bioRxiv : the preprint server for biology
Date Published	04/2023
DOI	10.1101/2023.04.04.535623
PubMed ID	37066421
Links

Recent ӳ��ý Publications

Waist Circumference, a Body Shape Index, and Molecular Subtypes of Colorectal Cancer: A Pooled Analysis of Four Cohort Studies.

Type 1 Diabetes Polygenic Scores Improve Diagnostic Accuracy in Pediatric Diabetes Care.

Two distinct durable human class-switched memory B cell populations are induced by vaccination and infection.

New Threshold for Defining Mild Aortic Stenosis Derived From Velocity-Encoded MRI in 60,000 Individuals.

Covalent adduct Grob fragmentation underlies LSD1 demethylase-specific inhibitor mechanism of action and resistance.