Supplementary materials for paper: Computational challenges to reproducibility, robustness, extensibility and reuse in multi-omics: a meta-workflow-based case study (Under review)


This page provides pointers to the materials (datasets, software, workflows, provenance traces and container image) used in a paper currently under review (more details about the publication will be announced at a later stage)

Summary: The primary goal of our study is to demonstrate an example computational paradigm that uses meta-workflows (i.e., workflows that assess the results of two or more workflows) alongside abstract components (i.e, components that can be implemented with multiple tools) to overcome barriers facing multi-omic discovery. We characterize the reproducibility of multi-omics discoveries by attempting a figure-for-figure reproduction of the seminal paper by Zhang et al [1]. We then assess the robustness of its findings by re-running all the experiments with newly acquired data, discussing the differences of the obtained results.

This page and the materials described on it (excluding external references) is available in Zenodo under DOI 10.5281/zenodo.4633542.

Datasets.


Input Datasets

Note that availability on some of the following datasets may be subject to the the platforms where they are stored:

Software and Workflows.


The pointers for using the main software and workflows can be found below:

Docker image.


Provenance traces.


Below you can find provenance traces for the experiments run in the paper, following the OPMW-PROV model (and extending the W3C PROV standard). Note that the pointers to code and data are relative URLs, which won't resolve (they indicate the names of the files that have been used, but the paths are local to the Docker image). The code components and data referred to can be found in the dataset and code sections.

Bibliography.


[1] Zhang, B. et al. Proteogenomic characterization of human colon and rectal cancer. Nature 513, 382-387, doi:10.1038/nature13438 (2014)

About the authors.


Ravali Adusumilli

Ravali Adusumilli

Researcher

Bioinformaticist at the Mallik Lab of Stanford Univesity. She is interested in developing tools and pipelines for multi-omic analysis.

Daniel Garijo

Daniel Garijo

Researcher

Researcher at the Polytechnic University of Madrid and past researcher at the Information Sciences Institute of the University of Southern California. Daniel's research activities focus on e-Science and the Semantic web, specifically on how to increase the understandability of software and scientific workflows using provenance, metadata, intermediate results and Linked Data.

Varun Ratnakar

Varun Ratnakar

Research Programmer

Research programmer at the Information Sciences Institute of the University of Southern California. Ratnakar is the main developer of the WINGS workflow system.

Arunima Srivastava

Arunima Srivastava

Data Scientist

Research collaborator at the Mallick lab, Stanford. Arunima currently works as a Data Scientist at a major pharmaceutical company.

Matt Chambers

Matt Chambers

Researcher

Research collaborator at the Vanderbilt University, now working as a Bioinformatics and proteome informatics consultant.

Jing Wang

Jing Wang

Research Assistant Professor

Research Assistant Professor of Biomedical Informatics in the School of Medicine at Vanderbilt University. His research interests focus on integrating multi-dimensional omics data to understand cancer mechanism. His current work focuses on assessing biological relevance of mRNA and protein profiling data and developing computational tools to discover and interpret novel associations among cancer omics data.

Xiaojing Wang

Xiaojing Wang

Research Instructor

Research instructor of biomedical informatics in the School of Medicine at Vanderbilt University. Dr. Wang has extensive experience with biological data, including sequencing data analysis and gene expression. Her research focuses on using such resources in parallel with shotgun proteomics data, an emerging field termed proteogenomics, to understand cancer biology. Her current projects include developing bioinformatics methods for proteogenomics studies and its application to human cancer studies.

David L. Tabb

David L. Tabb

Professor

Professor at the South African Tuberculosis Bioinformatics Initiative. Dr Tabb and his team have created a complete software pipeline for proteome informatics, contributing to the field-standard ProteoWizard library for mass spectrometry informatics, creating three distinct peptide identification algorithms, and developing the highly scalable IDPicker protein assembly environment. Dr Tabb moved to Stellenbosch University in 2015 to speed the development of proteomics in South Africa and to add support to tuberculosis researchers throughout Cape Town.

Bing Zhang

Bing Zhang

Professor

Professor at the Lester and Sue Smith Breast Center of the Baylor College of Medicine (Houston, TX); and professor at the Department of Molecular and Human Genetics of the Baylor College of Medicine. Professor Zhang's research interests focus on developing computational and statistical approaches that help translating multidimensional omics data into biological and clinical insights.

Raghu Machiraju

Raghu Machiraju

Professor

TDAI Chief Data Scientist; Professor of Computer Science and Engineering at the College of Engineering and Biomedical Informatics, College of Medicine. Dr. Machiraju’s interests include visual analytics, modeling, and machine learning, especially as they apply to topics in biology, medicine, and engineering. Over the years he has been working increasingly on problems of computational biology and bioinformatics including the discovery of biomarkers and disease subtypes and the modeling and reconstruction of signal transduction networks.

Yolanda Gil

Yolanda Gil

Research Professor

Director of Knowledge Technologies and at the Information Sciences Institute of the University of Southern California, and Research Professor in the Computer Science Department. Her research interests include intelligent user interfaces, social knowledge collection, provenance and assessment of trust, and knowledge management in science. Dr Gil's most recent work focuses on intelligent workflow systems to support collaborative data analytics at scale.

Parag Mallick

Parag Mallick

Associate Professor in Radiology

Associate Professor in Radiology. Dr. Mallick is also member of the Stanford Cancer Institute and a faculty fellow of Stanford ChEM-H. After completing his PhD, he trained with Ruedi Aebersold in clinical proteomics and systems biology at the Institute for Systems Biology.

Designed deived from w3.css