•  
  •  
 

Bulletin of Chinese Academy of Sciences (Chinese Version)

Keywords

biological and medical; big data; data integration; interaction; data mining

Document Type

Article

Abstract

The bio-medical data has entered a new era from exabyte-scale of genomic data to petabyte-scale of multi-dimensional big data, transforming the biological and medical research into a "data-intensive science" that is also referred as the fourth paradigm of discovery. Such transformation presented a set of new challenges:we have to efficiently gather and share high-dimensional and multi-level clinical and research data, further facilitate the comprehensive utilization of various omics data, clinical data, and phenome data of large population, eventually convert big data to new knowledge. Such challenges have to be faced by employing a new series of paradigm shifting ideas. In particular, new frameworks should be developed to improve the current submission-based data storage system to an integration-oriented system; to improve the subjective-based data sharing system to an interactive-oriented system; to integrate the cutting edge information technologies into the current data mining system. At the same time, large efforts have to be invested in developing data standardization guidelines and quality control technologies. These ideas will be critical in order to establish next generation of bio-medical big data centers and will be a new trend of future research.

First page

853

Last Page

860

Language

Chinese

Publisher

Bulletin of Chinese Academy of Sciences

References

Bourne P E, Lorsch J R, Green E D. Perspective:Sustaining the big-data ecosystem. Nature, 2015, 527(7576):S16-17.

Perez-Riverol Y, Alpi E, Wang R et al. Making proteomics data accessible and reusable:current state of proteomics databases and repositories. Proteomics, 2015, 15(5-6):930-949.

Argyropulo-Palmer M, Jenkins A, Theti D S, et al. Sunitinib in Metastatic Renal Cell Carcinoma:A Systematic Review of UK Real World Data. Front Oncol, 2015, 5:195.

Berger ML, Lipset C, Gutteridge A, et al. Optimizing the leveraging of real-world data to improve the development and use of medicines. Value Health, 2015, 18(1):127-130.

Benson D A, Cavanaugh M, Clark K, et al. GenBank. Nucleic Acids Res, 2018, 46(D1):D41-D47.

Cook C E, Bergman M T, Cochrane G, et al. The European Bioinformatics Institute in 2017:data coordination and integration. Nucleic Acids Res, 2018, 46(D1):D21-D29.

Coordinators N R. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res, 2018, 46(D1):D8-D13.

Karsch-Mizrachi I, Takagi T, Cochrane G, et al. The international nucleotide sequence database collaboration. Nucleic Acids Res, 2018, 46(D1):D48-D51.

Kodama Y, Mashima J, Kosuge T et al. DNA Data Bank of Japan:30th anniversary. Nucleic Acids Res, 2018, 46(D1):D30-D35.

Silvester N, Alako B, Amid C, et al. The European Nucleotide Archive in 2017. Nucleic Acids Res, 2018, 46(D1):D36-D40.

Vizcaino J A, Csordas A, Del-Toro N, et al. 2016 update of the PRIDE database and its related tools. Nucleic Acids Res, 2016, 44(22):11033.

Wang Y, Song F, Zhu J, et al. GSA:Genome Sequence Archive. Genomics Proteomics Bioinformatics, 2017, 15(1):14-18.

Wu L, Sun Q, Desmeth P, et al. World data centre for microorganisms:an information infrastructure to explore and utilize preserved microbial strains worldwide. Nucleic Acids Res, 2017, 45(D1):D611-D618.

Cancer Genome Atlas Research N, Weinstein J N, Collisson E A, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat Genet, 2013, 45(10):1113-1120.

Rigden D J, Fernandez X M. The 2018 Nucleic Acids Research database issue and the online molecular biology database collection. Nucleic Acids Res, 2018, 46(D1):D1-D7.

Wilkinson M D, Dumontier M, Aalbersberg I J, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data, 2016, 3:160018.

Bourne P E, Bonazzi V, Dunn M, et al. The NIH Big Data to Knowledge (BD2K) initiative. J Am Med Inform Assoc, 2015, 22(6):1114.

Perez-Riverol Y, Bai M, da Veiga Leprevost F, et al. Discovering and linking public omics data sets using the Omics Discovery Index. Nat Biotechnol, 2017, 35(5):406-409.

Alic A S, Blanquer I. MuffinInfo:HTML5-Based Statistics Extractor from Next-Generation Sequencing Data. J Comput Biol, 2016, 23(9):750-755.

Burger M C. ChemDoodle Web Components:HTML5 toolkit for chemical graphics, interfaces, and informatics. J Cheminform, 2015, 7:35.

Yuan S, Chan H C S, Hu Z. Implementing WebGL and HTML5 in Macromolecular Visualization and Modern Computer-Aided Drug Design. Trends Biotechnol, 2017, 35(6):559-571.

Sardaraz M, Tahir M, Ikram A A. Advances in high throughput DNA sequence data compression. J Bioinform Comput Biol, 2016, 14(3):1630002.

Zhu Z, Zhang Y, Ji Z, et al. High-throughput DNA sequence data compression. Brief Bioinform, 2015, 16(1):1-15.

Laurens V D M, Hinton G E. Visualizing Data using t-SNE. Journal of Machine Learning Research, 2008, 9(2605):2579-2605.

Maffucci I, Hu X, Fumagalli V et al. An Efficient Implementation of the Nwat-MMGBSA Method to Rescore Docking Results in Medium-Throughput Virtual Screenings. Front Chem, 2018, 6:43.

Warris S, Timal N R N, Kempenaar M, et al. pyPaSWAS:Pythonbased multi-core CPU and GPU sequence alignment. PLoS One, 2018, 13(1):e0190279.

Amoroso N, Diacono D, Fanizzi A, et al. Deep learning reveals Alzheimer's disease onset in MCI subjects:Results from an international challenge. J Neurosci Methods, 2018, 302:3-9.

Esteva A, Kuprel B, Novoa R A, et al. Corrigendum:Dermatologist-level classification of skin cancer with deep neural networks. Nature, 2017, 546(7660):686.

Kermany D S, Goldbaum M, Cai W, et al. Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning. Cell, 2018, 172(5):1122-1131.

Menze B H, Jakab A, Bauer S, et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans Med Imaging, 2015, 34(10):1993-2024.

Li H, Zhu L, Shen M, et al. Blockchain-Based Data Preservation System for Medical Data. J Med Syst, 2018, 42(8):141.

Zhang A, Lin X. Towards Secure and Privacy-Preserving Data Sharing in e-Health Systems via Consortium Blockchain. J Med Syst, 2018, 42(8):140.

The Gene Ontology Consortium. Expansion of the Gene Ontology knowledgebase and resources. Nucleic Acids Res, 2017, 45(D1):D331-D338.

Kohler S, Vasilevsky N A, Engelstad M, et al. The Human Phenotype Ontology in 2017. Nucleic Acids Res, 2017, 45(D1):D865-D876.

Field D, Garrity G, Gray T, et al. The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol, 2008, 26(5):541-547.

Kottmann R, Gray T, Murphy S, et al. A standard MIGS/MIMS compliant XML Schema:toward the development of the Genomic Contextual Data Markup Language (GCDML). OMICS, 2008, 12(2):115-121.

Yilmaz P, Kottmann R, Field D, et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nature Biotechnology, 2011, 29:415.

Brazma A, Hingamp P, Quackenbush J, et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet, 2001, 29(4):365-371.

Edgar R, Barrett T. NCBI GEO standards and services for microarray data. Nat Biotechnol, 2006, 24(12):1471-1472.

MAQC Consortium, Shi L, Reid L H, et al. The MicroArray Quality Control (MAQC) project shows inter-and intraplatform reproducibility of gene expression measurements. Nat Biotechnol, 2006, 24(9):1151-1161.

SEQC/MAQC-Ⅲ Consortium. A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. Nat Biotechnol, 2014, 32(9):903-914.

Shi L, Campbell G, Jones W D, et al. The MicroArray Quality Control (MAQC)-Ⅱ study of common practices for the development and validation of microarray-based predictive models. Nat Biotechnol, 2010, 28(8):827-838.

Shi L, Kusko R, Wolfinger R D, et al. The international MAQC Society launches to enhance reproducibility of high-throughput technologies. Nat Biotechnol, 2017, 35(12):1127-1128.

Tong W, Lucas A B, Shippy R, et al. Evaluation of external RNA controls for the assessment of microarray performance. Nat Biotechnol, 2006, 24(9):1132-1139.

Csordas A, Ovelleiro D, Wang R, et al. PRIDE:quality control in a proteomics data repository. Database (Oxford), 2012, 2012:bas004.

Li N, Wu S, Zhang C, et al. PepDistiller:A quality control tool to improve the sensitivity and accuracy of peptide identifications in shotgun proteomics. Proteomics, 2012, 12(11):1720-1725.

Share

COinS