Bulletin of Chinese Academy of Sciences (Chinese Version)


scientific big data; integrate query; pipeline; data sharing; elastic expansion

Document Type



As modern scientific discoveries heavily depend on the big data management, it is an urgent task to research how to manage scientific big data efficiently. In this paper, we first introduce the application scenes and requirement of scientific big data. Then we summarize four challenges in the management of scientific big data (SPUS):Scale dynamic, Pipeline management, Unified access, and Sharing management. After that, we present the proposed scientific big data management system which consists of four components:computing & storage management, data processing management, data fusion management, and data sharing management. Moreover, we specify the key techniques in the proposed system. At last, we introduce the ongoing Big Scientific Data Management System (BigSDMS) program, which is a national key research and development program.

First page


Last Page





Bulletin of Chinese Academy of Sciences


Hey T, Tansley S, Tolle K M. The Fourth Paradigm: Data-intensive Scientific Discovery. Redmond, WA: Microsoft Research, 2009.

黎建辉, 沈志宏, 孟小峰.科学大数据管理:概念, 技术与系统.计算机研究与发展, 2017, 54(2):235-247.

郭华东.大数据大科学大发现——大数据与科学发现国际研讨会综述.中国科学院院刊, 2014, 29(4):500-506.

杨晨, 翁祖建, 孟小峰, 等.天文大数据挑战与实时处理技术.计算机研究与发展, 2017, 54(2):248-257.

程耀东, 张潇, 王培建, 等.高能物理大数据挑战与海量事例特征索引技术研究.计算机研究与发展, 2017, 54(2):258-266.

Ward J S, Barker A. Undefined By Data: A Survey of Big Data Definitions. arXiv, 2013: 1309. 5821v1.

李红臣, 史美林.工作流模型及其形式化描述.计算机学报, 2003, 26(11):1456-1463.

Wang H, Li J, Shen Z, et al. Approximations and Bounds for (n, k) Fork-Join Queues: A Linear Transformation Approach. 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing(CCGRID). arXiv, 2017: 1707. 08860v7.

Lu J, Holubova I. Multi-model Data Management:What's New and What's Next? Extending Database Technology, 2017:602-605.

Sheth A P, Larson J A. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys (CSUR), 1990, 22(3):183-236.

Lenzerini M. Data integration:a theoretical perspective. Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. New York:ACM, 2002:233-246.

Davoudian A, Chen L, Liu M. A Survey on NoSQL Stores. ACM Computing Surveys (CSUR), 2018, 51(2):40.

Gadepally V, Chen P, Duggan J, et al. The BigDAWG polystore system and architecture. IEEE High Performance Extreme Computing Computing Conference, 2016. arXiv: 1609. 07548.

王晴.论科学数据开放共享的运行模式、保障机制及优化策略.国家图书馆学刊, 2014, 23(1):3-9.

李成赞, 张丽丽, 侯艳飞, 等.科学大数据开放共享:模式与机制.情报理论与实践, 2017, 40(11):45-51.

丁伟, 王国成, 许爱东, 等.能源区块链的关键技术及信息安全问题研究.中国电机工程学报, 2018, 38(4):1026-1034.

翁健, 李明, 张悦, 等. 一种基于区块链与代理重加密技术的可信基因检测及数据共享方法. 广东: CN108063752A, 2018-05-22.