ChatGPT, large language model, fundamental science, scientific research integrity, scientific research ecology
The generative large model GPT represented by ChatGPT is developing rapidly, which has aroused extensive discussion in academic circle and the industry and has an incalculable impact on foundational scientific research development. The study first sorts out the development of the GPT technological revolution, and discusses the new changes brought about by this technology in scientific research. Then, based on the three aspects of application status, core principles and innovation subjects, the impact of the GPT technological revolution on basic scientific research and its development suggestions for China are discussed. The study believes that GPT technology can certainly play a positive role in knowledge production, improve scientific research efficiency, and even promote scientific research paradigm changes, but it may also cause scientific research misconduct, weaken research credibility, and amplify the inherent bias of the Internet and other issues. Therefore, the study discusses how to develop Chinese foundational scientific research based on GPT technology. On the one hand, investing in the research and development of data and computing platforms that are independently controllable by the country and protected by intellectual property rights. On the other hand, emphasizing humancomputer collaboration and scientific research integrity supervision to create an open and transparent environment for Al development. In short, this study aims to provide policymakers and front-line researchers with an understanding perspective on the impact of GPT technology on fundamental science, promote the rational use of GPT technology, and provide a reference for the healthy development of the future academic ecology.
Bulletin of Chinese Academy of Sciences
1 赵朝阳, 朱贵波, 王金桥. ChatGPT给语言大模型带来的启示和多模态大模型新的发展思路. 数据分析与知识发现, 2023, 7(3): 26-35. Zhao C Y, Zhu G B, Wang J Q. The inspiration brought by ChatGPT to LLM and the new development ideas of multi-modal large model. Data Analysis and Knowledge Discovery, 2023, 7(3): 26-35. (in Chinese)
2 Kojima T, Gu S X S, Reid M, et al. Large language models are zero-shot reasoners.(2023-01-29)[2023-07-28]. https://doi.org/10.48550/arXiv.2205.11916.
3 叶鹰, 朱秀珠, 魏雪迎, 等. 从ChatGPT爆发到GPT技术革命的启示. 情报理论与实践, 2023, 46(6): 33-37. Ye Y, Zhu X Z, Wei X Y, et al. Enlightenment from the explosion of ChatGPT to the technological revolution of GPT. Information Studies: Theory & Application, 2023, 46(6): 33-37. (in Chinese)
4 陆伟, 刘家伟, 马永强, 等. ChatGPT为代表的大模型对信息资源管理的影响. 图书情报知识, 2023, 40(2): 6-9. Lu W, Liu J W, Ma Y Q, et al. The influence of large language models represented by ChatGPT on information resources management. Document, Information & Knowledge, 2023, 40(2): 6-9. (in Chinese)
5 卢经纬, 郭超, 戴星原, 等. 问答ChatGPT之后： 超大预训练模型的机遇和挑战. 自动化学报, 2023, 49(4): 705-717. Lu J W, Guo C, Dai X Y, et al. The ChatGPT after: Opportunities and challenges of very large scale pre-trained models. Acta Automatica Sinica, 2023, 49(4): 705-717. (in Chinese)
6 OpenAI. GPT-4 technical Report. (2023-03-27)[2023-07-28]. https://doi.org/10.48550/arXiv.2303.08774.
7 张晓林. 从猿到人： 探索知识服务的凤凰涅槃之路. 数据分析与知识发现, 2023, 7(3): 1-4. Zhang X L. From ape to man: Exploring the nirvana road of knowledge service. Data Analysis and Knowledge Discovery, 2023, 7(3): 1-4. (in Chinese)
8 Touvron H, Lavril T, Izacard G, et al. LLaMA: Open and efficient foundation language models. (2023-02-27)[2023-07-28]. https://doi.org/10.48550/arXiv.2302.13971.
9 王飞跃, 缪青海. 人工智能驱动的科学研究新范式： 从AI4S到智能科学. 中国科学院院刊, 2023, 38(4): 536-540. Wang F Y, Liao Q H. Novel paradigm for AI-driven scientific research: From AI4S to intelligent science. Bulletin of Chinese Academy of Sciences, 2023, 38(4): 536-540. (in Chinese)
10 Lin Z M, Akin H, Rao R S, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 2023, 379(6637): 1123-1130.
11 Lin Z M, Akin H, Rao R S, et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. (2022-07-20)[2023-07-28]. https://doi.org/10.1101/2022.07.20.500902.
12 Ferruz N, Schmidt S, Höcker B. ProtGPT2 is a deep unsupervised language model for protein design. Nature Communications, 2022, 13(1): 4348.
13 Madani A, Krause B, Greene E R, et al. Large language models generate functional protein sequences across diverse families. Nature Biotechnology, 2023, doi: 10.1038/s41587-022-01618-2.
14 Jin Q, Yang Y F, Chen Q Y, et al. GeneGPT: Teaching large language models to use NCBI web APIs. (2023-05-19)[2023-07-28]. http://export.arxiv.org/abs/2304.09667v1.
15 Hong Z J. ChatGPT for computational materials science: A perspective. Energy Material Advances, 2023, 4: 0026.
16 Wang S, Zhao Z H, Ouyang X, et al. ChatCAD: Interactive computer-aided diagnosis on medical image using large language models. (2023-02-14)[2023-07-31]. https://doi.org/10.48550/arXiv.2302.07257.
17 Imani S, Du L, Shrivastava H. MathPrompter: Mathematical reasoning using large language models. (2023-03-04)[2023-07-31]. https://doi.org/10.48550/arXiv.2303.05398.
18 Inagaki T, Kato A, Takahashi K, et al. LLMs can generate robotic scripts from goal-oriented instructions in biological laboratory automation. (2023-04-18)[2023-07-28]. https://doi.org/10.48550/arXiv.2304.10267.
19 赵志枭, 王东波. 数字化时代下ChatGPT的开端、发展和影响. 科技情报研究, 2023, 5(2): 37-47. Zhao Z X, Wang D B. The beginning, development and impact of ChatGPT in the digital age. Scientific Information Research, 2023, 5(2): 37-47. (in Chinese)
20 Spirling A. Why open-source generative AI models are an ethical way forward for science. Nature, 2023, 616(7957): 413.
21 骆飞, 马雨璇. 人工智能生成内容（AIGC）对学术生态的影响与应对——基于ChatGPT的讨论与分析. 现代教育技术, 2023, 33(6): 15-25. Luo F, Ma Y X. The impact of artificial intelligence generated content on academic ecology and countermeaures—Discussion and analysis based on ChatGPT. Modern Educational Technology, 2023, 33(6): 15-25. (in Chinese)
22 Hendrycks D, Mazeika M. X-risk analysis for AI research. (2022-09-20)[2023-07-28]. https://doi.org/10.48550/arXiv.2206.05862.
23 LeeEE J S. Evaluating generative patent language models. World Patent Information, 2023, 72: 102173.
24 Wei J, Tay Y, Bommasani R, et al. Emergent abilities of large language models. (2022-10-26)[2023-07-28]. https://doi.org/10.48550/arXiv.2206.07682.
25 侯丹阳, 庞亮, 丁汉星,等. 语言模型攻击性的自动评价方法. 中文信息学报, 2022, 36(1): 12-20. Hou D Y, Pang L, Ding H X, et al. Automatic evaluation method for agresivenes of language model. Journal of Chinese Information Processing, 2022, 36(1): 12-20.
26 Chung H W, Hou L, Longpre S, et al. Scaling instruction-finetuned language models. (2022-12-06)[2023-07-28]. https://doi.org/10.48550/arXiv.2210.11416.
27 Schaeffer R, Miranda B, Koyejo S. Are emergent abilities of large language models a mirage?. (2023-05-22)[2023-07-28]. https://doi.org/10.48550/arXiv.2304.15004.
28 Cornelio C, Dash S, Austel V, et al. Combining data and theory for derivable scientific discovery with AI-Descartes. Nature Communications, 2023, 14(1): 1777.
29 Wu T L, Tegmark M. Toward an AI physicist for unsupervised learning. (2019-09-02)[2023-07-28]. https://doi.org/10.48550/arXiv.1810.10525.
30 夏琪, 程妙婷, 薛翔钟, 等. 从国际视野透视如何将ChatGPT有效纳入教育——基于对72篇文献的系统综述. 现代教育技术, 2023, 33(6): 26-33. Xia Q, Cheng M T, Xue X Z, et al. How to effectively integrate ChatGPT into education from an international perspective—Based on a systematic review on 72 literature. Modern Educational Technology, 2023, 33(6): 26-33. (in Chinese)
31 West C G. Advances in apparent conceptual physics reasoning in ChatGPT-4. (2023-04-06)[2023-07-28]. https://doi.org/10.48550/arXiv.2303.17012.
32 Ahmed N, Wahed M, Thompson N C. The growing influence of industry in AI research. Science, 2023, 379(6635): 884-886.
33 Stokes D E. Pasteur’s Quadrant: Basic Science and Technological Innovation . Washington, DC: Brookings Institution Press, 1997.
34 Bubeck S, Chandrasekaran V, Eldan R, et al. Sparks of artificial general intelligence: Early experiments with GPT-4. (2023-04-13)[2023-07-28]. https://doi.org/10.48550/arXiv.2303.12712.33.
35 Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold. Nature, 2021, 596(7873): 583-589.
36 Baek M, DiMaio F, Anishchenko I, et al. Accurate prediction of protein structures and interactions using a three-track network. Science, 2021, 373(6557): 871-876.
37 Kitano H. Nobel Turing Challenge: Creating the engine for scientific discovery. AI Magazine Artificial Intelligence, 2016. NPJ Systems Biology and Applications, 2021, 7: 29.
38 Rahwan I, Cebrian M, Obradovich N, et al. Machine behaviour. Nature, 2019, 568(7753): 477-486.
39 Thorp H H. ChatGPT is fun, but not an author. Science, 2023, 379(6630): 313.
40 Editorials. Tools such as ChatGPT threaten transparent science; here are our ground rules for their use. Nature, 2023, 613(7945): 612.
SUN, Mengge; HAN, Tao; WANG, Yanpeng; HUANG, Yuxin; and LIU, Xiwen
"Impact analysis of GPT technology revolution on fundamental scientific research,"
Bulletin of Chinese Academy of Sciences (Chinese Version): Vol. 38
, Article 11.
Available at: https://bulletinofcas.researchcommons.org/journal/vol38/iss8/11