•  
  •  
 

Bulletin of Chinese Academy of Sciences (Chinese Version)

Keywords

artificial intelligence; training data; data element circulation; intellectual property; personal information protection; public data

Document Type

Artificial Intelligence and Future Society

Abstract

The quantity and quality of training data are critical to the performance of artificial intelligence (AI) models. However, in China, the production of training data is hindered by issues such as insufficient quantity, low quality, and fragmented distribution, compounded by limitations stemming from commercial ecosystems, regulatory frameworks, and restricted development and utilization of public data. To address these challenges, this study proposes several policy recommendations, including incentivizing research institutions to generate open-source datasets, fostering AI application scenarios, adopting a "loose-in, focus-out" regulatory approach, introducing intellectual property exemption provisions, refining personal information protection guidelines, and expediting the establishment of a unified national public data platform.

First page

672

Last Page

680

Language

Chinese

Publisher

Bulletin of Chinese Academy of Sciences

References

1 Shumailov I, Shumaylov Z, Zhao Y, et al. AI models collapse when trained on recursively generated data. Nature, 2024, 631: 755-759.

2 Dohmatob E, Feng Y, Subramonian A, et al. Strong model collapse. arXiv, 2024, doi: 10.48550/arXiv.2410.04840.

3 Dong F, Li J. Asset bubbles in the data economy. Rochester N Y, 2024, doi: 10.2139/ssrn.4790765.

4 DeepSeek-AI, Guo D, Yang D, et al. DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv, 2025, doi: 10.48550/ arXiv.2501.12948.

5 齐英程. 我国个人信息匿名化规则的检视与替代选择. 环 球法律评论, 2021, (3): 52-66. Qi Y C. A review of China's personal information anonymization rules and alternative options. Global Law Review, 2021, (3): 52-66. (in Chinese)

6 赵精武. 个人信息匿名化的理论基础与制度建构. 中外法 学, 2024, 36(2): 326-345. Zhao J W. Theoretical foundation and institutional framework of personal information anonymization. Peking University Law Journal, 2024, 36(2): 326-345. (in Chinese)

7 许可. 复活僵尸法条:个人信息匿名化制度的再造. 财经法 学, 2024, (4): 160-177. Xu K. Reviving dormant legislation: Reconstructing the personal information anonymization system. Financial Legal Studies, 2024, (4): 160-177. (in Chinese)

8 Lee J H, You S J. Balancing privacy and accuracy: Exploring the impact of data anonymization on deep learning models in computer vision. IEEE Access, 2024, 12: 8346-8358.

9 Johnson G. Economic research on privacy regulation: Lessons from the GDPR and beyond. National Bureau of Economic Research, 2022, doi: 10.3386/w30705.

10 Demirer M, Jiménez Hernández D J, Li D, et al. Data, privacy laws and firm production: Evidence from the GDPR. National Bureau of Economic Research, 2024, doi: 10.3386/ w32146.

11 马亮. 政商关系对数字政府的影响机制与理论进路. 党政 研究, 2022, (3): 107-117. Ma L. The influential mechanism and theoretical evolution of government-business relations on the construction of digital government. Studies on Party and Government, 2022, (3): 107-117. (in Chinese)

12 胡业飞. 责任配置、风险共担与激励相容:中国地方公共数 据授权运营的治理机制问题研究. 电子政务, 2024, 262 (10): 22-31. Hu Y F. Responsibility allocation, risk sharing, and incentive compatibility: A study on the governance mechanism of local public data authorization operations in China. EGovernment, 2024, 262(10): 22-31. (in Chinese)

13 郭华东, 陈和生, 闫冬梅, 等. 加强开放数据基础设施建设, 推 动 开 放 科 学 发 展. 中 国 科 学 院 院 刊, 2023, 38(6): 806-817. Guo H D, Chen H S, Yan D M, et al. Strengthening open data infrastructure and promoting open science. Bulletin of Chinese Academy of Sciences, 2023, 38(6): 806-817. (in Chinese)

14 Talesh S A. Data breach, privacy, and cyber insurance:How insurance companies act as "compliance managers" for businesses. Law & Social Inquiry, 2018, 43(2): 417-440.

Share

COinS