Artificial Intelligence 17 min read

Data Compliance Risks and Mitigation Measures Across the Generative AI Model Lifecycle

The article examines data compliance challenges and legal risks during the training, application, and optimization stages of generative AI models, and offers concrete mitigation strategies such as respecting robots.txt, obtaining user consent, handling cross‑border data, and implementing robust security and governance measures.

DataFunSummit

Oct 23, 2024

Data Compliance Risks and Mitigation Measures Across the Generative AI Model Lifecycle

Model Training Phase

Risks are mainly concentrated on data collection. According to the "Interim Measures for the Administration of Generative AI Services," providers must use data with legitimate sources. Three typical acquisition methods are web crawling, third‑party data, and domain‑specific datasets, each carrying distinct compliance risks.

Web Crawling: Automated collection from public sites (e.g., OpenAI’s GPTBot) can lead to unfair competition, copyright infringement, privacy violations, and even criminal liability. Recommended compliance actions include respecting the Robots protocol, reviewing site terms, avoiding technical circumvention, assessing competitive impact, limiting collection of copyrighted or personal data, and handling complaints promptly.

Third‑Party Data: Using datasets such as Databricks Dolly 15k, OASST1, or RedPajama reduces risk but still requires verification of the data source’s authority, ownership, and any embedded personal information.

Domain‑Specific Datasets: Enterprises that have accumulated proprietary data must obtain explicit user authorization and follow the principle of minimal necessity.

Model Application Phase

This stage involves user interaction and therefore the collection of user input data. Compliance requirements include providing clear notice and obtaining consent as mandated by the Personal Information Protection Law, especially when handling children’s data. Privacy policies should detail the purpose, method, and type of data collected.

Additional considerations:

Children’s Personal Information: If the product targets minors, explicit parental consent and age verification mechanisms (e.g., COPPA‑style agreements) are required.

Cross‑Border Data Transfer: Exporting Chinese user data abroad triggers compliance under the Data Security Law and Personal Information Protection Law, requiring security assessments, certification, or standard contract filings.

Foreign Service Restrictions: Providers must respect trade control lists and avoid serving prohibited jurisdictions.

Model Optimization Phase

Collected user data may be repurposed as training data to improve the model. This raises further compliance concerns:

Purpose Limitation: Personal information must be processed only for clearly defined, lawful purposes and with the least impact on user rights.

Re‑Consent: If data is used beyond the original scope, a new consent must be obtained.

De‑Identification: To prevent inadvertent disclosure, personal data should be anonymized before being added to training corpora.

Security Measures: Organizations should establish security governance structures, adopt encryption, conduct regular security testing, obtain relevant certifications (e.g., ISO/IEC 27001), and define incident‑response procedures.

Operational Controls: Limit employee access, enforce strict usage policies, and evaluate AI products for data‑safety capabilities before deployment.

Summary

Data processing occurs throughout the training, application, and optimization phases of generative AI, subjecting developers and operators to the Cybersecurity Law, Data Security Law, and Personal Information Protection Law. The recently issued Interim Measures for Generative AI Services provide additional guidance. This article outlines the specific compliance risks at each stage and offers targeted recommendations for lawful and secure AI development.

For a deeper dive into AI safety, regulation, and compliance, see the book Large Model Security, Regulation, and Compliance .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI compliance Privacy Law

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.