课程大纲

Foundations of Safe and Fair AI

  • Key concepts: safety, bias, fairness, transparency
  • Types of bias: dataset, representation, algorithmic
  • Overview of regulatory frameworks (EU AI Act, GDPR, etc.)

Bias in Fine-Tuned Models

  • How fine-tuning can introduce or amplify bias
  • Case studies and real-world failures
  • Identifying bias in datasets and model predictions

Techniques for Bias Mitigation

  • Data-level strategies (rebalancing, augmentation)
  • In-training strategies (regularization, adversarial debiasing)
  • Post-processing strategies (output filtering, calibration)

Model Safety and Robustness

  • Detecting unsafe or harmful outputs
  • Adversarial input handling
  • Red teaming and stress testing fine-tuned models

Auditing and Monitoring AI Systems

  • Bias and fairness evaluation metrics (e.g., demographic parity)
  • Explainability tools and transparency frameworks
  • Ongoing monitoring and governance practices

Toolkits and Hands-On Practice

  • Using open-source libraries (e.g., Fairlearn, Transformers, CheckList)
  • Hands-on: Detecting and mitigating bias in a fine-tuned model
  • Generating safe outputs through prompt design and constraints

Enterprise Use Cases and Compliance Readiness

  • Best practices for integrating safety in LLM workflows
  • Documentation and model cards for compliance
  • Preparing for audits and external reviews

Summary and Next Steps

要求

  • 了解机器学习模型与训练流程
  • 具备微调与LLMs的实务经验
  • 熟悉Python与NLP概念

目标受众

  • AI合规团队
  • ML工程师
 14 小时

即将举行的公开课程

课程分类