Here’s a summary of the topics for the semester:
- Attention, Transformers, and BERT
- Training LLMs, Risks and Rewards
- Introduction to AI Alignment and Failure Cases
- Redteaming
- Jail-breaking LLMs
- Prompt Engineering
- Marked Personas
- LLM Capabilities
- Medical Applications of LLMs
- Hallucination Risks
- Potential Solutions
Week 6: Visit from Anton Korinek
Week 7: Generative Adversarial Networks and DeepFakes
- GANs and DeepFakes
- Creation and Detection of DeepFake Videos
- History of Machine Translation
- Neural Machine Translation
- Introduction to Interpretability
- Mechanistic Interpretability
- Data Selection for Fine-tuning LLMs
- Detecting Pretraining Data from Large Language Models
- Impact of Data on Large Language Models
- The Curse of Recursion: Training on Generated Data Makes Models Forget
- Watermarking LLM Outputs
- Watermarking Diffusion Models
- LLM Agents
- Tools and Planning
Week 13: Regulating Dangerous Technologies
- Analogies from other technologies for regulating AI