Readings and Topics

This page collects some potential topics and readings for the seminar.

Introduction (Week 1)

Introduction to Large Language Models (from Stanford course)

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. Attention Is All You Need. NeurIPS 2017.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ACL 2019.

(optional) Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever. Language Models are Unsupervised Multitask Learners. OpenAI, 2019.

These two blog posts by Jay Alammar are helpful for understanding attention and Transformers:

Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, Iason Gabriel. Ethical and social risks of harm from Language Models DeepMind, 2021.

Simon Willison. Catching up on the weird world of LLMs, August 2023.

Viewpoint Essays

Douglas Hofstadter. Gödel, Escher, Bach, and AI. The Atlantic. 8 July 2023.

Marc Andreessen. AI Will Save the World. 11 July 2023

Paul Kingsnorth. Rage Against the Machine. 12 July 2023

Broad Overviews

Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, Robert McHardy. Challenges and Applications of Large Language Models.

Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). On the Opportunities and Risks of Foundation Models Report Page

Katherine Lee, A. Feder Cooper, James Grimmelmann, and Daphne Ippolito. AI and Law: The Next Generation. July 2023.

Inyoung Cheong, Aylin Caliskan, Tadayoshi Kohno. Is the U.S. Legal System Ready for AI’s Challenges to Human Values?. 30 August 2023.

Sarah Silverman, Christopher Golden, and Richard Kadrey vs. OpenAI. Legal Complaint against ChatGPT, file 7 July 2023.

Nikhil Vyas, Sham Kakade, Boaz Barak. On Provable Copyright Protection for Generative Models.

Federal Register. Artificial Intelligence and Copyright. Request for Comment (Published 30 August 2023, Comments due by 15 November 2023)

Governance and Regulation

Michael Veale, Kira Matus, Robert Gorwa. AI and Global Governance: Modalities, Rationales, Tensions. Annual Review of Law and Social Science, 2023.

Amplification Techniques

Subhabrata Mukherjee, Arindam Mitra, Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi, Ahmed Awada. Orca: Progressive Learning from Complex Explanation Traces of GPT-4.

Programming with LLMs

Performance of LLMs

Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman. Studying Large Language Model Generalization with Influence Functions.


Ernest Davis, Scott Aaronson. Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems.


Andrew Slavin Ross, Michael C. Hughes, Finale Doshi-Velez. Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations. IJCAI 2017.

Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, Igor Mordatch. Improving Factuality and Reasoning in Language Models through Multiagent Debate

Abuses of LLMs

Nicholas Carlini. A LLM Assisted Exploitation of AI-Guardian.

Andy Zou, Zifan Wang, J. Zico Kolter, Matt Fredrikson. Universal and Transferable Adversarial Attacks on Aligned Language Models. Project Website:

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz. Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection.

Fairness and Bias

Shangbin Feng, Chan Young Park, Yuhan Liu, Yulia Tsvetkov. From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models. ACL 2023.

Myra Cheng, Esin Durmus, Dan Jurafsky. Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. ACL 2023.


Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, Hang Li. Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models’ Alignment.


Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang. Sparks of Artificial General Intelligence: Early experiments with GPT-4. Microsoft, March 2023.

Yejin Choi. The Curious Case of Commonsense Intelligence. Daedalus, Spring 2022.

Konstantine Arkoudas. GPT-4 Can’t Reason.

Natalie Shapira, Mosh Levy, Seyed Hossein Alavi, Xuhui Zhou, Yejin Choi, Yoav Goldberg, Maarten Sap, Vered Shwartz. Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models.

Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Yejin Choi, Yulia Tsvetkov. Minding Language Models’ (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker. ACL 2023

Boaz Barak. The shape of AGI: Cartoons and back of envelope. July 2023.



Human Dignity and Job Loss




Prompt Engineering and “Jailbreaking”

Alexander Wei, Nika Haghtalab, Jacob Steinhardt. Jailbroken: How Does LLM Safety Training Fail?. July 2023.



Patrick Butlin, Robert Long, Eric Elmoznino, Yoshua Bengio, Jonathan Birch, Axel Constant, George Deane, Stephen M. Fleming, Chris Frith, Xu Ji, Ryota Kanai, Colin Klein, Grace Lindsay, Matthias Michel, Liad Mudrik, Megan A. K. Peters, Eric Schwitzgebel, Jonathan Simon, Rufin VanRullen. Consciousness in Artificial Intelligence: Insights from the Science of Consciousness. August 2023.

Memorization and Inference Privacy

R. Thomas McCoy, Paul Smolensky, Tal Linzen, Jianfeng Gao, Asli Celikyilmaz. How Much Do Language Models Copy From Their Training Data? Evaluating Linguistic Novelty in Text Generation Using RAVEN. TACL 2023.

Preventing Learning

Shawn Shan, Jenna Cryan, Emily Wenger, Haitao Zheng, Rana Hanocka, Ben Y. Zhao. Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models. USENIX Security 2023. Glaze Project Website

Pedro Sandoval-Segura, Vasu Singla, Jonas Geiping, Micah Goldblum, Tom Goldstein. What Can We Learn from Unlearnable Datasets?.

Giannis Daras, Kulin Shah, Yuval Dagan, Aravind Gollakota, Alexandros G. Dimakis, Adam Klivans. Ambient Diffusion: Learning Clean Distributions from Corrupted Data.

Training on Generated Data

Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, Ross Anderson. The Curse of Recursion: Training on Generated Data Makes Models Forget.

Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi, Ahmed Imtiaz Humayun, Hossein Babaei, Daniel LeJeune, Ali Siahkoohi, Richard G. Baraniuk. Self-Consuming Generative Models Go MAD.

Environmental Harms

Evaluating LLMs

Percy Liang, et al. Holistic Evaluation of Language Models. HELM Project.

Useful Guides

How to Use AI to Do Stuff: An Opinionated Guide (Ethan Mollick)

OpenAI Cookbook

More Sources

Awesome-LLM: a curated list of Large Language Model Papers and Links

LLM Security — collection of papers on LLM security

COS 597G (Fall 2022): Understanding Large Language Models (Princeton Course taught by Danqi Chen

CS324 - Large Language Models (Winter 2022) (Stanford Course taught by Percy Liang, Tatsunori Hashimoto, and Christopher Ré)