This page collects some potential topics and readings for the seminar.
Introduction (Week 1)
Introduction to Large Language Models (from Stanford course)
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ACL 2019.
(optional) Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever. Language Models are Unsupervised Multitask Learners. OpenAI, 2019.
These two blog posts by Jay Alammar are helpful for understanding attention and Transformers:
- Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)
- The Illustrated Transformer
Laura Weidinger, John Mellor, Maribeth Rauh, Conor Griffin, Jonathan Uesato, Po-Sen Huang, Myra Cheng, Mia Glaese, Borja Balle, Atoosa Kasirzadeh, Zac Kenton, Sasha Brown, Will Hawkins, Tom Stepleton, Courtney Biles, Abeba Birhane, Julia Haas, Laura Rimell, Lisa Anne Hendricks, William Isaac, Sean Legassick, Geoffrey Irving, Iason Gabriel. Ethical and social risks of harm from Language Models DeepMind, 2021. https://arxiv.org/abs/2112.04359
Simon Willison. Catching up on the weird world of LLMs, August 2023.
Douglas Hofstadter. Gödel, Escher, Bach, and AI. The Atlantic. 8 July 2023.
Marc Andreessen. AI Will Save the World. 11 July 2023
Paul Kingsnorth. Rage Against the Machine. 12 July 2023
Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, Robert McHardy. Challenges and Applications of Large Language Models. https://arxiv.org/abs/2307.10169
Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). On the Opportunities and Risks of Foundation Models https://arxiv.org/abs/2108.07258. Report Page
Copyright and Law
Inyoung Cheong, Aylin Caliskan, Tadayoshi Kohno. Is the U.S. Legal System Ready for AI’s Challenges to Human Values?. 30 August 2023.
Sarah Silverman, Christopher Golden, and Richard Kadrey vs. OpenAI. Legal Complaint against ChatGPT, file 7 July 2023.
Nikhil Vyas, Sham Kakade, Boaz Barak. On Provable Copyright Protection for Generative Models. https://arxiv.org/abs/2302.10870.
Federal Register. Artificial Intelligence and Copyright. Request for Comment (Published 30 August 2023, Comments due by 15 November 2023)
Governance and Regulation
Michael Veale, Kira Matus, Robert Gorwa. AI and Global Governance: Modalities, Rationales, Tensions. Annual Review of Law and Social Science, 2023.
Subhabrata Mukherjee, Arindam Mitra, Ganesh Jawahar, Sahaj Agarwal, Hamid Palangi, Ahmed Awada. Orca: Progressive Learning from Complex Explanation Traces of GPT-4. https://arxiv.org/abs/2306.02707
Programming with LLMs
Performance of LLMs
Roger Grosse, Juhan Bae, Cem Anil, Nelson Elhage, Alex Tamkin, Amirhossein Tajdini, Benoit Steiner, Dustin Li, Esin Durmus, Ethan Perez, Evan Hubinger, Kamilė Lukošiūtė, Karina Nguyen, Nicholas Joseph, Sam McCandlish, Jared Kaplan, Samuel R. Bowman. Studying Large Language Model Generalization with Influence Functions. https://arxiv.org/abs/2308.03296
Ernest Davis, Scott Aaronson. Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems. https://arxiv.org/abs/2308.05713
Andrew Slavin Ross, Michael C. Hughes, Finale Doshi-Velez. Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations. IJCAI 2017.
Yilun Du, Shuang Li, Antonio Torralba, Joshua B. Tenenbaum, Igor Mordatch. Improving Factuality and Reasoning in Language Models through Multiagent Debate https://arxiv.org/abs/2305.14325.
Abuses of LLMs
Nicholas Carlini. A LLM Assisted Exploitation of AI-Guardian.
Andy Zou, Zifan Wang, J. Zico Kolter, Matt Fredrikson. Universal and Transferable Adversarial Attacks on Aligned Language Models. https://arxiv.org/abs/2307.15043. Project Website: https://llm-attacks.org/.
Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, Mario Fritz. Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. https://arxiv.org/abs/2302.12173.
Fairness and Bias
Shangbin Feng, Chan Young Park, Yuhan Liu, Yulia Tsvetkov. From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models. ACL 2023.
Myra Cheng, Esin Durmus, Dan Jurafsky. Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. ACL 2023.
Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, Hang Li. Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models’ Alignment. https://arxiv.org/abs/2308.05374.
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, Harsha Nori, Hamid Palangi, Marco Tulio Ribeiro, Yi Zhang. Sparks of Artificial General Intelligence: Early experiments with GPT-4. Microsoft, March 2023. https://arxiv.org/abs/2303.12712
Yejin Choi. The Curious Case of Commonsense Intelligence. Daedalus, Spring 2022.
Natalie Shapira, Mosh Levy, Seyed Hossein Alavi, Xuhui Zhou, Yejin Choi, Yoav Goldberg, Maarten Sap, Vered Shwartz. Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models.
Melanie Sclar, Sachin Kumar, Peter West, Alane Suhr, Yejin Choi, Yulia Tsvetkov. Minding Language Models’ (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker. ACL 2023
Boaz Barak. The shape of AGI: Cartoons and back of envelope. July 2023.
Human Dignity and Job Loss
Prompt Engineering and “Jailbreaking”
Alexander Wei, Nika Haghtalab, Jacob Steinhardt. Jailbroken: How Does LLM Safety Training Fail?. July 2023.
Patrick Butlin, Robert Long, Eric Elmoznino, Yoshua Bengio, Jonathan Birch, Axel Constant, George Deane, Stephen M. Fleming, Chris Frith, Xu Ji, Ryota Kanai, Colin Klein, Grace Lindsay, Matthias Michel, Liad Mudrik, Megan A. K. Peters, Eric Schwitzgebel, Jonathan Simon, Rufin VanRullen. Consciousness in Artificial Intelligence: Insights from the Science of Consciousness. August 2023.
Memorization and Inference Privacy
R. Thomas McCoy, Paul Smolensky, Tal Linzen, Jianfeng Gao, Asli Celikyilmaz. How Much Do Language Models Copy From Their Training Data? Evaluating Linguistic Novelty in Text Generation Using RAVEN. TACL 2023.
Shawn Shan, Jenna Cryan, Emily Wenger, Haitao Zheng, Rana Hanocka, Ben Y. Zhao. Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models. USENIX Security 2023. Glaze Project Website https://arxiv.org/abs/2302.04222
Giannis Daras, Kulin Shah, Yuval Dagan, Aravind Gollakota, Alexandros G. Dimakis, Adam Klivans. Ambient Diffusion: Learning Clean Distributions from Corrupted Data. https://arxiv.org/abs/2305.19256
Training on Generated Data
Ilia Shumailov, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, Ross Anderson. The Curse of Recursion: Training on Generated Data Makes Models Forget. https://arxiv.org/abs/2305.17493
Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi, Ahmed Imtiaz Humayun, Hossein Babaei, Daniel LeJeune, Ali Siahkoohi, Richard G. Baraniuk. Self-Consuming Generative Models Go MAD. https://arxiv.org/abs/2307.01850.
How to Use AI to Do Stuff: An Opinionated Guide (Ethan Mollick)
LLM Security — collection of papers on LLM security
COS 597G (Fall 2022): Understanding Large Language Models (Princeton Course taught by Danqi Chen