Summary

This course will prepare students for real-world research in Natural Language Processing, considering ethical requirements from society as well as research communities. The course will explore how biases in data and models can be both identified (using statistical analysis, adversarial attacks, and model explanations) and mitigated (with modified training, fine-tuning, and dataset adaptation). Finally, this course will align students with best practices for research (including reproducible code and reporting model cards and data statements) as well as the challenges faced by research communities in languages other than English. By the end of the course, students will be able to think critically about ethical decisions in research and apply or extend common techniques to new datasets and models.

Assessment and Requirements

Required:

Must have taken AI605 (Deep Learning for NLP) or a similar course

Assessment:

  • Attendance and Participation (10%)
  • Assignments (10%) - 2 small lab / written assignments summarizing key parts of the course
  • Presentation (30%) - students will each deliver one or two paper presentations during the course (2-4 presentations per lecture)
  • Capstone (50%) - students will write an 8 page paper extending one of the papers in the course's reading list

Deadlines:

  • Assignments: week 8, week 16
  • Presentation: any week (randomized order)
  • Capstone: proposal week 4, intermediate report week 12, final report week 15

Usage of late days incurs 10% penalty per day. No late day can be made for presentation, but swaps may be allowed with a valid reason (e.g. medical emergency)

Syllabus and Reading

Textbook: https://fairmlbook.org/pdf/fairmlbook.pdf

- Week 1 : Introduction, and Overview

- Week 2 : Societal Bias in Word Embeddings


- Week 3 : Mitigating Biases in NLI


- Week 4 : Dataset Construction


- Week 5-6 : Model Explanations


- Week 7 : Causal Reasoning

  • TBD


- Week 8 : No Lectures

- Week 9 : Privacy and Bias in Large Language Models


- Week 10 : Adversarial Attacks


- Week 11 : Adversarial Training


- Week 12 : Low-Resource and Multilingual NLP


- Week 13 : Ethics advisory Panels

  • TBD


- Week 14 : Data Statements and Model Cards


- Week 15 : Reproducible NLP


- Week 16 : No Lectures