Course Syllabus
Natural Language Processing - Section 2
Fall 2025
Class Time: Friday 1:10-3:40pm
Location: TBA
Instructor: Daniel Bauer <bauer@cs.columbia.edu>, office hours: Monday 1:15-3:00pm, 704 CEPSR and on Zoom.
Please see the Modules section for week-to-week materials.
Teaching Assistants/Course Assistants:
Office Hours will take place in the CS TA room, 122 Mudd, located on the 1st floor (street level) unless specified otherwise.
Course Description
This course provides an introduction to the field of Natural Language Processing (NLP). We will discuss properties of human language at different levels of representation (morphology, syntax , semantics, pragmatics), and will learn how to create systems that can analyze, understand, and generate natural language. We will introduce core NLP techniques for language modeling, tagging, parsing, and word-sense disambiguation. We will also discuss applications such as text classification, machine translation, summarization, question answering, dialog systems, and image caption generation. We will study machine learning methods used in NLP, such as various forms of Neural Networks and large language models. We will discuss ethical aspects of NLP research and applications. Homework assignments will consist of programming projects in Python and using pyTorch.
Prerequisites
Data Structures (COMS 3134 or COMS 3137), and Discrete Math (COMS 3203). Some experience programming in Python, and background in probability/statistics, multivariable calculus, and linear algebra is helpful. Some previous or concurrent exposure to AI and machine learning is beneficial, but not required.
Supplemental Textbooks
You do not have to buy any textbooks for this class. The following books are available online. From time to time I will post additional research papers and reading material on Courseworks.
Daniel Jurafsky, James H. Martin, Speech and language processing, 2nd edition. Prentice Hall. 2009.
A draft of the third edition is available here: https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf
Yoav Goldberg, Neural Network Methods for Natural Language Processing. Morgan & Claypool (Synthesis Lectures on Human Language Technologies). 2017
Available as an e-book through the Columbia library (https://clio.columbia.edu/catalog/13676351)
Requirements and Grading
Your course grade will be based on 4 programming assignments, lowest score dropped (13.33% each = 40% total), a midterm exam (25%), a final exam (25%), and participation (10%, online participation on Ed counts -- CVN students are only expected to participate on Ed)
Exams
There will be two paper base exams: Midterm and final (75 min each). The exams will be closed notes / closed book. For the second exam only, one double-sided letter-sized "cheat sheet" will be allowed. Calculators are recommended.
In the event that an exam needs to be missed for health or personal reasons (documentation and approval required), we will make a reasonable effort to reschedule the exam during the next week. For the first exam, if no such time can be found, the exam may be dropped from grading and the second exam score will be adjusted accordingly. For the second exam, if no make-up is possible prior to the end of the semester we will work with your school to assign an Incomplete grade and the exam needs to be completed in the Spring semester.
CVN students will be able to take exams on Zoom (synchronously), or asynchronously using Proctorio.
Homework Policy
You will have approximately 2 weeks to complete each programming assignment. Homework assignments may be submitted up to 4 days late for a 20 point penalty (this is a one-time penalty of 20 points, not a fractional deduction per day). Other extensions will be granted for documented emergencies only. Regrade requests will only be accepted up to 72 hours after homework scores are released on Courseworks by emailing the TA who graded your assignment.
Tentative schedule of topics:
| Date | Topics | Homework schedule (subject to change) |
| 1/23 | Introduction and course overview. Levels of linguistic representation. Ambiguity. History of NLP techniques. |
|
| 1/30 |
Probability review and machine learning basics. Naive Bayes for text classification. n-gram language models and smoothing. |
|
| 2/6 |
Sequence labeling. Hidden Markov models (HMMs). Part-of-speech tagging. Named-entity recognition. |
|
| 2/13 |
Context-free grammars (CFGs). Parsing with CFG and PCFGs (probabilistic CFGs): CKY. Dependency structures. Transition-based Dependency Parsing. |
HW 1 due |
| 2/20 |
Linear models (perceptron). Feature functions. Logistic regression and log-linear models. Introduction to Neural Networks. |
|
| 2/27 |
Backpropagation. pyTorch basics. Neural language models. Word embeddings (word2vec). Pre-training. |
|
| 3/6 |
Exam 1 |
HW 2 due |
| 3/13 |
Recurrent Neural Networks and LSTMs. Contextualized embeddings. ELMo. Attention. |
|
| 3/20 |
Spring break |
|
| 3/27 |
Transformer architecture. BERT. GPT. Pre-training/ Fine-tuning. |
|
| 4/3 |
GPT and BERT applications. Other transformer architectures. Chain-of-thought prompting. |
HW 3 due |
| 4/10 |
Instruction tuning. Retrieval Augmented Generation. RLHF. Conversational AI. |
|
| 4/17 |
Semantic Role Labelling. Abstract Meaning Representation and semantic parsing. |
|
| 4/24 |
Summarization. Machine Translation. |
|
| 5/1 |
Summary and Review. Multimodal NLP / Language and Vision. |
HW4 due |
|
Final exam (TBD) |
|
Attendance Policy, Classroom Interaction
Attendance, in person or online with instructor approval, is expected for all sessions and is reflected in the participation grade. CVN students are exempt from this requirement, but are expected to either participate on Zoom or watch recorded lectures. The instructor understands that there may be extenuating circumstances that prevent you from attending all sessions. If you have to miss a session you are responsible for catching up on the material.
During class, you are expected to behave professionally, respectful and courteous to all course participants. You will use your time in class time most effectively if you fully participating in the class by asking questions and engaging in classroom activities.
If you are participating on zoom, keep your camera turned on and keep yourself muted. Ask questions in the chat. From time-to time, I may unmute individual participants. I will also occasionally use polls and ask that you participate using the yes/no/hands up functionality.
Disability-Related Accommodations
To receive disability-related academic accommodations for this course, students must first be registered with their school Disability Services (DS) office. Detailed information is available online for both the Columbia and Barnard registration processes. Refer to the appropriate website for information regarding deadlines, disability documentation requirements, and drop-in hours (Columbia)/intake session (Barnard).
Academic Honesty Policy
It is important that you read and understand this section. Any form of academic misconduct will result in a homework or exam grade of zero and can potentially be reported to the appropriate office.
Interaction With Other Students: All homework assignments must be solved individually. You are encouraged to discuss problems with others, but when you sit down to code up your solution you must work on your own, without any further interaction. You are not allowed to share your solutions (literal code and theory solutions) with other students or other groups.
Online Material: Treat coding problems like essay assignments: You are not permitted to copy any part of other people’s work without attribution. This applies to code produced by other students and to material found on the internet. The problems in this course are designed to be solved with the course materials only. However, sometimes online sources (for instance Stackoverflow) can be useful as a reference. If you have to use code snippets found online you must attribute your source in a comment (complete link). You are not allowed to copy non-trivial code fragments from these sources.
Code is non-trivial if either of the following applies:
- Any code that directly solves the the assignment.
- Any code you do not fully understand.
- Code longer than three lines.
If we find that you copied a source with attribution, but you used it to solve part of the problem, we may deduct points, but we will not treat this as an example of academic misconduct.
Note that you may use any code provided in the course materials and in the textbook without attribution.
In addition to this policy, the CS department’s academic honesty policy, as well as the individual policies of your school applies to this course.
AI-Tools/LLMs: The use of language model based AI is governed by same rules outlined in the “Online Material” section above. You may not use AI-tools to generate non-trivial code for your programming assignments. You are allowed to use LLMs as a learning tool to generate explanations for concepts discussed in class, assuming you are aware that such explanations may be incomplete or incorrect.
Please also consult the university's policy on generative AI: https://provost.columbia.edu/content/office-senior-vice-provost/ai-policy
Campus Resources
The instructor is committed to promoting students' well being and advancing an inclusive and welcoming campus culture. He is aware that students may experience personal, social, or financial challenges, whether related or unrelated to their coursework, that may affect their health and academic performance. In addition, the high levels of stress experienced by many Columbia students may affect their mental and physical health. These effects may be exacerbated during the Covid-19 pandemic.
If you are in need of support, you are encouraged to reach out to your school's adviser. If you feel comfortable notifying the instructor, he will make every effort to provide support and connect you to available Columbia resources.
If you or someone you know feels overwhelmed or suffers from depression or anxiety, please contact
-
- Counseling and Psychological Services (CPS, Columbia) - 212-853-2878
- Furman Counseling Center (Barnard) - 212-854-2092
For additional campus resources, see https://universitylife.columbia.edu/student-resources-directory
Course Summary:
| Date | Details | Due |
|---|---|---|