 |
11-711 Algorithms for NLP
Fall 2007
Course Description
Algorithms for NLP is an introductory graduate-level
course on the computational properties of natural languages and the fundamental
algorithms for processing natural languages. The course will provide
an in-depth presentation of the major algorithms used in NLP, including
Lexical, Morphological, Syntactic and semantic analysis, with the primary
focus on parsing algorithms and their analysis. The course is a recommended
first-semester class for both the MLT and PhD in Language Technologies
programs. The main objectives of the course are the following:
-
Develop a thorough understanding of the principles and formal methods used
in the design and analysis of language processing algorithms.
-
Provide an in-depth presentation of the major algorithms used in NLP, including
Lexical, Morphological, Syntactic, and Semantic analysis, with the primary
focus on parsing algorithms and their analysis.
Prerequisites & corequisites:
- Minimal exposure to syntax and structure of NL (English).
- College level course on algorithms.
- College level programming skills in some imperative and/or functional programming language.
- The self-paced Laboratory in NLP (11-712) is designed to complement this course with programming assignments on relevant topics. Students are encouraged to take the lab in parallel with the course or in the following semester.
Detailed Class Schedule
General Information
-
Class Meeting Time and Location:
-
Tuesday and Thursday, 1:30 p.m. - 2:50 p.m., NSH 1305
-
Recitation Sessions: (some) Wednesdays, 10:30 a.m. -
11:20 a.m., NSH 3002, see schedule for details
-
Primary Instructor:
-
Alon
Lavie, alavie (at) cs.cmu.edu, NSH 4615, 268-5655, Office Hours: By Appointment
-
Additional Instructors:
-
Bob Frederking, ref (at) cs.cmu.edu,
NSH 4617, 268-6656, Office Hours: By Appointment
-
Teaching Assistant:
-
Greg Hanneman, ghannema (at) cs.cmu.edu, NSH 4622, 268-6748,
Office Hours: Monday and Friday, 1:30 p.m. - 2:30 p.m.
-
Text Books:
-
James Allen, Natural Language Understanding, Second Edition
-
Hopcroft and Ullman, Introduction to Automata Theory, Languages, and
Computation, Chapters 1-6, 9-10.
-
Jurafsky and Martin, SPEECH and LANGUAGE PROCESSING: An Introduction to
Natural Language Processing, Computational Linguistics, and Speech
Recognition.
-
Additional Books and Readings:
-
Ritchie, Russell, Black, and Pulman: Computational Morphology
-
Manning and Schuetze, Foundations of Statistical NLP
-
Rich and Knight, Artificial Intelligence
-
Gazdar and Mellish, Natural Language Processing in Lisp
-
Terry Winograd, Language as a Cognitive Process, Vol. 1 - Syntax
-
Aho, Sethi, and Ullman, Compilers: Principles, Techniques, and Tools
-
Cormen, Leiserson, and Rivest, Introduction to Algorithms
-
Barr, A. and Feigenbaum, E. (eds.), 1981, The Handbook of Artificial
Intelligence, Vol. 1
-
Thomas A. Sudkamp: Languages and Machines, 2nd Edition, Addison-Wesley, 1997.
-
Additional material from various conference and journal papers will be
used to supplement the books mentioned above. These materials will
be announced later, and access or copies will be arranged.
-
Course Directory:
-
/afs/cs.cmu.edu/project/cmt-55/lti/Courses/711/
The class directory will contain postscript files of slides from some of the lectures, other class notes and handouts, homework assignments and solutions, relevant html documents, and other miscellaneous course related documents. All of the above will also be accessible via the class web page.
-
Course Website:
-
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/cmt-55/lti/Courses/711/www/index.html
The website will contain and up-to-date schedule of class lectures and recitations, the postscript files mentioned above, and general information about the class.
-
Course Mailing List:
-
11711-fall07 (at) lists.andrew.cmu.edu
Schedule updates and other important announcements will be periodically sent out via the mailing list. To subscribe to the list, visit the list info page here. List subscribers also have posting (e-mail sending) access to the list.
-
Homework:
-
Homework assignments will be handed out approximately every 2-3 weeks.
-
Assignments will include problem solving, short essay questions, and minor
program development tasks.
-
Homework assignments will be due approximately 1-2 weeks after being handed
out.
-
Grading:
-
Midterm Exam
|
-
25%
|
-
Final Exam
|
-
40%
|
-
Homework Assignments
|
-
25%
|
-
Class Participation
|
-
10%
|
Major Topics to be Covered
-
Introduction/Overview of NLP [1 class]
-
NL Applications, Levels of Language Analysis, Knowledge Representation
for NLP.
-
Introduction to Formal Language Theory [8 classes]
-
Languages, recognition and decision algorithms
-
Regular Languages, Finite State Automata and their properties
-
CFLs, CFGs and their properties
-
Search Techniques and Algorithms [1 class]
-
Search Spaces, DFS, BFS, A* and Beam Search
-
Introduction to Algorithm Analysis [1 class]
-
Time and Space Computational Complexity
-
Average case and worse case complexity analysis
-
Part-of-Speech Tagging [1 class]
-
Statistical and transformation-based tagging
-
Parsing Algorithms for CFLs [4 classes]
-
Top-down and bottom-up parsing
-
Chart Parsers, CYK and Earley's Parsing Algorithms
-
LR Parsing and Generalized LR Parsing
-
Parsing Alternative Grammar Formalisms [3 classes]
-
Features and Feature-structures
-
Feature Unification
-
Unification-based Grammar Formalisms
-
Augmented GLR Parsing
-
PROLOG and DCGs
-
Tree Adjoining Grammars
-
Dependency Grammars
-
Statistical Parsing [1 class]
-
Ambiguity Resolution [2 classes]
-
Types and Levels of Ambiguity
-
Principle-based Methods: Right Association, Minimal Attachment
-
Statistical Methods: Probabilistic CFGs, probabilistic parsing
-
Interactive Ambiguity Resolution
-
Introduction to Semantic Processing [2 classes]
-
Semantic Knowledge Representation, Deep Structure and Logical Form
-
Compositional Semantic Interpretation
-
Semantic Grammars
-
Case Frames and Case Frame-based Parsing
-
Natural Language Generation [1 class]
-
Problems in NL Generation
-
Basic Generation Techniques: GenKit, FUF
-
Hard Problems in NLP [1 class]
-
Speech Understanding and Translation
-
Discourse Processing
-
Anaphora and Ellipsis Resolution
Detailed Class Schedule |