11-711 Algorithms for NLP

Fall 2007


Course Description

Algorithms for NLP is an introductory graduate-level course on the computational properties of natural languages and the fundamental algorithms for processing natural languages.  The course will provide an in-depth presentation of the major algorithms used in NLP, including Lexical, Morphological, Syntactic and semantic analysis, with the primary focus on parsing algorithms and their analysis.  The course is a recommended first-semester class for both the MLT and PhD in Language Technologies programs.  The main objectives of the course are the following:
  1. Develop a thorough understanding of the principles and formal methods used in the design and analysis of language processing algorithms.
  2. Provide an in-depth presentation of the major algorithms used in NLP, including Lexical, Morphological, Syntactic, and Semantic analysis, with the primary focus on parsing algorithms and their analysis.
Prerequisites & corequisites:
  • Minimal exposure to syntax and structure of NL (English).
  • College level course on algorithms.
  • College level programming skills in some imperative and/or functional programming language.
  • The self-paced Laboratory in NLP (11-712) is designed to complement this course with programming assignments on relevant topics. Students are encouraged to take the lab in parallel with the course or in the following semester.
Detailed Class Schedule


General Information

Class Meeting Time and Location:
Tuesday and Thursday, 1:30 p.m. - 2:50 p.m., NSH 1305
Recitation Sessions: (some) Wednesdays, 10:30 a.m. - 11:20 a.m., NSH 3002, see schedule for details

 
Primary Instructor:
Alon Lavie, alavie (at) cs.cmu.edu, NSH 4615, 268-5655, Office Hours: By Appointment

 
Additional Instructors:
Bob Frederking, ref (at) cs.cmu.edu, NSH 4617, 268-6656, Office Hours: By Appointment

 
Teaching Assistant:
Greg Hanneman, ghannema (at) cs.cmu.edu, NSH 4622, 268-6748, Office Hours: Monday and Friday, 1:30 p.m. - 2:30 p.m.

 
Text Books:
James Allen, Natural Language Understanding, Second Edition
Hopcroft and Ullman, Introduction to Automata Theory, Languages, and Computation, Chapters 1-6, 9-10.
Jurafsky and Martin, SPEECH and LANGUAGE PROCESSING: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition.

 
Additional Books and Readings:
Ritchie, Russell, Black, and Pulman: Computational Morphology
Manning and Schuetze, Foundations of Statistical NLP
Rich and Knight, Artificial Intelligence
Gazdar and Mellish, Natural Language Processing in Lisp
Terry Winograd, Language as a Cognitive Process, Vol. 1 - Syntax
Aho, Sethi, and Ullman, Compilers: Principles, Techniques, and Tools
Cormen, Leiserson, and Rivest, Introduction to Algorithms
Barr, A. and Feigenbaum, E. (eds.), 1981, The Handbook of Artificial Intelligence, Vol. 1
Thomas A. Sudkamp: Languages and Machines, 2nd Edition, Addison-Wesley, 1997.

 
Additional material from various conference and journal papers will be used to supplement the books mentioned above.  These materials will be announced later, and access or copies will be arranged.

 
Course Directory:
/afs/cs.cmu.edu/project/cmt-55/lti/Courses/711/

The class directory will contain postscript files of slides from some of the lectures, other class notes and handouts, homework assignments and solutions, relevant html documents, and other miscellaneous course related documents. All of the above will also be accessible via the class web page.


 
Course Website:
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/cmt-55/lti/Courses/711/www/index.html

The website will contain and up-to-date schedule of class lectures and recitations, the postscript files mentioned above, and general information about the class.


 
Course Mailing List:
11711-fall07 (at) lists.andrew.cmu.edu

Schedule updates and other important announcements will be periodically sent out via the mailing list. To subscribe to the list, visit the list info page here. List subscribers also have posting (e-mail sending) access to the list.


 
Homework:
Homework assignments will be handed out approximately every 2-3 weeks.
Assignments will include problem solving, short essay questions, and minor program development tasks.
Homework assignments will be due approximately 1-2 weeks after being handed out.

 
Grading:
Midterm Exam
25%
Final Exam
40%
Homework Assignments
25%
Class Participation
10%

Major Topics to be Covered

  1. Introduction/Overview of NLP [1 class]
    • NL Applications, Levels of Language Analysis, Knowledge Representation for NLP.
  2. Introduction to Formal Language Theory [8 classes]
    • Languages, recognition and decision algorithms
    • Regular Languages, Finite State Automata and their properties
    • CFLs, CFGs and their properties
  3. Search Techniques and Algorithms [1 class]
    • Search Spaces, DFS, BFS, A* and Beam Search
  4. Introduction to Algorithm Analysis [1 class]
    • Time and Space Computational Complexity
    • Average case and worse case complexity analysis
  5. Part-of-Speech Tagging [1 class]
    • Statistical and transformation-based tagging
  6. Parsing Algorithms for CFLs [4 classes]
    • Top-down and bottom-up parsing
    • Chart Parsers, CYK and Earley's Parsing Algorithms
    • LR Parsing and Generalized LR Parsing
  7. Parsing Alternative Grammar Formalisms [3 classes]
    • Features and Feature-structures
    • Feature Unification
    • Unification-based Grammar Formalisms
    • Augmented GLR Parsing
    • PROLOG and DCGs
    • Tree Adjoining Grammars
    • Dependency Grammars
  8. Statistical Parsing [1 class]
  9. Ambiguity Resolution [2 classes]
    • Types and Levels of Ambiguity
    • Principle-based Methods: Right Association, Minimal Attachment
    • Statistical Methods: Probabilistic CFGs, probabilistic parsing
    • Interactive Ambiguity Resolution
  10. Introduction to Semantic Processing [2 classes]
    • Semantic Knowledge Representation, Deep Structure and Logical Form
    • Compositional Semantic Interpretation
    • Semantic Grammars
    • Case Frames and Case Frame-based Parsing
  11. Natural Language Generation [1 class]
    • Problems in NL Generation
    • Basic Generation Techniques: GenKit, FUF
  12. Hard Problems in NLP [1 class]
    • Speech Understanding and Translation
    • Discourse Processing
    • Anaphora and Ellipsis Resolution
Detailed Class Schedule