William W. Cohen

William W. Cohen

Visiting Professor, Machine Learning Department

Biography

William Cohen is a Visiting Professor at Carnegie Mellon University in the Machine Learning Department. He also holds a 20%-time appointment as a Principal Scientist at Google, where he worked full-time between May 2018 and March 2024. He received his bachelor's degree in Computer Science from Duke University in 1984, and a PhD in Computer Science from Rutgers University in 1990. From 1990 to 2000 Dr. Cohen worked at AT&T Bell Labs and later AT&T Labs-Research, and from April 2000 to May 2002 Dr. Cohen worked at Whizbang Labs, a company specializing in extracting information from the web. From 2002 to 2018, Dr. Cohen worked at Carnegie Mellon University in the Machine Learning Department, with a joint appointment in the Language Technology Institute.

Dr. Cohen is a past president of the International Machine Learning Society. In the past he has also served as an action editor for the the AI and Machine Learning series of books published by Morgan Claypool, for the journal Machine Learning, the journal Artificial Intelligence, the Journal of Machine Learning Research, and the Journal of Artificial Intelligence Research. He was General Chair for the 2008 International Machine Learning Conference, held July 6-9 at the University of Helsinki, in Finland; Program Co-Chair of the 2006 International Machine Learning Conference; and Co-Chair of the 1994 International Machine Learning Conference. Dr. Cohen was also the co-Chair for the 3rd Int'l AAAI Conference on Weblogs and Social Media, which was held May 17-20, 2009 in San Jose, and was the co-Program Chair for the 4rd Int'l AAAI Conference on Weblogs and Social Media. He is a AAAI Fellow, and was a winner of the 2008 the SIGMOD "Test of Time" Award for the most influential SIGMOD paper of 1998, the 2014 SIGIR "Test of Time" Award for the most influential SIGIR paper of 2002-2004, and the 2023 Semantic Web Science Association's Ten-Year Award for the most influential paper of the ISWC-2013 conference.

Dr. Cohen's research interests include include question answering, machine learning for NLP tasks, and neuro-symbolic reasoning, and he has a long-standing interest in statistical relational learning. He holds seven patents related to learning, discovery, information retrieval, and data integration, and is the author of more than 300 publications.

Announcements and FAQs

March 2024: As you can see from my updated bio above, I am have returned to CMU's ML department full-time (although I still have a 20% involvement a Google, so that email will work!) I'm really looking forward to re-engaging with my friends at colleagues at CMU.

Projects, Publications, Software, Datasets, and Talks

These are now being distributed from my Github page.

Teaching

Past courses:

Spring 2018: Undergraduate Level Machine Learning with Large Datasets, 10-405, Mon-Wed 3:30-4:20 in GHC 4307
Fall 2017: Machine Learning with Large Datasets, 10-605 and 10-805, Tues-Thus 1:30-2:50pm, PH 100.
Fall 2016: Machine Learning with Large Datasets, 10-605 and 10-805, Tues-Thus 1:30-2:50pm, Wean Hall 7500.
Spring 2016: Machine Learning 10-601, Mon-Wed time 10:30-11:50am, GHC 4401, with Maria-Florina Balcan.
Fall 2015: Machine Learning with Large Datasets, 10-605 and 10-805, Tu-Thu 4:30-5:50am in Dougherty Hall 2210.
Spring 2015: Machine Learning with Large Datasets, 10-605 and 10-805, Tu-Thu 10:30-11:50am in BH A51
Fall 2014: 10-601 Machine Learning, Tu-Thu 1:30-2:50, Wean 7500
Spring 2014: 10-605 Machine Learning with Large Datasets, Mon-Wed 1:30-2:50, Dougherty Hall 1112
Fall 2013: 10-601 Machine Learning, Mon-Wed 4:30-5:50, Doherty Hall 2315 (with Eric Xing).
Spring 2013: Machine Learning with Large Datasets, Mon-Wed 1:30-2:50, 4307 GHC
Fall 2012: ML 10-802 and LTI 11-772 (Analysis of Social Media), 10:30-11:50pm Tues & Thus, 4303 Gates Building.
Fall 2012: 10-915, the MLD Journal Club, 12-1:20pm Tue & Thu, 4101 Gates Building (with Roy Maxion).
Spring 2012: Machine Learning with Large Datasets, Tues-Thurs 1:30-2:50pm, NSH 1305
Fall 2011: Structured Prediction for Language and Other Discrete Data (SPLODD-2011), ML 10-710 and LTI 11-763, Tues-Thursday 3:00-4:20 in Gates-Hillman 4211. This is co-taught by myself and Noah Smith, and will include some subjects from Information Extraction and some from Language and Stats 2. A machine learning course (10-701 or consent of the instructors) is a prereq; we don't recommend that you take the course if you have already taken Information Extraction or Language and Stats 2.
Spring 2011: ML 10-802 and LTI 11-772 (Analysis of Social Media), 10:30-11:50pm Tues & Thus, 4303 Gates Building.
Spring 2011: 10-915, the MLD Journal Club, 3-4pm Mon & Wed, 4101 Gates Building.
Fall 2010: 10-707 (Information Extraction - cross-listed in LTI as 11-748), 1:30-2:50pm Mon & Wed, Gates 4101. The first class is 9/8, the Wed after Labor Day, to allow incoming students time to attend the IC courses.
Spring 2010: 10-802 (Analysis of Social Media).
Fall 2009: 10-707 (Information Extraction), 1:30-2:50pm Mon & Wed, 5222 Gates Building.
Spring 2008: 10-601 (Machine Learning) with Tom Mitchell, on 3-4:30 Mon & Wed in Wean Hall 5409.
Fall 2007: Analysis of Social Media, Machine Learning 10-802 and LTI 11-772, with Natalie Glance (of Google Pittsburgh) - a brand-new seminar course. 4:30-6:30 Tuesdays in Wean Hall 4623.
- Note: This site is the shattered remains of a once-beautiful wiki, created by the students of 10-802, generously hosted for free by ScribbleWiki, tragically lost (due a combination of RAID drive failures and low-bidder backup schemes), and then largely recovered using Warrick from various internel caches and archives.
Fall 2007: Current Topics in Computational Biology (Journal Club), 02-701. (Announcements). Thursdays from 4:00-5:00 in 411 Mellon Institute (after Cell & Systems Modeling).
Spring 2007: Information Extraction, Machine Learning 10-707 and LTI 11-748 - back by popular demand for the first time since 2004!
Fall 2006: Current Topics in Computational Biology (Journal Club), 02-701. (Announcements)
Spring 2006: Read the Web, CALD 10-709.
June 21,23,25, 2005: A mini-course on Minorthird. Materials are below.
- Slides, notes, and sample files from first day's lecture.
- Slides, notes, and sample files from second day's lecture.
- Powerpoint slides from third day's lecture.
- Jar file for minorThird, if you only want to run the code, not compile it or read it. The installation process here is:
  1. Install Java 1.4 or higher (actually, JRE is all you need).
  2. Download the jar for minorThird and stick it in some directory.
  3. Optionally, download the sample data repository and unpack it into the same directory.
  4. Change to that same directory and then run Minorthird with the command
    java -Xmx500M -jar minorthird.jar
    What will pop up will be a small launch pad that can be used to start any of the UI programs. You can also start a particular main by specifying minorthird.jar as your classpath, for instance:
    java -Xmx500M -cp minorthird.jar edu.cmu.minorthird.ui.Help
- If you want to do a real install here's the home page on Sourceforge, and a document on how to do a CVS install Minorthird.
Spring 2004: "Learning to Turn Words into Data: Machine Learning Approaches to Information Extraction and Information Integration", CALD 10-707 and LTI 11-748.

Students and other colleagues

Daniel Spokoyny, LTI PhD student, co-supervised with Taylor Berg-Kirkpatrick.

Long-term colleagues

Katie Rivard Mazaitis, research programmer/analyst, CMU

Former students

Haitian Sun (former MLD PhD student, now at Google).
Zhilin Yang (former LTI PhD student, now doing a startup)
Bhuwan Dhingra (former LTI PhD student, now an Assistant Professor at Duke)
Fan Yang (former MLD PhD student)
Rose Catherine Kanjirathinkal (former LTI PhD student)
Yifeng Tao, CMU Comp Bio PhD student, co-supervised with Xinghua Lu.
William Yang Wang (former LTI PhD student, now a Professor at UCSB).
Dana Movshovitz-Attias (former CSD PhD student, now at Google).
Bhavana Dalvi Mishra (former LTI PhD student co-advised with Jamie Callan, now at AI2)
Tae Yano, (former LTI PhD student, co-advised with Noah Smith, now at Microsoft)
Nan Li, (former CSD PhD student, co-advised with Ken Koedinger, now at D. E. Shaw)
Ramnath Balasubramanyan, (LTI PhD student)
Mahesh Joshi, (former LTI PhD student, co-advised with Carolyn Rosé)
Frank Lin, (former LTI PhD student)
Ni Lao (former LTI PhD student, now at Google)
Richard C. Wang, (former LTI PhD student co-advised with Bob Frederking).
Andrew Arnold (former MLD PhD student, now at Point 72 Asset Management)
Einat Minkov (former LTI PhD student, now at Haifa University)
Vitor Rocha de Carvalho (former LTI PhD student)
Zhenzhen Kou (former MLD PhD student)
Qiao Jin, School of Medicine, Tsinghua University
Ezra Winston, MLD Master's student.
Lanxio (Karen) Xu, MLD Master's student.
Yuxing Zhang, MLD Master's student.
Jakob Bauer, MLD 5th-year Master's student .
Kavya Srinet, MCDS Master's student.
Bhawna Juneja, MCDS Master's student.
Tom Shen, CMU CSD undergrad
Yu-Hsin Allen Kuo, LTI MLT student, formerly co-advised with Natasa Miskov-Zivanov
Rahul Goutam, former LTI MLT student, co-advised with Natasa Miskov-Zivanov
Malcolm Greaves, former CSD master's student.
Edoardo Airoldi (former MLD/Stats PhD student, co-advised with Steve Fienberg)
Ja-Hui Chang (visiting faculty from National Central University, Taiwan, 2007-2008)
Wen Haw Chong (PhD student at Singapore Management University, visted CMU in 2015-2016).
Tuan Ahn Hoang, (PhD student at Singapore Management University, visited CMU for 2012-2013 academic year in my group).
Freddy Chong Tat Chua (PhD student at Singapore Management University, visited CMU for the academic year 2011-2012 in my group.)
Gustavo Lacerda (former research assistant, co-supervised with Noboru Matsuda and Ken Koedinger, now at UBC)
Lidong Bing, former postdoc, now at Tencent.
Ramesh Nallapati (former postdoc, co-supervised with John Lafferty)
Noboru Matsuda (former postdoc, co-supervised with Ken Koedinger, now Associate Professor at NC State)
Pradeep Ravikumar (former MLD PhD student, co-advised with Steve Fienberg)
I have been an external committee member for the PhD theses of
- John Zelle (degree from U Texas)
- Misha Bilenko (from U Texas)
- Daniel Kudenko (Rutgers)
- Chumki Basu (Rutgers)
- Ananlada Chotimongkol (CMU)
- Wei-Hao Lin (CMU)
- Cenk Gazen (CMU)
- David Nadeau (U Ottowa)
- Hanghang Tong (CMU)
- Ben van Durme (Rochester)
- Partha Talukdar (U Penn)
- Andy Carlson (CMU)
- Yifen Huang (CMU)
- Swapna Sundaran (U Pitt)
- Michael Heilman (CMU)
- Jon Elsas (CMU)
- Dipanjan Das (CMU)
- Fan Guo (CMU)
- Jana Diesner (CMU)
- Freddy Chong Tat Chua (Singapore Management University).
- Qirong Ho (CMU)
- Danai Koutra (CMU)
- Reyyan Yeniterzi (CMU)
- YiChi Wang (CMU)
- Steven Gardiner (CMU)
- Jay Pujara (Univ Maryland)
- Derry Wijaya (CMU)
- Lingjia Deng (Univ of Pittsburgh)
- Chenyan Xiong (CMU)
- Pradeep Dasigi (CMU)
- Tiancheng Zhao (CMU)
- Abulhair Saparov (CMU)
- Danish Pruthi (CMU)
- Sanket Vaibhav Mehta (CMU)
- Luyu Gao (CMU)
I have also been an external committee member for the Master's theses of Mehrbod Sharifi (CMU) and Weam Abu-Zaki (CMU).

Contact Info

My preferred email address for CMU-related matters is: wcohen AT cmu DOT edu

William W. Cohen

Visiting Professor, Machine Learning Department

Biography

Announcements and FAQs

Projects, Publications, Software, Datasets, and Talks

Teaching

Students and other colleagues

Long-term colleagues

Former students

Contact Info