Daniel Daniel B. Neill
Assistant Professor of Information Systems
H.J. Heinz III School of Public Policy and Management
Carnegie Mellon University
Hamburg Hall #2105B, x8-3885

neill @ cs.cmu.edu

I am an Assistant Professor of Information Systems in the Heinz School of Public Policy and Management at Carnegie Mellon University. I also hold courtesy appointments in the Machine Learning Department and Robotics Institute in CMU's School of Computer Science, and an adjunct appointment in the Department of Biomedical Informatics at the University of Pittsburgh. I received my Ph.D. in Computer Science from CMU in 2006. Before that, I received my B.S.E. in Electrical Engineering and Computer Science from Duke University, M.Phil. in Computer Speech from Cambridge University, and M.S. in Computer Science from Carnegie Mellon.


Teaching:

I am currently teaching two courses at the Heinz School. Statistics for IT Managers is the core statistics course for students in the Master of Information Systems Management program. Artificial Intelligence Tools for Policy is a new elective course that I developed and taught for the first time in Fall 2008. It is geared primarily for students in the Master of Science in Public Policy and Management program, but is open to any student who is interested in the application of artificial intelligence and machine learning to real-world policy problems. Course descriptions, sample syllabi, and lecture slides can be obtained by clicking on the course names above, and current course materials are available on Blackboard.

I am also coordinating the new Joint Ph.D. Program in Machine Learning and Public Policy, offered jointly by the Heinz School and Machine Learning Department at CMU. Information about this program is available here.


Research:

My research interests include pattern detection, machine learning, data mining, algorithms, biosurveillance, and health care information systems. I am currently researching new machine learning methods and fast algorithms for pattern detection in massive datasets. One major application of this work is the development of systems for early detection of emerging outbreaks of disease. A more detailed description of my research is available here.

I am currently seeking students in Heinz and SCS for research on the following project:

Pattern Detection, Characterization, and Discovery

This project will develop new statistical and computational techniques for accurate and efficient pattern detection in massive, high-dimensional datasets.

While most previous data mining work has focused on detection and classification of single records, pattern detection extends these methods to groups of records, in order to detect and identify patterns not visible from any individual record alone. A key idea of our work is that pattern detection can often be transformed into a subset scan problem, in which we search over subsets of the data records to find those groups that are likely to correspond to some probabilistically modeled pattern type. However, this idea creates two main challenges: the statistical problem of evaluating the "interestingness" of a given subset (whether it corresponds to some specific pattern, is anomalous, etc.) and the computational problem of efficiently searching a massive dataset for the most interesting subsets (finding a "needle in the haystack").

Our past work on this project has focused primarily on detection of emerging events (e.g. outbreaks of disease) in multivariate spatial time series data. We have developed a variety of new statistical methods which achieve more timely and accurate event detection through better use of spatial and temporal information, integration of multiple data streams, and incorporation of prior knowledge.

Current research topics include: Primary application areas for this project include disease surveillance (using electronic health data such as hospital visits and medication sales to detect and characterize emerging outbreaks), monitoring of water quality and food safety, detection and prediction of crime patterns, network intrusion detection, fraud detection, and scientific discovery. We are currently involved in the development and deployment of several large-scale systems for health and crime surveillance. These collaborations will provide exciting opportunities to work with real-world data, interact with law enforcement and public health officials, and directly contribute to the public good by improving health, safety, and security.


Here are links to some recent papers. A complete list of publications is available in my CV.

EVENT AND PATTERN DETECTION

Daniel B. Neill and Gregory F. Cooper. A multivariate Bayesian scan statistic for early event detection and characterization. Accepted for publication, preprint available upon request.

Daniel B. Neill, Gregory F. Cooper, Kaustav Das, Xia Jiang, and Jeff Schneider. Bayesian network scan statistics for multivariate pattern detection. Accepted for publication, preprint available upon request.

Kaustav Das, Jeff Schneider, and Daniel B. Neill. Anomaly pattern detection in categorical datasets. Accepted for publication, preprint available upon request.

Maxim Makatchev and Daniel B. Neill. Learning outbreak regions in Bayesian spatial scan statistics. Proceedings of the ICML/UAI/COLT Workshop on Machine Learning for Health Care Applications, 2008. (pdf)

Daniel B. Neill. Detection of spatial and spatio-temporal clusters. Ph.D. thesis, Carnegie Mellon University, Department of Computer Science, Technical Report CMU-CS-06-142, 2006. (pdf)

Daniel B. Neill, Andrew W. Moore, and Gregory F. Cooper. A Bayesian spatial scan statistic. In Y. Weiss, et al., eds. Advances in Neural Information Processing Systems 18, 1003-1010, 2006. (pdf)

Daniel B. Neill, Andrew W. Moore, Maheshkumar Sabhnani, and Kenny Daniel. Detection of emerging space-time clusters. Proceedings of the 11th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 218-227, 2005. (pdf)

Daniel B. Neill and Andrew W. Moore. Anomalous spatial cluster detection. Proceedings of the KDD 2005 Workshop on Data Mining Methods for Anomaly Detection, 2005. (pdf)

FAST DETECTION ALGORITHMS

Daniel B. Neill, Andrew W. Moore, Francisco Pereira, and Tom Mitchell. Detecting significant multidimensional spatial clusters. In L.K. Saul, et al., eds. Advances in Neural Information Processing Systems 17, 969-976, 2005. (pdf)

Daniel B. Neill and Andrew W. Moore. Rapid detection of significant spatial clusters. Proceedings of the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 256-265, 2004. (pdf)

DISEASE SURVEILLANCE

Maheshkumar R. Sabhnani, Daniel B. Neill, Andrew W. Moore, Fu-Chiang Tsui, Michael M. Wagner, and Jeremy U. Espino. Detecting anomalous patterns in pharmacy retail data. Proceedings of the KDD 2005 Workshop on Data Mining Methods for Anomaly Detection, 2005. (pdf)

M. Wagner, F.-C. Tsui, J. Espino, W. Hogan, J. Hutman, J. Hersh, D. Neill, A. Moore, G. Parks, C. Lewis, and R. Aller. A national retail data monitor for public health surveillance. Morbidity and Mortality Weekly Report 53: 40-42, 2004. (pdf)

HEALTH CARE INFORMATION SYSTEMS

Sharique Hasan, George T. Duncan, Daniel B. Neill, and Rema Padman. Towards a collaborative filtering approach to medication reconciliation. Accepted for publication, preprint available upon request.

GAME THEORY

Daniel B. Neill. Cascade effects in heterogeneous populations. Rationality and Society 17(2): 191-241, 2005. (pdf)

Daniel B. Neill. Evolutionary stability for large populations. Journal of Theoretical Biology 227(3): 397-401, 2004. (pdf)

Daniel B. Neill. Evolutionary dynamics with large aggregate shocks. Dept. of Computer Science, Technical Report CMU-CS-03-197, 2003. (pdf)

Daniel B. Neill. Cooperation and coordination in the Turn-Taking Dilemma. Proceedings of the Ninth Conference on Theoretical Aspects of Rationality and Knowledge: 231-244, 2003. (pdf)

Daniel B. Neill. Optimality under noise: higher memory strategies for the Alternating Prisoner's Dilemma. Journal of Theoretical Biology 211(2): 159-180, 2001. (pdf)

NATURAL LANGUAGE PROCESSING

Paul Hsiung, Andrew Moore, Daniel Neill, and Jeff Schneider. Alias detection in link data sets. Proceedings of the First International Conference on Intelligence Analysis, 2005. (pdf)

Daniel B. Neill. Fully automatic word sense induction by semantic clustering. Cambridge University, masters thesis, M.Phil. in Computer Speech, 2002. (pdf)


Links:

My Poetry
Google
CNN.com
The Onion
Arts and Letters Daily