Daniel B. Neill
Assistant Professor of Information Systems
H.J. Heinz III School of Public Policy and Management
Carnegie Mellon University
Hamburg Hall #2105B, x8-3885
I am currently teaching two courses at the Heinz School. Statistics for IT
Managers is the core statistics course for students in the Master of Information Systems Management
program. Artificial
Intelligence Tools for Policy is a new
elective course that I developed and taught for the first time in Fall
2008. It is geared primarily for students in the Master of Science in Public Policy
and Management program, but is open to any student who is interested
in the application of artificial intelligence and machine learning to
real-world policy problems. Course descriptions, sample syllabi, and
lecture slides can be obtained by clicking on the course names above,
and current course materials are available on Blackboard.
I am also coordinating the new Joint Ph.D. Program in Machine Learning and
Public Policy, offered jointly by the Heinz School and Machine Learning
Department at CMU. Information about this program is available here.
Research:
My research interests include pattern detection, machine learning, data
mining, algorithms, biosurveillance, and health care information
systems. I am currently researching new machine learning methods and
fast algorithms for pattern detection in massive datasets. One major
application of this work is the development of systems for early detection
of emerging outbreaks of disease. A more detailed description of
my research is available here.
I am currently seeking students in Heinz and SCS for research on
the following project:
Pattern Detection, Characterization, and Discovery
This project will develop new statistical and computational techniques for
accurate and efficient pattern detection in massive, high-dimensional
datasets.
While most previous data mining work has focused on detection
and classification of single records, pattern detection extends
these methods to groups of records, in order to detect and identify
patterns not visible from any individual record alone. A key idea of our
work is that pattern detection can often be transformed into a
subset scan problem, in which we search over subsets of the data
records to find those groups that are likely to correspond to some
probabilistically modeled pattern type. However, this idea creates two
main challenges: the statistical problem of evaluating the
"interestingness" of a given subset (whether it corresponds to some
specific pattern, is anomalous, etc.) and the computational problem of
efficiently searching a massive dataset for the most interesting subsets
(finding a "needle in the haystack").
Our past work on this project has focused primarily on detection of
emerging events (e.g. outbreaks of disease) in multivariate spatial time
series data. We have developed a variety of new statistical methods which
achieve more timely and accurate event detection through better use of
spatial and temporal information, integration of multiple data streams,
and incorporation of prior knowledge.
Current research topics include:
Extending event detection methodology to more general approaches for
pattern detection in large multivariate datasets.
Developing novel Bayesian and nonparametric approaches for more
accurate detection of events and patterns.
Creating new, fast algorithms for computationally efficient detection
of patterns in massive datasets.
Incorporating model learning into the event detection framework,
enabling us to distinguish between relevant and irrelevant patterns.
Incorporating active learning from user feedback, enabling us to
rapidly "zero in" on those patterns that are most relevant to an
individual user.
Integrating web-scale data sources, such as search engine queries or
information from online social networks.
Providing interactive tools for investigation, tracking, and
discovery of patterns in massive data.
Primary application areas for this project include disease
surveillance (using electronic health data such as hospital visits and
medication sales to detect and characterize emerging outbreaks),
monitoring of water quality and food safety, detection and prediction of
crime patterns, network intrusion detection, fraud detection, and
scientific discovery. We are currently involved in the development and
deployment of several large-scale systems for health and crime
surveillance. These collaborations will provide exciting opportunities to
work with real-world data, interact with law enforcement and public health
officials, and directly contribute to the public good by improving health,
safety, and security.
Here are links to some recent papers. A complete list of publications is
available in my CV.
EVENT AND PATTERN DETECTION
Daniel B. Neill and Gregory F. Cooper. A multivariate Bayesian scan
statistic for early event detection and characterization. Accepted for
publication, preprint available upon request.
Daniel B. Neill, Gregory F. Cooper, Kaustav Das, Xia Jiang, and Jeff
Schneider. Bayesian network scan statistics for multivariate pattern
detection. Accepted for publication, preprint available upon
request.
Kaustav Das, Jeff Schneider, and Daniel B. Neill. Anomaly pattern
detection in categorical datasets. Accepted for publication, preprint
available upon request.
Maxim Makatchev and Daniel B. Neill. Learning outbreak regions in
Bayesian spatial scan statistics. Proceedings of the ICML/UAI/COLT
Workshop on Machine Learning for Health Care Applications, 2008.
(pdf)
Daniel B. Neill. Detection of spatial and spatio-temporal clusters.
Ph.D. thesis, Carnegie Mellon University, Department of Computer
Science, Technical Report CMU-CS-06-142, 2006.
(pdf)
Daniel B. Neill, Andrew W. Moore, and Gregory F. Cooper. A
Bayesian spatial scan statistic. In Y. Weiss, et al., eds. Advances
in Neural Information Processing Systems 18, 1003-1010, 2006.
(pdf)
Daniel B. Neill, Andrew W. Moore, Maheshkumar Sabhnani, and Kenny
Daniel. Detection of emerging space-time clusters.
Proceedings of the 11th ACM SIGKDD Conference on Knowledge Discovery
and Data Mining, 218-227, 2005.
(pdf)
Daniel B. Neill and Andrew W. Moore. Anomalous spatial cluster
detection. Proceedings of the KDD 2005 Workshop on Data Mining
Methods for Anomaly Detection, 2005.
(pdf)
FAST DETECTION ALGORITHMS
Daniel B. Neill, Andrew W. Moore, Francisco Pereira, and Tom Mitchell.
Detecting significant multidimensional spatial clusters. In L.K. Saul, et
al., eds. Advances in Neural Information Processing Systems 17,
969-976, 2005.
(pdf)
Daniel B. Neill and Andrew W. Moore. Rapid detection of
significant spatial clusters. Proceedings of the 10th ACM
SIGKDD Conference on Knowledge Discovery and Data Mining,
256-265, 2004.
(pdf)
DISEASE SURVEILLANCE
Maheshkumar R. Sabhnani, Daniel B. Neill, Andrew W. Moore, Fu-Chiang
Tsui, Michael M. Wagner, and Jeremy U. Espino. Detecting anomalous
patterns in pharmacy retail data. Proceedings of the KDD 2005
Workshop on Data Mining Methods for Anomaly Detection, 2005.
(pdf)
M. Wagner, F.-C. Tsui, J. Espino, W. Hogan, J. Hutman, J. Hersh, D. Neill,
A. Moore, G. Parks, C. Lewis, and R. Aller. A national retail data
monitor for public health surveillance. Morbidity and Mortality Weekly
Report 53: 40-42, 2004.
(pdf)
HEALTH CARE INFORMATION SYSTEMS
Sharique Hasan, George T. Duncan, Daniel B. Neill, and Rema Padman.
Towards a collaborative filtering approach to medication reconciliation.
Accepted for publication, preprint available upon request.
GAME THEORY
Daniel B. Neill. Cascade effects in heterogeneous
populations. Rationality and Society 17(2): 191-241, 2005.
(pdf)
Daniel B. Neill. Evolutionary stability for large populations.
Journal of Theoretical Biology 227(3): 397-401, 2004.
(pdf)
Daniel B. Neill. Evolutionary dynamics with large aggregate
shocks. Dept. of Computer Science, Technical Report CMU-CS-03-197, 2003.
(pdf)
Daniel B. Neill. Cooperation and coordination in the Turn-Taking
Dilemma. Proceedings of the Ninth Conference on Theoretical Aspects
of Rationality and Knowledge: 231-244, 2003.
(pdf)
Daniel B. Neill. Optimality under noise: higher memory
strategies for the Alternating Prisoner's Dilemma. Journal of
Theoretical Biology 211(2): 159-180, 2001.
(pdf)
NATURAL LANGUAGE PROCESSING
Paul Hsiung, Andrew Moore, Daniel Neill, and Jeff Schneider.
Alias detection in link data sets. Proceedings of the First
International Conference on Intelligence Analysis, 2005.
(pdf)
Daniel B. Neill. Fully automatic word sense induction by
semantic clustering. Cambridge University, masters thesis, M.Phil. in
Computer Speech, 2002.
(pdf)