Collaborative Research: NetMine: Finding Patterns in Network Data
Keywords
Data mining, network traffic, network topology.
Project Award Information
-
Award Number: IIS-0209107
-
Duration: 09/01/2002-2/29/2004
-
Title: Collaborative Research: NetMine: Finding Patterns in Network
Data
Project Summary
The project has two major thrusts: The first is to find patterns in the
network traffic, and the second is to find trends in the Internet evolution.
The technical merit is in the synergy of the networking and
data mining fields, pushing the envelope in both: The networking field
will enjoy novel insights and fast tools to predict the network performance.
The data mining field will benefit from new problems and new tools (using
fractals, power laws, large-graph algorithms), that will be stress-tested
on multiple Gigabytes of real, network data.
This is a joint project with Prof. Michalis Faloutsos of U.C. Riverside.
Goals, Objectives, and Targeted Activities
As mentioned, the first goal is to find patterns and to do forecasting
in network traffic. The second goal is to find patterns in the topology
of the Internet, when viewed as a graph.
Indication of Success
We have developed models for non-linear forecasting [Chakrabarti+, CIKM
2002], we found patterns in the Internet topology as it evolves over time
[Siganos+, Trans. Netw.], and we also estimated the so-called 'epidemic
threshold', a condition to estimate when a virus will propagate throughout
the whole network.
Project Impact
-
Human Resources: A Ph.D. candidate, Mr. Deepay Chakrabarti, is working
on the topic. From UCR, Mr. George Siganos is working on the network aspects
of the project.
-
Education and curriculum development: Several lectures on linear
and non-linear forecasting have been incorporated in the Multimedia
databases and data mining (15-826/10-603) which is a required
course in the CALD Masters program at CMU (CALD = Center for Automated
Learning and Discovery) as well as in the newly introduced
Ph.D.
program in Computational & Statistical Learning
Project References
The following refereed publications mention the NSF support, since
Sept. 2002:
-
Georgos Siganos, Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos
Power-Laws and the AS-level Internet Topology IEEE/ACM Transactions
on Networking (to appear)
-
Yang Wang, Deepayan Chakrabarti, Chenxi Wang and Christos Faloutsos
Epidemic Spreading in Real Networks: An Eigenvalue Viewpoint 22nd
Symposium on Reliable Distributed Computing (SRDS2003) Florence, Italy,
Oct. 6-8, 2003.
-
Deepay Chakrabarti and Christos Faloutsos F4: Large-Scale Automated
Forecasting using Fractals CIKM 2002, Washington DC, Nov. 2002.
Area Background
The project requires familiarity with networks, graph analysis and time-series
analysis.
Area References:
-
Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, On Power-Law
Relationships of the Internet Topology, SIGCOMM 1999.
GPRA performance criteria
Discoveries at and across the frontiers of science and engineering:
The project straddles many areas: databases and data mining; networks;
graph analysis; and time series analysis.
Connections between discoveries and their use in the service of society:
Modeling of network traffic can help in forecasting, provisioning, detection
of anomalies (like 'denial of service' attacks, mis-configured routers).
Modeling of the graph topology can help with spotting the most important
nodes (to defend against attacks), as well as to study how computer viruses
may propagate.