Collaborative Research: NetMine: Finding Patterns in Network Data

 
Christos Faloutsos Phone: (412)-268.1457
Department of Computer Science Fax : (412)-268.5576
Carnegie Mellon Univ. Email: christos@cs.cmu.edu
Pittsburgh, PA 15213 WWW page: http://www.cs.cmu.edu/~christos

Keywords

Data mining, network traffic, network topology.

Project Award Information

Project Summary

The project has two major thrusts: The first is to find patterns in the network traffic, and the second is to find trends in the Internet evolution. The technical merit is  in the synergy of the networking and  data mining fields, pushing the envelope in both: The networking field will enjoy novel insights and fast tools to predict the network performance. The data mining field will benefit from new problems and new tools (using fractals, power laws, large-graph algorithms), that will be stress-tested on multiple Gigabytes of real, network data.

This is a joint project with Prof. Michalis Faloutsos of U.C. Riverside.

Goals, Objectives, and Targeted Activities

As mentioned, the first goal is to find patterns and to do forecasting in network traffic. The second goal is to find patterns in the topology of the Internet, when viewed as a graph.

Indication of Success

We have developed models for non-linear forecasting [Chakrabarti+, CIKM 2002], we found patterns in the Internet topology as it evolves over time [Siganos+, Trans. Netw.], and we also estimated the so-called 'epidemic threshold', a condition to estimate when a virus will propagate throughout the whole network.

Project Impact

Project References

The following refereed publications mention the  NSF support, since Sept. 2002:
  1. Georgos Siganos, Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos Power-Laws and the AS-level Internet Topology IEEE/ACM Transactions on Networking (to appear)
  2. Yang Wang, Deepayan Chakrabarti, Chenxi Wang and Christos Faloutsos  Epidemic Spreading in Real Networks: An Eigenvalue Viewpoint 22nd Symposium on Reliable Distributed Computing (SRDS2003) Florence, Italy,  Oct. 6-8, 2003.
  3. Deepay Chakrabarti and Christos Faloutsos F4: Large-Scale Automated       Forecasting using Fractals CIKM 2002, Washington DC, Nov. 2002.

Area Background

The project requires familiarity with networks, graph analysis and time-series analysis.
 

Area References:

GPRA performance criteria

Discoveries at and across the frontiers of science and engineering: The project straddles many areas: databases and data mining; networks; graph analysis; and time series analysis.

Connections between discoveries and their use in the service of society:  Modeling of network traffic can help in forecasting, provisioning, detection of anomalies (like 'denial of service' attacks, mis-configured routers). Modeling of the graph topology can help with spotting the most important nodes (to defend against attacks), as well as to study how computer viruses may propagate.