NLANR/MNA logo

SC2004 Realtime Data Collection

Google

PMA Home

Special Traces
SC2004 wheather map

NLANR's PMA team is developing a prototype real time analysis application, in collaboration with researchers at the Department for Informatik, University of Leipzig, Germany.

The prototype is designed to investigate the following issues:

  • demonstrate the ability to perform real time IP header analysis at 10 Gigabits/sec link speeds using NLANR's OC192MON platform
  • study the performance impact of multiple analyses running in real time on a high-end server PC (OC192MON and equivalent)
  • develop a concept for operating a distributed real time data analysis and collection infrastructure using centralized reporting and archiving
  • develop the required algorithms, architectures and protocols to support such infrastructure and publish those in the course of time
  • field trial and stress test various approaches and implementations

During Supercomputing 2004 we used the opportunity to demonstrate and evaluate our present system under stress conditions, the highlight of which was to monitor and observe a number of contestants of the Bandwidth Challenge. We were supported by colleagues from Internet2 and NCSA as part of SCinet.

This display of data

Some interesting artefacts highlighted:

We operated the OC192MON from Monday November 8th through to Thursday November 11th, 2004. Most of the time the OC192MON was collecting and analyzing data in real time; with one major gap between Tuesday night and Wednesday morning, during which the system was collecting IP packet header trace data. This data set is published here.

The OC192MON was initially tuned into the Abilene link towards New York (see also the weathermap above). This configuration was changed on Tuesday night, after which the system was observing the Abilene link to Chicago, until it was turned off on Thursday afternoon. All times are Eastern Standard (Pittsburgh, PA, local time).

The application will compute various parameters in real time. It will generate a new set of graphs every five seconds, displaying the performance parameters in windows of the last five minutes, 30 minutes, the last hour, the last six and the last 24 hours. The pages displayed above represent the archived RRD database information and computed graphs over the course of the four days.

This display of data consists of close to 8,000 HTML pages and some 95,000 PNG graphs. Every HTML page has 12 PNG's displaying:

packet per second graphs:
Packets/Protocol, Packets/Direction, Packets/IP-Protocol
bits per second graphs:
Bandwidth/Protocol, Bandwidth/Direction, Bandwidth/IP-Protocol
Active TCP and UDP connections:
all connections which are in a open state
New TCP and UDP connections:
new connections per second, all unknown TCP and UDP flows are counted here
Connection Duration of TCP and UDP flows:
the average time between the first and the last TCP and UDP packet seen
Packets per Connection:
the average count of packets of each flow terminated at this timestamp
Bits per Connection:
the average count of bits of each flow terminated at this timestamp
Dag loss counter:
packet loss counter as maintained by dag firmware
One minute CPU load averages:
the one minute average number of processes in the run queue, as gathered with UNIX getloadavg(3).

While the first 10 parameters are computed from network traffic on the link, the last two figures are displayed in order to understand the performance (and limitiations) of the application itself. From the data collected we can derive that while the host is quite busy at times, there never is any actual data loss, hence we can assume that the application can keep up in processing the network information at any point in time.

Some of the performance highlights captured during SC2004 include peaks of 13 Gigabits/second of combined bidirectional data load, nearly 600,000 packets/sec load. During the later part of the week the machine was busy compressing and transferring IP header trace data as well, which explains the unusually high CPU load. The DAG packet loss anomaly on Wednesday afternoon is explained by the temporary loss of connectivity during the switch of fibers.

It is fair to note that the SC2004 OC192MON deployment quite vividly demonstrated the proof of concept, and the possibility to implement useful real time analysis tools performing at high traffic loads on 10 Gigabit/second network links.

Acknowledgements

We would like to thank Jon Dugan (NCSA) and Matt Zekauskas (Internet2) for their help and support prior and during SC2004 in setting up the taps and fibers as well as providing details on the configuration of the network links observed.

location/map image

divider line

Top   last modified: 26 Apr 2005   Klaus Degner and Jörg Micheel   Comments, questions are welcome:   Feedback

acknowledgment