SC2004 Realtime Data Collection |
|
|||
NLANR's PMA team is developing a prototype real time analysis application, in collaboration with researchers at the Department for Informatik, University of Leipzig, Germany. The prototype is designed to investigate the following issues:
During Supercomputing 2004 we used the opportunity to demonstrate and evaluate our present system under stress conditions, the highlight of which was to monitor and observe a number of contestants of the Bandwidth Challenge. We were supported by colleagues from Internet2 and NCSA as part of SCinet. This display of data
Some interesting artefacts highlighted:
We operated the OC192MON from Monday November 8th through to Thursday November 11th, 2004. Most of the time the OC192MON was collecting and analyzing data in real time; with one major gap between Tuesday night and Wednesday morning, during which the system was collecting IP packet header trace data. This data set is published here. The OC192MON was initially tuned into the Abilene link towards New York (see also the weathermap above). This configuration was changed on Tuesday night, after which the system was observing the Abilene link to Chicago, until it was turned off on Thursday afternoon. All times are Eastern Standard (Pittsburgh, PA, local time). The application will compute various parameters in real time. It will generate a new set of graphs every five seconds, displaying the performance parameters in windows of the last five minutes, 30 minutes, the last hour, the last six and the last 24 hours. The pages displayed above represent the archived RRD database information and computed graphs over the course of the four days. This display of data consists of close to 8,000 HTML pages and some 95,000 PNG graphs. Every HTML page has 12 PNG's displaying:
While the first 10 parameters are computed from network traffic on the link, the last two figures are displayed in order to understand the performance (and limitiations) of the application itself. From the data collected we can derive that while the host is quite busy at times, there never is any actual data loss, hence we can assume that the application can keep up in processing the network information at any point in time. Some of the performance highlights captured during SC2004 include peaks of 13 Gigabits/second of combined bidirectional data load, nearly 600,000 packets/sec load. During the later part of the week the machine was busy compressing and transferring IP header trace data as well, which explains the unusually high CPU load. The DAG packet loss anomaly on Wednesday afternoon is explained by the temporary loss of connectivity during the switch of fibers. It is fair to note that the SC2004 OC192MON deployment quite vividly demonstrated the proof of concept, and the possibility to implement useful real time analysis tools performing at high traffic loads on 10 Gigabit/second network links. AcknowledgementsWe would like to thank Jon Dugan (NCSA) and Matt Zekauskas (Internet2)
for their help and support prior and during SC2004 in setting up the taps
and fibers as well as providing details on the configuration of the network
links observed.
|
|
|
|
|
|
|
Top last modified: 26 Apr 2005 Klaus Degner and Jörg Micheel Comments, questions are welcome: Feedback
|