Phase 3

Definitions

  • scenario- a configuration of traffic generators and background traffic.

Directory Layout

The dataset directory is laid out in the following manner: /data/searchlight/phase3_corpus/{size}/{encryption}/{scenario}. Experiment scenarios are separated by topology size (small, medium, large) and encryption type (non encrypted, various encryption methods). Each scenario contains the following:

  • config.json
    • JSON file of list of configurations for the given scenario. top-level map is from variant: config, for most use variant 000000 is the main configuration
  • myexp.etchosts
    • etchosts file for the topology. Necessary for mapping hostnames to ip addresses
    • Used in conjunction with config.json to provide ground truth mapping of flows in a pcap to traffic generators in the configuration
  • pcaps
    • All packet captures of a scenario. the bulk of the dataset

Naming Convention

The path to a PCAP file takes the following form: /data/searchlight/phase3_corpus/{size}/{encryption}/{scenario}/pcaps/{pcap_name}.pcap

A PCAP file name takes the following form: {timestamp}-{node}.{experiment materialization}-{traffic generator info}-{encryption status}{

Example

Example using the following pcap path: /data/searchlight/phase3_corpus/small/ipsec-sts/Phase_3_Small_100_Vtc_Beta/pcaps/1680156562-e0.test.smallencrypted.pharos-1iperf-2vtc-ipsec-sts000000.pcap

The number (#) refers to the placement in the path

  • (4) small refers to the size of the experiment
    • There are three options: small, medium, and large
  • (5) ipsec-sts refers to the encryption status
    • There are four options: noenc, ipsec-sts, wireguard-sts, wireguard-ptp
  • (6) Phase_3_Small_100_Vtc_Beta is the name of the experiment scenario. This is set in the configuration file for the scenario
  • (8) 1680156562-e0.test.smallencrypted.pharos-1iperf-2vtc-ipsec-sts000000.pcap is the pcap name.

Breaking down the PCAP name, using 1680156562-e0.test.smallencrypted.pharos-1iperf-2vtc-ipsec-sts000000.pcap as an example:

  • 1680156562 is the timestamp at which the pcap was generated
  • e0.test.smallencrypted.pharos refers to the
    • e0 is the name of the node the pcap was captured on
    • test.smallencrypted.pharos is the mergeTB experiment realization
  • 1iperf-2vtc gives information on the amount of unique configurations for each traffic generator:
    • In general, {NUM}{TG NAME} means there are NUM instances of the TG NAME traffic generator
    • In this example, we have one instance of an iperf configuration (with an unspecified number of servers and clients) and two instances of vtc configurations (with an unspecified number of servers and clients)
    • For more specific information, viewing the configuration file is necessary
  • ipsec-sts is the encryption status, again
  • 000000 is the scenario variant
    • Each scenario can have multiple variants. Check the config.json file in the directory above the packet capture for variants.

Experiment Scenarios

In general, the experiment scenario naming follows the following conventions: Phase_3_{Experiment Size}_{# approx flows}_{TG Type}_{Background traffic}

  • {Experiment Size} is one of Small, Medium, or Large. It will almost always match the experiment size in the path.
    • Exception: Some Medium size configurations were run on a Large topology. They will be labeled as Medium on the scenario folder but reside in the large folder directly under /data/searchlight/phase3_corpus
  • {# approx flows} is the approximate amount of flows in the scenario
  • {TG Type} is the primary traffic generator
    • Cloud, Video, VTC, Web_Training
  • {Background traffic} is the type of background traffic on the expeirment
    • Alpha refers to a 50/50 split of protonuke and iperf traffic
    • Beta refers to only iperf background traffic

Exceptions: - Deterministic is a Web_Training only type – background traffic is Beta, and the deterministic flag is set to True for the web application

Collection Methods

The phase 3 dataset was created using the experiment maker and configurations hosted in the phase-3-experiments repo.

Reproduction

Experiments can be reproduced using the data in the dataset, a MergeTB testbed, and the experiment maker program.

To reproduce a packet capture:

  1. take the config.json file and extract the variation you want to reproduce
    • e.g. if reproducing the 000000 variant, extract the JSON data from "000000") and save into a new config file
  2. Materialize and initialize the Small, Medium, or Large topology (whatever is revealed in the pcap path)
  3. Run the experiment-maker given the config file made in step 1
    • python experiment.py config.json

This will save the pcap in /local_scratch/artifacts on the node specified in config.json (usually e0 or c0).

Additional Information

Traffic breakdown by type

Most small experiments have a 90/10 ratio of background traffic / target application traffic. The following is a breakdown of all target traffic types in the format: {# Flows}: {# background flows}/{# target flows}

  • small: 10% is target
    • 100: 100/10
    • 250: -beta: 300/30
      • alpha: 240/30
    • 500: 450/50
  • medium: 5% is target
    • 800: 760/40
    • 1000: 950/50
    • 1500:
      • alpha: 1444/75
      • beta: 1425 / 75
  • large: 5%, 2.5% for 10K is target
    • 2500: 2375/125
    • 5000: 4750/250
    • 10000: 9750/250 Examples:
  • A small 500 flow video streaming alpha experiment has 450 background flows and 50 target flows. 225 of the background flows are protonuke http, 225 are iperf. All 50 target flows are video streaming
  • A large, 5,000 flow experiment has 4750 background flows and 250 target flows.
  • A medium 1,500 flow “beta” experiment has 1425 background flows and 75 target flows.