Experiment Automation

Tools to automate experiments

Overview

Ansible is the recommended tool for experiment automation. This guide will go over the basics of using Ansible for Merge experiments. The general Ansible docs can be found here.

While playbooks can be executed from your local workstation using the XDC as a jump host to reach your nodes, we recommend executing playbooks directly from the XDC. This guide assumes you have created an XDC and attached your XDC to a materialization.

Prerequisites

Install Ansible in your XDC as follows:

sudo apt -y update
sudo apt -y install ansible

Topology

The following topology will be used in this example. It’s the simple two node topology depicted in the diagram above.

from mergexp import *

net = Network('ansible')

nodes = a, b = [net.node(name) for name in ['a', 'b']]
link = net.connect(nodes)
link[a].socket.addrs = ip4('10.0.0.1/24')
link[b].socket.addrs = ip4('10.0.0.2/24')

experiment(net)

Playbooks

A playbook is a sequence of plays that can be anything from provisioning users, copying files, configuring software to running generic shell commands and scripts. Here is an example of a playbook that installs and runs the iperf3 network performance tool across two nodes. Here we assume the nodes are called named a and b and that the experiment network IP addresses of these nodes are 10.0.0.1 and 10.0.0.2 respectively.

playbook.yml

- hosts: all
  become: true
  tasks:
    - name: install iperf3
      apt:
        name: iperf3

- hosts: b
  tasks:
    - name: run iperf3 in server mode
      shell: iperf3 --server --daemon --one-off

- hosts: a
  tasks:
    - name: run iperf3 in client mode
      shell: iperf3 -c 10.0.0.2 --logfile /tmp/results

    - name: collect results
      fetch:
        src: /tmp/results
        dest: results
        flat: yes

Inventories

Ansible requires a file called an inventory to tell it how to reach the nodes it’s automating. This file is typically called hosts. Here is an example for our two node topology.

hosts

[all]
a
b

Names in square brackets indicate groups, the names below the square brackets are the members of a group. The all group was used in the first sequence of plays above, and then individual nodes were used for the follow-on play sequences.

You should be able to retrieve a very basic inventory from the XDC using the mrg CLI. Assuming your materialization is named ansible.hello.murphy, you would generate this as follows:

mrg generate inventory ansible.hello.murphy > hosts

Configuration

Ansible configuration is kept in a file called ansible.cfg. This file can be configured globally at /etc/ansible/ansible.cfg; it can be placed in your home directory at $HOME/.ansible.cfg for all playbooks run by your user; or it can be placed in the local directory where you execute a playbook from to affect only that playbook. There are a few options that are very useful to set for executing from an XDC.

[defaults]
# don't check experiment node keys, if this is not set, you will have to
# explicitly accept the SSH key for each experiment node you run Ansible
# against
host_key_checking = False

# configure up to 5 hosts in parallel
forks = 5

[ssh_connection]
# connection optimization that increases speed significantly
pipelining = True

Execution

To execute the playbook.yml above we simply type

ansible-playbook -i hosts playbook.yml

After executing the playbook, you should have a results file in your local directory like the following

$ cat results
Connecting to host 10.0.0.2, port 5201
[  6] local 10.0.0.1 port 40620 connected to 10.0.0.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  6]   0.00-1.00   sec   114 MBytes   954 Mbits/sec    0    437 KBytes
[  6]   1.00-2.00   sec   113 MBytes   947 Mbits/sec    0    437 KBytes
[  6]   2.00-3.00   sec   112 MBytes   941 Mbits/sec    0    437 KBytes
[  6]   3.00-4.00   sec   112 MBytes   942 Mbits/sec    0    482 KBytes
[  6]   4.00-5.00   sec   112 MBytes   941 Mbits/sec    0    482 KBytes
[  6]   5.00-6.00   sec   111 MBytes   929 Mbits/sec   75    400 KBytes
[  6]   6.00-7.00   sec   112 MBytes   943 Mbits/sec    0    426 KBytes
[  6]   7.00-8.00   sec   113 MBytes   945 Mbits/sec    0    426 KBytes
[  6]   8.00-9.00   sec   112 MBytes   941 Mbits/sec    0    426 KBytes
[  6]   9.00-10.00  sec   113 MBytes   946 Mbits/sec    0    426 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  6]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec   75             sender
[  6]   0.00-10.04  sec  1.09 GBytes   937 Mbits/sec                  receiver

Optimizations

If you are running a larger experiment, or if you need to optimize your ansible runs, you can change some ansible configuration options to try and reduce ansible’s memory load on the XDC, such as caching your hosts’ facts in a json file.

[defaults]
# fact caching to file instead of in memory
fact_caching = jsonfile

# set directory/location of fact cache files
fact_caching_connection = /tmp/.ansible/fc

# ask ansible to gather facts only when necessary
gathering = smart

OpenSSH command update

It is a relatively small optimization, but you can ask ssh to try your ssh keys before any other authentication types.

[ssh_connection]
# add 'PreferredAuthentication' option
ssh_args = -C -o PreferredAuthentications=publickey -o ControlMaster=auto -o ControlPersist=60s

Timing analysis callbacks

If you want to try to do some analysis on your execution times or other ansible functions, you can turn on callbacks in the configuration file and get a additional information. See ansible documentation: callbacks:profile_tasks.

You can see what callback plugins are available on your host via ansible-doc -t callback -l. profile_tasks, role_tasks, and timer are the plugins you’re looking for.

Callbacks may also take some time, so feel free to turn them back off once you are no longer in need of timing analysis.

[defaults]
# examine timing for tasks
callback_whitelist = profile_tasks, timer

Host grouping and role assignment

Host grouping is another effective optimization technique that can be utilized from your playbook via inventory group and utilization of roles.

An example of a play running roles against a group of servers:

- name: apply all settings to host group target_servers
  hosts: target_servers
  become: yes
  roles:
    - generic_hosts_configuration
    - target_server_packages
    - make_target_servers_ready

This kind of role grouping is useful to keep ansible working on the same set of hosts through several sets of changes.

Additionally, if you have a set of tasks or a role where there is no ordering necessary between hosts, you can specify an alternative “strategy”. Strategies are discussed minimally in the ansible documentation, but available strategy plugins and their descriptions are available via the command ansible-doc -t strategy -l (list) and ansible-doc -t strategy <strategy name>. You may wish to explore the host_pinned or free strategies.

Asynchronous tasks

You can run long-running tasks “in the background” using ansible’s asynchronous task handler. This is a bit of a complex topic, so it is suggested to read the available documentation in its entirety before attempting asynchronous tasks, and it is very important to keep in mind that future tasks depending on asynchronous tasks must wait for them or risk play failure.

Python Dependencies

Many people use a tool called pip to install python packages, and ansible includes a built-in pip package management module, however, using pip as a package manager is several times slower than using the system-level package manager (that is, yum/dnf or apt). If you have python package dependencies, try searching the system package managers for those dependencies to see if the version available to the system is acceptable for your use. If so, you can install them using the system-level packager and reduce play wait time for pip. Grouping the system packager installs into existing system packager plays also reduces ansible play run times.

Note: Pypi is currently limiting pip’s access to xmlrpc calls (it is disallowed). There is a chance that pip installation will be unavailable or limited in the future.