Parsing CSV into an Ansible vars_file

Somebody asked me if it was possible to parse a CSV file into an Ansible vars_file. A little casual googling didn’t yield good results, so I wrote up a little hack.

The context was wanting to feed a big CSV file of data (presumably exported from an Excel spreadsheet) to Ansible’s network automation. So first we gin up a bit of CSV:

name,description,vlanid,state,mtu,inet
spam,spam,1,up,1500,192.168.1.1
eggs,eggs,2,up,1500,192.168.1.2
sausage,sausage,3,up,9000,192.168.1.3

Parsing this with Python is very easy. In fact, it even has a built-in module for CSV, which can produce a nice dict with labels based on the column titles.

#!/usr/bin/env python

import csv
import sys
import yaml

csv_data = []
with open(sys.argv[1]) as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        csv_data.append(row)

with open(sys.argv[1] + '.yml', 'w') as outfile:
    outfile.write(yaml.dump({'csv_data': csv_data}))

Now using this bit of Python, we can make an Ansible playbook that first parses the CSV into a YAML vars_file, and then imports that vars_file in the playbook.

---
- hosts: localhost
  connection: local
  become: false
  gather_facts: false
  tasks:
    - name: parse csv and make vars file
      command: "python csv_to_yaml.py example.csv"

- hosts: localhost
  connection: local
  become: false
  gather_facts: false
  vars_files:
    - example.csv.yml
  tasks:
    - debug: var=csv_data

Now lets run it and look at the debug output.

[jason@w550s redhat]$ ansible-playbook csv_vars_playbook.yml
 [WARNING]: Could not match supplied host pattern, ignoring: all

 [WARNING]: provided hosts list is empty, only localhost is available


PLAY [localhost] **********************************************************

TASK [parse csv and make vars file] ***************************************
changed: [localhost]

PLAY [localhost] **********************************************************

TASK [debug] **************************************************************
ok: [localhost] => {
    "csv_data": [
        {
            "description": "spam",
            "inet": "192.168.1.1",
            "mtu": "1500",
            "name": "spam",
            "state": "up",
            "vlanid": "1"
        },
        {
            "description": "eggs",
            "inet": "192.168.1.2",
            "mtu": "1500",
            "name": "eggs",
            "state": "up",
            "vlanid": "2"
        },
        {
            "description": "sausage",
            "inet": "192.168.1.3",
            "mtu": "9000",
            "name": "sausage",
            "state": "up",
            "vlanid": "3"
        }
    ]
}

PLAY RECAP ****************************************************************
localhost                  : ok=2    changed=1    unreachable=0    failed=0

There may be better ways for Ansible to parse CSV, but this was quick and easy. Hit me up if you’ve got a more elegant approach!

Save your AWS budget with Python and boto

My team and I lean heavily on AWS services for prototypes, demos, and training. The challenge that we’ve encountered is that it’s easy to forget about the resources you’ve spun up. So I wrote a quickly little utility that shuts down unnecessary EC2 instances at night.

The Python library, boto, provides an AWS SDK. It’s very easy to use, and many good tutorials exist. Instructions can be found in the README, but here’s a quick overview.

First we import the boto and yaml libraries. (We’re using YAML for our config file markup. ) Then we read in that config file.

import boto.ec2
import yaml

config_file = '/etc/nightly_shutdown.yml'
with open(config_file) as f:
    config = yaml.safe_load(f)

In that config file, we’ve got our region, access and secret keys, and a white list of instance IDs we’d like to opt-out of the nightly shutdown. This last bit is important if you have instances doing long-running jobs like repo syncing, for example.

---
region: us-east-1
access_key: eggseggseggseggs
secret_key: spamspamspamspam
whitelist:
  - i-abcdefgh
  - i-ijklmnop

Now we connect to the AWS API and get a list of reservations. This itself is interesting, as it gives us a little insight into the guts of EC2. As I understand it, a reservation must exist before an instance can be launched.

conn = boto.ec2.connect_to_region(config['region'],
                                  aws_access_key_id=config['access_key'],
                                  aws_secret_access_key=config['secret_key'])

reservations = conn.get_all_reservations()

Now it’s simply a matter of iterating over those reservations, getting the instance IDs, and filtering out the white-listed IDs.

running_instances = []
for r in reservations:
    for i in r.instances:
        if i.state == "running":
            if i.id not in config['whitelist']:
                running_instances.append(i.id)

Finally, we make the API call to stop the instances. Before doing so, we check to be sure there are any running, as this call will throw an exception if the instance ID list is empty.

if len(running_instances) > 0:
    conn.stop_instances(instance_ids=running_instances)

Now you just have to add this to your daily cronjobs and you’ll save a little budget.

Python one-liner: converting JSON to YAML

I’ve been playing with the Titan graph database lately; it’s hella cool, super powerful, and has a great ecosystem. One tool in the Titan toolbox is a REST interface called Rexster.

You can check to see that it’s up and what it’s serving up by curl-ing one of its endpoints.

# curl localhost:8182/graphs/graph
{"version":"2.5.0","name":"graph","graph":"titangraph[cassandrathrift:[127.0.0.1]]","features":{"isWrapper":false,"supportsVertexProperties":true,"supportsMapProperty":true,"supportsUniformListProperty":true,"supportsIndices":false,"ignoresSuppliedIds":true,"supportsFloatProperty":true,"supportsPrimitiveArrayProperty":true,"supportsEdgeIndex":false,"supportsKeyIndices":true,"supportsDoubleProperty":true,"isPersistent":true,"supportsVertexIteration":true,"supportsEdgeProperties":true,"supportsSelfLoops":true,"supportsDuplicateEdges":true,"supportsSerializableObjectProperty":true,"supportsEdgeIteration":true,"supportsVertexIndex":false,"supportsIntegerProperty":true,"supportsBooleanProperty":true,"supportsMixedListProperty":true,"supportsEdgeRetrieval":true,"supportsTransactions":true,"supportsThreadedTransactions":true,"supportsStringProperty":true,"supportsVertexKeyIndex":false,"supportsEdgeKeyIndex":false,"supportsLongProperty":true},"readOnly":false,"type":"com.thinkaurelius.titan.graphdb.database.StandardTitanGraph","queryTime":0.213622,"upTime":"0[d]:00[h]:28[m]:25[s]","extensions":[{"op":"GET","namespace":"tp","name":"gremlin","description":"evaluate an ad-hoc Gremlin script for a graph.","href":"http://localhost:8182/graphs/graph/tp/gremlin","title":"tp:gremlin","parameters":[{"name":"rexster.showTypes","description":"displays the properties of the elements with their native data type (default is false)"},{"name":"language","description":"the gremlin language flavor to use (default is groovy)"},{"name":"params","description":"a map of parameters to bind to the script engine"},{"name":"load","description":"a list of 'stored procedures' to execute prior to the 'script' (if 'script' is not specified then the last script in this argument will return the values"},{"name":"returnTotal","description":"when set to true, the full result set will be iterated and the results returned (default is false)"},{"name":"rexster.returnKeys","description":"an array of element property keys to return (default is to return all element properties)"},{"name":"rexster.offset.start","description":"start index for a paged set of data to be returned"},{"name":"rexster.offset.end","description":"end index for a paged set of data to be returned"},{"name":"script","description":"the Gremlin script to be evaluated"}]},{"op":"POST","namespace":"tp","name":"gremlin","description":"evaluate an ad-hoc Gremlin script for a graph.","href":"http://localhost:8182/graphs/graph/tp/gremlin","title":"tp:gremlin","parameters":[{"name":"rexster.showTypes","description":"displays the properties of the elements with their native data type (default is false)"},{"name":"language","description":"the gremlin language flavor to use (default is groovy)"},{"name":"params","description":"a map of parameters to bind to the script engine"},{"name":"load","description":"a list of 'stored procedures' to execute prior to the 'script' (if 'script' is not specified then the last script in this argument will return the values"},{"name":"returnTotal","description":"when set to true, the full result set will be iterated and the results returned (default is false)"},{"name":"rexster.returnKeys","description":"an array of element property keys to return (default is to return all element properties)"},{"name":"rexster.offset.start","description":"start index for a paged set of data to be returned"},{"name":"rexster.offset.end","description":"end index for a paged set of data to be returned"},{"name":"script","description":"the Gremlin script to be evaluated"}]}

Ugly. Python to the rescue.

#!/usr/bin/env python

import simplejson
import sys
import yaml

print yaml.dump(simplejson.loads(str(sys.stdin.read())), default_flow_style=False)

Basically a one-liner.

# curl localhost:32791/graphs/graph | python json2yaml.py 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3581    0  3581    0     0   552k      0 --:--:-- --:--:-- --:--:--  582k
extensions:
- description: evaluate an ad-hoc Gremlin script for a graph.
  href: http://localhost:8182/graphs/graph/tp/gremlin
  name: gremlin
  namespace: tp
  op: GET
  parameters:
  - description: displays the properties of the elements with their native data type
      (default is false)
    name: rexster.showTypes
  - description: the gremlin language flavor to use (default is groovy)
    name: language
  - description: a map of parameters to bind to the script engine
    name: params
  - description: a list of 'stored procedures' to execute prior to the 'script' (if
      'script' is not specified then the last script in this argument will return
      the values
    name: load
  - description: when set to true, the full result set will be iterated and the results
      returned (default is false)
    name: returnTotal
  - description: an array of element property keys to return (default is to return
      all element properties)
    name: rexster.returnKeys
  - description: start index for a paged set of data to be returned
    name: rexster.offset.start
  - description: end index for a paged set of data to be returned
    name: rexster.offset.end
  - description: the Gremlin script to be evaluated
    name: script
  title: tp:gremlin
- description: evaluate an ad-hoc Gremlin script for a graph.
  href: http://localhost:8182/graphs/graph/tp/gremlin
  name: gremlin
  namespace: tp
  op: POST
  parameters:
  - description: displays the properties of the elements with their native data type
      (default is false)
    name: rexster.showTypes
  - description: the gremlin language flavor to use (default is groovy)
    name: language
  - description: a map of parameters to bind to the script engine
    name: params
  - description: a list of 'stored procedures' to execute prior to the 'script' (if
      'script' is not specified then the last script in this argument will return
      the values
    name: load
  - description: when set to true, the full result set will be iterated and the results
      returned (default is false)
    name: returnTotal
  - description: an array of element property keys to return (default is to return
      all element properties)
    name: rexster.returnKeys
  - description: start index for a paged set of data to be returned
    name: rexster.offset.start
  - description: end index for a paged set of data to be returned
    name: rexster.offset.end
  - description: the Gremlin script to be evaluated
    name: script
  title: tp:gremlin
features:
  ignoresSuppliedIds: true
  isPersistent: true
  isWrapper: false
  supportsBooleanProperty: true
  supportsDoubleProperty: true
  supportsDuplicateEdges: true
  supportsEdgeIndex: false
  supportsEdgeIteration: true
  supportsEdgeKeyIndex: false
  supportsEdgeProperties: true
  supportsEdgeRetrieval: true
  supportsFloatProperty: true
  supportsIndices: false
  supportsIntegerProperty: true
  supportsKeyIndices: true
  supportsLongProperty: true
  supportsMapProperty: true
  supportsMixedListProperty: true
  supportsPrimitiveArrayProperty: true
  supportsSelfLoops: true
  supportsSerializableObjectProperty: true
  supportsStringProperty: true
  supportsThreadedTransactions: true
  supportsTransactions: true
  supportsUniformListProperty: true
  supportsVertexIndex: false
  supportsVertexIteration: true
  supportsVertexKeyIndex: false
  supportsVertexProperties: true
graph: titangraph[cassandrathrift:[127.0.0.1]]
name: graph
queryTime: 0.31277
readOnly: false
type: com.thinkaurelius.titan.graphdb.database.StandardTitanGraph
upTime: 0[d]:00[h]:31[m]:27[s]
version: 2.5.0

I love Python. YAML ain’t bad, either.

Make new KVM VMs in less than 10 seconds

In the course of my day, I tend to spin up lots of VMs on my laptop. KVM is my hypervisor of choice, and since it supports libvirt, there are lots of great tools to make this easier. virt-manager is a nice GUI that’s very helpful for beginners. virt-install is my CLI tool of choice. But if you want to use dnsmasq for guest name resolution, and dhcp against libvirt networking, it can be a little tedious to type out everything over and over. So I decided to make a tool to save me some time and typing: kvminstall.

Hat tip to Rich Lucente who shared with me a bash script that inspired me to write kvminstall.

Installation

To install, use Python PIP. If you haven’t used this before, it’s easy to install with yum.

# yum install python-pip
# pip install kvminstall
# kvminstall --help
usage: kvminstall [-h] [-c CLONE] [-i IMAGE] [-v VCPUS] [-r RAM] [-d DISK]
                  [-D DOMAIN] [-N NETWORK] [--type TYPE] [--variant VARIANT]
                  [-f CONFIGFILE] [--verbose]
                  name

positional arguments:
  name                  name of the new virtual machine

optional arguments:
  -h, --help            show this help message and exit
  -c CLONE, --clone CLONE
                        name of the source logical volume to be cloned
  -i IMAGE, --image IMAGE
                        image file to duplicate
  -v VCPUS, --vcpus VCPUS
                        number of virtual CPUs
  -r RAM, --ram RAM     amount of RAM in MB
  -d DISK, --disk DISK  disk size in GB
  -D DOMAIN, --domain DOMAIN
                        domainname for dhcp / dnsmasq
  -N NETWORK, --network NETWORK
                        libvirt network
  --type TYPE           os type, i.e., linux
  --variant VARIANT     os variant, i.e., rhel7
  -f CONFIGFILE, --configfile CONFIGFILE
                        specify an alternate config file,
                        default=~/.config/kvminstall/config.yaml
  --verbose             verbose output

Configuration

In your .config directory, kvminstall sets up a yaml file with defaults. You can specify any of these interactively, or if you want to minimize typing, you can set these defaults in ~/.config/kvminstall/config:

---
vcpus: 1
ram: 1024
disk: 10
domain: example.com
network: default
mac: 5c:e0:c5:c4:26
type: linux
variant: rhel7

The MAC address can be specified as up to 5 :-delimited fields. If you want to specify fewer, kvminstall will auto-complete with random, available values.

Usage

The current version 0.1.3 supports only image-based installs — either by snapshotting an LVM volume, or by copying an image file. I intend to add kickstart and iso support, but hey, release early, release often.

Image File

Most people will probably want to copy an image file. Let’s assume that you’ve built a base image, and its root volume lives in /var/lib/libvirt/images/rhel71base.img. (Next post will be on building base images.) To create a new VM, based on that image, called ‘testvm’:

# kvminstall -c /var/lib/libvirt/images/rhel71base.img testvm

You’re mostly I/O bound here, as your copying rhel71base.img -> testvm.img. Shortly after that’s finished, you’ve got a new VM with all of your host and guest networking configured.

# virsh list
 Id    Name                           State
----------------------------------------------------
 2     testvm                         running

# grep testvm /etc/hosts
192.168.122.27	testvm.example.com testvm
# ssh testvm
Last login: Thu Aug 27 13:30:25 2015 from 192.168.122.1
[root@testvm ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 5c:e0:c5:c4:26:7a brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.27/24 brd 192.168.122.255 scope global dynamic eth0
       valid_lft 2141sec preferred_lft 2141sec
    inet6 fe80::5ee0:c5ff:fec4:267a/64 scope link 
       valid_lft forever preferred_lft forever
# nslookup testvm.example.com
Server:		192.168.122.1
Address:	192.168.122.1#53

Name:	testvm.example.com
Address: 192.168.122.27

The guest networking has been setup with virsh. An available IP and MAC address has been automatically picked based on your DHCP scope. (In the next version I’ll add support for specifying an IP address.)

# virsh net-dumpxml default
<network connections='1'>
  <name>default</name>
  <uuid>431ea266-8584-4e10-866a-fc1a3ad419b5</uuid>
  <forward mode='nat'>
    <nat>
      <port start='1024' end='65535'/>
    </nat>
  </forward>
  <bridge name='virbr0' stp='on' delay='0'/>
  <mac address='52:54:00:d0:5e:a3'/>
  <dns>
    <host ip='192.168.122.27'>
      <hostname>testvm.example.com</hostname>
    </host>
  </dns>
  <ip address='192.168.122.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.122.2' end='192.168.122.254'/>
      <host mac='5c:e0:c5:c4:26:7a' name='testvm.example.com' ip='192.168.122.27'/>
    </dhcp>
  </ip>
</network>

The dnsmasq service will be automatically restarted after /etc/hosts is updated. This way, so long as your resolv.conf is set up properly in your base image, DNS hostname resolution will work in your guest network.

LVM Volume

Now I use LVM volumes on my laptop, served up from an M2.SATA drive. This gives me better I/O since I’ve split out host and guest storage devices. It’s also much faster to snapshot a base image’s root volume. Using kvminstall with an LVM snapshot, you can get VM creation time down to seconds. My LVM volume group is called libvirt_lvm.

# lvs
  LV                 VG          Attr       LSize   Pool Origin     Data%  Meta%  Move Log Cpy%Sync Convert
  home               fedora      -wi-ao---- 500.00g                                                        
  root               fedora      -wi-ao---- 366.82g                                                        
  swap               fedora      -wi-ao----  64.00g                                                        
  rhel71base         libvirt_lvm owi-a-s---  10.00g                                                        
# time kvminstall -c /dev/libvirt_lvm/rhel71base testvm

real	0m2.217s
user	0m1.012s
sys	0m0.218s
[root@w550 ~]# ssh testvm
Warning: Permanently added the ECDSA host key for IP address '192.168.133.164' to the list of known hosts.
Last login: Sat Aug  8 21:02:29 2015 from 192.168.133.1
[root@testvm ~]# exit
# lvs
  LV                 VG          Attr       LSize   Pool Origin     Data%  Meta%  Move Log Cpy%Sync Convert
  home               fedora      -wi-ao---- 500.00g                                                        
  root               fedora      -wi-ao---- 366.82g                                                        
  swap               fedora      -wi-ao----  64.00g                                                               
  rhel71base         libvirt_lvm owi-a-s---  10.00g                                                        
  testvm             libvirt_lvm swi-aos---  10.00g      rhel71base 0.06                                   

Upcoming features

It would be nice if we could — just as quickly — remove the VMs, or even reset them back to their base images. In the next version, expect kvmuninstall and kvmreset commands.

I’d love feedback. Please feel free to comment here or open issues on the GitHub project page.

Stay tuned for my next article on building base images for easy cloning.