Measuring cyclomatic complexity with Radare2

Cyclomatic complexity is a metric that’s used to measure the complexity of a program. It’s one of many binary metrics tracked by the Cyber-ITL, and calculating it is a good first step in repeating their results.

Radare2 makes this super easy, and with r2pipe we can start building out our Python SDK for the Fedora Cyber Test Lab’s open source implementation of the Cyber-ITL’s approach.

macOS Environment Setup

It probably makes more sense to do this work on my Fedora box, but I happen to be on my Macbook. So we’ll use macOS as our environment for this post, which is, honestly, way more difficult.

First, let’s set up Homebrew. Note that in order to do so cleanly, we want to first give my user full control over /usr/local. I prefer to use facls to do this than changing ownership, which seems clumsy. Be sure to replace “jason” with your username.

sudo chmod -R +a "jason allow list,add_file,search,delete,add_subdirectory,delete_child,readattr,writeattr,readextattr,writeextattr,readsecurity,writesecurity,chown,file_inherit,directory_inherit" /usr/local

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

I also like to use the Homebrew version of Python, so we’ll install that as well.

brew install python virtualenv radare2
virtualenv -p /usr/local/bin/python ENV
source ENV/bin/activate
pip install r2pipe

We’re setting up a virtual environment here so we can isolate our installed libraries, but also to ensure we don’t run into PYTHONPATH issues, which is easy to do on macOS with multiple Python interpreters installed.

Simple analysis with r2

Radare2 (pronounced like “radar”) is a fantastic tool and, to be honest, I’ve only scratched its surface. Watching an experienced reverse engineer use r2 is like watching Michelangelo paint. Ok, I’ve never seen that, but I assume it was hella cool.

So let’s write a simple program and analyze it with r2. This assumes you have Xcode Command Line Tools installed, which is from where we’re getting gcc.

#include <stdio.h>

int main() {
  printf("hello world\n");
  return 0;
}

That should look familiar to everybody. Now let’s compile it, analyze it, and ask r2 to calculate its cyclomatic complexity.

(ENV) mb:~ jason$ gcc hello_world.c
(ENV) mb:~ jason$ r2 a.out
syntax error: error in error handling
syntax error: error in error handling
syntax error: error in error handling
 -- For a full list of commands see `strings /dev/urandom`
[0x100000f60]> aaa
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze len bytes of instructions for references (aar)
[x] Analyze function calls (aac)
[ ] [*] Use -AA or aaaa to perform additional experimental analysis.
[x] Constructing a function name for fcn.* and sym.func.* functions (aan))
[0x100000f60]> afcc
1
[0x100000f60]> q

Not super-interesting, there’s just a single path the code can take, so when we use the afcc command, or “analyze functions cyclomatic complexity,” we get back 1.

Using the formula the Wikipedia article, we can sanity check that result.

M = E − N + 2P,
where
E = the number of edges of the graph.
N = the number of nodes of the graph.
P = the number of connected components.

We get

M = 0 – 1 + 2 * 1
M = 1

For our purposes, P will almost always be 2, so we can treat that like a constant.

Ok, let’s add some more complexity.

#include <stdio.h>

int main() {
	int a = 1;
	if( a == 1 ){
		printf("hello world\n");
	}
	return 0;
}

Compile it and analyze it.

(ENV) mb:~ jason$ gcc hello_world2.c
(ENV) mb:~ jason$ r2 a.out
syntax error: error in error handling
syntax error: error in error handling
syntax error: error in error handling
 -- Rename a function using the 'afr <newname> @ <offset>' command.
[0x100000f50]> aaa
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze len bytes of instructions for references (aar)
[x] Analyze function calls (aac)
[ ] [*] Use -AA or aaaa to perform additional experimental analysis.
[x] Constructing a function name for fcn.* and sym.func.* functions (aan))
[0x100000f50]> afcc
2
[0x100000f50]> agv
[0x100000f50]> q

Ok, with a conditional in our code we got the complexity to go up. But let’s use the command “agv” or “analyze graph web/png” to get a nice graphic representation of the function graph.

Now let’s use our formula, M = E – N + 2. Three edges, three nodes.

M = 3 – 3 + 2
M = 2

So that tracks. Now once more with an extra conditional statement.

#include <stdio.h>

int main() {
	int a = 1;
	if( a == 1 ){
		printf("hello world\n");
	}
	if(a == 0){
		printf("goodbye world\n");
	}
	return 0; 
}

Lather, rinse, repeat.

(ENV) mb:~ jason$ gcc hello_world3.c
(ENV) mb:~ jason$ r2 a.out
syntax error: error in error handling
syntax error: error in error handling
syntax error: error in error handling
 -- Use '-e bin.strings=false' to disable automatic string search when loading the binary.
[0x100000f20]> aaa
[x] Analyze all flags starting with sym. and entry0 (aa)
[x] Analyze len bytes of instructions for references (aar)
[x] Analyze function calls (aac)
[ ] [*] Use -AA or aaaa to perform additional experimental analysis.
[x] Constructing a function name for fcn.* and sym.func.* functions (aan))
[0x100000f20]> afcc
3
[0x100000f20]> agv
[0x100000f20]> q

Verifying once again, six edges, five nodes:

M = 6 – 5 + 2
M = 3

You get the idea. These are really trivial examples, and when you ask r2 to analyze complex binaries, it can take a really long time. For fun, try the docker runtime, and see what a pain it is to deal with statically linked Go binaries.

Now let’s use r2pipe to do this from Python.

#!/usr/bin/env python

import os
import r2pipe
import sys

r2 = r2pipe.open(sys.argv[1])
r2.cmd("aaa")
cc = r2.cmdj("afcc")
print cc

And now we can do it from Python!

(ENV) mb:~ jason$ python r2cc.py a.out
syntax error: error in error handling
syntax error: error in error handling
syntax error: error in error handling
3

BTW, those errors are a known issue and you can ignore them on macOS for now.

Just for fun, let’s look at some macOS binaries.

(ENV) mb:~ jason$ python r2cc.py /bin/ls
syntax error: error in error handling
syntax error: error in error handling
syntax error: error in error handling
36
(ENV) mb:~ jason$ python r2cc.py /usr/sbin/chown
syntax error: error in error handling
syntax error: error in error handling
syntax error: error in error handling
31
(ENV) mb:~ jason$ python r2cc.py /usr/sbin/postfix
syntax error: error in error handling
syntax error: error in error handling
syntax error: error in error handling
24
(ENV) mb:~ jason$ python r2cc.py /bin/bash
syntax error: error in error handling
syntax error: error in error handling
syntax error: error in error handling
177

Crazy that ls and chown are more complex that Postfix! And BASH is far more complex than either. Think of this value as a representation of the number of moving parts in a machine. The more complex it is, the more likely it’ll break.

But what about the quality of those parts in the machine? A complex but well-engineered machine will work better than a simple piece of junk. And so far, we’re only examining the machine while it’s turned off. That’s static analysis. We also need to watch the machine work, which is dynamic analysis. The Cyber-ITL has been doing both.

Stay tuned as we follow them down the quantitative analysis rabbit hole.

Defense in Depth is cool, and you should attend on October 3rd

For five years now, Red Hat and Intel have hosted a small, but very cool security conference called “Defense in Depth (DiD)” in Tyson’s Corner, VA. Its popularity has been increasing, and this year is a sort of watershed event in the show’s history. DiD has, in the past, been very focused on the security of Red Hat’s products; this year we’re casting a wider net around the security of many open source communities.

We even have an infosec A-lister keynoting — none other than David Kennedy of DerbyCon. In case you haven’t been watching CNN, Fox News, or other high-profile media outlets’ reporting on infosec, Dave’s kind of a big deal. His security roots run deep, and I’m super pumped to hear his keynote, “The Changing Tactics of Hackers,” which will talk about how only the first T in TTP tends to change, which can be handy for developing a counter-strategy.

The whole agenda looks cool, but here are some of the talks for which I am particularly stoked:

There are still conference passes available, so if you get geeked on open source security, register here.