Cyclomatic complexity is a metric that’s used to measure the complexity of a program. It’s one of many binary metrics tracked by the Cyber-ITL, and calculating it is a good first step in repeating their results.
Radare2 makes this super easy, and with r2pipe we can start building out our Python SDK for the Fedora Cyber Test Lab’s open source implementation of the Cyber-ITL’s approach.
macOS Environment Setup
It probably makes more sense to do this work on my Fedora box, but I happen to be on my Macbook. So we’ll use macOS as our environment for this post, which is, honestly, way more difficult.
First, let’s set up Homebrew. Note that in order to do so cleanly, we want to first give my user full control over /usr/local. I prefer to use facls to do this than changing ownership, which seems clumsy. Be sure to replace “jason” with your username.
sudo chmod -R +a "jason allow list,add_file,search,delete,add_subdirectory,delete_child,readattr,writeattr,readextattr,writeextattr,readsecurity,writesecurity,chown,file_inherit,directory_inherit" /usr/local /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
I also like to use the Homebrew version of Python, so we’ll install that as well.
brew install python virtualenv radare2 virtualenv -p /usr/local/bin/python ENV source ENV/bin/activate pip install r2pipe
We’re setting up a virtual environment here so we can isolate our installed libraries, but also to ensure we don’t run into PYTHONPATH issues, which is easy to do on macOS with multiple Python interpreters installed.
Simple analysis with r2
Radare2 (pronounced like “radar”) is a fantastic tool and, to be honest, I’ve only scratched its surface. Watching an experienced reverse engineer use r2 is like watching Michelangelo paint. Ok, I’ve never seen that, but I assume it was hella cool.
So let’s write a simple program and analyze it with r2. This assumes you have Xcode Command Line Tools installed, which is from where we’re getting gcc.
#include <stdio.h> int main() { printf("hello world\n"); return 0; }
That should look familiar to everybody. Now let’s compile it, analyze it, and ask r2 to calculate its cyclomatic complexity.
(ENV) mb:~ jason$ gcc hello_world.c (ENV) mb:~ jason$ r2 a.out syntax error: error in error handling syntax error: error in error handling syntax error: error in error handling -- For a full list of commands see `strings /dev/urandom` [0x100000f60]> aaa [x] Analyze all flags starting with sym. and entry0 (aa) [x] Analyze len bytes of instructions for references (aar) [x] Analyze function calls (aac) [ ] [*] Use -AA or aaaa to perform additional experimental analysis. [x] Constructing a function name for fcn.* and sym.func.* functions (aan)) [0x100000f60]> afcc 1 [0x100000f60]> q
Not super-interesting, there’s just a single path the code can take, so when we use the afcc command, or “analyze functions cyclomatic complexity,” we get back 1.
Using the formula the Wikipedia article, we can sanity check that result.
M = E − N + 2P,
where
E = the number of edges of the graph.
N = the number of nodes of the graph.
P = the number of connected components.
We get
M = 0 – 1 + 2 * 1
M = 1
For our purposes, P will almost always be 2, so we can treat that like a constant.
Ok, let’s add some more complexity.
#include <stdio.h> int main() { int a = 1; if( a == 1 ){ printf("hello world\n"); } return 0; }
Compile it and analyze it.
(ENV) mb:~ jason$ gcc hello_world2.c (ENV) mb:~ jason$ r2 a.out syntax error: error in error handling syntax error: error in error handling syntax error: error in error handling -- Rename a function using the 'afr <newname> @ <offset>' command. [0x100000f50]> aaa [x] Analyze all flags starting with sym. and entry0 (aa) [x] Analyze len bytes of instructions for references (aar) [x] Analyze function calls (aac) [ ] [*] Use -AA or aaaa to perform additional experimental analysis. [x] Constructing a function name for fcn.* and sym.func.* functions (aan)) [0x100000f50]> afcc 2 [0x100000f50]> agv [0x100000f50]> q
Ok, with a conditional in our code we got the complexity to go up. But let’s use the command “agv” or “analyze graph web/png” to get a nice graphic representation of the function graph.
Now let’s use our formula, M = E – N + 2. Three edges, three nodes.
M = 3 – 3 + 2
M = 2
So that tracks. Now once more with an extra conditional statement.
#include <stdio.h> int main() { int a = 1; if( a == 1 ){ printf("hello world\n"); } if(a == 0){ printf("goodbye world\n"); } return 0; }
Lather, rinse, repeat.
(ENV) mb:~ jason$ gcc hello_world3.c (ENV) mb:~ jason$ r2 a.out syntax error: error in error handling syntax error: error in error handling syntax error: error in error handling -- Use '-e bin.strings=false' to disable automatic string search when loading the binary. [0x100000f20]> aaa [x] Analyze all flags starting with sym. and entry0 (aa) [x] Analyze len bytes of instructions for references (aar) [x] Analyze function calls (aac) [ ] [*] Use -AA or aaaa to perform additional experimental analysis. [x] Constructing a function name for fcn.* and sym.func.* functions (aan)) [0x100000f20]> afcc 3 [0x100000f20]> agv [0x100000f20]> q
Verifying once again, six edges, five nodes:
M = 6 – 5 + 2
M = 3
You get the idea. These are really trivial examples, and when you ask r2 to analyze complex binaries, it can take a really long time. For fun, try the docker runtime, and see what a pain it is to deal with statically linked Go binaries.
Now let’s use r2pipe to do this from Python.
#!/usr/bin/env python import os import r2pipe import sys r2 = r2pipe.open(sys.argv[1]) r2.cmd("aaa") cc = r2.cmdj("afcc") print cc
And now we can do it from Python!
(ENV) mb:~ jason$ python r2cc.py a.out syntax error: error in error handling syntax error: error in error handling syntax error: error in error handling 3
BTW, those errors are a known issue and you can ignore them on macOS for now.
Just for fun, let’s look at some macOS binaries.
(ENV) mb:~ jason$ python r2cc.py /bin/ls syntax error: error in error handling syntax error: error in error handling syntax error: error in error handling 36 (ENV) mb:~ jason$ python r2cc.py /usr/sbin/chown syntax error: error in error handling syntax error: error in error handling syntax error: error in error handling 31 (ENV) mb:~ jason$ python r2cc.py /usr/sbin/postfix syntax error: error in error handling syntax error: error in error handling syntax error: error in error handling 24 (ENV) mb:~ jason$ python r2cc.py /bin/bash syntax error: error in error handling syntax error: error in error handling syntax error: error in error handling 177
Crazy that ls and chown are more complex that Postfix! And BASH is far more complex than either. Think of this value as a representation of the number of moving parts in a machine. The more complex it is, the more likely it’ll break.
But what about the quality of those parts in the machine? A complex but well-engineered machine will work better than a simple piece of junk. And so far, we’re only examining the machine while it’s turned off. That’s static analysis. We also need to watch the machine work, which is dynamic analysis. The Cyber-ITL has been doing both.
Stay tuned as we follow them down the quantitative analysis rabbit hole.