6. How to symbolically execute Linux binaries?

In this tutorial, we will show how to symbolically (or concolically) execute existing Linux programs, without modifying their source code. In the Testing a Simple Program tutorial, we instrumented the source code with S2E instructions to inject symbolic values. This tutorial shows how to do this directly from the program’s command line.

To do so, we use the init_env shared library and LD_PRELOAD. This library intercepts the call to the main function and inserts user-configured symbolic arguments. This library can also restrict symbolic execution to the program itself or to all the code in the program’s address space.

6.1. 1. Obtaining and compiling init_env

The init_env library can be found in the guest folder of the S2E distribution. Copy the entire guest directory to your guest virtual machine, and run make. This will compile init_env along with some other useful tools.

6.2. 2. Configuring S2E for use with init_env

init_env communicates with several S2E plugins in order to restrict symbolic execution to the program of interest. The S2E configuration file must contain default settings for these plugins, as follows:

plugins = {
  -- Enable S2E custom opcodes
  "BaseInstructions",

  -- Track when the guest loads programs
  "RawMonitor",

  -- Detect when execution enters
  -- the program of interest
  "ModuleExecutionDetector",

  -- Restrict symbolic execution to
  -- the programs of interest
  "CodeSelector",
}

Note that it is not necessary to declare empty configuration blocks for RawMonitor, ModuleExecutionDetector, or CodeSelector.

6.3. 3. Using init_env

The init_env library needs to be pre-loaded to your binary using LD_PRELOAD. init_env intercepts the program’s entry point invocation, parses the command line arguments of the program, configures symbolic execution, and removes init_env-related parameters, before invoking the original program’s entry point.

For example, the following invokes echo from GNU CoreUtils, using up to two symbolic command line arguments, selecting only code from the echo binary, and terminating the execution path after echo returns:

$ LD_PRELOAD=/path/to/guest/init_env/init_env.so /bin/echo \
--select-process-code --sym-args 0 2 4 ; /path/to/guest/s2ecmd/s2ecmd kill 0 "echo done"

The s2ecmd utility can be found in $S2EDIR/guest/ and allows to control S2E from the guest’s command line. Here, we ask it to kill the execution path after echo returns. It is important to do so, otherwise S2E will run forever, all 100s of paths generated by echo will eventually wait indefinitely at the prompt.

The init_env library supports the following commands. Each command is added as a command-line parameter to the program being executed. It is removed before the program sees the actual command line. In the above example, echo would see zero to two command line arguments of up to four bytes, but would not see the --select-process-code or --sym-args argument.

--select-process               Enable forking in the current process only
--select-process-userspace     Enable forking in userspace-code of the
                               current process only
--select-process-code          Enable forking in the code section of the
                               current binary only
--concolic                     Augment all concrete arguments with symbolic values
--sym-arg <N>                  Replace by a symbolic argument of length N
--sym-args <MIN> <MAX> <N>     Replace by at least MIN arguments and at most
                               MAX arguments, each with maximum length N

Additionally, init_env will show a usage message if the sole argument given is --help.

6.4. 4. Analyzing large programs with concolic execution

Depending on the program under analysis, normal symbolic execution may get stuck in the constraint solver. It is better in general to use concolic execution. The following command runs echo in concolic mode, based on the concrete parameter abc (i.e., the first path will print abc, while the others will print other strings):

$ LD_PRELOAD=/path/to/guest/init_env/init_env.so /bin/echo --concolic abc ; /path/to/guest/s2ecmd/s2ecmd kill 0 "echo done"

You may also want to add > /dev/null to prevent the program from forking in the kernel when printing symbolic content.

6.5. 5. What about other symbolic input?

You can easily feed symbolic data to your program via stdin. The idea is to pipe the symbolic output of one program to the input of another. Symbolic output can be generated using the s2ecmd utility, located in the guest tools directory.

$ /path/to/guest/s2ecmd/s2ecmd symbwrite 4 | echo

The command above will pass 4 symbolic bytes to echo.

The easiest way to have your program read symbolic data from files (other than stdin) currently involves a ramdisk. You need to redirect the symbolic output of s2ecmd symbwrite to a file residing on the ramdisk, then have your program under test read that file. On many Linux distributions, the /tmp filesystem resides in RAM, so using a file in /tmp works. This can be checked using the df command: it should print something similar to tmpfs 123 456 123 1% /tmp.