| 
									
										
										
										
											2020-02-19 23:11:18 -05:00
										 |  |  | = Fuzzing = | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | == Introduction == | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This document describes the virtual-device fuzzing infrastructure in QEMU and | 
					
						
							|  |  |  | how to use it to implement additional fuzzers. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | == Basics == | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Fuzzing operates by passing inputs to an entry point/target function. The | 
					
						
							|  |  |  | fuzzer tracks the code coverage triggered by the input. Based on these | 
					
						
							|  |  |  | findings, the fuzzer mutates the input and repeats the fuzzing. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | To fuzz QEMU, we rely on libfuzzer. Unlike other fuzzers such as AFL, libfuzzer | 
					
						
							|  |  |  | is an _in-process_ fuzzer. For the developer, this means that it is their | 
					
						
							|  |  |  | responsibility to ensure that state is reset between fuzzing-runs. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | == Building the fuzzers == | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | NOTE: If possible, build a 32-bit binary. When forking, the 32-bit fuzzer is | 
					
						
							|  |  |  | much faster, since the page-map has a smaller size. This is due to the fact that | 
					
						
							|  |  |  | AddressSanitizer mmaps ~20TB of memory, as part of its detection. This results | 
					
						
							|  |  |  | in a large page-map, and a much slower fork(). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | To build the fuzzers, install a recent version of clang: | 
					
						
							| 
									
										
										
										
											2020-07-06 15:55:32 -04:00
										 |  |  | Configure with (substitute the clang binaries with the version you installed). | 
					
						
							|  |  |  | Here, enable-sanitizers, is optional but it allows us to reliably detect bugs | 
					
						
							|  |  |  | such as out-of-bounds accesses, use-after-frees, double-frees etc. | 
					
						
							| 
									
										
										
										
											2020-02-19 23:11:18 -05:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-07-06 15:55:32 -04:00
										 |  |  |     CC=clang-8 CXX=clang++-8 /path/to/configure --enable-fuzzing \ | 
					
						
							|  |  |  |                                                 --enable-sanitizers | 
					
						
							| 
									
										
										
										
											2020-02-19 23:11:18 -05:00
										 |  |  | 
 | 
					
						
							|  |  |  | Fuzz targets are built similarly to system/softmmu: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     make i386-softmmu/fuzz | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This builds ./i386-softmmu/qemu-fuzz-i386 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-07-09 10:40:59 +02:00
										 |  |  | The first option to this command is: --fuzz-target=FUZZ_NAME | 
					
						
							| 
									
										
										
										
											2020-02-19 23:11:18 -05:00
										 |  |  | To list all of the available fuzzers run qemu-fuzz-i386 with no arguments. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-07-09 10:40:59 +02:00
										 |  |  | For example: | 
					
						
							|  |  |  |     ./i386-softmmu/qemu-fuzz-i386 --fuzz-target=virtio-scsi-fuzz | 
					
						
							| 
									
										
										
										
											2020-02-19 23:11:18 -05:00
										 |  |  | 
 | 
					
						
							|  |  |  | Internally, libfuzzer parses all arguments that do not begin with "--". | 
					
						
							|  |  |  | Information about these is available by passing -help=1 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Now the only thing left to do is wait for the fuzzer to trigger potential | 
					
						
							|  |  |  | crashes. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-07-06 15:55:33 -04:00
										 |  |  | == Useful libFuzzer flags == | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | As mentioned above, libFuzzer accepts some arguments. Passing -help=1 will list | 
					
						
							|  |  |  | the available arguments. In particular, these arguments might be helpful: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | $CORPUS_DIR/ : Specify a directory as the last argument to libFuzzer. libFuzzer | 
					
						
							|  |  |  | stores each "interesting" input in this corpus directory. The next time you run | 
					
						
							|  |  |  | libFuzzer, it will read all of the inputs from the corpus, and continue fuzzing | 
					
						
							|  |  |  | from there. You can also specify multiple directories. libFuzzer loads existing | 
					
						
							|  |  |  | inputs from all specified directories, but will only write new ones to the | 
					
						
							|  |  |  | first one specified. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | -max_len=4096 : specify the maximum byte-length of the inputs libFuzzer will | 
					
						
							|  |  |  | generate. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | -close_fd_mask={1,2,3} : close, stderr, or both. Useful for targets that | 
					
						
							|  |  |  | trigger many debug/error messages, or create output on the serial console. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | -jobs=4 -workers=4 : These arguments configure libFuzzer to run 4 fuzzers in | 
					
						
							|  |  |  | parallel (4 fuzzing jobs in 4 worker processes). Alternatively, with only | 
					
						
							|  |  |  | -jobs=N, libFuzzer automatically spawns a number of workers less than or equal | 
					
						
							|  |  |  | to half the available CPU cores. Replace 4 with a number appropriate for your | 
					
						
							|  |  |  | machine. Make sure to specify a $CORPUS_DIR, which will allow the parallel | 
					
						
							|  |  |  | fuzzers to share information about the interesting inputs they find. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | -use_value_profile=1 : For each comparison operation, libFuzzer computes  | 
					
						
							|  |  |  | (caller_pc&4095) | (popcnt(Arg1 ^ Arg2) << 12) and places this in the coverage | 
					
						
							|  |  |  | table. Useful for targets with "magic" constants. If Arg1 came from the fuzzer's | 
					
						
							|  |  |  | input and Arg2 is a magic constant, then each time the Hamming distance | 
					
						
							|  |  |  | between Arg1 and Arg2 decreases, libFuzzer adds the input to the corpus. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | -shrink=1 : Tries to make elements of the corpus "smaller". Might lead to | 
					
						
							|  |  |  | better coverage performance, depending on the target. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Note that libFuzzer's exact behavior will depend on the version of | 
					
						
							|  |  |  | clang and libFuzzer used to build the device fuzzers. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-07-06 15:55:34 -04:00
										 |  |  | == Generating Coverage Reports == | 
					
						
							|  |  |  | Code coverage is a crucial metric for evaluating a fuzzer's performance. | 
					
						
							|  |  |  | libFuzzer's output provides a "cov: " column that provides a total number of | 
					
						
							|  |  |  | unique blocks/edges covered. To examine coverage on a line-by-line basis we | 
					
						
							|  |  |  | can use Clang coverage: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |  1. Configure libFuzzer to store a corpus of all interesting inputs (see | 
					
						
							|  |  |  |     CORPUS_DIR above) | 
					
						
							|  |  |  |  2. ./configure the QEMU build with: | 
					
						
							|  |  |  |     --enable-fuzzing \ | 
					
						
							|  |  |  |     --extra-cflags="-fprofile-instr-generate -fcoverage-mapping" | 
					
						
							|  |  |  |  3. Re-run the fuzzer. Specify $CORPUS_DIR/* as an argument, telling libfuzzer | 
					
						
							|  |  |  |     to execute all of the inputs in $CORPUS_DIR and exit. Once the process | 
					
						
							|  |  |  |     exits, you should find a file, "default.profraw" in the working directory. | 
					
						
							|  |  |  |  4. Execute these commands to generate a detailed HTML coverage-report: | 
					
						
							|  |  |  |  llvm-profdata merge -output=default.profdata default.profraw | 
					
						
							|  |  |  |  llvm-cov show ./path/to/qemu-fuzz-i386 -instr-profile=default.profdata \ | 
					
						
							|  |  |  |  --format html -output-dir=/path/to/output/report | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-02-19 23:11:18 -05:00
										 |  |  | == Adding a new fuzzer == | 
					
						
							|  |  |  | Coverage over virtual devices can be improved by adding additional fuzzers. | 
					
						
							|  |  |  | Fuzzers are kept in tests/qtest/fuzz/ and should be added to | 
					
						
							|  |  |  | tests/qtest/fuzz/Makefile.include | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Fuzzers can rely on both qtest and libqos to communicate with virtual devices. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 1. Create a new source file. For example ``tests/qtest/fuzz/foo-device-fuzz.c``. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 2. Write the fuzzing code using the libqtest/libqos API. See existing fuzzers | 
					
						
							|  |  |  | for reference. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 3. Register the fuzzer in ``tests/fuzz/Makefile.include`` by appending the | 
					
						
							|  |  |  | corresponding object to fuzz-obj-y | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Fuzzers can be more-or-less thought of as special qtest programs which can | 
					
						
							|  |  |  | modify the qtest commands and/or qtest command arguments based on inputs | 
					
						
							|  |  |  | provided by libfuzzer. Libfuzzer passes a byte array and length. Commonly the | 
					
						
							|  |  |  | fuzzer loops over the byte-array interpreting it as a list of qtest commands, | 
					
						
							|  |  |  | addresses, or values. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-10-23 11:07:41 -04:00
										 |  |  | == The Generic Fuzzer == | 
					
						
							|  |  |  | Writing a fuzz target can be a lot of effort (especially if a device driver has | 
					
						
							|  |  |  | not be built-out within libqos). Many devices can be fuzzed to some degree, | 
					
						
							|  |  |  | without any device-specific code, using the generic-fuzz target. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The generic-fuzz target is capable of fuzzing devices over their PIO, MMIO, | 
					
						
							|  |  |  | and DMA input-spaces. To apply the generic-fuzz to a device, we need to define | 
					
						
							|  |  |  | two env-variables, at minimum: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | QEMU_FUZZ_ARGS= is the set of QEMU arguments used to configure a machine, with | 
					
						
							|  |  |  | the device attached. For example, if we want to fuzz the virtio-net device | 
					
						
							|  |  |  | attached to a pc-i440fx machine, we can specify: | 
					
						
							|  |  |  | QEMU_FUZZ_ARGS="-M pc -nodefaults -netdev user,id=user0 \ | 
					
						
							|  |  |  |                 -device virtio-net,netdev=user0" | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | QEMU_FUZZ_OBJECTS= is a set of space-delimited strings used to identify the | 
					
						
							|  |  |  | MemoryRegions that will be fuzzed. These strings are compared against | 
					
						
							|  |  |  | MemoryRegion names and MemoryRegion owner names, to decide whether each | 
					
						
							|  |  |  | MemoryRegion should be fuzzed. These strings support globbing. For the | 
					
						
							|  |  |  | virtio-net example, we could use QEMU_FUZZ_OBJECTS= | 
					
						
							|  |  |  |  * 'virtio-net' | 
					
						
							|  |  |  |  * 'virtio*' | 
					
						
							|  |  |  |  * 'virtio* pcspk' (Fuzz the virtio devices and the PC speaker...) | 
					
						
							|  |  |  |  * '*' (Fuzz the whole machine) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The "info mtree" and "info qom-tree" monitor commands can be especially useful | 
					
						
							|  |  |  | for identifying the MemoryRegion and Object names used for matching. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | As a generic rule-of-thumb, the more MemoryRegions/Devices we match, the greater | 
					
						
							|  |  |  | the input-space, and the smaller the probability of finding crashing inputs for | 
					
						
							|  |  |  | individual devices. As such, it is usually a good idea to limit the fuzzer to | 
					
						
							|  |  |  | only a few MemoryRegions. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | To ensure that these env variables have been configured correctly, we can use: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ./qemu-fuzz-i386 --fuzz-target=generic-fuzz -runs=0 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The output should contain a complete list of matched MemoryRegions. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-02-19 23:11:18 -05:00
										 |  |  | = Implementation Details = | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | == The Fuzzer's Lifecycle == | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The fuzzer has two entrypoints that libfuzzer calls. libfuzzer provides it's | 
					
						
							|  |  |  | own main(), which performs some setup, and calls the entrypoints: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | LLVMFuzzerInitialize: called prior to fuzzing. Used to initialize all of the | 
					
						
							|  |  |  | necessary state | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | LLVMFuzzerTestOneInput: called for each fuzzing run. Processes the input and | 
					
						
							|  |  |  | resets the state at the end of each run. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In more detail: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | LLVMFuzzerInitialize parses the arguments to the fuzzer (must start with two | 
					
						
							|  |  |  | dashes, so they are ignored by libfuzzer main()). Currently, the arguments | 
					
						
							|  |  |  | select the fuzz target. Then, the qtest client is initialized. If the target | 
					
						
							|  |  |  | requires qos, qgraph is set up and the QOM/LIBQOS modules are initialized. | 
					
						
							|  |  |  | Then the QGraph is walked and the QEMU cmd_line is determined and saved. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | After this, the vl.c:qemu__main is called to set up the guest. There are | 
					
						
							|  |  |  | target-specific hooks that can be called before and after qemu_main, for | 
					
						
							|  |  |  | additional setup(e.g. PCI setup, or VM snapshotting). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | LLVMFuzzerTestOneInput: Uses qtest/qos functions to act based on the fuzz | 
					
						
							|  |  |  | input. It is also responsible for manually calling the main loop/main_loop_wait | 
					
						
							|  |  |  | to ensure that bottom halves are executed and any cleanup required before the | 
					
						
							|  |  |  | next input. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Since the same process is reused for many fuzzing runs, QEMU state needs to | 
					
						
							|  |  |  | be reset at the end of each run. There are currently two implemented | 
					
						
							|  |  |  | options for resetting state: | 
					
						
							|  |  |  | 1. Reboot the guest between runs. | 
					
						
							|  |  |  |    Pros: Straightforward and fast for simple fuzz targets. | 
					
						
							|  |  |  |    Cons: Depending on the device, does not reset all device state. If the | 
					
						
							|  |  |  |    device requires some initialization prior to being ready for fuzzing | 
					
						
							|  |  |  |    (common for QOS-based targets), this initialization needs to be done after | 
					
						
							|  |  |  |    each reboot. | 
					
						
							|  |  |  |    Example target: i440fx-qtest-reboot-fuzz | 
					
						
							|  |  |  | 2. Run each test case in a separate forked process and copy the coverage | 
					
						
							|  |  |  |    information back to the parent. This is fairly similar to AFL's "deferred" | 
					
						
							|  |  |  |    fork-server mode [3] | 
					
						
							|  |  |  |    Pros: Relatively fast. Devices only need to be initialized once. No need | 
					
						
							|  |  |  |    to do slow reboots or vmloads. | 
					
						
							|  |  |  |    Cons: Not officially supported by libfuzzer. Does not work well for devices | 
					
						
							|  |  |  |    that rely on dedicated threads. | 
					
						
							|  |  |  |    Example target: virtio-net-fork-fuzz |