Improving Fuzzing Code Coverage with Neural Networks

Modern fuzzers like AFL (American Fuzzy Lop) use coverage-guided mutation to explore program paths. However, they can plateau on complex conditions or structured inputs due to blind random mutations. Neural networks offer a way to guide fuzzing beyond random chance—by learning patterns in inputs and program behavior—to improve code coverage. This document explores the best neural network approaches (surrogate models, reinforcement learning, and generative models) and how they boost code coverage in fuzzing, summarizing key research and potential workflow improvements.

1. Surrogate Neural Models for Guidance (Neural Program Smoothing)

One successful approach is to use surrogate neural networks to approximate a program’s branching behavior, thereby enabling gradient-guided input generation. NEUZZ (2019) pioneered this by training a feed-forward neural network to learn a smooth approximation of the target program’s logic. The surrogate model predicts how input bytes influence code coverage (i.e., which edges are hit) and learns which input bits are “critical” for triggering new program paths.

Once trained, the model computes gradients with respect to the input bytes. These gradients indicate which mutations are likely to increase coverage. By following these gradient hints—similar to crafting adversarial examples in machine learning—NEUZZ generates new inputs that reach previously unexplored code. This approach has dramatically improved fuzzing efficiency, with NEUZZ achieving 3× more code coverage than AFL and other state-of-the-art fuzzers in 24-hour campaigns, while also exposing deep bugs that evolutionary fuzzers often miss.

How It Works:

Data Collection: The fuzzer executes seeds and collects coverage data.
Model Training: A neural network (typically a simple multilayer perceptron) is trained on <input, coverage> pairs.
Gradient Descent: The trained model is used to perform gradient descent on the input space, guiding mutations that are likely to flip an unseen branch bit.
Validation: Candidate inputs generated through gradient guidance are tested on the real program; those that increase coverage are added to the corpus.
Iterative Learning: The surrogate model is retrained periodically with new data, keeping it relevant as new program areas are discovered.

Results & Developments:
NEUZZ’s surrogate model has been effective across various real-world programs—finding 31 new bugs (2 CVEs) and often far exceeding AFL’s coverage. Its success has spurred follow-up work:

MTFuzz (2020): Extended neural guidance using a multi-task neural network to learn behaviors across multiple targets.
PreFuzz (2022): Improved the selection of program edges for training, making the surrogate more efficient.
Neuzz++: Integrated the neural smoothing approach as a custom mutator in AFL++ with minimal changes.

2. Reinforcement Learning (RL) for Fuzzing Strategy

An alternative strategy is to use reinforcement learning (RL), where a fuzzing agent learns to take actions (e.g., selecting a seed or deciding how to mutate input bytes) that maximize a reward—typically the discovery of new coverage.

Key Aspects:

MDP Formulation:
The fuzzing loop is modeled as a Markov Decision Process (MDP):
- State: Represents the current input or program coverage metrics.
- Actions: Include mutation choices or seed selections.
- Reward: Generally the number of new edges discovered.
Neural Network Policies:
A neural network (such as a deep Q-network or policy network) is trained to output actions that maximize the reward.
- RLFuzz: Utilizes Deep Deterministic Policy Gradient (DDPG) to guide input mutations, outperforming baseline AFL-style random mutations on benchmarks like LAVA-M.
- Hierarchical Seed Scheduling: An RL-based scheduler (NDSS 2021) improved both coverage and bug discovery—finding up to 20% more bugs on the DARPA CGC benchmark and achieving higher coverage on many targets.

RL-driven fuzzing thus learns to balance exploration (searching for new paths) with exploitation (refining promising inputs). However, challenges include handling a vast state space and dealing with sparse reward signals during training.

3. Generative Models and Other AI-Driven Techniques

Generative models are also used to produce higher-quality fuzzing inputs:

Generative Adversarial Networks (GANs) and Autoencoders:
These models can be trained on valid inputs to generate new seed files. For instance, SmartSeed (2018) used a WGAN to produce seeds for multimedia fuzzing, ensuring that the generated inputs have a higher chance of triggering new code paths.
Recurrent Neural Networks (RNNs) / LSTMs:
These models can capture the sequential nature of structured inputs (e.g., PDF, XML) and generate syntactically valid test cases. Early work by Godefroid et al. demonstrated that even a character-level LSTM, trained on valid inputs, can improve fuzzing efficiency by creating inputs that adhere to the expected format.
Hybrid Approaches:
Combining AI-driven input generation with traditional fuzzing methods can further enhance code coverage. For example, large language models (LLMs) are being explored to infer input formats or suggest mutation strategies based on code context, offering high-level guidance to the fuzzing process.

4. Integrating Neural Techniques into a Fuzzing Workflow

Integrating neural networks with fuzzers like AFL involves several key considerations:

Model Training Overhead:
Neural models require training on input and coverage data. A common strategy is to alternate between fuzzing (to collect data) and model training. Running the training process in parallel or in a separate thread can help mitigate any slowdown in the fuzzing loop.
Application Frequency:
Neural guidance might not be needed at every fuzzing step. For instance, the model can be used periodically (e.g., every few minutes) to generate new candidate inputs based on gradient information, which are then added to AFL’s queue.
Model Selection:
- For numerical or bit-level challenges: A simple feed-forward network (MLP) is effective.
- For sequential inputs: Recurrent networks or transformers are better suited.
- For adaptive scheduling: Deep Q-networks or policy networks can be employed to guide seed selection and mutation.
- For complex binary formats: GANs or autoencoders may be the best choice to capture high-dimensional structure.
Feedback Loop:
Establishing a robust feedback loop is crucial. The fuzzer should supply runtime data (coverage and crashes) to the neural model, which then suggests new mutations or seed priorities. This loop ensures that the system continuously improves over time.
Robustness and Benchmarking:
Continuous benchmarking (e.g., using FuzzBench) is necessary to verify that the neural enhancements lead to better coverage and bug discovery without introducing significant overhead.

5. Potential Improvements and Future Directions

Surrogate Model Enhancements:
Future work might integrate program analysis (e.g., using graph neural networks) to better understand control-flow graphs and data dependencies, thus improving branch prediction accuracy.
Multi-Objective Learning:
Models could be trained to balance various objectives—such as maximizing edge coverage, increasing path depth, and ensuring path uniqueness—rather than focusing solely on edge coverage.
Transfer Learning in RL:
Pre-training fuzzing policies on simpler programs and then transferring these policies to more complex targets could give RL agents a head start.
Hybrid Approaches:
Combining neural guidance with symbolic or concolic execution could blend the speed of fuzzing with the precision of constraint solving, offering significant coverage improvements.
AI for Mutation Scheduling:
A neural classifier could dynamically adjust the probability distribution over various mutation operators (e.g., bit flips, arithmetic mutations, dictionary insertions) based on past success, thereby optimizing the mutation process.

Conclusion

Neural networks have demonstrated clear potential in improving fuzzing code coverage. Surrogate models like NEUZZ leverage gradient-guided mutations to explore previously unreachable code paths, while reinforcement learning agents learn to optimize mutation and seed selection strategies. Generative models contribute by producing structured inputs that overcome the limitations of random mutations. The optimal neural model depends on the target and desired trade-offs between speed and thoroughness. Integrating these techniques into AFL (or AFL++) can lead to significant improvements in both coverage and bug discovery, making the fuzzing process smarter and more efficient.

Sources

NEUZZ: Efficient Fuzzing with Neural Program Smoothing (2019):
Neural surrogate model for fuzzing that achieved significant coverage gains
MTFuzz (2020):
Extended neural guidance using a multi-task neural network to learn behaviors across multiple targets
PreFuzz (2022):
Improved surrogate model training for efficient branch targeting
RLFuzz (2021):
Reinforcement learning-based approach using DDPG for mutation guidance
Hierarchical Seed Scheduling (NDSS 2021):
RL-based scheduler improving both coverage and bug discovery (up to 20% more bugs)
Generative Approaches:
GANs and RNN-based methods for seed generation and language modeling in fuzzing
Neuzz++ (AFL++ Plugin):
Integration of neural network guidance into AFL++ for enhanced performance

R00tkitSMM/deepfuzzing.md