MorphRL is a reinforcement learning framework for generating adversarial PE (Portable Executable) files that evade malware detection. It trains a PPO agent to intelligently compose LLVM compiler optimization passes and binary encoding transformations to morph malware loaders while preserving functionality.
MorphRL operates as a sequential decision-making problem:
- A malware sample (payload) is provided as input.
- The RL agent selects a sequence of LLVM IR passes and binary encoding operations to apply to a loader stub.
- The transformed loader is compiled and the payload is attached.
- The resulting PE file is submitted to an AV scanner for evaluation.
- The agent receives a reward signal based on evasion success.
LLVM Passes:
| ID | Pass | Description |
|---|---|---|
| 0 | sroa |
Scalar Replacement of Aggregates |
| 1 | gvn |
Global Value Numbering |
| 2 | simplifycfg |
Simplify Control Flow Graph |
| 3 | loop-rotate |
Loop Rotation |
| 4 | licm |
Loop Invariant Code Motion |
| 5 | globalopt |
Global Optimization |
| 6 | delay-loop |
Custom Delay Loop Injection |
Encoding Operations:
| ID | Operation | Description |
|---|---|---|
| 0 | xor_full |
100% XOR encoding |
| 1 | xor_75 |
75% XOR encoding |
| 2 | xor_50 |
50% XOR encoding |
| 3 | xor_25 |
25% XOR encoding |
| 4 | pad_front |
Front padding |
| 5 | pad_back |
Back padding |
| 6 | shuffle |
Byte shuffle |
- Linux (tested on Debian/Ubuntu)
- Clang/LLVM 20
- MinGW-w64 (for cross-compiling PE files targeting Windows)
Using conda (recommended):
conda env create -f environment.yml
conda activate rlevadeOr manually:
pip install stable_baselines3==2.7.1 numpy requests rich tqdmThe toolchain/ directory must contain mingw32/ and mingw64/ subdirectories with MinGW-w64 cross-compilation toolchains. You can install them via:
apt install gcc-mingw-w64 g++-mingw-w64Then update INCLUDE_DIR_32/64 and LIB_DIR_32/64 paths in src/config.py accordingly.
MorphRL requires REST-based AV scanning endpoints. Configure the addresses in src/config.py:
ENGINE_ADDRESS = {
'ClamAV': 'http://<YOUR_HOST>:5000/upload_sync',
...
}
MODEL_ADDRESS = {
'malconv': 'http://<YOUR_HOST>:7004/upload_sync',
...
}Each endpoint must accept multipart/form-data file uploads and return:
{ "result": 0 } // 0 = clean, 1 = detectedMorphRL/
├── train.py # Training script
├── test.py # Evaluation script
├── environment.yml # Conda environment
├── src/
│ ├── config.py # All hyperparameters and addresses
│ ├── scanner.py # AV scanner client
│ ├── compiler.py # LLVM IR compilation pipeline
│ ├── encoding.py # Binary encoding operations
│ ├── pe_utils.py # PE file utilities
│ ├── callbacks.py # SB3 training callbacks
│ └── envs/
│ ├── base.py # Base RL environment
│ ├── train_env.py # Training environment
│ └── test_env.py # Testing environment
├── stub/
│ └── src/ # Loader stub C source code
└── llvm/
└── passes/ # Custom LLVM pass plugins (DelayLoop.so)
# Train against an ML model detector
python train.py --av-type model --av-name malconv --num-envs 16
# Train against a traditional AV engine
python train.py --av-type engine --av-name ClamAV --num-envs 8
# Custom dataset and timesteps
python train.py \
--av-type model \
--av-name malconv \
--sample-dir data/train_samples \
--num-envs 16 \
--total-timesteps 200000The trained policy is saved to policies/<av_name>_ppo.zip.
Note: n_steps per environment is automatically computed as buffer_size // num_envs (default buffer_size=2048), so training dynamics remain consistent regardless of the number of parallel environments.
# Evaluate a trained policy
python test.py --av-type model --av-name malconv --max-samples 500
# Specify a custom model path and test set
python test.py \
--av-type model \
--av-name malconv \
--model-path policies/malconv_ppo \
--sample-dir data/test_samples \
--max-samples 1000Adversarial samples are saved to outputs/test/<av_name>/adversarial/.
data/
├── train_samples/ # PE malware samples for training
└── test_samples/ # PE malware samples for evaluation
This project is released for academic research purposes only. Do not use it for malicious purposes.