Skip to content

Commit

Permalink
Beef up readme
Browse files Browse the repository at this point in the history
  • Loading branch information
CalebFenton committed Aug 6, 2016
1 parent 3e85eb3 commit 722e7c6
Show file tree
Hide file tree
Showing 2 changed files with 85 additions and 24 deletions.
107 changes: 84 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,25 @@

## Generic Android Deobfuscator

Simplify emulates an app to understand what it does. Then, it applies various optimizations to create code that behaves identically but is easier for a human to understand. It's a _generic_ deobfuscator because it doesn't matter how the obfuscation works.
Simplify virtually executes an app to understand its behavior and then tries to optimize the code so that it behaves identically but is easier for a human to understand. Each optimization type is simple and generic, so it doesn't matter what the specific type of obfuscation is used.

### Before and After

The code on the left is a decompilation of an obfuscated app, and the code on the right has been deobfuscated.
<section>
<p align="center">
<img src="https://cloud.githubusercontent.com/assets/1356658/5331911/1e790c86-7df4-11e4-91e7-aba1d2c63b98.png" alt="Lots of method calls, no clear meaning" height="380px" align="center" />
<img src="https://cloud.githubusercontent.com/assets/1356658/5331912/1ecc6d7c-7df4-11e4-9572-bc3d41303842.png" alt="Wow, such literal, much meaning" height="380px" align="center" />
</p>
</section>

There are three parts to the project:
## Overview

1. **smalivm**: Virtual machine library which can execute Android apps. It can execute a method and create a graph which contains every possible execution path and register or class value for every instruction. It'll execute even when it doesn't know the value of something such as a network response from a server. If it encounters an `if` and doesn't know the values of the conditional, it assumes either branch could happen and executes both paths.
2. **simplify**: Takes the graphs from **smalivm** and applies optimizations such as constant propagation, dead code removal, unreflection, and specific peephole optimizations. The optimizations are fairly simple, but when applied together over and over again, it can decrypt strings and greatly simplify code.
3. **demoapp**: Contains simple, heavily commented examples of how to use **smalivm**. It's a good place to start to understand how to use smalivm.
There are three parts to the project: smalivm, simplify, and the demo app.

1. **smalivm**: Virtual machine library which can execute Android apps. It executes a method and returns a graph which contains the register and class values at every instruction for every possible execution path. It works even if certain values are unknown such as a network response from a server. If it encounters an `if` and doesn't know the values of the conditional, it assumes either branch could happen and executes both paths.
2. **simplify**: Analyzes the graphs from **smalivm** and applies optimizations such as constant propagation, dead code removal, unreflection, and specific peephole optimizations. The optimizations are fairly simple, but when applied together and in succession, it can decrypt strings, peel back layers of obfuscation, and greatly simplify code.
3. **demoapp**: Contains simple, heavily commented examples of how to use **smalivm**. It's a good place to start if you want to use smalivm in your own projects.

## Usage

Expand Down Expand Up @@ -47,27 +50,25 @@ deobfuscates a dalvik executable

Because this project contains submodules for Android frameworks, either clone with `--recursive`:

```
```bash
git clone --recursive https://github.com/CalebFenton/simplify.git
```

Or update submodules at any time with:

```
```bash
git submodule update --init --recursive
```

Then, to build a single jar:

```
```bash
./gradlew fatjar
````
The Simplify jar will be in `simplify/build/libs/simplify.jar`
```

You can test it's working by simplifying the obfuscated-example app:
The Simplify jar will be in `simplify/build/libs/simplify.jar`. You can test it's working by simplifying the obfuscated-example app:

```
```bash
java -jar simplify/build/libs/simplify.jar -it 'org/cf' simplify/obfuscated-example
```

Expand All @@ -78,19 +79,79 @@ Simplify is in early stages of development. If you encounter a failure, try thes
1. Include only a few methods or classes with `-it`.
2. If failure is because of maximum visits exceeded, try using higher `--max-address-visits`, `--max-call-depth`, and `--max-method-visits`.
3. Try with `-v` or `-v 2` and report the issue with the logs.
4. Try again, but do not break eye contact. Simpify can sense fear.
4. Try again, but do not break eye contact. Simplify can sense fear.

## Reporting Issues
## Contributing

1. If possible, include a link the APK or DEX. If you can't share the sample, please include either the SHA1 or MD5 checksum.
2. Include the full command used.
3. *Optional*: Include verbose logs.
Don't be shy. I think virtual execution and deobfuscation are fascinating problems. Anyone who's interested is automatically cool and contributions are welcome, even if it's just to fix a typo. Feel free to ask questions in the issues and submit pull requests.

## Contributing
### Reporting Issues

Please include a link to the APK or DEX and the full command you're using. This makes it much easier to reproduce (and thus _fix_) your issue.

If you can't share the sample, please include the file hash (sha1, sha256, etc).

## Optimization Strategies

### Constant Propagation

If an op places a value of a type which can be turned into a constant such as a string, number, or boolean, this optimization will replace that op with the constant. For example:

```smali
const-string v0, "VGVsbCBtZSBvZiB5b3VyIGhvbWV3b3JsZCwgVXN1bC4="
invoke-static {v0}, Lmy/string/Decryptor;->decrypt(Ljava/lang/String;)Ljava/lang/String;
# Decrypts to: "Tell me of your homeworld, Usul."
move-result v0
```

In this example, an encrypted string is decrypted and placed into `v0`. Since strings are "constantizable", the `move-result v0` can be replaced with a `const-string`:

```smali
const-string v0, "VGVsbCBtZSBvZiB5b3VyIGhvbWV3b3JsZCwgVXN1bC4="
invoke-static {v0}, Lmy/string/Decryptor;->decrypt(Ljava/lang/String;)Ljava/lang/String;
const-string v0, "Tell me of your homeworld, Usul."
```

### Dead Code Removal

Code is dead if removing it cannot possibly alter the behavior of the app. The most obvious case is if the code is unreachable, e.g. `if (false) { // dead }`). If code is reachable, it may be considered dead if it doesn't affect any state outside of the method, i.e. it has no _side effect_. For example, code may not affect the return value for the method, alter any class variables, or perform any IO. This is a difficult to determine in static analysis. Luckily, smalivm doesn't have to be clever. It just stupidly executes everything it can and assumes there are side effects if it can't be sure. Consider the example from Constant Propagation:

```smali
const-string v0, "VGVsbCBtZSBvZiB5b3VyIGhvbWV3b3JsZCwgVXN1bC4="
invoke-static {v0}, Lmy/string/Decryptor;->decrypt(Ljava/lang/String;)Ljava/lang/String;
const-string v0, "Tell me of your homeworld, Usul."
```

In this code, the `invoke-static` no longer affects the return value of the method and let's assume it doesn't do anything weird like write bytes to the file system or a network socket so it has no side effects. It can simply be removed.

```smali
const-string v0, "VGVsbCBtZSBvZiB5b3VyIGhvbWV3b3JsZCwgVXN1bC4="
const-string v0, "Tell me of your homeworld, Usul."
```

Finally, the first `const-string` assigns a value to a register, but that value is never used, i.e. the assignment is dead. It can also be removed.

```smali
const-string v0, "Tell me of your homeworld, Usul."
```

Huzzah!

### Unreflection

One major challenge with static analysis of Java is reflection. It's just not possible to know the arguments are for reflection methods without doing careful data flow analysis. There are smart, clever ways of doing this, but smalivm does it by just executing the code. When it finds a reflected method invocation such as:

```smali
invoke-virtual {v0, v1, v2}, Ljava/lang/reflect/Method;->invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
```

It can know the values of `v0`, `v1`, and `v2`. If it's sure what the values are, it can replace the call to `Method.invoke()` with an actual non-reflected method invocation. The same applies for reflected field and class lookups.

### Peephole

Just submit a pull request. We can review and talk through it there. Feel free to ask any questions in issues.
For everything that doesn't fit cleanly into a particular category, there's peephole optimizations. This includes removing useless `check-cast` ops, replacing `Ljava/lang/String;-><init>` calls with `const-string`, and so on.

## Optimization Example
## Deobfuscation Example

### Before Optimization

Expand Down Expand Up @@ -148,7 +209,7 @@ The `move-result v0` is replaced with `const/4 v0, 0x1`. This is because there i

Because the code above `const/4 v0, 0x1` does not affect state outside of the method (no side-effects), it can be removed without changing behavior. If there was a method call that wrote something to the file system or network, it couldn't be removed because it affects state outside the method. Or if `test()I` took a mutable argument, such as a `LinkedList`, any instructions that accessed it couldn't be considered dead.

Some other examples of dead code:
Other examples of dead code:

* unreferenced assignments - assigning registers and not using them
* unreached / unreachable instructions - `if (false) { dead_code(); }`
Expand All @@ -163,7 +224,7 @@ This tool is available under a dual license: a commercial one suitable for close

Depending on your needs, you must choose one of them and follow its policies. A detail of the policies and agreements for each license type are available in the [LICENSE.COMMERCIAL](LICENSE.COMMERCIAL) and [LICENSE.GPL](LICENSE.GPL) files.

# Related Works
# Related / Inspirational

* [Guillot, Yoann, and Alexandre Gazet. "Automatic Binary Deobfuscation." Journal in Computer Virology 6.3 (2010): 261-76](http://metasm.cr0.org/docs/sstic09-metasm-jcv.pdf)
* [Unicorn - The ultimate CPU emulator](http://www.unicorn-engine.org/)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ public void detectsDeadTryCatchBlock() {
}

@Test
public void detectshUnusedResultOfMethodInvocationWithNoSideEffects() {
public void detectsUnusedResultOfMethodInvocationWithNoSideEffects() {
String methodName = "unusedResultNoSideEffects()I";
ExecutionGraphManipulator manipulator = OptimizerTester.getGraphManipulator(CLASS_NAME, methodName);
DeadRemovalStrategy strategy = new DeadRemovalStrategy(manipulator);
Expand Down

0 comments on commit 722e7c6

Please sign in to comment.