unicorn/docs/FAQ.md

205 lines
12 KiB
Markdown
Raw Permalink Normal View History

2022-02-13 11:54:38 +03:00
## Why is my execution so slow?
Typically, its due to
- Instrumenting every instruction executed.
- Instrumenting every memory access.
2022-05-18 16:01:20 +03:00
Optimize your program with less instrumentation, e.g. by using `UC_HOOK_BLOCK` instead of `UC_HOOK_CODE`
2022-02-13 11:54:38 +03:00
## Why do I get a wrong PC after emulation stops?
2022-11-01 12:04:59 +03:00
Updating PC is a very large overhead (10x slower in the worst case, see FAQ above) for emulation so the PC sync guarantee is explained below in several cases:
2022-10-20 22:47:09 +03:00
2022-11-01 12:04:59 +03:00
- A `UC_HOOK_CODE` hook is installed. In this case, the PC is sync-ed _everywhere_ within the effective range of the hook. However, on some architectures, the PC might by sync-ed all the time if the hook is installed in any range. Note using `count` in `uc_emu_start` implies installing a `UC_HOOK_CODE` hook.
- A `UC_HOOK_MEM_READ` or `UC_HOOK_MEM_WRITE` hook is installed. In this case, the PC is sync-ed exactly before any read/write events within the effective range of the hook.
2022-10-20 22:47:09 +03:00
- Emulation (`uc_emu_start`) terminates without any exception. In this case, the PC will point to the next instruction.
- No hook mentioned above is installed and emulation terminates with exceptions. In this case, the PC is sync-ed at the basic block boundary, in other words, the first instruction of the basic block where the exception happens.
Below is an example:
```
mov x0, #1 <--- the PC will be here
mov x1, #2
ldr x0, [x1] <--- exception here
```
If `ldr x0, [x1]` fails with memory exceptions, the PC will be left at the beginning of the basic block, in this case `mov x0, #1`.
However, if a `UC_HOOK_MEM_READ` hook is installed, the PC will be sync-ed:
```
mov x0, #1
mov x1, #2
ldr x0, [x1] <--- exception here and PC sync-ed here
```
2022-02-13 11:54:38 +03:00
## I get an “Unhandled CPU Exception”, why?
Unicorn is a pure CPU emulator and usually its due to no handler registered for instructions like `syscall` and `SVC`. If you expect system emulation, you probably would like [qiling framework](https://github.com/qilingframework/qiling).
## I would like to instrument a specific instruction but get a `UC_ERR_HOOK`, why?
Currently, only a small subset of the instructions can be instrumented.
On x86, all available instructions are: `in` `out` `syscall` `sysenter` `cpuid`.
2022-11-01 12:04:59 +03:00
## Emulating some instructions gives an error like "Invalid Instruction", what should I do?
2022-02-13 11:54:38 +03:00
1. Some instructions are not enabled by default on some architectures. For example, you have to setup CSR on RISC-V or VFP on ARM before emulating floating-point instructions. Refer to the corresponding manual to check if you leave out possible switches in special registers.
2022-11-01 12:04:59 +03:00
2. Different CPU models support different sets of instructions. This is especially observed on ARM CPUs. For example, for `THUMB2` big-endian instructions, consider setting CPU model to `cortex-r5` or `arm_max`. See [#1725](https://github.com/unicorn-engine/unicorn/issues/1725) and [#1724](https://github.com/unicorn-engine/unicorn/issues/1724).
3. If you are on ARM, please check whether you are emulating a THUMB instruction. If so, please use `UC_MODE_THUMB` and make sure the starting address is odd.
4. If it's not the cases above, it might be some newer instruction sets that qemu5 doesnt support.
5. Note some instruction sets are not implemented by the latest QEMU.
2022-02-13 11:54:38 +03:00
If you are still using Unicorn1, please upgrade to Unicorn2 for better support.
2022-09-28 00:13:06 +03:00
## Memory hooks get called multiple times for a single instruction
There are several possibilities, e.g.:
- The instruction might access memory multiple times like `rep stos` in x86.
- The address to access is bad-aligned and thus the MMU emulation will split the access into several aligned memory access. In worst cases on some arch, it leads to byte by byte access.
2022-02-13 11:54:38 +03:00
## I can't recover from unmapped read/write even I return `true` in the hook, why?
This is a minor change in memory hooks behavior between Unicorn1 and Unicorn2. To gracefully recover from memory read/write error, you have to map the invalid memory before you return true.
It is due to the fact that, if users return `true` without memory mapping set up correctly, we don't know what to do next. In Unicorn1, the behavior is __undefined__ in this case but in Unicorn2 we would like to force users to set up memory mapping in the hook to continue execution.
See the [sample](https://github.com/unicorn-engine/unicorn/blob/c05fbb7e63aed0b60fc2888e08beceb17bce8ac4/samples/sample_x86.c#L1379-L1393) for details.
2022-09-28 00:13:06 +03:00
## My emulation gets weird read/write error and CPU exceptions.
For MIPS, you might have an address that falls in MIPS `kseg` segments. In that case, MMU is bypassed and you have to make sure the corresponding physical memory is mapped. See [#217](https://github.com/unicorn-engine/unicorn/issues/217), [#1371](https://github.com/unicorn-engine/unicorn/issues/1371), [#1550](https://github.com/unicorn-engine/unicorn/issues/1371).
2022-05-18 16:01:20 +03:00
2022-09-28 00:13:06 +03:00
For ARM, you might have an address that falls in some non-executable segments. For example, for m-class ARM cpu, some memory area is not executable according to [the ARM document](https://developer.arm.com/documentation/ddi0403/d/System-Level-Architecture/System-Address-Map/The-system-address-map?lang=en).
2022-05-18 16:01:20 +03:00
## KeyboardInterrupt is not raised during `uc.emu_start`
This is intended as python [signal module](https://docs.python.org/3.10/library/signal.html) states:
> A long-running calculation implemented purely in C (such as regular expression matching on a large body of text) may run uninterrupted for an arbitrary amount of time, regardless of any signals received. The Python signal handlers will be called when the calculation finishes.
A workaround is to start emulation in another thread.
## Editing an instruction doesn't take effect/Hooks added during emulation are not called.
Unicorn is a fork of QEMU and inherits most QEMU internal mechanisms, one of which is called TB chaining. In short, every block (in most cases, a `basic block`) is translated, executed and __cached__. Therefore, any operation on cached addresses won't immediately take effect without a call to `uc_ctl_remove_cache`. Check a more detailed discussion here: [#1561](https://github.com/unicorn-engine/unicorn/issues/1561)
Note, this doesn't mean you have to care about Self Modifying Code because the read/write happens within emulation (TB execution) and QEMU would handle such special cases. For technical details, refer to the [QEMU paper](https://www.usenix.org/legacy/event/usenix05/tech/freenix/full_papers/bellard/bellard.pdf).
TLDR: To ensure any modification to an address will take effect:
1. Call `uc_ctl_remove_cache` on the target address.
2. Call `uc_reg_write` to write current PC to the PC register, if the modification happens during emulation. It restarts emulation (but doesn't quit `uc_emu_start`) on current address to re-translate the block.
2022-02-13 11:54:38 +03:00
## How to emulate interrupts (or ticks) with Unicorn?
As stated, Unicorn is a pure CPU emulator. For such emulation, you have two choices:
- Use the `timeout` parameter of `uc_emu_start`
- Use the `count` parameter of `uc_emu_start`
After emulation stops, you may check anything you feel interested and resume emulation accordingly.
Note that for cortex-m `exec_return`, Unicorn has a magic software exception with interrupt number 8. You may register a hook to handle that.
## Why not keep up the upstream qemu?
To provide end users with simple API, Unicorn does lots of dirty hacks within qemu code which prevents it from sync painlessly.
## Is there anyway to disable softmmu to speed up execution?
Yes, its possible but that is not Unicorns goal and there is no simple switch in qemu to disable softmmu.
2023-03-28 16:16:19 +03:00
Starting from 2.0.2, Unicorn will emulate the MMU depending on the emulated architecture without further hacks. That said, Unicorn offers the full ability of the target MMU implementation. While this enables more possibilities of Uncorn, it has a few drawbacks:
2023-03-28 16:16:19 +03:00
- As previous question points out already, some memory regions are not writable/executable.
- You have to always check architecture-specific registers to confirm MMU status.
- `uc_mem_map` will always deal with physical addresses while `uc_emu_start` accepts virtual addresses.
2023-03-28 16:16:19 +03:00
Therefore, if you still prefer the previous `paddr = vaddr` simple mapping, we have a simple experimental MMU implementation that can be switched on by: `uc_ctl_tlb_mode(uc, UC_TLB_VIRTUAL)`. With this mode, you could also add a `UC_HOOK_TLB_FILL` hook to manage the TLB. When a virtual address is not cached, the hook will be called. Besides, users are allowed to flush the tlb with `uc_ctl_flush_tlb`.
2023-03-28 16:16:19 +03:00
In theory, `UC_TLB_VIRTUAL` will achieve better performance as it skips all MMU details, though not benchmarked.
## Something is wrong - I would like to dig deeper
Unicorn uses at several places logging by the qemu implementation.
This might provide a first glance what could be wrong.
The logs contains optionally the filename and the line number including
additional messages to indicate what is happening.
However, the qemu logs are partially commented-out and incomplete, but give it a try.
You might want to dig deeper - and add your own log messages where you expect or try to find the bug.
2024-09-21 19:14:03 +03:00
To enable logs, you must recompile Unicorn with `-DUNICORN_LOGGING=yes` to cmake.
Logs are written in different log levels, which might result into a very verbose logging if enabled.
To control the log level information, two environment variables could be used.
`UNICORN_LOG_LEVEL` and `UNICORN_LOG_DETAIL_LEVEL`.
These environment variables are parsed into `uint32_t` values once, (due to performance reasons)
so set these environment variables before you execute any line of Unicorn.
Allowed are hexa-decimal, decimal and octal values, which fits into a buffer of 10 chars. (see stroul for details).
To define how detailed and what should be logged, use the following environment variables:
- `UNICORN_LOG_LEVEL`=\<32bit mask\>
- The qemu bit mask what should be logged.
- Use the value of `UINT32_MAX` to log everything.
- If no bit is set in the mask, there will be no logging.
- `UNICORN_LOG_DETAIL_LEVEL`=\<level\>
- The level defines how the filename and line is constructed.
- 0: no filename and no line is used.
- 1: full filename including the leading path is used with line information.
- 2: just the filename with line information. It might be a little confusing,
as the file name can be used in several places.
- If unsure or unwanted, leave this variable undefined or set it to 0.
As an example to set up the environment for python correctly, see the example below.
```python
import os
os.environ['UNICORN_LOG_LEVEL'] = "0xFFFFFFFF" # verbose - print anything
os.environ['UNICORN_LOG_DETAIL_LEVEL'] = "1" # full filename with line info
```
Please note that file names are statically compiled in and can reveal the paths
of the file system used during compilation.
## My code does not do what I would expect - is this a bug?
Please create an github issue and provide as much details as possible.
- [ ] Simplified version of your script / source
- Make sure that "no" external dependencies are needed.
- E.g. remove additional use of capstone or CTF tools.
- [ ] Used Unicorn git-hash commit
- Make sure to exclude any changes of you made in unicorn.
- Alternativily provide the repo link to your commit.
- [ ] Detailed explaination what is expected
- Try to verify if the instructions can be processed by qemu.
- Dumping the registers of unicorn and qemu helps a lot.
- [ ] Detailed explaination what is observed
- Describe what's going on (and what you might think about it).
- [ ] Output from your executed script
- You might have additional log messages which could be helpful.
- [ ] Output from the qemu-logs
- Try to gather more informations by enabling the qemu logging.
- [ ] More details
- Attach more details to help reproduce the bug.
- Like attaching a repo link to the CTF challenge containing the binary or source code.
2022-02-13 11:54:38 +03:00
## I'd like to make contributions, where do I start?
See [milestones](https://github.com/unicorn-engine/unicorn/milestones) and [coding convention](https://github.com/unicorn-engine/unicorn/wiki/Coding-Convention
).
Be sure to send pull requests for our **dev** branch only.
## Which qemu version is Unicorn based on?
Prior to 2.0.0, Unicorn is based on qemu 2.2.1. After that, Unicorn is based on qemu 5.0.1.