I saw that issue under gcc 4.9.0. for some reason gcc 4.9.0 didn't optimize next handler call in all fpu opcode handlers.
As result, instead of finishing the handler and jumping to next one, the next handler is called blowing up stack.
After some long period stack overflow might occur.
The fix simply limit the max chaining depth to 1000 traces (should be enough)
The same fix should be able to address the stack overflow problem when compiling with -O0 and handlers chaining speedup enabled.
implemented important CPU statistics which were used for Bochs CPU model performance analysis.
old statistics code from paging.cc and cpu.cc is replaced with new infrastructure.
In order to enale statitics collection in Bochs CPU:
- Enable statistics @ compilation time in cpu/cpustats.h
- Dump statistics periodically by adding -dumpstats N into Bochs command line
The SMC problem was solved in following manner:
- Every trace linked to another remembers when it was linked (a special timestamp value called traceLinkTimeStamp)
- When true SMC happens it incremements the traceLinkTimeStamp
- Jump to the linked trace won't be allowed if traceLinkTimeStamp in the link doesn't match traceLinkTimeStamp
So SMC effectively breaks all trace links and therefore I should not care for them anymore
5%-10% speedup on OS boot benchamarks observed
I observed ~5% emulation slowdown ... thinking about possible mitigations
this fixes TLB issue with SMAP and SMEP features.
these features introduce a new behavior when page can be inaccessible by System (CPL=0).
Current behavior is accessBits was not supporting it but legacy (from Bochs 2.3.6) was.
The wrong behavior can be observed if user access a user page and system access the same page later.
user access is fine and pass SMEP/SMA checks and stores the translation in TLB.
the system access will hit the TLB and nobody could detect that system cannot access that page.
SMP emulation. New implementation uses dynamic CPU quantum value and takes
full advantage of the trace cache. Each emulated processor will execute
the whole trace before switching to the next processor.
* It is also safe to use large (up to 16 instructions) quantum values for
the SMP emulation now and improve performance even further.
The same merge also completely fixes SF bug :
[3312237] stepN command might be not working properly
Handlers chaining speedups are also supported with SMP emulation now.
feature is enabled by default when configure with --enable-all-optimizations
option, to disable handlers chaining speedups configure with
--disable-handlers-chaining
Allows to significantly cut trace cache miss latenct and find data in victim cahe instead of redoding it
8 entries VC in parallel with direct map 64K entries