Implemented VCOMISS/VCOMISD/VUCOMISS/VUCOMISD AVX512 instructions
Fix vector length values for AVX-512 (512-bit vector should have length 4)
support mis-alignment #GP exception for VMOVAPS/PD/DQA32/DQ64 AVX512 instructions
move AVX512 load/store and register move operations into dedicated file avx512_move.cc
The code already knows to disasm most of the opcodes with their operands.
- Split according to OSIZE opcodes RDFSBASE/WRFSBASE / RDGSBASE/WRGSBASE both for disasm and performance
- Minimize amount of opcode forms in ia_opcodes.h again.
For example Udq means the same as Wdq but with no memory form.
The end goal will be also merging of disasm and cpu decoder to one module and remove the disasm.
Two bug fixes on the way:
TBM: fixed 64-bit TBM instructions with memory access (did 32-bit load instead of 64-bit)
BMI2: fixed operands order for PEXT/PDEP instructions
AVX2: fixed gather instruction decoding bug from decoder alias commit
The Windows version looks almost stable, but the GTK version fails in some cases.
That's why the classic wx debugger is still available if BX_DEBUGGER_GUI is set to 0.
- added function close_debug_dialog() to handle the simulation stop case in wx
- disable all the wx debugger related code if BX_DEBUGGER_GUI is set to 1
- added enhanced debugger specific init code similar to the code in sdl.cc
- include debugger related resources on Windows
- TODO: make the GTK / wxGTK case stable and remove the wx debugger
Bugfix: VMX: VmEntry should do TPR Virtualization (TPR Shadow + APIC Access Virtualization case is affected) and even could possibly cause TPR Threshold VMEXIT
Of course no true random numbers will be generated - use standard "C" rand() function as stub.
In future it will be possible to improve (using another random generator) or even use real rdrand/rdseed intrinsics
Minor speedup (of 1-2%) was observed due to new implementation
Remove obsolete dbg_take_irq function and dbg_force_interrupt function from CPU code, the functions were not working properly anyway
fixed enabling of ADX extensions in generic CPUID when enabled through .bochsrc
Small code cleanups on the way to implementation of APIC Registers Virtualization features disclosed in recent Intel SDM rev043
Bochs instruction emulation handlers won't refer to direct fields of instructions like MODRM.NNN or MODRM.RM anymore.
Use generic source/destination indications like SRC1, SRC2 and DST.
All handlers are modified to support new notation. In addition fetchDecode module was modified to assign sources to instructions properly.
Immediate benefits:
- Removal of several duplicated handlers (FMA3 duplicated with FMA4 is a trivial example)
- Simpler to understand fetch-decode code
Future benefits:
- Integration of disassembler into Bochs CPU module, ability to disasm bx_instruction_c instance (planned)
Huge patch. Almost all source files wre modified.
for CPU emulation performance reasons, the alignment check compilation
still can be enabled using configure option --enable-alignment-check.
There is no software in the world which enable #AC exception checking, this
x86 feature is completely legacy but its emulation support costs up to 3-5%
emulation speed.
The checking for #AC exception enable still will be done, if
CPL == 3, EFLAGS.AC = 1 and CR0.AM = 1
but the alignment check is not compiled in, the Bochs will PANIC with corresponding message.
You can press 'always continue' and ignore the PANIC, the simulation will continue as if alignment checking is not enabled.
The problem with Parity is it is generally referenced very rarely so the current lazy flags code is not efficient to updated Parify flag only (because it updates low 8 bits of .result value the existing Zero Flag has to be shadowed in .auxbits.
So I flipped it around, to make Parity be shadowed in auxbits. .result now is only needed to derive Zero Flag, and both Sign and Parify are derived from .result + .auxbits (as Zero Flag is now). For the 90% of the conditional jumps that are JZ or JNZ, this is a speedup.
Parity is now derived from 8 bits in .result and 8 bits in .auxbits, and Sign is derived from one flag in .result and 1 bit in .auxbits by XOR-ing them all together. It makes the code sequences for SAHF and POPF simpler too.
------------
Implemented SVN nested paging support - the Virtual Box boots perfectly with Nested Paging guest !
A lot of code duplication was added for now - major cleanup will follow later.
! Added AMD Phenom X3 8650 (Toliman) configuration to the CPUDB - this configuration has Nested Paging enabled.
Some CPUID modules rework done to enable Toliman configuration.
Ckean up 'executable' attribute from all CPU source files.
- IO intercept is not implemented yet
- MSR intercept is not implemented yet
VMX:
Fixed Bochs PANIC crash when doing I/O access crossing VMX I/O permission bitmaps.
This can happen because access_physical_read and access_physical_write cannot access memory cross 4K boundary.
I am merging the code in order to start making shortcuts between VMX emulation and SVM emulation.
Of course SVM emulation is incomplete, completely untested and not expected to work.
But someone could already take a look one the code and give some suggestions.
Also looking for anybody with existing SVM kernels - as simple as possible - for testing.
Status:
- exceptions intercept is not implemented yet
- IO intercept is not implemented yet
- MSR intercept is not implemented yet
- virtual interrupts are not implemented yet
- CPUID is not implemented yet
No advanced SVM featurez planned - I am implementing the very basic 'Pacifica' document from 2005 using QEMU code as reference.
XOP: few instructions are still missing, coming soon
BX_PANIC(("VPERMILPS_VpsHpsWpsVIbR: not implemented yet"));
BX_PANIC(("VPERMILPD_VpdHpdWpdVIbR: not implemented yet"));
BX_PANIC(("VPMADCSSWD_VdqHdqWdqVIbR: not implemented yet"));
BX_PANIC(("VPMADCSWD_VdqHdqWdqVIbR: not implemented yet"));
BX_PANIC(("VFRCZPS_VpsWpsR: not implemented yet"));
BX_PANIC(("VFRCZPD_VpdWpdR: not implemented yet"));
BX_PANIC(("VFRCZSS_VssWssR: not implemented yet"));
BX_PANIC(("VFRCZSD_VsdWsdR: not implemented yet"));
So now the same single option will choose not only the CPUID flags but also VMX capabilities matching real HW machine.
Removed cpuid of core2_extreme_x9770 from the cpudb. I don't remember its VMX capabilities anyway.
There is another Penryn model in the cpudb - core2_penryn_t9600.
32-bit CPU using Bochs binary compiled with x86-64 support.
The commit also fixes some init.cc issues with initialization of SYSCALL/SYSRET MSR in AMD hosts and also includes code reorg.
SMP emulation. New implementation uses dynamic CPU quantum value and takes
full advantage of the trace cache. Each emulated processor will execute
the whole trace before switching to the next processor.
* It is also safe to use large (up to 16 instructions) quantum values for
the SMP emulation now and improve performance even further.
The same merge also completely fixes SF bug :
[3312237] stepN command might be not working properly
Handlers chaining speedups are also supported with SMP emulation now.