- Fixed critical bug in CPU code added with one of the prev commits
- Disasm support for SSE4
- Rename PNI->SSE3 everywhere in the code
- Correctly decode, disassemble and execute 'XCHG R8, rAX' x86-64 instruction
- Correctly decode, disassemble and execute multi-byte NOP 0F F1 opcode
- Fixed ENTER and LEAVE instructions in x86-64 mode
- Added ability to turn ON instruction trace, only GUI support is missed.
Instruction trace could be enabled if Bochs was compiled with disasm
- More changes Bit32u -> bx_phy_address
- Complete preliminary implementation of SMM in Bochs, SMI is still PANICs but if you press 'continue' everything should work OK
- Small code cleanup
- Update CHANGES and user docs
Averything that required cpu.h include now has it explicitly and there are a lot of files not dependant by CPU at all which will compile a lot faster now ...
1. Review and commit patch
[ 896733 ] Lazy flags, for more instructions, only 1 src op
May be partially, but I hope to get all ideas from patch in
2. Get Bochs speedup after lazy flags optimization
3. Most important for me: improve correctness of emulation by handling several
undocumented EFLAGS modifications. And finally pass
UFLAGS - Undefined Flags Test v 3.0
Copyright (C) Potemkin's Hackers Group (PHG) 1989,1995
The test still fails on > 50% of its checks.
Notes from the author:
Here is another one of my speed up patches. Unlike my previous speedups
this one will help more platforms than just X86. It cleans up the Data
Xfer instructions. Since the Data Xfer instructions are the most often
executed instructions it gives a noticable boost in speed. The basic
optimization technique was to eliminate intermediate variables and pass
a pointer to the final destination or original source to the
read_virtual_whatever and the write_virtual_whatever functions.
there to offer a way to substitute more efficient code
to do the RMW cases. At the moment, they just map to
the normal functions.
Sorry, restored the previous version ...
"bx_bool" which is always defined as Bit32u on all platforms. In Carbon
specific code, Boolean is still used because the Carbon header files
define it to unsigned char.
- this fixes bug [ 623152 ] MacOSX: Triple Exception Booting win95.
The bug was that some code in Bochs depends on Boolean to be a
32 bit value. (This should be fixed, but I don't know all the places
where it needs to be fixed yet.) Because Carbon defined Boolean as
an unsigned char, Bochs just followed along and used the unsigned char
definition to avoid compile problems. This exposed the dependency
on 32 bit Boolean on MacOS X only and led to major simulation problems,
that could only be reproduced and debugged on that platform.
- On the mailing list we debated whether to make all Booleans into "bool" or
our own type. I chose bx_bool for several reasons.
1. Unlike C++'s bool, we can guarantee that bx_bool is the same size on all
platforms, which makes it much less likely to have more platform-specific
simulation differences in the future. (I spent hours on a borrowed
MacOSX machine chasing bug 618388 before discovering that different sized
Booleans were the problem, and I don't want to repeat that.)
2. We still have at least one dependency on 32 bit Booleans which must be
fixed some time, but I don't want to risk introducing new bugs into the
simulation just before the 2.0 release.
Modified Files:
bochs.h config.h.in gdbstub.cc logio.cc main.cc pc_system.cc
pc_system.h plugin.cc plugin.h bios/rombios.c cpu/apic.cc
cpu/arith16.cc cpu/arith32.cc cpu/arith64.cc cpu/arith8.cc
cpu/cpu.cc cpu/cpu.h cpu/ctrl_xfer16.cc cpu/ctrl_xfer32.cc
cpu/ctrl_xfer64.cc cpu/data_xfer16.cc cpu/data_xfer32.cc
cpu/data_xfer64.cc cpu/debugstuff.cc cpu/exception.cc
cpu/fetchdecode.cc cpu/flag_ctrl_pro.cc cpu/init.cc
cpu/io_pro.cc cpu/lazy_flags.cc cpu/lazy_flags.h cpu/mult16.cc
cpu/mult32.cc cpu/mult64.cc cpu/mult8.cc cpu/paging.cc
cpu/proc_ctrl.cc cpu/segment_ctrl_pro.cc cpu/stack_pro.cc
cpu/tasking.cc debug/dbg_main.cc debug/debug.h debug/sim2.cc
disasm/dis_decode.cc disasm/disasm.h doc/docbook/Makefile
docs-html/cosimulation.html fpu/wmFPUemu_glue.cc
gui/amigaos.cc gui/beos.cc gui/carbon.cc gui/gui.cc gui/gui.h
gui/keymap.cc gui/keymap.h gui/macintosh.cc gui/nogui.cc
gui/rfb.cc gui/sdl.cc gui/siminterface.cc gui/siminterface.h
gui/term.cc gui/win32.cc gui/wx.cc gui/wxmain.cc gui/wxmain.h
gui/x.cc instrument/example0/instrument.cc
instrument/example0/instrument.h
instrument/example1/instrument.cc
instrument/example1/instrument.h
instrument/stubs/instrument.cc instrument/stubs/instrument.h
iodev/cdrom.cc iodev/cdrom.h iodev/cdrom_osx.cc iodev/cmos.cc
iodev/devices.cc iodev/dma.cc iodev/dma.h iodev/eth_arpback.cc
iodev/eth_packetmaker.cc iodev/eth_packetmaker.h
iodev/floppy.cc iodev/floppy.h iodev/guest2host.h
iodev/harddrv.cc iodev/harddrv.h iodev/ioapic.cc
iodev/ioapic.h iodev/iodebug.cc iodev/iodev.h
iodev/keyboard.cc iodev/keyboard.h iodev/ne2k.h
iodev/parallel.h iodev/pci.cc iodev/pci.h iodev/pic.h
iodev/pit.cc iodev/pit.h iodev/pit_wrap.cc iodev/pit_wrap.h
iodev/sb16.cc iodev/sb16.h iodev/serial.cc iodev/serial.h
iodev/vga.cc iodev/vga.h memory/memory.h memory/misc_mem.cc
and Jas Sandys-Lumsdaine to split out common instructions into
variants which deal with the mod=11b case (Reg-Reg) and the
other cases (which do memory ops). Actually, I only split
MOV_GwEw and MOV_GdEd for now. According to some instrumentation
of a Win95 boot, they were the most frequently used opcode by far.
of (1 & (val32>>N)), and added a getB_?F() accessor for special
cases which need a strict binary value (exactly 0 or 1). Most
code only needed a value for logical comparison. I modified the
special cases which do need a binary number for shifting and
comparison between flags, to use the special getB_?F() accessor.
Cleaned up memory.cc functions a little, now that all accesses
are within a single page.
Fixed a (not very likely encountered) bug in fetchdecode.cc (and
fetchdecode64.cc) where a 2-byte opcode starting with a prefix
starts at the last offset on a page. There were no checks
on the segment overrides for a boundary condition. I added them.
The eflags enhancements added just a tiny bit of performance.
also extended by the REX.B field on Hammer) is passed to instructions.
I rearranged the bxInstruction_c to free up a field to be used
to pass this info when mod-rm bytes are not used. This got rid
of the ugly ((i->b1 & 7) + i->rex_b) code.
Probably shaved just a very little run time off Hammer emulation,
and even less on x86-32. The resultant is a little cleaner anyways.
in cpu.cc out of the main loop, and into the asynchronous
events handling. I went through all the code paths, and
there doesn't seem to be any reason for that code to be
in the hot loop.
Added another accessor for getting instruction data, called
modC0(). A lot of instructions test whether the mod field
of mod-nnn-rm is 0xc0 or not, ie., it's a register operation
and not memory. So I flag this in fetchdecode{,64}.cc.
This added on the order of 1% performance improvement for
a Win95 boot.
Macroized a few leftover calls to Write_RMV_virtual_xyz()
that didn't get modified in the x86-64 merge. Really, they
just call the real function for now, but I want to have them
available to do direct writes with the guest2host TLB pointers.
to bitfields. bxInstruction_c is now 24 bytes, including 4 for
the memory addr resolution function pointer, and 4 for the
execution function pointer (16 + 4 + 4).
Coded more accessors, to abstract access from most code.
with accessors. Had to touch a number of files to update the
access using the new accessors.
Moved rm_addr to the CPU structure, to slim down bxInstruction_c
and to prevent future instruction caching from getting sprayed
with writes to individual rm_addr fields. There only needs to
be one. Though need to deal with instructions which have
static non-modrm addresses, but which are using rm_addr since
that will change.
bxInstruction_c is down to about 40 bytes now. Trying to
get down to 24 bytes.