With this coding style each instruction could be implemented separatelly even not together with current Bochs FPU emulator.
Step-by-step I am going to transfer all FPU instructions from current Bochs FPU emulator to new style and remove an old bugged emulator.
Anyway, now I could implement all currently missed FPU instructions without hacking wm-fpu-emu.
PNI could be enabled by setting BX_SUPPORT_PNI in config.h
After the feature will be fully validation I'll also add configure option.
The implemntation is ~complete. I've missed only three FPU new opcodes of FUSTTP instruction and MONITOR/WAIT instructions.
Enjoy ! ;)
According to the Intel manuals:
The LOCK prefix can be prepended only to the following instructions
and only to those forms of the instructions where the destination
operand is a memory operand: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG,
CMPXCH8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. If
the LOCK prefix is used with one of these instructions and the source
operand is a memory operand, an undefined opcode exception (#UD) will
be generated. An undefined opcode exception will also be generated if
the LOCK prefix is used with any instruction not in the above list.
Checking of the LOCK prefix done in fetchDecode state and not overloads
Bochs's execution.
* renamed CPU_ID to BX_CPU_ID.
with this new name there is no possibility for name contentions and BX_CPU_ID
definition could be moved out to NEED_CPU_REG_SHORTCUTS block
* returned back `unsigned BX_CPU::which_cpu(void)` function
* added BX_CPU_ID parameter for
BX_INSTR_PHY_READ(a20addr, len);
BX_INSTR_PHY_WRITE(a20addr, len);
now it will be
BX_INSTR_PHY_READ(cpu_id, a20addr, len);
BX_INSTR_PHY_WRITE(cpu_id, a20addr, len);
Because source files were added/removed it would require an update
of the windows and macos project files, so I want to wait until after 2.0.
M Makefile.in 1.51 back to 1.50
M cpu.h 1.121 back to 1.120
M fetchdecode.cc 1.37 back to 1.36
M fetchdecode64.cc 1.33 back to 1.32
M sse.cc 1.17 back to 1.16
A sse2.cc 1.27 back to 1.26 (added back)
R sse_move.cc removed
R sse_pfp.cc removed
- to bring these changes back again, all we have to do is
"cvs update -j tmp-before1 -j tmp-after1"
sse.cc -> general SSE stuff and SSE integer (MMX extensions)
sse_move.cc -> memory transfer and shuffle opcodes
sse_pfp.cc -> packed floating point operations
"bx_bool" which is always defined as Bit32u on all platforms. In Carbon
specific code, Boolean is still used because the Carbon header files
define it to unsigned char.
- this fixes bug [ 623152 ] MacOSX: Triple Exception Booting win95.
The bug was that some code in Bochs depends on Boolean to be a
32 bit value. (This should be fixed, but I don't know all the places
where it needs to be fixed yet.) Because Carbon defined Boolean as
an unsigned char, Bochs just followed along and used the unsigned char
definition to avoid compile problems. This exposed the dependency
on 32 bit Boolean on MacOS X only and led to major simulation problems,
that could only be reproduced and debugged on that platform.
- On the mailing list we debated whether to make all Booleans into "bool" or
our own type. I chose bx_bool for several reasons.
1. Unlike C++'s bool, we can guarantee that bx_bool is the same size on all
platforms, which makes it much less likely to have more platform-specific
simulation differences in the future. (I spent hours on a borrowed
MacOSX machine chasing bug 618388 before discovering that different sized
Booleans were the problem, and I don't want to repeat that.)
2. We still have at least one dependency on 32 bit Booleans which must be
fixed some time, but I don't want to risk introducing new bugs into the
simulation just before the 2.0 release.
Modified Files:
bochs.h config.h.in gdbstub.cc logio.cc main.cc pc_system.cc
pc_system.h plugin.cc plugin.h bios/rombios.c cpu/apic.cc
cpu/arith16.cc cpu/arith32.cc cpu/arith64.cc cpu/arith8.cc
cpu/cpu.cc cpu/cpu.h cpu/ctrl_xfer16.cc cpu/ctrl_xfer32.cc
cpu/ctrl_xfer64.cc cpu/data_xfer16.cc cpu/data_xfer32.cc
cpu/data_xfer64.cc cpu/debugstuff.cc cpu/exception.cc
cpu/fetchdecode.cc cpu/flag_ctrl_pro.cc cpu/init.cc
cpu/io_pro.cc cpu/lazy_flags.cc cpu/lazy_flags.h cpu/mult16.cc
cpu/mult32.cc cpu/mult64.cc cpu/mult8.cc cpu/paging.cc
cpu/proc_ctrl.cc cpu/segment_ctrl_pro.cc cpu/stack_pro.cc
cpu/tasking.cc debug/dbg_main.cc debug/debug.h debug/sim2.cc
disasm/dis_decode.cc disasm/disasm.h doc/docbook/Makefile
docs-html/cosimulation.html fpu/wmFPUemu_glue.cc
gui/amigaos.cc gui/beos.cc gui/carbon.cc gui/gui.cc gui/gui.h
gui/keymap.cc gui/keymap.h gui/macintosh.cc gui/nogui.cc
gui/rfb.cc gui/sdl.cc gui/siminterface.cc gui/siminterface.h
gui/term.cc gui/win32.cc gui/wx.cc gui/wxmain.cc gui/wxmain.h
gui/x.cc instrument/example0/instrument.cc
instrument/example0/instrument.h
instrument/example1/instrument.cc
instrument/example1/instrument.h
instrument/stubs/instrument.cc instrument/stubs/instrument.h
iodev/cdrom.cc iodev/cdrom.h iodev/cdrom_osx.cc iodev/cmos.cc
iodev/devices.cc iodev/dma.cc iodev/dma.h iodev/eth_arpback.cc
iodev/eth_packetmaker.cc iodev/eth_packetmaker.h
iodev/floppy.cc iodev/floppy.h iodev/guest2host.h
iodev/harddrv.cc iodev/harddrv.h iodev/ioapic.cc
iodev/ioapic.h iodev/iodebug.cc iodev/iodev.h
iodev/keyboard.cc iodev/keyboard.h iodev/ne2k.h
iodev/parallel.h iodev/pci.cc iodev/pci.h iodev/pic.h
iodev/pit.cc iodev/pit.h iodev/pit_wrap.cc iodev/pit_wrap.h
iodev/sb16.cc iodev/sb16.h iodev/serial.cc iodev/serial.h
iodev/vga.cc iodev/vga.h memory/memory.h memory/misc_mem.cc
and Jas Sandys-Lumsdaine to split out common instructions into
variants which deal with the mod=11b case (Reg-Reg) and the
other cases (which do memory ops). Actually, I only split
MOV_GwEw and MOV_GdEd for now. According to some instrumentation
of a Win95 boot, they were the most frequently used opcode by far.
Some things changed in the ctrl_xfer*.cc, fetchdecode*.cc,
and cpu.cc since the original patches, so I did some patch
integration by hand. Check the placement of the
macros BX_INSTR_FETCH_DECODE_COMPLETED() and BX_INSTR_OPCODE()
in cpu.cc to make sure I go them right. Also, I changed the
parameters to BX_INSTR_OPCODE() to update them to the new code.
I put some comments before each of these to help determine if
the placement is right.
These macros are only compiled in if you are gathering instrumentation
data from bochs, so they shouldn't effect others.
Since the SYSCALL replaces the LOADALL instruction, it is incompatible with
earlier CPU types.
At moment, the SYSCALL is only enabled by x86-64 emulation, but the code
can be incorporated in IA32 only emulations.
Instructions added:
0F 05 SYSCALL (replaces LOADALL)
0F 07 SYSRET (new)
TODO: restructure #if ... so that it can be used by non x86-64 emulations.
of (1 & (val32>>N)), and added a getB_?F() accessor for special
cases which need a strict binary value (exactly 0 or 1). Most
code only needed a value for logical comparison. I modified the
special cases which do need a binary number for shifting and
comparison between flags, to use the special getB_?F() accessor.
Cleaned up memory.cc functions a little, now that all accesses
are within a single page.
Fixed a (not very likely encountered) bug in fetchdecode.cc (and
fetchdecode64.cc) where a 2-byte opcode starting with a prefix
starts at the last offset on a page. There were no checks
on the segment overrides for a boundary condition. I added them.
The eflags enhancements added just a tiny bit of performance.
so frequently.
Coded asm() statements for INC/DEC_ERX() instructions.
Cleaned up the iCache a litle including a bug fix. The
generation ID was decrementing the whole field including
some high meta bits. That could roll over after 1 Billion
cycles. I know only decrement if the field is valid, to
save the write.
I implemented inline functions which can serve the value of
the arithmetic flags if they are cached, and redirect to
the lazy_flags.cc routines if not.
Most of this was just prep work for adding more asm() statements
for native eflags processing when on x86.
also extended by the REX.B field on Hammer) is passed to instructions.
I rearranged the bxInstruction_c to free up a field to be used
to pass this info when mod-rm bytes are not used. This got rid
of the ugly ((i->b1 & 7) + i->rex_b) code.
Probably shaved just a very little run time off Hammer emulation,
and even less on x86-32. The resultant is a little cleaner anyways.
in cpu.cc out of the main loop, and into the asynchronous
events handling. I went through all the code paths, and
there doesn't seem to be any reason for that code to be
in the hot loop.
Added another accessor for getting instruction data, called
modC0(). A lot of instructions test whether the mod field
of mod-nnn-rm is 0xc0 or not, ie., it's a register operation
and not memory. So I flag this in fetchdecode{,64}.cc.
This added on the order of 1% performance improvement for
a Win95 boot.
Macroized a few leftover calls to Write_RMV_virtual_xyz()
that didn't get modified in the x86-64 merge. Really, they
just call the real function for now, but I want to have them
available to do direct writes with the guest2host TLB pointers.