With this coding style each instruction could be implemented separatelly even not together with current Bochs FPU emulator.
Step-by-step I am going to transfer all FPU instructions from current Bochs FPU emulator to new style and remove an old bugged emulator.
Anyway, now I could implement all currently missed FPU instructions without hacking wm-fpu-emu.
* Fully supports
* MMX/XMM/3DNOW instruction sets
* FPU instruction
* SSE3 extensions
currently only 16/32 bit mode bug anyway, it is much better that old one ;)
Instructions MOV_CxRx and MOV_RxCx are not supported in v8086 mode according to Intel manuals.
Also these instructions are treated as register-to-register regardless to MODRM byte fields (according to AMD manuals)
Also commit fix for MOV_EwSw by Kevin
PNI could be enabled by setting BX_SUPPORT_PNI in config.h
After the feature will be fully validation I'll also add configure option.
The implemntation is ~complete. I've missed only three FPU new opcodes of FUSTTP instruction and MONITOR/WAIT instructions.
Enjoy ! ;)
From the author (see bug #663320) :
In the code there is a check to verify that an IO bitmap
is defined (io_base > BX_CPU_THIS_PTR
tr.cache.u.tss386.limit_scaled) but there is no check if
an accessed IO port's address actually falls within the
defined limit of the TSS segment. So if I define an IO
bitmap with 100 entries, port 101 may or may not be
allowed depending on whatever bytes follow the TSS in
memory
[x] fixed bug in int01 (opcode 0xF1) emulation
[x] fixed bug in x86 debugger with dr0-dr3 registers
Committed disassembler bugfix from Dirk Thierbach:
[x] fixed bug in relative addresses in Jmp, Jcc, Call and so on
* changed all %ll format descriptions to FMT_LL macro so that
Microsoft Visual C works correctly (it uses %I64)
* missing type conversions added
* cdrom.cc: variable types for win32 fixed
* removed some unused variables in eth_win32.cc and harddrv.cc
* added missing includes in make_cmos_image.c and niclist.c
check.
Commented out a number of instances of invalidate_prefetch_q(),
for branches which do not change CS since the EIP window mechanism
takes care of validating that EIP lands in the current page or not
in the main cpu loop anyways.
Fixed a couple cases (v8086 mode and real mode) of loading CS where
the EIP page window was not invalidated in segment_ctrl_pro.cc.
That may fix some aliasing problems reported before (OS2).
Notes from the author:
Here is another one of my speed up patches. Unlike my previous speedups
this one will help more platforms than just X86. It cleans up the Data
Xfer instructions. Since the Data Xfer instructions are the most often
executed instructions it gives a noticable boost in speed. The basic
optimization technique was to eliminate intermediate variables and pass
a pointer to the final destination or original source to the
read_virtual_whatever and the write_virtual_whatever functions.
According to the Intel manuals:
The LOCK prefix can be prepended only to the following instructions
and only to those forms of the instructions where the destination
operand is a memory operand: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG,
CMPXCH8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. If
the LOCK prefix is used with one of these instructions and the source
operand is a memory operand, an undefined opcode exception (#UD) will
be generated. An undefined opcode exception will also be generated if
the LOCK prefix is used with any instruction not in the above list.
Checking of the LOCK prefix done in fetchDecode state and not overloads
Bochs's execution.
- it works only on x86 with gcc2.95+
- uses the GCC function atribute "regparm(n)" to declare that certain
functions use the register calling convention
- performance improvement is about 6%
1) fixed the type of "hostPageAddr" and associated typecasts.
2) fixed the type of "pages" and associated typecasts (overloaded variable)
3) patch to cpu.cc to calculate "eipPageBias" correctly in 64 bit mode
1) fixed some errors running 32 bit compat mode. IMPORTANT FIX.
2) added IST processing (uses IST1-IST7 in 64 bit TSS)
3) cosmetic - debugging stuff to console.
a minor optimization. Also in transition from compat mode to 64 bit mode (e.g. interrupt to inner
privelege with mode change), SS may not be properly defined - this avoids other messiness.
* renamed CPU_ID to BX_CPU_ID.
with this new name there is no possibility for name contentions and BX_CPU_ID
definition could be moved out to NEED_CPU_REG_SHORTCUTS block
* returned back `unsigned BX_CPU::which_cpu(void)` function
* added BX_CPU_ID parameter for
BX_INSTR_PHY_READ(a20addr, len);
BX_INSTR_PHY_WRITE(a20addr, len);
now it will be
BX_INSTR_PHY_READ(cpu_id, a20addr, len);
BX_INSTR_PHY_WRITE(cpu_id, a20addr, len);
> CPU_ID is defined as
> #define CPU_ID (BX_CPU_THIS_PTR local_apic.get_id())
> This is not true when the APIC name is changed (true in Linux). Please
> change this to:
> #define CPU_ID (BX_CPU_THIS - BX_CPU(0))
The 64 bit variant of MOVNTI was not decoded. The proper fix for this is to work on
fetchdecode64.cc to call a 64 bit variant of SSE instructions or fail it with a
invalid op. A careful check needs to be done with the AMD manuals to determine if
there are any other SSE instructions that have a special 64 bit decoding.
PSRAW_PqQq (MMX)
PSRAD_PqQq (MMX)
PSRAW_PqIb (MMX)
PSRAD_PqIb (MMX)
PSRAW_VdqWdq (SSE)
PSRAD_VdqWdq (SSE)
PSRAW_PdqIb (SSE)
PSRAD_PdqIb (SSE)
When register was shifted by 0 bits the result produced was incorrect.
Now Bochs fully passes MMX test provided by
Hentai Yagi [hentai_yagi@yahoo.com.au] !
sash to run.
1) fixed fetchdecode64.cc to fix the operand size at 64 bits in long mode for moves
to/from CRx
2) minor patches to sse2.cc to fix unimplemented and 64 bit variants of sse2
instructions.
Because source files were added/removed it would require an update
of the windows and macos project files, so I want to wait until after 2.0.
M Makefile.in 1.51 back to 1.50
M cpu.h 1.121 back to 1.120
M fetchdecode.cc 1.37 back to 1.36
M fetchdecode64.cc 1.33 back to 1.32
M sse.cc 1.17 back to 1.16
A sse2.cc 1.27 back to 1.26 (added back)
R sse_move.cc removed
R sse_pfp.cc removed
- to bring these changes back again, all we have to do is
"cvs update -j tmp-before1 -j tmp-after1"
sse.cc -> general SSE stuff and SSE integer (MMX extensions)
sse_move.cc -> memory transfer and shuffle opcodes
sse_pfp.cc -> packed floating point operations
Description/justification:
Endian Host byte order Guest (x86) byte order
======================================================
Little FFFFFFFFEEAAAAAA FFFFFFFFEEAAAAAA
Big AAAAAAEEFFFFFFFF FFFFFFFFEEAAAAAA
F - fraction/mmx
E - exponent
A - aligment