all available optimizations in one shot.
Finished one last case of an instruction which could but didn't use
the Read-Modify-Write variants of access.cc functions.
Started going through the integer instructions, merging obvious cases
where there are two "if (modrm==11b) {" clauses and very little
action in between, and cleaning up the aweful indentation leftover
from many years ago when those instructions were implemented using
cut-and-paste. We may get a little extra performance out of these
mods, but they'll also be easier after I'm finished to enhance
with asm() statements to knock out the lazy flags processing on x86.
now simply return a cached value which is set upon mode changes.
The biggest problem was protected_mode() which did something like:
return CR0.PM && ! EFLAGS.VM
This adds up when it was being executed many times in branch functions
etc. Now, cached values are set and sampled instead.
- parallel port detection fixed:
* write the value of AX to 0x0410, not BX
* the timeout value is a byte and now stored in CL
* the offset of the port address list is 2 bytes
Used patch.disasm to do
1) clean up the disasm output to make the dispaly of extra stuff optional.
2) included the part of the patch which displays displacements as
proper addresses.
and Jas Sandys-Lumsdaine to split out common instructions into
variants which deal with the mod=11b case (Reg-Reg) and the
other cases (which do memory ops). Actually, I only split
MOV_GwEw and MOV_GdEd for now. According to some instrumentation
of a Win95 boot, they were the most frequently used opcode by far.
Essentially, when I coded a few of the instructions to use
asm()s for acceleration of the eflags, I got lazy and only
used the asm() to compute eflags and let the normal C operation
do the actual operation. Jas's patch, moved the asm()s such
that they now do the work of the operation as well.
The patches look great. The code reads a lot better as well.
Further work can be done to give the compiler more options with
register scheduling.
were simply replacements of the eflags mask constants with
the macro names already in cpu.h for asm() statements. I forgot
to use the macros for some instructions.
0x000008d5 -> EFlagsOSZAPCMask
0x000008d4 -> EFlagsOSZAPMask
Some things changed in the ctrl_xfer*.cc, fetchdecode*.cc,
and cpu.cc since the original patches, so I did some patch
integration by hand. Check the placement of the
macros BX_INSTR_FETCH_DECODE_COMPLETED() and BX_INSTR_OPCODE()
in cpu.cc to make sure I go them right. Also, I changed the
parameters to BX_INSTR_OPCODE() to update them to the new code.
I put some comments before each of these to help determine if
the placement is right.
These macros are only compiled in if you are gathering instrumentation
data from bochs, so they shouldn't effect others.
just the wxwindows ones. This is required on cygwin, for example, because
the CFLAGS and CXXFLAGS include gcc flags that change code generation:
-fno-pcc-struct-return and -fvtable-thunks. It is not safe to mix code
compiled with these flags with code compiled without. I learned this the
hard way when I found that sometimes code that called a virtual member
function was jumping to the WRONG member function.
Created 64-bit versions of some branch instructions and
changed fetchdecode64.cc to use them instead. This keeps the
#ifdef pollution down for 32-bit code and made fixing them
easier. They needed to clear the upper bits of RIP for
16-bit operand sizes. They also should not have had a protection
limit check in them, especially since that field is still
32-bit in cpu.h, so there's no way to set nominal 64-bit values.
The 32-bit versions were also not honoring the upper 32-bits
of RIP.
LOOPNE64_Jb
LOOPE64_Jb
LOOP64_Jb
JCXZ64_Jb
Changed all occurances of JCC_Jw/JCC_Jd in fetchdecode64.cc to
use JCC_Jq, which was coded already. Both JMP_Jq and JCC_Jq are
now fixed w.r.t. 16-bit opsizes and upper RIP bit clearing.
63..16 when a 16-bit operand size JMP is executed. Previous
fix cleared only 63..32. I since realized, this is the case
which does parallel the 32-bit semantics.
fetching 64-bit address opcode info, which was incorrect.
Fixed. Got rid of BxImmediate_Oq. fetchdecode64.cc now
uses BxImmediateO, like the fetch routine does. Addresses which
are embedded in the opcode, have a size which depends on
the current addressing size. For long-mode, this is
either 64 (default) or 32 (AddrSize over-ride). BxImmediate_O
now conditionally fetches based on AddrSize.
64-bit bug#2: In JMP_Jq(), when the current operand size is
16-bits, the upper dword of RIP was not being cleared. The
semantics with this case are weird - one would think the
top 48 bits would be cleared, but apparently only the top
32 bits are. Anyways, I fixed this.
Replaced some of the messy immediate fetching (byte-by-byte) in
fetchdecode64.cc with ReadHost{Q,D}WordFromLittleEndian() calls
for cleanliness. Should do this for all the cases, plus
the 32-bit stuff.
conditionally include <windows.h>. This may seem like a drastic step
for just one little type, but I expect before long we may want to use
other symbols like VK_F12 which are also in windows.h. In a cygwin
compile this is required.
wxWindows guis.
- if cross configuring, don't insist on finding curses library.
- on normal configures, when the target platform is win32 (windows, cygwin,
mingw), don't insist on finding pthread either.
(I'm starting to wonder if when cross_configure=1 we shouldn't just skip over
ALL of the library and header checks. When you're going to configure on one
platform and build on another, all that information is useless anyway.)