- #GP when trying to set reserved bits of CR4_HI in 64-bit mode
- #GP when trying to set reserved bits of EFER MSR
- clear upper part of RSI/RDI when executing rep instructions with 32-bit asize
even if no repeat iterations were executed (because of RCX=0 for example)
- write SYSENTER_EIP_MSR and SYSENTER_ESP_MSR as 64-bit when x86_64 supported
- set MSR_FMASK reset value
- MSR_FMASK should be 32-bit only
- check for fetch permissions when doing ITLB lookup
- #GP when trying to write non-canonical address to MSR_CSTAR or MSR_LSTAR
- correct repeat instructions timing
- mark TSS busy in TR after it is loaded
Averything that required cpu.h include now has it explicitly and there are a lot of files not dependant by CPU at all which will compile a lot faster now ...
bochs.h already not include iodev.h which reduces compilation dependences for almost all cpu and fpu files, now cpu files will not be recompiled if iodev includes was changed
Fixed/updated/cleaned guest2host TLB speedups for Long mode.
I now can boot the Linux x86-64 kernel to the VFS mount message,
using all the accelerations.
but if you hand edit cpu/cpu.h, and change BxICacheEntries,
you can try different sizes. I'll make this more flexible
with configure. For now, use "--enable-icache" with no parameters.
- Modified fetchdecode.cc/fetchdecode64.cc just enough so that
instructions which encode a direct address now use a memory
resolution function which just sticks the immediate address
into rm_addr. With cached instructions we need this.
to bitfields. bxInstruction_c is now 24 bytes, including 4 for
the memory addr resolution function pointer, and 4 for the
execution function pointer (16 + 4 + 4).
Coded more accessors, to abstract access from most code.
with accessors. Had to touch a number of files to update the
access using the new accessors.
Moved rm_addr to the CPU structure, to slim down bxInstruction_c
and to prevent future instruction caching from getting sprayed
with writes to individual rm_addr fields. There only needs to
be one. Though need to deal with instructions which have
static non-modrm addresses, but which are using rm_addr since
that will change.
bxInstruction_c is down to about 40 bytes now. Trying to
get down to 24 bytes.
use accessors. This lets me work on compressing the
size of fetch-decode structure (now called bxInstruction_c).
I've reduced it down to about 76 bytes. We should be able
to do much better soon. I needed the abstraction of the
accessors, so I have a lot of freedom to re-arrange things
without making massive future changes.
Lost a few percent of performance in these mods, but my
main focus was to get the abstraction.
enhancement to bochs. You can now configure with
--enable-guest2host-tlb.
Force the support of big pages (PSE) when x86-64 is configured.
Reverted back to only one kind of TLB entry style, since everything
is ported.
Fixed one bug in io.cc with as_64 and the index registers.
There are others, as noticed by Peter.
be used at all, and Peter didn't want it. "extdb.o" is compiled
into libcpu.a, if configured for it.
Removed a few #warnings for x86-64 compile, based on Peter's
line-item comments regarding the warnings I inserted during
the port/merge.
cpu64 directories. Instead of using the macros introduced in cpu.h rev 1.37
such as GetEFlagsDFLogical and SetEFlagsDF and ClearEFlagsDF, I made inline
methods on the BX_CPU_C object that access the eflags fields. The problem
with the macros is that they cannot be used outside the BX_CPU_C object. The
macros have now been removed, and all references to eflags now use these new
accessors.
- I debated whether to put the accessors as members of the BX_CPU_C object
or members of the bx_flags_reg_t struct. I chose to make them members
of BX_CPU_C for two reasons: 1. the lazy flags are implemented as
members of BX_CPU_C, and 2. the eflags are referenced in many many places
and it is more compact without having to put eflags in front of each. (The
real problem with compactness is having to write BX_CPU_THIS_PTR in front of
everything, but that's another story.)
- Kevin pointed out a major bug in my set accessor code. What a difference a
little tilde can make! That is fixed now.
- modified: load32bitOShack.cc debug/dbg_main.cc
and in both cpu and cpu64 directories:
cpu.cc cpu.h ctrl_xfer_pro.cc debugstuff.cc exception.cc flag_ctrl.cc
flag_ctrl_pro.cc init.cc io.cc io_pro.cc proc_ctrl.cc soft_int.cc
string.cc vm8086.cc
to request bulk IO operations to IO devices which are bulk IO aware.
Currently, I modified only harddrv.cc to be aware. I added some
fields to the bx_devices_c class for the IO instructions to
place requests and receive responses from the IO device emulation.
Devices except the hard drive, don't monitor these fields so they
respond as normal. The hard drive now monitors these fields for
bulk requests, and if enabled, it memcpy()'s data straight from
the disk buffer to memory. This eliminates numerous inp/outp calling
sequences per disk sector.
I used the fields in bx_devices_c so that I would not have to
disrupt most IO device modules. Enhancements can be made to
other devices if they use high-bandwidth IO via in/out instructions.
All the EFLAGS bits used to be cached in separate fields. I left
a few of them in separate fields for now - might remove them
at some point also. When the arithmetic fields are known
(ie they're not in lazy mode), they are all cached in a
32-bit EFLAGS image, just like the x86 EFLAGS register expects.
All other eflags are store in the 32-bit register also, with
a few also mirrored in separate fields for now.
The reason I did this, was so that on x86 hosts, asm() statements
can be #ifdef'd in to do the calculation and get the native
eflags results very cheaply. Just to test that it works, I
coded ADD_EdId() and ADD_EwIw() with some conditionally compiled
asm()s for accelerated eflags processing and it works.
-Kevin