Averything that required cpu.h include now has it explicitly and there are a lot of files not dependant by CPU at all which will compile a lot faster now ...
bochs.h already not include iodev.h which reduces compilation dependences for almost all cpu and fpu files, now cpu files will not be recompiled if iodev includes was changed
Fixed/updated/cleaned guest2host TLB speedups for Long mode.
I now can boot the Linux x86-64 kernel to the VFS mount message,
using all the accelerations.
but if you hand edit cpu/cpu.h, and change BxICacheEntries,
you can try different sizes. I'll make this more flexible
with configure. For now, use "--enable-icache" with no parameters.
- Modified fetchdecode.cc/fetchdecode64.cc just enough so that
instructions which encode a direct address now use a memory
resolution function which just sticks the immediate address
into rm_addr. With cached instructions we need this.
to bitfields. bxInstruction_c is now 24 bytes, including 4 for
the memory addr resolution function pointer, and 4 for the
execution function pointer (16 + 4 + 4).
Coded more accessors, to abstract access from most code.
with accessors. Had to touch a number of files to update the
access using the new accessors.
Moved rm_addr to the CPU structure, to slim down bxInstruction_c
and to prevent future instruction caching from getting sprayed
with writes to individual rm_addr fields. There only needs to
be one. Though need to deal with instructions which have
static non-modrm addresses, but which are using rm_addr since
that will change.
bxInstruction_c is down to about 40 bytes now. Trying to
get down to 24 bytes.
use accessors. This lets me work on compressing the
size of fetch-decode structure (now called bxInstruction_c).
I've reduced it down to about 76 bytes. We should be able
to do much better soon. I needed the abstraction of the
accessors, so I have a lot of freedom to re-arrange things
without making massive future changes.
Lost a few percent of performance in these mods, but my
main focus was to get the abstraction.
enhancement to bochs. You can now configure with
--enable-guest2host-tlb.
Force the support of big pages (PSE) when x86-64 is configured.
Reverted back to only one kind of TLB entry style, since everything
is ported.
Fixed one bug in io.cc with as_64 and the index registers.
There are others, as noticed by Peter.
be used at all, and Peter didn't want it. "extdb.o" is compiled
into libcpu.a, if configured for it.
Removed a few #warnings for x86-64 compile, based on Peter's
line-item comments regarding the warnings I inserted during
the port/merge.
cpu64 directories. Instead of using the macros introduced in cpu.h rev 1.37
such as GetEFlagsDFLogical and SetEFlagsDF and ClearEFlagsDF, I made inline
methods on the BX_CPU_C object that access the eflags fields. The problem
with the macros is that they cannot be used outside the BX_CPU_C object. The
macros have now been removed, and all references to eflags now use these new
accessors.
- I debated whether to put the accessors as members of the BX_CPU_C object
or members of the bx_flags_reg_t struct. I chose to make them members
of BX_CPU_C for two reasons: 1. the lazy flags are implemented as
members of BX_CPU_C, and 2. the eflags are referenced in many many places
and it is more compact without having to put eflags in front of each. (The
real problem with compactness is having to write BX_CPU_THIS_PTR in front of
everything, but that's another story.)
- Kevin pointed out a major bug in my set accessor code. What a difference a
little tilde can make! That is fixed now.
- modified: load32bitOShack.cc debug/dbg_main.cc
and in both cpu and cpu64 directories:
cpu.cc cpu.h ctrl_xfer_pro.cc debugstuff.cc exception.cc flag_ctrl.cc
flag_ctrl_pro.cc init.cc io.cc io_pro.cc proc_ctrl.cc soft_int.cc
string.cc vm8086.cc
to request bulk IO operations to IO devices which are bulk IO aware.
Currently, I modified only harddrv.cc to be aware. I added some
fields to the bx_devices_c class for the IO instructions to
place requests and receive responses from the IO device emulation.
Devices except the hard drive, don't monitor these fields so they
respond as normal. The hard drive now monitors these fields for
bulk requests, and if enabled, it memcpy()'s data straight from
the disk buffer to memory. This eliminates numerous inp/outp calling
sequences per disk sector.
I used the fields in bx_devices_c so that I would not have to
disrupt most IO device modules. Enhancements can be made to
other devices if they use high-bandwidth IO via in/out instructions.
All the EFLAGS bits used to be cached in separate fields. I left
a few of them in separate fields for now - might remove them
at some point also. When the arithmetic fields are known
(ie they're not in lazy mode), they are all cached in a
32-bit EFLAGS image, just like the x86 EFLAGS register expects.
All other eflags are store in the 32-bit register also, with
a few also mirrored in separate fields for now.
The reason I did this, was so that on x86 hosts, asm() statements
can be #ifdef'd in to do the calculation and get the native
eflags results very cheaply. Just to test that it works, I
coded ADD_EdId() and ADD_EwIw() with some conditionally compiled
asm()s for accelerated eflags processing and it works.
-Kevin
checks were honoring the EFLAGS.DF bit, but assuming it was always
equal to 0 (increment upward). Plus some general cleanup of the
acceleration code.
I left the default of '--enable-repeat-speedups' to disabled, but
it seems pretty solid. Definitely adds performance for disk
heavy workloads.
added --enable-repeat-speedups with default to disabled.
Reconfigure/recompile and the speedup code will be #ifdef'd
out for now. It manifested as junk written to the VGA screen
while booting/running Windows.
Also made some more mods to the main cpu loop. Moved the
handling of EXT/errorno outside the main loop, much like
the extra EIP/ESP commits were moved, for a little better
performance.
I changed the fetch_ptr/bytesleft method of fetching to
a slightly different model, which calculates a window
for which EIP will be valid (land on the current page),
and a bias which when applied to EIP will be from
0..upper_page_limit. Speed is about the same for either
method, but a pseudo-op/threaded-interpreter will plug
in better with this and be faster.
- Paging code rehash. You must now use --enable-4meg-pages to
use 4Meg pages, with the default of disabled, since we don't well
support 4Meg pages yet. Paging table walks model a real CPU
more closely now, and I fixed some bugs in the old logic.
- Segment check redundancy elimination. After a segment is loaded,
reads and writes are marked when a segment type check succeeds, and
they are skipped thereafter, when possible.
- Repeated IO and memory string copy acceleration. Only some variants
of instructions are available on all platforms, word and dword
variants only on x86 for the moment due to alignment and endian issues.
This is compiled in currently with no option - I should add a configure
option.
- Added a guest linear address to host TLB. Actually, I just stick
the host address (mem.vector[addr] address) in the upper 29 bits
of the field 'combined_access' since they are unused. Convenient
for now. I'm only storing page frame addresses. This was the
simplest for of such a TLB. We can likely enhance this. Also,
I only accelerated the normal read/write routines in access.cc.
Could also modify the read-modify-write versions too. You must
use --enable-guest2host-tlb, to try this out. Currently speeds
up Win95 boot time by about 3.5% for me. More ground to cover...
- Minor mods to CPUI/MOV_CdRd for CMOV.
- Integrated enhancements from Volker to getHostMemAddr() for PCI
being enabled.
tries to fix it. The shortcuts to register names such as AX and DL are
#defines in cpu/cpu.h, and they are defined in terms of BX_CPU_THIS_PTR.
When BX_USE_CPU_SMF=1, this works fine. (This is what bochs used for
a long time, and nobody used the SMF=0 mode at all.) To make SMP bochs
work, I had to get SMF=0 mode working for the CPU so that there could
be an array of cpus.
When SMF=0 for the CPU, BX_CPU_THIS_PTR is defined to be "this->" which
only works within methods of BX_CPU_C. Code outside of BX_CPU_C must
reference BX_CPU(num) instead.
- to try to enforce the correct use of AL/AX/DL/etc. shortcuts, they are
now only #defined when "NEED_CPU_REG_SHORTCUTS" is #defined. This is
only done in the cpu/*.cc code.
To see the commit logs for this use either cvsweb or
cvs update -r BRANCH-io-cleanup and then 'cvs log' the various files.
In general this provides a generic interface for logging.
logfunctions:: is a class that is inherited by some classes, and also
. allocated as a standalone global called 'genlog'. All logging uses
. one of the ::info(), ::error(), ::ldebug(), ::panic() methods of this
. class through 'BX_INFO(), BX_ERROR(), BX_DEBUG(), BX_PANIC()' macros
. respectively.
.
. An example usage:
. BX_INFO(("Hello, World!\n"));
iofunctions:: is a class that is allocated once by default, and assigned
as the iofunction of each logfunctions instance. It is this class that
maintains the file descriptor and other output related code, at this
point using vfprintf(). At some future point, someone may choose to
write a gui 'console' for bochs to which messages would be redirected
simply by assigning a different iofunction class to the various logfunctions
objects.
More cleanup is coming, but this works for now. If you want to see alot
of debugging output, in main.cc, change onoff[LOGLEV_DEBUG]=0 to =1.
Comments, bugs, flames, to me: todd@fries.net