All the EFLAGS bits used to be cached in separate fields. I left
a few of them in separate fields for now - might remove them
at some point also. When the arithmetic fields are known
(ie they're not in lazy mode), they are all cached in a
32-bit EFLAGS image, just like the x86 EFLAGS register expects.
All other eflags are store in the 32-bit register also, with
a few also mirrored in separate fields for now.
The reason I did this, was so that on x86 hosts, asm() statements
can be #ifdef'd in to do the calculation and get the native
eflags results very cheaply. Just to test that it works, I
coded ADD_EdId() and ADD_EwIw() with some conditionally compiled
asm()s for accelerated eflags processing and it works.
-Kevin
it can decide how to proceed. Some of those bits are necessary
to make TLB invalidation decisions. INVLPG doesn't cause
a whole TLB flush anymore, just one page. Some of the
current CPU behaviours model the P6, especially on CR0
reloads. Earlier processors kept some pre-change pre-fetched
instructions until a branch. We could probably model that
by setting a flag, and letting the revalidate_prefetch_q
function cause serialization.
The TLB flush code only invalidates entries which are not
already invalidated for the case where the TLB invalidation
ID trick is not in use.
mode uses the notion of the guest-to-host TLB. This has the
benefit of allowing more uniform and streamlined acceleration
code in access.cc which does not have to check if CR0.PG
is set, eliminating a few instructions per guest access.
Shaved just a little off execution time, as expected.
Also, access_linear now breaks accesses which span two pages,
into two calls the the physical memory routines, when paging
is off, just like it always has for paging on. Besides
being more uniform, this allows the physical memory access
routines to known the complete data item is contained
within a single physical page, and stop reapplying the
A20ADDR() macro to pointers as it increments them.
Perhaps things can be optimized a little more now there too...
I renamed the routines to {read,write}PhysicalPage() as
a reminder that these routines now operate on data
solely within one page.
I also added a little code so that the paging module is
notified when the A20 line is tweaked, so it can dump
whatever mappings it wants to.
so that a compare of the current access could be done more
efficiently against the cached values, both in the normal
paging routines, and in the accelerated code in access.cc.
This cut down the amount of code path needed to get to
direct use of a host address nicely, and speed definitely
got a boost as a result, especially if you use the
--enable-guest2host-tlb option.
The CR0.WP flag was a real pain, because it imparts
a complication on the way protections work. Fortunately
it's not a high-change flag, so I just base the new
cached info on the current CR0.WP value, and dump
the TLB cache when it changes.
- Paging code rehash. You must now use --enable-4meg-pages to
use 4Meg pages, with the default of disabled, since we don't well
support 4Meg pages yet. Paging table walks model a real CPU
more closely now, and I fixed some bugs in the old logic.
- Segment check redundancy elimination. After a segment is loaded,
reads and writes are marked when a segment type check succeeds, and
they are skipped thereafter, when possible.
- Repeated IO and memory string copy acceleration. Only some variants
of instructions are available on all platforms, word and dword
variants only on x86 for the moment due to alignment and endian issues.
This is compiled in currently with no option - I should add a configure
option.
- Added a guest linear address to host TLB. Actually, I just stick
the host address (mem.vector[addr] address) in the upper 29 bits
of the field 'combined_access' since they are unused. Convenient
for now. I'm only storing page frame addresses. This was the
simplest for of such a TLB. We can likely enhance this. Also,
I only accelerated the normal read/write routines in access.cc.
Could also modify the read-modify-write versions too. You must
use --enable-guest2host-tlb, to try this out. Currently speeds
up Win95 boot time by about 3.5% for me. More ground to cover...
- Minor mods to CPUI/MOV_CdRd for CMOV.
- Integrated enhancements from Volker to getHostMemAddr() for PCI
being enabled.
for BX_CPU_LEVEL >= 6, and to have the CMOV instructions generate
an undefined opcode exception after printing info that they were
called, if BX_CPU_LEVEL <= 5. I suppose we could have a separate
configure option, but mirroring Intel, CMOV is available as of
Pentium Pro.
For now, you have to compile with --enable-cpu-level=6 for CMOV
support to be compiled in.
tries to fix it. The shortcuts to register names such as AX and DL are
#defines in cpu/cpu.h, and they are defined in terms of BX_CPU_THIS_PTR.
When BX_USE_CPU_SMF=1, this works fine. (This is what bochs used for
a long time, and nobody used the SMF=0 mode at all.) To make SMP bochs
work, I had to get SMF=0 mode working for the CPU so that there could
be an array of cpus.
When SMF=0 for the CPU, BX_CPU_THIS_PTR is defined to be "this->" which
only works within methods of BX_CPU_C. Code outside of BX_CPU_C must
reference BX_CPU(num) instead.
- to try to enforce the correct use of AL/AX/DL/etc. shortcuts, they are
now only #defined when "NEED_CPU_REG_SHORTCUTS" is #defined. This is
only done in the cpu/*.cc code.
in BRANCH-smp-bochs revisions.
- The general task was to make multiple CPU's which communicate
through their APICs. So instead of BX_CPU and BX_MEM, we now have
BX_CPU(x) and BX_MEM(y). For an SMP simulation you have several
processors in a shared memory space, so there might be processors
BX_CPU(0..3) but only one memory space BX_MEM(0). For cosimulation,
you could have BX_CPU(0) with BX_MEM(0), then BX_CPU(1) with
BX_MEM(1). WARNING: Cosimulation is almost certainly broken by the
SMP changes.
- to simulate multiple CPUs, you have to give each CPU time to execute
in turn. This is currently implemented using debugger guards. The
cpu loop steps one CPU for a few instructions, then steps the
next CPU for a few instructions, etc.
- there is some limited support in the debugger for two CPUs, for
example printing information from each CPU when single stepping.
To see the commit logs for this use either cvsweb or
cvs update -r BRANCH-io-cleanup and then 'cvs log' the various files.
In general this provides a generic interface for logging.
logfunctions:: is a class that is inherited by some classes, and also
. allocated as a standalone global called 'genlog'. All logging uses
. one of the ::info(), ::error(), ::ldebug(), ::panic() methods of this
. class through 'BX_INFO(), BX_ERROR(), BX_DEBUG(), BX_PANIC()' macros
. respectively.
.
. An example usage:
. BX_INFO(("Hello, World!\n"));
iofunctions:: is a class that is allocated once by default, and assigned
as the iofunction of each logfunctions instance. It is this class that
maintains the file descriptor and other output related code, at this
point using vfprintf(). At some future point, someone may choose to
write a gui 'console' for bochs to which messages would be redirected
simply by assigning a different iofunction class to the various logfunctions
objects.
More cleanup is coming, but this works for now. If you want to see alot
of debugging output, in main.cc, change onoff[LOGLEV_DEBUG]=0 to =1.
Comments, bugs, flames, to me: todd@fries.net