Commit Graph

939 Commits

Author SHA1 Message Date
Kevin Lawton
425ad824c0 I changed the TLB entry from 3 dwords to 4, and (when you compile
with GCC) align them with the GCC special alignment attribute.
Since there was then one available field, I split the protection
attributes and native host pointers into their own fields.

Before, with 3 dwords per TLB entry, some entries (about 3/8)
were spanning two processor cache lines (assuming a 32-byte
cache line).  Now, they all fit within one cache line.

Knocked about 1.4% off Win95 boot time, probably more off normal
software runs.
2002-09-10 00:01:01 +00:00
Bryce Denney
be659a09b3 - check in Stanislav Shwartsman's patch "bochs-mmx.patch-endian-support".
He writes: Detailed description: MMX instruction set support.
  Also supports BIG_ENDIAN systems. Tested on Solaris and HP1100.
- modified files:
    configure.in cpu/Makefile.in cpu/cpu.h cpu/fetchdecode.cc
    cpu/proc_ctrl.cc fpu/fpu_system.h fpu/wmFPUemu_glue.cc
- added files: cpu/i387.h cpu/mmx.cc
2002-09-09 16:11:25 +00:00
Kevin Lawton
0d7a5fdf3c I rehashed the way the EFLAGS register was stored internally.
All the EFLAGS bits used to be cached in separate fields.  I left
a few of them in separate fields for now - might remove them
at some point also.  When the arithmetic fields are known
(ie they're not in lazy mode), they are all cached in a
32-bit EFLAGS image, just like the x86 EFLAGS register expects.
All other eflags are store in the 32-bit register also, with
a few also mirrored in separate fields for now.

The reason I did this, was so that on x86 hosts, asm() statements
can be #ifdef'd in to do the calculation and get the native
eflags results very cheaply.  Just to test that it works, I
coded ADD_EdId() and ADD_EwIw() with some conditionally compiled
asm()s for accelerated eflags processing and it works.

-Kevin
2002-09-08 04:08:14 +00:00
Kevin Lawton
51c93e12a1 The paging unit gets notified of all CR0/CR3/CR4 updates so
it can decide how to proceed.  Some of those bits are necessary
to make TLB invalidation decisions.  INVLPG doesn't cause
a whole TLB flush anymore, just one page.  Some of the
current CPU behaviours model the P6, especially on CR0
reloads.  Earlier processors kept some pre-change pre-fetched
instructions until a branch.  We could probably model that
by setting a flag, and letting the revalidate_prefetch_q
function cause serialization.

The TLB flush code only invalidates entries which are not
already invalidated for the case where the TLB invalidation
ID trick is not in use.
2002-09-07 05:21:28 +00:00
Kevin Lawton
491035fcb2 I extended the guest-to-host TLB acceleration across the
Read-Modify-Write instructions.  The first read phase stores
the host pointer in the "pages" field if a direct use pointer
is available.  The Write phase first checks if a pointer was
issued and uses it for a direct write if available.

I chose the "pages" field since it needs to be checked by the
write_RMW_virtual variants anyways and thus needs to be
cached anyways.

Mostly the mods where to access.cc, but I did also macro-ize
the calls to write_RMW_virtual...() in files which use it
and cpu.h.  Right now, the macro is just a straight pass-through.
I tried expanding it to a quick initial check for the pointer
availability to do the write in-place, with a function call
as a fall-back.  That didn't seemed to matter at all.

Booting is not helped by this really.  The upper bound of
the gain is 5 or 6%, and that's only if you have a loop that
looks like:

label:
  add [eax], ebx   ;; mega read-modify-write instruction
  jmp label        ;; intensive loop.
2002-09-06 21:54:58 +00:00
Gregory Alexander
4f6039f533 Macroize BX_TLB_QUICK_INVALIDATE code.
Kevin Lawton says he doesn't get a performance benefit.

I'm not sure if I do.  Either way, the difference isn't
very large.

This code may get removed if it turns out to be useless.
2002-09-06 19:21:55 +00:00
Gregory Alexander
1c3ae99300 Speed-up for TLB invalidates as proposed by Peter Tattam.
I had been planning on this same thing in a similar form
for the I$, so this made a lot of sense, and was easy to
implement.
2002-09-06 14:58:56 +00:00
Bryce Denney
85b3dfe60f - fix minor problems with static member function declarations:
- bx_gen_reg cannot be declared with BX_SMF or it can't read gen_reg
    when static member functions are turned on.
  - use "BX_CPU_C_PREFIX" instead of "BX_CPU_C::" for get_segment_base.
- the SMF (static member function) tricks are just plain wierd. The only way to
  really be sure that you're not breaking something is to try compiling it with
  SMF on and with SMF off.  e.g. "configure && make" and
  "configure --enable-processors=2 && make".
2002-09-05 20:16:40 +00:00
Stanislav Shwartsman
2d2651a0f3 Added some useful debug/information methods for BX_CPU class 2002-09-05 19:46:20 +00:00
Stanislav Shwartsman
611d983900 Added get_REGISTER functions for all registers 2002-09-05 19:12:02 +00:00
Kevin Lawton
f0c9896964 Now, when you compile with --enable-guest2host-tlb, non-paged
mode uses the notion of the guest-to-host TLB.  This has the
benefit of allowing more uniform and streamlined acceleration
code in access.cc which does not have to check if CR0.PG
is set, eliminating a few instructions per guest access.
Shaved just a little off execution time, as expected.

Also, access_linear now breaks accesses which span two pages,
into two calls the the physical memory routines, when paging
is off, just like it always has for paging on.  Besides
being more uniform, this allows the physical memory access
routines to known the complete data item is contained
within a single physical page, and stop reapplying the
A20ADDR() macro to pointers as it increments them.
Perhaps things can be optimized a little more now there too...
I renamed the routines to {read,write}PhysicalPage() as
a reminder that these routines now operate on data
solely within one page.

I also added a little code so that the paging module is
notified when the A20 line is tweaked, so it can dump
whatever mappings it wants to.
2002-09-05 02:31:24 +00:00
Kevin Lawton
8a1baa6bb8 Added ::{read,write}_virtual_qword() functions as per Stanislav's request.
I have not tested these functions, but they model the format and
acceleration principals of the byte/word/dword functions.  Give them
a try on both little/big endian machines.
2002-09-04 20:23:54 +00:00
Kevin Lawton
d07c1c0bb0 I rehashed the way the paging code stores protection bits,
so that a compare of the current access could be done more
efficiently against the cached values, both in the normal
paging routines, and in the accelerated code in access.cc.

This cut down the amount of code path needed to get to
direct use of a host address nicely, and speed definitely
got a boost as a result, especially if you use the
--enable-guest2host-tlb option.

The CR0.WP flag was a real pain, because it imparts
a complication on the way protections work.  Fortunately
it's not a high-change flag, so I just base the new
cached info on the current CR0.WP value, and dump
the TLB cache when it changes.
2002-09-04 08:59:13 +00:00
Kevin Lawton
3f2d28f86c Added guest2host TLB tricks to read-modify-write variants of
access routines in access.cc, completing the upgrade of
those routines.  You do need '--enable-guest2host-tlb', before
you get the speedups for now.  The guest2host mods seem pretty
solid, though I do need to see what effects the A20 line has
on this cache and the paging TLB in general.
2002-09-03 04:54:28 +00:00
Kevin Lawton
746f09b427 There's a bug in the repeated IO & mem copy speedups. I
added --enable-repeat-speedups with default to disabled.
Reconfigure/recompile and the speedup code will be #ifdef'd
out for now.  It manifested as junk written to the VGA screen
while booting/running Windows.

Also made some more mods to the main cpu loop.  Moved the
handling of EXT/errorno outside the main loop, much like
the extra EIP/ESP commits were moved, for a little better
performance.

I changed the fetch_ptr/bytesleft method of fetching to
a slightly different model, which calculates a window
for which EIP will be valid (land on the current page),
and a bias which when applied to EIP will be from
0..upper_page_limit.  Speed is about the same for either
method, but a pseudo-op/threaded-interpreter will plug
in better with this and be faster.
2002-09-02 18:44:35 +00:00
Kevin Lawton
3d8e5f8b61 Removed the BX_FETCHDECODE_CACHE mods, and the patch that
Bryce created for use of ensuring all mods were removed
cleanly.
2002-09-01 23:02:36 +00:00
Kevin Lawton
3a5f338419 Integrated patches for:
- Paging code rehash.  You must now use --enable-4meg-pages to
    use 4Meg pages, with the default of disabled, since we don't well
    support 4Meg pages yet.  Paging table walks model a real CPU
    more closely now, and I fixed some bugs in the old logic.
  - Segment check redundancy elimination.  After a segment is loaded,
    reads and writes are marked when a segment type check succeeds, and
    they are skipped thereafter, when possible.
  - Repeated IO and memory string copy acceleration.  Only some variants
    of instructions are available on all platforms, word and dword
    variants only on x86 for the moment due to alignment and endian issues.
    This is compiled in currently with no option - I should add a configure
    option.
  - Added a guest linear address to host TLB.  Actually, I just stick
    the host address (mem.vector[addr] address) in the upper 29 bits
    of the field 'combined_access' since they are unused.  Convenient
    for now.  I'm only storing page frame addresses.  This was the
    simplest for of such a TLB.  We can likely enhance this.  Also,
    I only accelerated the normal read/write routines in access.cc.
    Could also modify the read-modify-write versions too.  You must
    use --enable-guest2host-tlb, to try this out.  Currently speeds
    up Win95 boot time by about 3.5% for me.  More ground to cover...
  - Minor mods to CPUI/MOV_CdRd for CMOV.
  - Integrated enhancements from Volker to getHostMemAddr() for PCI
    being enabled.
2002-09-01 20:12:09 +00:00
Gregory Alexander
1be5b1d46c Added a linked list to further speed up icache invalidates.
These should be pretty snappy now.  It's time to generate
some actual statistics.

 Modified Files:
 	cpu/cpu.cc cpu/cpu.h cpu/init.cc memory/memory.cc
2002-06-05 21:51:30 +00:00
Gregory Alexander
c41505e342 Added a RPN directory for the cache to help make invalidates
faster.  Hopefully this won't slow things down too much.

 	config.h.in cpu/cpu.cc cpu/cpu.h memory/memory.cc
2002-06-05 03:59:31 +00:00
Gregory Alexander
fda1b874e9 Check in FETCHDECODE Caching, with changes.
Specific changes from the patch:

1.) renamed fdcache_eip to fdcache_ip, as it is using
the RIP instead of the EIP.

2.) added a Boolean array fdcache_is32 which uses is32
to determine icache hits.  Otherwise we could run 32-bit
code as 16-bit or vice versa.


 Modified Files:
 	config.h.in cpu/cpu.cc cpu/cpu.h memory/memory.cc
2002-06-03 22:39:11 +00:00
Bryce Denney
30aaf4088e - commit patch.wxwindows.gz in the main branch. Now you can try out
the wxwindows interface by just "configure --with-wx; make"

  Modified Files:
    Makefile.in bochs.h config.h.in configure configure.in
    load32bitOShack.cc logio.cc main.cc cpu/cpu.cc cpu/cpu.h
    debug/dbg_main.cc gui/Makefile.in gui/control.cc gui/gui.cc
    gui/siminterface.cc gui/siminterface.h gui/x.cc iodev/cdrom.cc
    iodev/keyboard.cc memory/misc_mem.cc
  Added Files:
    README-wxWindows wxbochs.rc gui/wx.cc gui/wxmain.cc
    gui/wxmain.h gui/bitmaps/cdromd.xpm
    gui/bitmaps/configbutton.xpm gui/bitmaps/copy.xpm
    gui/bitmaps/floppya.xpm gui/bitmaps/floppyb.xpm
    gui/bitmaps/mouse.xpm gui/bitmaps/paste.xpm
    gui/bitmaps/power.xpm gui/bitmaps/reset.xpm
    gui/bitmaps/snapshot.xpm
  Removed Files:
    patches/patch.wxwindows.gz
2002-04-18 00:22:20 +00:00
instinc
22dc1c4f96 added address of the caught watchpoint 2002-04-01 04:42:43 +00:00
Bryce Denney
640d71d017 - check in Zwane Mwaikambo's MSR patch: patch.msr. 2002-03-27 16:04:05 +00:00
Bryce Denney
b8ecf5b118 - apply patch.smp-sync-arb-ids. This patch adds a local APIC behavior
that was missing before, the special "INIT Level Deassert" synchronize
  arbitration ID trick.
2002-03-25 01:58:34 +00:00
instinc
dabe63ef72 added a control variable for debugger to know if register tracing is required or not 2001-10-03 19:53:48 +00:00
Bryce Denney
daf2a9fb55 - add RCS Id to header of every file. This makes it easier to know what's
going on when someone sends in a modified file.
2001-10-03 13:10:38 +00:00
Bryce Denney
6a1c01c8b5 - back out my poorly written patch.virtual-address-checks-overflow 2001-10-02 20:01:29 +00:00
Bryce Denney
67ebaaca87 - apply patch.virtual-addr-checks-overflow to fix bug
[ #433759 ] virtual address checks can overflow
  > Bochs has been crashing in some cases when you try to access data which
  > overlaps the segment limit, when the segment limit is near the 32-bit
  > boundary.  The example that came up a few times is reading/writing 4 bytes
  > starting at 0xffffffff when the segment limit was 0xffffffff.  The
  > condition used to compare offset+length-1 with the limit, but
  > offset+length-1 was overflowing so the comparison went wrong.  This patch
  > changes the condition so that it supports all segment limits except for
  > sizes 0,1,2,3 bytes.  Dave and I figured that these sizes would not be
  > needed, while size 0xffffffff is used quite a lot.
2001-10-02 17:02:28 +00:00
Bryce Denney
fd7e7ee86c - added debugger command "info fpu" which prints all FPU registers
in an output format similar to gdb (when you do info all-registers).
  Also, if you do "info all" you get the CPU registers and the FPU
  registers.
- added bx_cpu_c method called fpu_print_regs, which is implemented
  in wmFPUemu_glue.cc
2001-09-15 06:55:14 +00:00
Bryce Denney
ad11335293 - remove space after line continuation character.
Thanks to Martijn Boekhorst <Martijn@boekhorst.net> for pointing it out.
2001-09-11 23:32:14 +00:00
Bryce Denney
3d29d5d614 - add instrumentation macros for begin and end opcode. These are usually
defined to be empty, so there should be no effect except for instrumentation
2001-06-28 19:45:44 +00:00
Bryce Denney
f822257511 - there were cases where BX_APIC_SUPPORT were used and others where
BX_SUPPORT_APIC were used.  To follow the pattern used by other
  names like this, I changed them all to BX_SUPPORT_APIC.
  Thanks to Tom Lindström for chasing this down!
2001-06-12 13:07:43 +00:00
Bryce Denney
565fa8ea8e - another speed boost: when not using SMP, use
BX_CPU_C bx_cpu;
     BX_MEM_C bx_mem;
  and when more than one processor, use
     BX_CPU_C    *bx_cpu_array[BX_SMP_PROCESSORS];
     BX_MEM_C    *bx_mem_array[BX_ADDRESS_SPACES];
  The changeover is controlled by BX_SMP_PROCESSORS, but there are only
  a few code changes since nearly all code uses the BX_CPU(n) and BX_MEM(n)
  macros.
- This turns out to make a 10% speed difference!  With this revision,
  the CVS version now gets 95% of the performance of the 3/25/2000
  snapshot, which I've been using as my baseline.
2001-06-05 17:35:08 +00:00
Bryce Denney
49664f7503 - parts of the SMP merge apparantly broke the debugger and this revision
tries to fix it.  The shortcuts to register names such as AX and DL are
  #defines in cpu/cpu.h, and they are defined in terms of BX_CPU_THIS_PTR.
  When BX_USE_CPU_SMF=1, this works fine.  (This is what bochs used for
  a long time, and nobody used the SMF=0 mode at all.)  To make SMP bochs
  work, I had to get SMF=0 mode working for the CPU so that there could
  be an array of cpus.

  When SMF=0 for the CPU, BX_CPU_THIS_PTR is defined to be "this->" which
  only works within methods of BX_CPU_C.  Code outside of BX_CPU_C must
  reference BX_CPU(num) instead.
- to try to enforce the correct use of AL/AX/DL/etc. shortcuts, they are
  now only #defined when "NEED_CPU_REG_SHORTCUTS" is #defined.  This is
  only done in the cpu/*.cc code.
2001-05-24 18:46:34 +00:00
Bryce Denney
e61d00351f - merged BRANCH-smp-bochs into main branch. For details see comments
in BRANCH-smp-bochs revisions.
- The general task was to make multiple CPU's which communicate
  through their APICs.  So instead of BX_CPU and BX_MEM, we now have
  BX_CPU(x) and BX_MEM(y).  For an SMP simulation you have several
  processors in a shared memory space, so there might be processors
  BX_CPU(0..3) but only one memory space BX_MEM(0).  For cosimulation,
  you could have BX_CPU(0) with BX_MEM(0), then BX_CPU(1) with
  BX_MEM(1).  WARNING: Cosimulation is almost certainly broken by the
  SMP changes.
- to simulate multiple CPUs, you have to give each CPU time to execute
  in turn.  This is currently implemented using debugger guards.  The
  cpu loop steps one CPU for a few instructions, then steps the
  next CPU for a few instructions, etc.
- there is some limited support in the debugger for two CPUs, for
  example printing information from each CPU when single stepping.
2001-05-23 08:16:07 +00:00
Todd T.Fries
bdb89cd364 merge in BRANCH-io-cleanup.
To see the commit logs for this use either cvsweb or
cvs update -r BRANCH-io-cleanup and then 'cvs log' the various files.

In general this provides a generic interface for logging.

logfunctions:: is a class that is inherited by some classes, and also
.   allocated as a standalone global called 'genlog'.  All logging uses
.   one of the ::info(), ::error(), ::ldebug(), ::panic() methods of this
.   class through 'BX_INFO(), BX_ERROR(), BX_DEBUG(), BX_PANIC()' macros
.   respectively.
.
.   An example usage:
.     BX_INFO(("Hello, World!\n"));

iofunctions:: is a class that is allocated once by default, and assigned
as the iofunction of each logfunctions instance.  It is this class that
maintains the file descriptor and other output related code, at this
point using vfprintf().  At some future point, someone may choose to
write a gui 'console' for bochs to which messages would be redirected
simply by assigning a different iofunction class to the various logfunctions
objects.

More cleanup is coming, but this works for now.  If you want to see alot
of debugging output, in main.cc, change onoff[LOGLEV_DEBUG]=0 to =1.

Comments, bugs, flames, to me: todd@fries.net
2001-05-15 14:49:57 +00:00
Bryce Denney
a6fef54678 - update copyright dates to 2001 for all mandrake headers
- for bochs files with other header, replaced with current mandrake header
2001-04-10 02:20:02 +00:00
Bryce Denney
4e04f4cb58 - change all inline declarations to one of two macros: BX_C_INLINE or
BX_CPP_INLINE.  Then in config.h.in you can define these two as you
  wish.
2001-04-10 02:10:09 +00:00
cvs
beff63eb32 - entered original Bochs snapshot bochs-2000_0325a.tar.gz from
ftp.bochs.com
2001-04-10 01:04:59 +00:00