Commit Graph

55 Commits

Author SHA1 Message Date
Stanislav Shwartsman
167c7075fb Use fastcall gcc attribute for all cpu execution functions - this pure "compiler helper" optimization brings additional 2% speedup to Bochs code 2008-03-22 21:29:41 +00:00
Stanislav Shwartsman
eea5e6eac5 Simplify RepeatSpeedups optimizations - restrict them only to segments which already passed access/limit validation and avoid mass of heavy checks during repeat speedup itself. 2008-03-21 20:35:46 +00:00
Stanislav Shwartsman
063d896226 Optimization in 16-bit resolve functions
Fixes for hosts which can't support misaligned memory access
2008-02-07 20:43:13 +00:00
Stanislav Shwartsman
a2897933a3 white space cleanup 2008-02-02 21:46:54 +00:00
Stanislav Shwartsman
37fbb82baa Cleanups. Move bxInstruction_c definition to separate file instr.h 2008-01-29 17:13:10 +00:00
Stanislav Shwartsman
79fc57dec8 Fixed more VCPP2008 warnings 2007-12-26 23:07:44 +00:00
Stanislav Shwartsman
38fb3d78be small cleanup in repeat code 2007-12-23 18:09:34 +00:00
Stanislav Shwartsman
5d4e32b8da Avoid pointer params for every read_virtual_* except 16-byte SSE and 10-byte x87 reads 2007-12-20 20:58:38 +00:00
Stanislav Shwartsman
b516589e4e Changes in write_virtual_* and pop_* functions -> avoid moving parameteres by pointer 2007-12-20 18:29:42 +00:00
Stanislav Shwartsman
fe2e0525da More optimization for string instructions 2007-12-17 19:52:01 +00:00
Stanislav Shwartsman
0af87ab63b Split string instructions according to the address size - simpler and faster 2007-12-17 18:48:26 +00:00
Stanislav Shwartsman
46366b5064 Speedup simulation by eliminating CPL==3 check from read/write_virtual* functions 2007-12-16 21:03:46 +00:00
Stanislav Shwartsman
1af7010e50 Optimized memory access for 64-bit mode
Starting convergence to new lazy flags scheme by Darek Mihocka (www.emulators.com). The new flags code is still being validated and perfected but I try to minimize the diff between 2 versionS
2007-11-20 17:15:33 +00:00
Stanislav Shwartsman
83f6eb6945 Changes copyrights for the files I wrote :)
Also split EqId G1 group for x86-64
2007-11-17 23:28:33 +00:00
Stanislav Shwartsman
a83b8ae843 Slight speed improvement in string functions 2007-10-29 15:39:18 +00:00
Stanislav Shwartsman
fbcdfa49c2 Cleanup 2007-10-10 22:20:32 +00:00
Stanislav Shwartsman
deb79e9675 [Bochs-developers] [PATCH] avoid RCX without BX_SUPPORT_X86_64 2007-09-27 16:11:32 +00:00
Stanislav Shwartsman
e812f81e7b Fixes in zero upper ECX 2007-09-25 16:11:32 +00:00
Stanislav Shwartsman
38d1f39c77 Converted CR0 bits to one register similar to CR4 - a bit slower but helps with other features implemntation 2007-07-09 15:16:14 +00:00
Stanislav Shwartsman
5c21f7821f Speed simulation between 3 to 5% by eliminating several checks from cpu loop.
The checks were related to repeat instructions - handle them differently
2007-01-05 13:40:47 +00:00
Stanislav Shwartsman
a4129e5341 Handle NULL_SEG_REG (no segment override) case in fetchdecode.cc 2006-05-24 20:57:37 +00:00
Stanislav Shwartsman
d69eba6c07 Split in/out instructions based on operand size 2006-05-07 18:27:36 +00:00
Stanislav Shwartsman
b8be848943 Use access_type param in getHostMemAddr, less efficient but no copy-paste at least 2006-03-26 19:39:37 +00:00
Stanislav Shwartsman
7b6c2587a9 Now devices could be compiled separatelly from CPU
Averything that required cpu.h include now has it explicitly and there are a lot of files not dependant by CPU at all which will compile a lot faster now ...
2006-03-06 22:03:16 +00:00
Stanislav Shwartsman
f096a80716 Fix code duplication for check_cs descriptor
The function will execute
 - segment is executable code segment
 - conforming/non-conforming segment priviledge checks
 - segment is present
2005-08-01 21:40:17 +00:00
Stanislav Shwartsman
01d8a97613 Try to cleanup/rewrite RepeatSpeedups optimization
This code doesn't add new speedups but makes it very easy
After some validation it could be no problem to enable repeat speedups optimization for REP MOVSx with any address size. And REP STOSx too.
2005-07-04 17:44:08 +00:00
Stanislav Shwartsman
ce8f1ade07 Some not really significant speedups 2005-06-21 17:01:21 +00:00
Stanislav Shwartsman
b25088bf2f Merge patch [1153327] ignore segment bases in x86-64 by Avi Kivity 2005-02-28 18:56:05 +00:00
Stanislav Shwartsman
41578589c1 Merge two patches by Avi Kivity (avik) 2005-02-22 18:24:19 +00:00
Stanislav Shwartsman
0d09a8c8a8 fix code duplication 2004-11-26 19:53:04 +00:00
Stanislav Shwartsman
69c0b06955 fixes in disassembler
split REPEAT instructions according to opsize to speedup execution
now each REPEATABLE instruction splitted to 3 different instructions, one for 16-bit operand size, one for 32-bit and one for 64-bit. Choosing of correct instruction occure in fetchdecode step.
2004-11-20 23:26:32 +00:00
Stanislav Shwartsman
231cd533c6 1. Update changes
2. Fix lazy_flags duplicate instruction patterns
2004-08-16 20:18:01 +00:00
Stanislav Shwartsman
12b68ede54 XADD and ADD instructions have same flags modification rules - remove redundant switch case 2004-08-14 20:44:48 +00:00
Christophe Bothamy
f4dbefad66 - fix bug reported by Thomas Weidner [ 877510 ] amd64 fixes... 2004-04-28 19:57:37 +00:00
Bryce Denney
5e520261db Add plugin support to Bochs by merging all the changes from the
BRANCH_PLUGINS branch!

Authors:
  Bryce Denney
  Christophe Bothamy
  Kevin Lawton (we grabbed a lot of plugin code from plex86)
Testing help from:
  Volker Ruppert
  Don Becker (Psyon)
  Jeremy Parsons (Br'fin)

The change log is too long to paste in here.  To read the change log, do
  cvs log patches/patch.final-from-BRANCH_PLUGINS.gz

All the changes and a detailed description are contained in a patch
called patch.final-from-BRANCH_PLUGINS.gz.  To look at the complete
patch, do
  cvs upd -r1.1 patches/patch.final-from-BRANCH_PLUGINS.gz

Then you will have a local copy of the patch, which you can gunzip and
play with however you want.

Modified Files:
    .bochsrc Makefile.in aclocal.m4 bochs.h config.h.in configure
    configure.in gdbstub.cc logio.cc main.cc pc_system.cc
    pc_system.h state_file.h bios/Makefile.in bios/rombios.c
    cpu/Makefile.in cpu/access.cc cpu/apic.cc cpu/arith16.cc
    cpu/arith32.cc cpu/arith8.cc cpu/cpu.cc cpu/cpu.h
    cpu/ctrl_xfer32.cc cpu/exception.cc cpu/fetchdecode.cc
    cpu/fetchdecode64.cc cpu/flag_ctrl.cc cpu/flag_ctrl_pro.cc
    cpu/init.cc cpu/io.cc cpu/logical16.cc cpu/logical32.cc
    cpu/logical8.cc cpu/paging.cc cpu/proc_ctrl.cc
    cpu/protect_ctrl.cc cpu/segment_ctrl_pro.cc cpu/shift16.cc
    cpu/shift32.cc cpu/stack64.cc cpu/string.cc cpu/tasking.cc
    debug/Makefile.in debug/dbg_main.cc disasm/Makefile.in
    doc/docbook/user/user.dbk dynamic/Makefile.in fpu/Makefile.in
    gui/Makefile.in gui/amigaos.cc gui/beos.cc gui/carbon.cc
    gui/control.cc gui/control.h gui/gui.cc gui/gui.h
    gui/keymap.cc gui/keymap.h gui/macintosh.cc gui/nogui.cc
    gui/rfb.cc gui/sdl.cc gui/sdlkeys.h gui/siminterface.cc
    gui/siminterface.h gui/term.cc gui/win32.cc gui/wx.cc
    gui/wxdialog.cc gui/wxdialog.h gui/wxmain.cc gui/wxmain.h
    gui/x.cc gui/keymaps/sdl-pc-de.map gui/keymaps/sdl-pc-us.map
    gui/keymaps/x11-pc-de.map instrument/example0/instrument.h
    instrument/example1/instrument.h
    instrument/stubs/instrument.cc instrument/stubs/instrument.h
    iodev/Makefile.in iodev/biosdev.cc iodev/biosdev.h
    iodev/cdrom.cc iodev/cmos.cc iodev/cmos.h iodev/devices.cc
    iodev/dma.cc iodev/dma.h iodev/eth_fbsd.cc iodev/eth_linux.cc
    iodev/eth_null.cc iodev/eth_tap.cc iodev/floppy.cc
    iodev/floppy.h iodev/guest2host.cc iodev/guest2host.h
    iodev/harddrv.cc iodev/harddrv.h iodev/iodebug.cc
    iodev/iodebug.h iodev/iodev.h iodev/keyboard.cc
    iodev/keyboard.h iodev/ne2k.cc iodev/ne2k.h iodev/parallel.cc
    iodev/parallel.h iodev/pci.cc iodev/pci.h iodev/pci2isa.cc
    iodev/pci2isa.h iodev/pic.cc iodev/pic.h iodev/pit.cc
    iodev/pit.h iodev/pit_wrap.cc iodev/pit_wrap.h iodev/sb16.cc
    iodev/sb16.h iodev/scancodes.cc iodev/scancodes.h
    iodev/serial.cc iodev/serial.h iodev/slowdown_timer.cc
    iodev/slowdown_timer.h iodev/unmapped.cc iodev/unmapped.h
    iodev/vga.cc iodev/vga.h memory/Makefile.in memory/memory.cc
    memory/memory.h memory/misc_mem.cc misc/bximage.c
    misc/niclist.c
Added Files:
    README-plugins extplugin.h ltdl.c ltdl.h ltdlconf.h.in
    ltmain.sh plugin.cc plugin.h
2002-10-24 21:07:56 +00:00
Stanislav Shwartsman
c5f0ef8c76 Removed duplicated definition of BX_SEG_REGS 2002-10-16 22:10:07 +00:00
Kevin Lawton
b8d7f5c88e Moved the asm() statements from the arithmetic instruction emulation
into inline functions with asm() statements in cpu.h.  This cleans
  up the *.cc code (which now doesn't have any asm()s in it), and
  centralizes the asm() code so constraints can be modified in one
  place.  This also makes it easier to cover more instructions
  with asm()s for more efficient eflags handling.
2002-10-07 22:51:58 +00:00
Kevin Lawton
8e33d2eda1 Integrated patches/patch.extra_eflags_asms from Jas, for more asm()
coverage of the high-frequency eflags instructions.  That should
  complete the asm() eflags updates for now, as we should be stabilizing
  moving towards bochs 2.0.
2002-10-03 18:12:40 +00:00
Kevin Lawton
1e5343b421 As a 1st effort to understand/debug the timer code, I cleanup
up pc_system.h.  Moved all variables under the private: section,
  as well as a few member functions.  The string instructions
  were accessing a field directly (only reads), so I indirected
  that via an inline member function for better abstraction.
2002-09-30 16:43:59 +00:00
Kevin Lawton
82fd79c546 Fixed/updated/cleaned repeat IO & memcpy speedups for Long mode.
Fixed/updated/cleaned guest2host TLB speedups for Long mode.

I now can boot the Linux x86-64 kernel to the VFS mount message,
using all the accelerations.
2002-09-24 04:43:59 +00:00
Kevin Lawton
0cd7346b9c - Added an instruction cache. Size is fixed for the moment,
but if you hand edit cpu/cpu.h, and change BxICacheEntries,
  you can try different sizes.  I'll make this more flexible
  with configure.  For now, use "--enable-icache" with no parameters.

- Modified fetchdecode.cc/fetchdecode64.cc just enough so that
  instructions which encode a direct address now use a memory
  resolution function which just sticks the immediate address
  into rm_addr.  With cached instructions we need this.
2002-09-19 19:17:20 +00:00
Kevin Lawton
4e51dcae40 Converted all the remaining available separate fields in bxInstruction_c
to bitfields.  bxInstruction_c is now 24 bytes, including 4 for
the memory addr resolution function pointer, and 4 for the
execution function pointer (16 + 4 + 4).

Coded more accessors, to abstract access from most code.
2002-09-18 08:00:43 +00:00
Kevin Lawton
6723ca9bf4 Moved more separate fields in the bxInstruction_c into bitfields
with accessors.  Had to touch a number of files to update the
access using the new accessors.

Moved rm_addr to the CPU structure, to slim down bxInstruction_c
and to prevent future instruction caching from getting sprayed
with writes to individual rm_addr fields.  There only needs to
be one.  Though need to deal with instructions which have
static non-modrm addresses, but which are using rm_addr since
that will change.

bxInstruction_c is down to about 40 bytes now.  Trying to
get down to 24 bytes.
2002-09-18 05:36:48 +00:00
Kevin Lawton
07b0df2a8a Updated accessing of modrm/sib addressing information to
use accessors.  This lets me work on compressing the
size of fetch-decode structure (now called bxInstruction_c).

I've reduced it down to about 76 bytes.  We should be able
to do much better soon.  I needed the abstraction of the
accessors, so I have a lot of freedom to re-arrange things
without making massive future changes.

Lost a few percent of performance in these mods, but my
main focus was to get the abstraction.
2002-09-17 22:50:53 +00:00
Kevin Lawton
798fc08585 (cpu64) Merged string.cc. 2002-09-15 05:09:18 +00:00
Bryce Denney
5fc31bcfda - this revision changes the way eflags are accessed throughout the cpu and
cpu64 directories.  Instead of using the macros introduced in cpu.h rev 1.37
  such as GetEFlagsDFLogical and SetEFlagsDF and ClearEFlagsDF, I made inline
  methods on the BX_CPU_C object that access the eflags fields.  The problem
  with the macros is that they cannot be used outside the BX_CPU_C object.  The
  macros have now been removed, and all references to eflags now use these new
  accessors.
- I debated whether to put the accessors as members of the BX_CPU_C object
  or members of the bx_flags_reg_t struct.  I chose to make them members
  of BX_CPU_C for two reasons: 1. the lazy flags are implemented as
  members of BX_CPU_C, and 2. the eflags are referenced in many many places
  and it is more compact without having to put eflags in front of each.  (The
  real problem with compactness is having to write BX_CPU_THIS_PTR in front of
  everything, but that's another story.)
- Kevin pointed out a major bug in my set accessor code.  What a difference a
  little tilde can make!  That is fixed now.
- modified: load32bitOShack.cc debug/dbg_main.cc
  and in both cpu and cpu64 directories:
    cpu.cc cpu.h ctrl_xfer_pro.cc debugstuff.cc exception.cc flag_ctrl.cc
    flag_ctrl_pro.cc init.cc io.cc io_pro.cc proc_ctrl.cc soft_int.cc
    string.cc vm8086.cc
2002-09-12 18:10:46 +00:00
Kevin Lawton
0d7a5fdf3c I rehashed the way the EFLAGS register was stored internally.
All the EFLAGS bits used to be cached in separate fields.  I left
a few of them in separate fields for now - might remove them
at some point also.  When the arithmetic fields are known
(ie they're not in lazy mode), they are all cached in a
32-bit EFLAGS image, just like the x86 EFLAGS register expects.
All other eflags are store in the 32-bit register also, with
a few also mirrored in separate fields for now.

The reason I did this, was so that on x86 hosts, asm() statements
can be #ifdef'd in to do the calculation and get the native
eflags results very cheaply.  Just to test that it works, I
coded ADD_EdId() and ADD_EwIw() with some conditionally compiled
asm()s for accelerated eflags processing and it works.

-Kevin
2002-09-08 04:08:14 +00:00
Kevin Lawton
54bc40971c Fixed repeated IO/string instruction acceleration bug. Not all the
checks were honoring the EFLAGS.DF bit, but assuming it was always
equal to 0 (increment upward).  Plus some general cleanup of the
acceleration code.

I left the default of '--enable-repeat-speedups' to disabled, but
it seems pretty solid.  Definitely adds performance for disk
heavy workloads.
2002-09-03 19:38:27 +00:00
Kevin Lawton
746f09b427 There's a bug in the repeated IO & mem copy speedups. I
added --enable-repeat-speedups with default to disabled.
Reconfigure/recompile and the speedup code will be #ifdef'd
out for now.  It manifested as junk written to the VGA screen
while booting/running Windows.

Also made some more mods to the main cpu loop.  Moved the
handling of EXT/errorno outside the main loop, much like
the extra EIP/ESP commits were moved, for a little better
performance.

I changed the fetch_ptr/bytesleft method of fetching to
a slightly different model, which calculates a window
for which EIP will be valid (land on the current page),
and a bias which when applied to EIP will be from
0..upper_page_limit.  Speed is about the same for either
method, but a pseudo-op/threaded-interpreter will plug
in better with this and be faster.
2002-09-02 18:44:35 +00:00
Kevin Lawton
3a5f338419 Integrated patches for:
- Paging code rehash.  You must now use --enable-4meg-pages to
    use 4Meg pages, with the default of disabled, since we don't well
    support 4Meg pages yet.  Paging table walks model a real CPU
    more closely now, and I fixed some bugs in the old logic.
  - Segment check redundancy elimination.  After a segment is loaded,
    reads and writes are marked when a segment type check succeeds, and
    they are skipped thereafter, when possible.
  - Repeated IO and memory string copy acceleration.  Only some variants
    of instructions are available on all platforms, word and dword
    variants only on x86 for the moment due to alignment and endian issues.
    This is compiled in currently with no option - I should add a configure
    option.
  - Added a guest linear address to host TLB.  Actually, I just stick
    the host address (mem.vector[addr] address) in the upper 29 bits
    of the field 'combined_access' since they are unused.  Convenient
    for now.  I'm only storing page frame addresses.  This was the
    simplest for of such a TLB.  We can likely enhance this.  Also,
    I only accelerated the normal read/write routines in access.cc.
    Could also modify the read-modify-write versions too.  You must
    use --enable-guest2host-tlb, to try this out.  Currently speeds
    up Win95 boot time by about 3.5% for me.  More ground to cover...
  - Minor mods to CPUI/MOV_CdRd for CMOV.
  - Integrated enhancements from Volker to getHostMemAddr() for PCI
    being enabled.
2002-09-01 20:12:09 +00:00