2. Fixed bug
[ 989478 ] I-Cache and undefined Instruktions
The L4 microkernel uses an undefined instruction to
trap for a special requests into the kernel (LOCK NOP).
The handler fixes this up and gives the user a special
code page with syscall stubs. If you're not using the
I-Cache optimization everthing works find on bochs. But
if you enable the I-Cache (--enable-icache), then the
undefined opcode exception is thrown only once for ever
virtual address it occurs. See the demodisk of the
L4KA::pistachio
(http://www.l4ka.org/projects/pistachio/download.php).
In this case the pingpong benchmark of this demo is of
interest. Everything runs fine until the program tries
to spawn a new task for its measurements. This new task
shares the code of the creating program. But the new
task stops executing at the undefined instruction
explained above and no exception is thrown.
bochs.h already not include iodev.h which reduces compilation dependences for almost all cpu and fpu files, now cpu files will not be recompiled if iodev includes was changed
configure script option --enable-magic-breakpoints (enabled by default).
Documented the instruction required to trigger the magic breakpoint
(xchgw %bx,%bx).
* changed all %ll format descriptions to FMT_LL macro so that
Microsoft Visual C works correctly (it uses %I64)
* missing type conversions added
* cdrom.cc: variable types for win32 fixed
* removed some unused variables in eth_win32.cc and harddrv.cc
* added missing includes in make_cmos_image.c and niclist.c
check.
Commented out a number of instances of invalidate_prefetch_q(),
for branches which do not change CS since the EIP window mechanism
takes care of validating that EIP lands in the current page or not
in the main cpu loop anyways.
Fixed a couple cases (v8086 mode and real mode) of loading CS where
the EIP page window was not invalidated in segment_ctrl_pro.cc.
That may fix some aliasing problems reported before (OS2).
1) fixed the type of "hostPageAddr" and associated typecasts.
2) fixed the type of "pages" and associated typecasts (overloaded variable)
3) patch to cpu.cc to calculate "eipPageBias" correctly in 64 bit mode
* renamed CPU_ID to BX_CPU_ID.
with this new name there is no possibility for name contentions and BX_CPU_ID
definition could be moved out to NEED_CPU_REG_SHORTCUTS block
* returned back `unsigned BX_CPU::which_cpu(void)` function
* added BX_CPU_ID parameter for
BX_INSTR_PHY_READ(a20addr, len);
BX_INSTR_PHY_WRITE(a20addr, len);
now it will be
BX_INSTR_PHY_READ(cpu_id, a20addr, len);
BX_INSTR_PHY_WRITE(cpu_id, a20addr, len);
In bx_cpu_c::reset method I set bx_cpu->async_event to 2
so execution in the cpu_loop gets stopped early.
Previously, async_event was set to 0, and with repeatable
instructions, after reset, eip was incremented by the instruction
length, so execution would resume at 0xffffX (X being >0, the current
instruction length).
In halt state I check now for reset with async_event is 2, so
reset works also when the cpu is halted. (update to Peter change)
I hope I fixed this the right way, please report any strange behaviour.
- Now compiles for plain ia-32
- Fixed some printf formatting for ia32 only.
- Update to latest Win32 DLL
- Added an ICEBP (Undoc 0xF8, INT 01) facility.
- updated to use latest VGA refresh routine
"bx_bool" which is always defined as Bit32u on all platforms. In Carbon
specific code, Boolean is still used because the Carbon header files
define it to unsigned char.
- this fixes bug [ 623152 ] MacOSX: Triple Exception Booting win95.
The bug was that some code in Bochs depends on Boolean to be a
32 bit value. (This should be fixed, but I don't know all the places
where it needs to be fixed yet.) Because Carbon defined Boolean as
an unsigned char, Bochs just followed along and used the unsigned char
definition to avoid compile problems. This exposed the dependency
on 32 bit Boolean on MacOS X only and led to major simulation problems,
that could only be reproduced and debugged on that platform.
- On the mailing list we debated whether to make all Booleans into "bool" or
our own type. I chose bx_bool for several reasons.
1. Unlike C++'s bool, we can guarantee that bx_bool is the same size on all
platforms, which makes it much less likely to have more platform-specific
simulation differences in the future. (I spent hours on a borrowed
MacOSX machine chasing bug 618388 before discovering that different sized
Booleans were the problem, and I don't want to repeat that.)
2. We still have at least one dependency on 32 bit Booleans which must be
fixed some time, but I don't want to risk introducing new bugs into the
simulation just before the 2.0 release.
Modified Files:
bochs.h config.h.in gdbstub.cc logio.cc main.cc pc_system.cc
pc_system.h plugin.cc plugin.h bios/rombios.c cpu/apic.cc
cpu/arith16.cc cpu/arith32.cc cpu/arith64.cc cpu/arith8.cc
cpu/cpu.cc cpu/cpu.h cpu/ctrl_xfer16.cc cpu/ctrl_xfer32.cc
cpu/ctrl_xfer64.cc cpu/data_xfer16.cc cpu/data_xfer32.cc
cpu/data_xfer64.cc cpu/debugstuff.cc cpu/exception.cc
cpu/fetchdecode.cc cpu/flag_ctrl_pro.cc cpu/init.cc
cpu/io_pro.cc cpu/lazy_flags.cc cpu/lazy_flags.h cpu/mult16.cc
cpu/mult32.cc cpu/mult64.cc cpu/mult8.cc cpu/paging.cc
cpu/proc_ctrl.cc cpu/segment_ctrl_pro.cc cpu/stack_pro.cc
cpu/tasking.cc debug/dbg_main.cc debug/debug.h debug/sim2.cc
disasm/dis_decode.cc disasm/disasm.h doc/docbook/Makefile
docs-html/cosimulation.html fpu/wmFPUemu_glue.cc
gui/amigaos.cc gui/beos.cc gui/carbon.cc gui/gui.cc gui/gui.h
gui/keymap.cc gui/keymap.h gui/macintosh.cc gui/nogui.cc
gui/rfb.cc gui/sdl.cc gui/siminterface.cc gui/siminterface.h
gui/term.cc gui/win32.cc gui/wx.cc gui/wxmain.cc gui/wxmain.h
gui/x.cc instrument/example0/instrument.cc
instrument/example0/instrument.h
instrument/example1/instrument.cc
instrument/example1/instrument.h
instrument/stubs/instrument.cc instrument/stubs/instrument.h
iodev/cdrom.cc iodev/cdrom.h iodev/cdrom_osx.cc iodev/cmos.cc
iodev/devices.cc iodev/dma.cc iodev/dma.h iodev/eth_arpback.cc
iodev/eth_packetmaker.cc iodev/eth_packetmaker.h
iodev/floppy.cc iodev/floppy.h iodev/guest2host.h
iodev/harddrv.cc iodev/harddrv.h iodev/ioapic.cc
iodev/ioapic.h iodev/iodebug.cc iodev/iodev.h
iodev/keyboard.cc iodev/keyboard.h iodev/ne2k.h
iodev/parallel.h iodev/pci.cc iodev/pci.h iodev/pic.h
iodev/pit.cc iodev/pit.h iodev/pit_wrap.cc iodev/pit_wrap.h
iodev/sb16.cc iodev/sb16.h iodev/serial.cc iodev/serial.h
iodev/vga.cc iodev/vga.h memory/memory.h memory/misc_mem.cc
into inline functions with asm() statements in cpu.h. This cleans
up the *.cc code (which now doesn't have any asm()s in it), and
centralizes the asm() code so constraints can be modified in one
place. This also makes it easier to cover more instructions
with asm()s for more efficient eflags handling.
hack with longjmp() back to cpu.cc main decode loop, and added a
check in there to return control when bx_guard.special_unwind_stack
is set (compiling with debugger enabled only).
If in the debugger you try to execute further instructions
(which you shouldn't), other fields need to be reset I would
think, such as EXT and errorno, and have to make sure ESP/EIP
are corrected properly. Basically, this hack is only good
for examining the current situation of a nasty fault.
exit out of cpu_loop() and back to the caller can be honored.
Previously, code in this function was a part of cpu_loop so
a "return;" would already do that. Now, a value is passed
back to cpu_loop() to denote such a request, and then a return
is executed from cpu_loop().
I haven't tested this yet, but previously I must have broke
certain debugging requests by moving the code to a separate
function and not fixing the "return;" statements.
Symptom: Linux kernel 2.4.19 would hang in random places. CPU still
running, but in dle loop.
Cause: if APIC interrupt occurred while a PIC interrupt was pending, the
PIC interrupt would be lost. This is because either an APIC or PIC
interrupt would trash any pending interrupt event because INTR is only a state,
not an event queue.
Temporary fix: reworked apic.cc to have it's own copy of INTR state. cpu.cc now
checks for both cpu.INTR and local_apic.INTR.
Need to do further research to see if local_apic and pic can be integrated in such
a way as properly manage the combined effects of both devices accessing INTR state.
32-bits rather than 64. This is possible, because there is
always an active null (heartbeat) timer, with periodicity
of less than or equal to the maximum 32-bit int value.
This generates a little less code in the hot part of cpu_loop,
and saved about 3% execution time on a Win95 boot.
Moved the asynchronous handling code from cpu_loop() to its
own function since it's a long path. This neatened up the
code a little (less gotos and all), and made it more clear
to use a "while (1)" around the iterative code in cpu_loop().
now simply return a cached value which is set upon mode changes.
The biggest problem was protected_mode() which did something like:
return CR0.PM && ! EFLAGS.VM
This adds up when it was being executed many times in branch functions
etc. Now, cached values are set and sampled instead.
Some things changed in the ctrl_xfer*.cc, fetchdecode*.cc,
and cpu.cc since the original patches, so I did some patch
integration by hand. Check the placement of the
macros BX_INSTR_FETCH_DECODE_COMPLETED() and BX_INSTR_OPCODE()
in cpu.cc to make sure I go them right. Also, I changed the
parameters to BX_INSTR_OPCODE() to update them to the new code.
I put some comments before each of these to help determine if
the placement is right.
These macros are only compiled in if you are gathering instrumentation
data from bochs, so they shouldn't effect others.
use getB_CF() etc. getB_CF() and friends are only for a relatively
small number of cases where a true boolean/binary number (0 or 1) is required
rather than 0 or non-0 as is returned by get_CF().
of (1 & (val32>>N)), and added a getB_?F() accessor for special
cases which need a strict binary value (exactly 0 or 1). Most
code only needed a value for logical comparison. I modified the
special cases which do need a binary number for shifting and
comparison between flags, to use the special getB_?F() accessor.
Cleaned up memory.cc functions a little, now that all accesses
are within a single page.
Fixed a (not very likely encountered) bug in fetchdecode.cc (and
fetchdecode64.cc) where a 2-byte opcode starting with a prefix
starts at the last offset on a page. There were no checks
on the segment overrides for a boundary condition. I added them.
The eflags enhancements added just a tiny bit of performance.
so frequently.
Coded asm() statements for INC/DEC_ERX() instructions.
Cleaned up the iCache a litle including a bug fix. The
generation ID was decrementing the whole field including
some high meta bits. That could roll over after 1 Billion
cycles. I know only decrement if the field is valid, to
save the write.
I implemented inline functions which can serve the value of
the arithmetic flags if they are cached, and redirect to
the lazy_flags.cc routines if not.
Most of this was just prep work for adding more asm() statements
for native eflags processing when on x86.
in cpu.cc out of the main loop, and into the asynchronous
events handling. I went through all the code paths, and
there doesn't seem to be any reason for that code to be
in the hot loop.
Added another accessor for getting instruction data, called
modC0(). A lot of instructions test whether the mod field
of mod-nnn-rm is 0xc0 or not, ie., it's a register operation
and not memory. So I flag this in fetchdecode{,64}.cc.
This added on the order of 1% performance improvement for
a Win95 boot.
Macroized a few leftover calls to Write_RMV_virtual_xyz()
that didn't get modified in the x86-64 merge. Really, they
just call the real function for now, but I want to have them
available to do direct writes with the guest2host TLB pointers.
but if you hand edit cpu/cpu.h, and change BxICacheEntries,
you can try different sizes. I'll make this more flexible
with configure. For now, use "--enable-icache" with no parameters.
- Modified fetchdecode.cc/fetchdecode64.cc just enough so that
instructions which encode a direct address now use a memory
resolution function which just sticks the immediate address
into rm_addr. With cached instructions we need this.
to bitfields. bxInstruction_c is now 24 bytes, including 4 for
the memory addr resolution function pointer, and 4 for the
execution function pointer (16 + 4 + 4).
Coded more accessors, to abstract access from most code.
with accessors. Had to touch a number of files to update the
access using the new accessors.
Moved rm_addr to the CPU structure, to slim down bxInstruction_c
and to prevent future instruction caching from getting sprayed
with writes to individual rm_addr fields. There only needs to
be one. Though need to deal with instructions which have
static non-modrm addresses, but which are using rm_addr since
that will change.
bxInstruction_c is down to about 40 bytes now. Trying to
get down to 24 bytes.
use accessors. This lets me work on compressing the
size of fetch-decode structure (now called bxInstruction_c).
I've reduced it down to about 76 bytes. We should be able
to do much better soon. I needed the abstraction of the
accessors, so I have a lot of freedom to re-arrange things
without making massive future changes.
Lost a few percent of performance in these mods, but my
main focus was to get the abstraction.
be used at all, and Peter didn't want it. "extdb.o" is compiled
into libcpu.a, if configured for it.
Removed a few #warnings for x86-64 compile, based on Peter's
line-item comments regarding the warnings I inserted during
the port/merge.
cpu64 directories. Instead of using the macros introduced in cpu.h rev 1.37
such as GetEFlagsDFLogical and SetEFlagsDF and ClearEFlagsDF, I made inline
methods on the BX_CPU_C object that access the eflags fields. The problem
with the macros is that they cannot be used outside the BX_CPU_C object. The
macros have now been removed, and all references to eflags now use these new
accessors.
- I debated whether to put the accessors as members of the BX_CPU_C object
or members of the bx_flags_reg_t struct. I chose to make them members
of BX_CPU_C for two reasons: 1. the lazy flags are implemented as
members of BX_CPU_C, and 2. the eflags are referenced in many many places
and it is more compact without having to put eflags in front of each. (The
real problem with compactness is having to write BX_CPU_THIS_PTR in front of
everything, but that's another story.)
- Kevin pointed out a major bug in my set accessor code. What a difference a
little tilde can make! That is fixed now.
- modified: load32bitOShack.cc debug/dbg_main.cc
and in both cpu and cpu64 directories:
cpu.cc cpu.h ctrl_xfer_pro.cc debugstuff.cc exception.cc flag_ctrl.cc
flag_ctrl_pro.cc init.cc io.cc io_pro.cc proc_ctrl.cc soft_int.cc
string.cc vm8086.cc
This adds a whole new directory cpu64 with the new emulation code.
Very few changes were necessary outside cpu64. To try it, configure
with --enable-x86-64 and make.
- also this adds Peter Tattam's external debugger interface.
- modified files: Makefile.in bochs.h config.h.in configure.in
load32bitOShack.cc logio.cc cpu/Makefile.in cpu/cpu.cc debug/dbg_main.cc
- added files: cpu/extdb.cc cpu/extdb.h and cpu64/*
> This is the bug fix to make the reset button work properly when the cpu
> is in the halt state. There is another patch in init.cc as well to clear
> async_event. If you don't do this, if a cpu goes into HLT, the only thing
> which will fix it is another interrupt. The reset button won't work.
All the EFLAGS bits used to be cached in separate fields. I left
a few of them in separate fields for now - might remove them
at some point also. When the arithmetic fields are known
(ie they're not in lazy mode), they are all cached in a
32-bit EFLAGS image, just like the x86 EFLAGS register expects.
All other eflags are store in the 32-bit register also, with
a few also mirrored in separate fields for now.
The reason I did this, was so that on x86 hosts, asm() statements
can be #ifdef'd in to do the calculation and get the native
eflags results very cheaply. Just to test that it works, I
coded ADD_EdId() and ADD_EwIw() with some conditionally compiled
asm()s for accelerated eflags processing and it works.
-Kevin
access routines in access.cc, completing the upgrade of
those routines. You do need '--enable-guest2host-tlb', before
you get the speedups for now. The guest2host mods seem pretty
solid, though I do need to see what effects the A20 line has
on this cache and the paging TLB in general.
added --enable-repeat-speedups with default to disabled.
Reconfigure/recompile and the speedup code will be #ifdef'd
out for now. It manifested as junk written to the VGA screen
while booting/running Windows.
Also made some more mods to the main cpu loop. Moved the
handling of EXT/errorno outside the main loop, much like
the extra EIP/ESP commits were moved, for a little better
performance.
I changed the fetch_ptr/bytesleft method of fetching to
a slightly different model, which calculates a window
for which EIP will be valid (land on the current page),
and a bias which when applied to EIP will be from
0..upper_page_limit. Speed is about the same for either
method, but a pseudo-op/threaded-interpreter will plug
in better with this and be faster.