Commit Graph

472 Commits

Author SHA1 Message Date
Paolo Bonzini
90f641531c target/i386: use separate MMU indexes for 32-bit accesses
Accesses from a 32-bit environment (32-bit code segment for instruction
accesses, EFER.LMA==0 for processor accesses) have to mask away the
upper 32 bits of the address.  While a bit wasteful, the easiest way
to do so is to use separate MMU indexes.  These days, QEMU anyway is
compiled with a fixed value for NB_MMU_MODES.  Split MMU_USER_IDX,
MMU_KSMAP_IDX and MMU_KNOSMAP_IDX in two.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-28 00:23:39 +01:00
Paolo Bonzini
5f97afe254 target/i386: introduce function to query MMU indices
Remove knowledge of specific MMU indexes (other than MMU_NESTED_IDX and
MMU_PHYS_IDX) from mmu_translate().  This will make it possible to split
32-bit and 64-bit MMU indexes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-28 00:23:39 +01:00
Paolo Bonzini
d09c79010f target/i386: check validity of VMCB addresses
MSR_VM_HSAVE_PA bits 0-11 are reserved, as are the bits above the
maximum physical address width of the processor.  Setting them to
1 causes a #GP (see "15.30.4 VM_HSAVE_PA MSR" in the AMD manual).

The same is true of VMCB addresses passed to VMRUN/VMLOAD/VMSAVE,
even though the manual is not clear on that.

Cc: qemu-stable@nongnu.org
Fixes: 4a1e9d4d11 ("target/i386: Use atomic operations for pte updates", 2022-10-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-28 00:23:39 +01:00
Paolo Bonzini
68fb78d7d5 target/i386: mask high bits of CR3 in 32-bit mode
CR3 bits 63:32 are ignored in 32-bit mode (either legacy 2-level
paging or PAE paging).  Do this in mmu_translate() to remove
the last where get_physical_address() meaningfully drops the high
bits of the address.

Cc: qemu-stable@nongnu.org
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Fixes: 4a1e9d4d11 ("target/i386: Use atomic operations for pte updates", 2022-10-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-28 00:23:38 +01:00
Ziqiao Kong
99d0dcd7f1 target/i386: Generate an illegal opcode exception on cmp instructions with lock prefix
target/i386: As specified by Intel Manual Vol2 3-180, cmp instructions
are not allowed to have lock prefix and a `UD` should be raised. Without
this patch, s1->T0 will be uninitialized and used in the case OP_CMPL.

Signed-off-by: Ziqiao Kong <ziqiaokong@gmail.com>
Message-ID: <20240215095015.570748-2-ziqiaokong@gmail.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-16 13:56:09 +01:00
Bui Quang Minh
774204cf98 apic, i386/tcg: add x2apic transitions
This commit adds support for x2APIC transitions when writing to
MSR_IA32_APICBASE register and finally adds CPUID_EXT_X2APIC to
TCG_EXT_FEATURES.

The set_base in APICCommonClass now returns an integer to indicate error in
execution. apic_set_base return -1 on invalid APIC state transition,
accelerator can use this to raise appropriate exception.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Message-Id: <20240111154404.5333-4-minhquangbui99@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-02-14 06:09:32 -05:00
Bui Quang Minh
b2101358e5 i386/tcg: implement x2APIC registers MSR access
This commit creates apic_register_read/write which are used by both
apic_mem_read/write for MMIO access and apic_msr_read/write for MSR access.

The apic_msr_read/write returns -1 on error, accelerator can use this to
raise the appropriate exception.

Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com>
Message-Id: <20240111154404.5333-2-minhquangbui99@gmail.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-02-14 06:09:32 -05:00
Richard Henderson
3b91614004 include/exec: Change cpu_mmu_index argument to CPUState
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-03 16:46:10 +10:00
Philippe Mathieu-Daudé
ec1d32af12 target/i386: Extract x86_cpu_exec_halt() from accel/tcg/
Move this x86-specific code out of the generic accel/tcg/.

Reported-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240124101639.30056-10-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-01-29 21:04:10 +10:00
Philippe Mathieu-Daudé
6ae754815f target/i386: Extract x86_need_replay_interrupt() from accel/tcg/
Move this x86-specific code out of the generic accel/tcg/.

Reviewed-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240124101639.30056-8-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-01-29 21:04:10 +10:00
Richard Henderson
1764ad70ce include/qemu: Add TCGCPUOps typedef to typedefs.h
QEMU coding style recommends using structure typedefs.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-01-29 21:04:10 +10:00
Anton Johansson
32f0c394bb target: Use vaddr in gen_intermediate_code
Makes gen_intermediate_code() signature target agnostic so the function
can be called from accel/tcg/translate-all.c without target specifics.

Signed-off-by: Anton Johansson <anjo@rev.ng>
Message-Id: <20240119144024.14289-9-anjo@rev.ng>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-01-29 07:06:03 +10:00
Peter Maydell
3f2a357b95 HW core patch queue
. Deprecate unmaintained SH-4 models (Samuel)
 . HPET: Convert DPRINTF calls to trace events (Daniel)
 . Implement buffered block writes in Intel PFlash (Gerd)
 . Ignore ELF loadable segments with zero size (Bin)
 . ESP/NCR53C9x: PCI DMA fixes (Mark)
 . PIIX: Simplify Xen PCI IRQ routing (Bernhard)
 . Restrict CPU 'start-powered-off' property to sysemu (Phil)
 
 . target/alpha: Only build sys_helper.c on system emulation (Phil)
 . target/xtensa: Use generic instruction breakpoint API & add test (Max)
 . Restrict icount to system emulation (Phil)
 . Do not set CPUState TCG-specific flags in non-TCG accels (Phil)
 . Cleanup TCG tb_invalidate API (Phil)
 . Correct LoongArch/KVM include path (Bibo)
 . Do not ignore throttle errors in crypto backends (Phil)
 
 . MAINTAINERS updates (Raphael, Zhao)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmWqXbkACgkQ4+MsLN6t
 wN6VVBAAkP/Bs2JfQYobPZVV868wceM97KeUJMXP2YWf6dSLpHRCQN5KtuJcACM9
 y3k3R7nMeVJSGmzl/1gF1G9JhjoCLoVLX/ejeBppv4Wq//9sEdggaQfdCwkhWw2o
 IK/gPjTZpimE7Er4hPlxmuhSRuM1MX4duKFRRfuZpE7XY14Y7/Hk12VIG7LooO0x
 2Sl8CaU0DN7CWmRVDoUkwVx7JBy28UVarRDsgpBim7oKmjjBFnCJkH6B6NJXEiYr
 z1BmIcHa87S09kG1ek+y8aZpG9iPC7nUWjPIQyJGhnfrnBuO7hQHwCLIjHHp5QBR
 BoMr8YQNTI34/M/D8pBfg96LrGDjkQOfwRyRddkMP/jJcNPMAPMNGbfVaIrfij1e
 T+jFF4gQenOvy1XKCY3Uk/a11P3tIRFBEeOlzzQg4Aje9W2MhUNwK2HTlRfBbrRr
 V30R764FDmHlsyOu6/E3jqp4GVCgryF1bglPOBjVEU5uytbQTP8jshIpGVnxBbF+
 OpFwtsoDbsousNKVcO5+B0mlHcB9Ru9h11M5/YD/jfLMk95Ga90JGdgYpqQ5tO5Y
 aqQhKfCKbfgKuKhysxpsdWAwHZzVrlSf+UrObF0rl2lMXXfcppjCqNaw4QJ0oedc
 DNBxTPcCE2vWhUzP3A60VH7jLh4nLaqSTrxxQKkbx+Je1ERGrxs=
 =KmQh
 -----END PGP SIGNATURE-----

Merge tag 'hw-cpus-20240119' of https://github.com/philmd/qemu into staging

HW core patch queue

. Deprecate unmaintained SH-4 models (Samuel)
. HPET: Convert DPRINTF calls to trace events (Daniel)
. Implement buffered block writes in Intel PFlash (Gerd)
. Ignore ELF loadable segments with zero size (Bin)
. ESP/NCR53C9x: PCI DMA fixes (Mark)
. PIIX: Simplify Xen PCI IRQ routing (Bernhard)
. Restrict CPU 'start-powered-off' property to sysemu (Phil)

. target/alpha: Only build sys_helper.c on system emulation (Phil)
. target/xtensa: Use generic instruction breakpoint API & add test (Max)
. Restrict icount to system emulation (Phil)
. Do not set CPUState TCG-specific flags in non-TCG accels (Phil)
. Cleanup TCG tb_invalidate API (Phil)
. Correct LoongArch/KVM include path (Bibo)
. Do not ignore throttle errors in crypto backends (Phil)

. MAINTAINERS updates (Raphael, Zhao)

# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmWqXbkACgkQ4+MsLN6t
# wN6VVBAAkP/Bs2JfQYobPZVV868wceM97KeUJMXP2YWf6dSLpHRCQN5KtuJcACM9
# y3k3R7nMeVJSGmzl/1gF1G9JhjoCLoVLX/ejeBppv4Wq//9sEdggaQfdCwkhWw2o
# IK/gPjTZpimE7Er4hPlxmuhSRuM1MX4duKFRRfuZpE7XY14Y7/Hk12VIG7LooO0x
# 2Sl8CaU0DN7CWmRVDoUkwVx7JBy28UVarRDsgpBim7oKmjjBFnCJkH6B6NJXEiYr
# z1BmIcHa87S09kG1ek+y8aZpG9iPC7nUWjPIQyJGhnfrnBuO7hQHwCLIjHHp5QBR
# BoMr8YQNTI34/M/D8pBfg96LrGDjkQOfwRyRddkMP/jJcNPMAPMNGbfVaIrfij1e
# T+jFF4gQenOvy1XKCY3Uk/a11P3tIRFBEeOlzzQg4Aje9W2MhUNwK2HTlRfBbrRr
# V30R764FDmHlsyOu6/E3jqp4GVCgryF1bglPOBjVEU5uytbQTP8jshIpGVnxBbF+
# OpFwtsoDbsousNKVcO5+B0mlHcB9Ru9h11M5/YD/jfLMk95Ga90JGdgYpqQ5tO5Y
# aqQhKfCKbfgKuKhysxpsdWAwHZzVrlSf+UrObF0rl2lMXXfcppjCqNaw4QJ0oedc
# DNBxTPcCE2vWhUzP3A60VH7jLh4nLaqSTrxxQKkbx+Je1ERGrxs=
# =KmQh
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 19 Jan 2024 11:32:09 GMT
# gpg:                using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]
# Primary key fingerprint: FAAB E75E 1291 7221 DCFD  6BB2 E3E3 2C2C DEAD C0DE

* tag 'hw-cpus-20240119' of https://github.com/philmd/qemu: (36 commits)
  configure: Add linux header compile support for LoongArch
  MAINTAINERS: Update hw/core/cpu.c entry
  MAINTAINERS: Update Raphael Norwitz email
  hw/elf_ops: Ignore loadable segments with zero size
  hw/scsi/esp-pci: set DMA_STAT_BCMBLT when BLAST command issued
  hw/scsi/esp-pci: synchronise setting of DMA_STAT_DONE with ESP completion interrupt
  hw/scsi/esp-pci: generate PCI interrupt from separate ESP and PCI sources
  hw/scsi/esp-pci: use correct address register for PCI DMA transfers
  target/riscv: Rename tcg_cpu_FOO() to include 'riscv'
  target/i386: Rename tcg_cpu_FOO() to include 'x86'
  hw/s390x: Rename cpu_class_init() to include 'sclp'
  hw/core/cpu: Rename cpu_class_init() to include 'common'
  accel: Rename accel_init_ops_interfaces() to include 'system'
  cpus: Restrict 'start-powered-off' property to system emulation
  system/watchpoint: Move TCG specific code to accel/tcg/
  system/replay: Restrict icount to system emulation
  hw/pflash: implement update buffer for block writes
  hw/pflash: use ldn_{be,le}_p and stn_{be,le}_p
  hw/pflash: refactor pflash_data_write()
  hw/i386/pc_piix: Make piix_intx_routing_notifier_xen() more device independent
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-01-19 11:39:38 +00:00
Philippe Mathieu-Daudé
e129593f6f target/i386: Rename tcg_cpu_FOO() to include 'x86'
The tcg_cpu_FOO() names are x86 specific, so rename
them as x86_tcg_cpu_FOO() (as other names in this file)
to ease navigating the code.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Message-ID: <20240111120221.35072-5-philmd@linaro.org>
2024-01-19 12:28:59 +01:00
Paolo Bonzini
729ba8e933 target/i386: pcrel: store low bits of physical address in data[0]
For PC-relative translation blocks, env->eip changes during the
execution of a translation block, Therefore, QEMU must be able to
recover an instruction's PC just from the TranslationBlock struct and
the instruction data with.  Because a TB will not span two pages, QEMU
stores all the low bits of EIP in the instruction data and replaces them
in x86_restore_state_to_opc.  Bits 12 and higher (which may vary between
executions of a PCREL TB, since these only use the physical address in
the hash key) are kept unmodified from env->eip.  The assumption is that
these bits of EIP, unlike bits 0-11, will not change as the translation
block executes.

Unfortunately, this is incorrect when the CS base is not aligned to a page.
Then the linear address of the instructions (i.e. the one with the
CS base addred) indeed will never span two pages, but bits 12+ of EIP
can actually change.  For example, if CS base is 0x80262200 and EIP =
0x6FF4, the first instruction in the translation block will be at linear
address 0x802691F4.  Even a very small TB will cross to EIP = 0x7xxx,
while the linear addresses will remain comfortably within a single page.

The fix is simply to use the low bits of the linear address for data[0],
since those don't change.  Then x86_restore_state_to_opc uses tb->cs_base
to compute a temporary linear address (referring to some unknown
instruction in the TB, but with the correct values of bits 12 and higher);
the low bits are replaced with data[0], and EIP is obtained by subtracting
again the CS base.

Huge thanks to Mark Cave-Ayland for the image and initial debugging,
and to Gitlab user @kjliew for help with bisecting another occurrence
of (hopefully!) the same bug.

It should be relatively easy to write a testcase that performs MMIO on
an EIP with different bits 12+ than the first instruction of the translation
block; any help is welcome.

Fixes: e3a79e0e87 ("target/i386: Enable TARGET_TB_PCREL", 2022-10-11)
Cc: qemu-stable@nongnu.org
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: Richard Henderson <richard.henderson@linaro.org>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1759
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1964
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2012
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-01-18 10:43:42 +01:00
guoguangyao
2926eab896 target/i386: fix incorrect EIP in PC-relative translation blocks
The PCREL patches introduced a bug when updating EIP in the !CF_PCREL case.
Using s->pc in func gen_update_eip_next() solves the problem.

Cc: qemu-stable@nongnu.org
Fixes: b5e0d5d22f ("target/i386: Fix 32-bit wrapping of pc/eip computation")
Signed-off-by: guoguangyao <guoguangyao18@mails.ucas.ac.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240115020804.30272-1-guoguangyao18@mails.ucas.ac.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-01-18 10:43:14 +01:00
Richard Henderson
a58506b748 target/i386: Do not re-compute new pc with CF_PCREL
With PCREL, we have a page-relative view of EIP, and an
approximation of PC = EIP+CSBASE that is good enough to
detect page crossings.  If we try to recompute PC after
masking EIP, we will mess up that approximation and write
a corrupt value to EIP.

We already handled masking properly for PCREL, so the
fix in b5e0d5d2 was only needed for the !PCREL path.

Cc: qemu-stable@nongnu.org
Fixes: b5e0d5d22f ("target/i386: Fix 32-bit wrapping of pc/eip computation")
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240101230617.129349-1-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-01-18 10:43:14 +01:00
Stefan Hajnoczi
195801d700 system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()
The Big QEMU Lock (BQL) has many names and they are confusing. The
actual QemuMutex variable is called qemu_global_mutex but it's commonly
referred to as the BQL in discussions and some code comments. The
locking APIs, however, are called qemu_mutex_lock_iothread() and
qemu_mutex_unlock_iothread().

The "iothread" name is historic and comes from when the main thread was
split into into KVM vcpu threads and the "iothread" (now called the main
loop thread). I have contributed to the confusion myself by introducing
a separate --object iothread, a separate concept unrelated to the BQL.

The "iothread" name is no longer appropriate for the BQL. Rename the
locking APIs to:
- void bql_lock(void)
- void bql_unlock(void)
- bool bql_locked(void)

There are more APIs with "iothread" in their names. Subsequent patches
will rename them. There are also comments and documentation that will be
updated in later patches.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Fabiano Rosas <farosas@suse.de>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com>
Acked-by: Hyman Huang <yong.huang@smartx.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-id: 20240102153529.486531-2-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-01-08 10:45:43 -05:00
Paolo Bonzini
405c7c0708 target/i386: implement CMPccXADD
The main difficulty here is that a page fault when writing to the destination
must not overwrite the flags.  Therefore, the flags computation must be
inlined instead of using gen_jcc1*.

For simplicity, I am using an unconditional cmpxchg operation, that becomes
a NOP if the comparison fails.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:04:40 +01:00
Paolo Bonzini
e7bbb7cb71 target/i386: introduce flags writeback mechanism
ALU instructions can write to both memory and flags.  If the CC_SRC*
and CC_DST locations have been written already when a memory access
causes a fault, the value in CC_SRC* and CC_DST might be interpreted
with the wrong CC_OP (the one that is in effect before the instruction.

Besides just using the wrong result for the flags, something like
subtracting -1 can have disastrous effects if the current CC_OP is
CC_OP_EFLAGS: this is because QEMU does not expect bits outside the ALU
flags to be set in CC_SRC, and env->eflags can end up set to all-ones.
In the case of the attached testcase, this sets IOPL to 3 and would
cause an assertion failure if SUB is moved to the new decoder.

This mechanism is not really needed for BMI instructions, which can
only write to a register, but put it to use anyway for cleanliness.
In the case of BZHI, the code has to be modified slightly to ensure
that decode->cc_src is written, otherwise the new assertions trigger.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:04:30 +01:00
Paolo Bonzini
4b2baf4a55 target/i386: adjust decoding of J operand
gen_jcc() has been changed to accept a relative offset since the
new decoder was written.  Adjust the J operand, which is meant
to be used with jump instructions such as gen_jcc(), to not
include the program counter and to not truncate the result, as
both operations are now performed by common code.

The result is that J is now the same as the I operand.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:04:30 +01:00
Paolo Bonzini
d4f611711a target/i386: move operand load and writeback out of gen_cmovcc1
Similar to gen_setcc1, make gen_cmovcc1 receive TCGv.  This is more friendly
to simultaneous implementation in the old and the new decoder.

A small wart is that s->T0 of CMOV is currently the *second* argument (which
would ordinarily be in T1).  Therefore, the condition has to be inverted in
order to overwrite s->T0 with cpu_regs[reg] if the MOV is not performed.

This only applies to the old decoder, and this code will go away soon.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:04:15 +01:00
Paolo Bonzini
3497f1646f target/i386: prepare for implementation of STOS/SCAS in new decoder
Do not use gen_op, and pull the load from the accumulator into
disas_insn.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:03:59 +01:00
Paolo Bonzini
9a5922d6bd target/i386: do not use s->tmp0 for jumps on ECX ==/!= 0
Create a new temporary, to ease the register allocator's work.

Creation of the temporary is pushed into gen_ext_tl, which
also allows NULL as the first parameter now.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:03:55 +01:00
Paolo Bonzini
1ec46bf237 target/i386: do not use s->tmp4 for push
Just create a temporary for the occasion.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:03:53 +01:00
Paolo Bonzini
80e55f54ac target/i386: split eflags computation out of gen_compute_eflags
The new x86 decoder wants the gen_* functions to compute EFLAGS before
writeback, which can be an issue for instructions with a memory
destination such as ARPL or shifts.

Extract code to compute the EFLAGS without clobbering CC_SRC, in case
the memory write causes a fault.  The flags writeback mechanism will
take care of copying the result to CC_SRC.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:03:50 +01:00
Paolo Bonzini
c0099cd40e target/i386: do not clobber T0 on string operations
The new decoder would rather have the operand in T0 when expanding SCAS, rather
than use R_EAX directly as gen_scas currently does.  This makes SCAS more similar
to CMP and SUB, in that CC_DST = T0 - T1.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:03:24 +01:00
Paolo Bonzini
24c0573bb0 target/i386: do not clobber A0 in POP translation
The new decoder likes to compute the address in A0 very early, so the
gen_lea_v_seg in gen_pop_T0 would clobber the address of the memory
operand.  Instead use T0 since it is already available and will be
overwritten immediately after.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:03:21 +01:00
Paolo Bonzini
a71e0b246a target/i386: do not decode string source/destination into decode->mem
decode->mem is only used if one operand has has_ea == true.  String
operations will not use decode->mem and will load A0 on their own, because
they are the only case of two memory operands in a single instruction.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:03:18 +01:00
Paolo Bonzini
8a36bbcf6c target/i386: add X86_SPECIALs for MOVSX and MOVZX
Usually the registers are just moved into s->T0 without much care for
their operand size.  However, in some cases we can get more efficient
code if the operand fetching logic syncs with the emission function
on what is nicer.

All the current uses are mostly demonstrative and only reduce the code
in the emission functions, because the instructions do not support
memory operands.  However the logic is generic and applies to several
more instructions such as MOVSXD (aka movslq), one-byte shift
instructions, multiplications, XLAT, and indirect calls/jumps.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:03:15 +01:00
Paolo Bonzini
5baf5641cc target/i386: rename zext0/zext2 and make them closer to the manual
X86_SPECIAL_ZExtOp0 and X86_SPECIAL_ZExtOp2 are poorly named; they are a hack
that is needed by scalar insertion and extraction instructions, and not really
related to zero extension: for PEXTR the zero extension is done by the generation
functions, for PINSR the high bits are not used at all and in fact are *not*
filled with zeroes when loaded into s->T1.

Rename the values to match the effect described in the manual, and explain
better in the comments.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:03:11 +01:00
Paolo Bonzini
6dd2afed55 target/i386: avoid trunc and ext for MULX and RORX
Use _tl operations for 32-bit operands on 32-bit targets, and only go
through trunc and extu ops for 64-bit targets.  While the trunc/ext
ops should be pretty much free after optimization, the optimizer also
does not like having the same temporary used in multiple EBBs.
Therefore it is nicer to not use tmpN* unless necessary.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:03:08 +01:00
Paolo Bonzini
b609db9477 target/i386: reimplement check for validity of LOCK prefix
The previous check erroneously allowed CMP to be modified with LOCK.
Instead, tag explicitly the instructions that do support LOCK.

Acked-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:03:08 +01:00
Paolo Bonzini
8147df44da target/i386: document more deviations from the manual
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:03:05 +01:00
Paolo Bonzini
2455e9cf5a target/i386: clean up cpu_cc_compute_all
cpu_cc_compute_all() has an argument that is always equal to CC_OP for historical
reasons (dating back to commit a7812ae412, "TCG variable type checking.", 2008-11-17,
which added the argument to helper_cc_compute_all).  It does not make sense for the
argument to have any other value, so remove it and clean up some lines that are not
too long anymore.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:03:02 +01:00
Paolo Bonzini
8cc746525c target/i386: remove unnecessary truncations
gen_lea_v_seg (called by gen_add_A0_ds_seg) already zeroes any
bits of s->A0 beyond s->aflag.  It does so before summing the
segment base and, if not in 64-bit mode, also after summing it.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:02:58 +01:00
Paolo Bonzini
83280f6a62 target/i386: remove unnecessary arguments from raise_interrupt
is_int is always 1, and error_code is always zero.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:02:55 +01:00
Paolo Bonzini
1e7dde8008 target/i386: speedup JO/SETO after MUL or IMUL
OF is equal to the carry flag, so use the same CCPrepare.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:02:52 +01:00
Paolo Bonzini
6032627f07 target/i386: optimize computation of JL and JLE from flags
Take advantage of the fact that there can be no 1 bits between SF and OF.
If they were adjacent, you could sum SF and get a carry only if SF was
already set.  Then the value of OF in the sum is the XOR of OF itself,
the carry (which is SF) and 0 (the value of the OF bit in the addend):
this is OF^SF exactly.

Because OF and SF are not adjacent, just place more 1 bits to the
left so that the carry propagates, which means summing CC_O - CC_S.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-12-29 22:02:48 +01:00
Richard Henderson
b5e0d5d22f target/i386: Fix 32-bit wrapping of pc/eip computation
In 32-bit mode, pc = eip + cs_base is also 32-bit, and must wrap.
Failure to do so results in incorrect memory exceptions to the guest.
Before 732d548732, this was implicitly done via truncation to
target_ulong but only in qemu-system-i386, not qemu-system-x86_64.

To fix this, we must add conditional zero-extensions.
Since we have to test for 32 vs 64-bit anyway, note that cs_base
is always zero in 64-bit mode.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2022
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20231212172510.103305-1-richard.henderson@linaro.org>
2023-12-12 13:35:08 -08:00
Paolo Bonzini
e000687f12 target/i386: validate VEX.W for AVX instructions
Instructions in VEX exception class 6 generally look at the value of
VEX.W.  Note that the manual places some instructions incorrectly in
class 4, for example VPERMQ which has no non-VEX encoding and no legacy
SSE analogue.  AMD does a mess of its own, as documented in the comment
that this patch adds.

Most of them are checked for VEX.W=0, and are listed in the manual
(though with an omission) in table 2-16; VPERMQ and VPERMPD check for
VEX.W=1, which is only listed in the instruction description.  Others,
such as VPSRLV, VPSLLV and the FMA3 instructions, use VEX.W to switch
between a 32-bit and 64-bit operation.

Fix more of the class 4/class 6 mismatches, and implement the check for
VEX.W in TCG.

Acked-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25 17:35:07 +02:00
Paolo Bonzini
183e6679e3 target/i386: group common checks in the decoding phase
In preparation for adding more similar checks, move the VEX.L=0 check
and several X86_SPECIAL_* checks to a new field, where each bit represent
a common check on unused bits, or a restriction on the processor mode.

Likewise, many SVM intercepts can be checked during the decoding phase,
the main exception being the selective CR0 write, MSR and IOIO intercepts.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25 17:35:07 +02:00
Paolo Bonzini
e582b629f0 target/i386: implement SHA instructions
The implementation was validated with OpenSSL and with the test vectors in
https://github.com/rust-lang/stdarch/blob/master/crates/core_arch/src/x86/sha.rs.

The instructions provide a ~25% improvement on hashing a 64 MiB file:
runtime goes down from 1.8 seconds to 1.4 seconds; instruction count on
the host goes down from 5.8 billion to 4.8 billion with slightly better
IPC too.  Good job Intel. ;)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25 17:35:07 +02:00
Richard Henderson
23f3d586e4 target/i386: Use tcg_gen_ext_tl
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-10-22 16:43:31 -07:00
Richard Henderson
46c684c862 target/i386: Use i128 for 128 and 256-bit loads and stores
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-10-22 16:32:28 -07:00
Paolo Bonzini
24b34590d0 target/i386: check intercept for XSETBV
Note that this intercept is special; it is checked before the #GP
exception.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-17 15:20:53 +02:00
Paolo Bonzini
c35b2fb1fd target/i386: fix shadowed variable pasto
Commit a908985971 ("target/i386/seg_helper: introduce tss_set_busy",
2023-09-26) failed to use the tss_selector argument of the new function,
which was therefore unused.

This shows up as a #GP fault when booting old versions of 32-bit
Linux.

Fixes: a908985971 ("target/i386/seg_helper: introduce tss_set_busy", 2023-09-26)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20231011135350.438492-1-pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Markus Armbruster <armbru@redhat.com>
2023-10-12 16:37:31 +02:00
Philippe Mathieu-Daudé
1da389c5db target/i386: Check for USER_ONLY definition instead of SOFTMMU one
Since we *might* have user emulation with softmmu,
replace the system emulation check by !user emulation one.

(target/ was cleaned from invalid CONFIG_SOFTMMU uses at
commit cab35c73be, but these files were merged few days
after, thus missed the cleanup.)

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20231004082239.27251-1-philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-07 19:02:33 +02:00
Richard Henderson
b77af26e97 accel/tcg: Replace CPUState.env_ptr with cpu_env()
Reviewed-by: Anton Johansson <anjo@rev.ng>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-10-04 11:03:54 -07:00
Richard Henderson
ad75a51e84 tcg: Rename cpu_env to tcg_env
Allow the name 'cpu_env' to be used for something else.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-10-03 08:01:02 -07:00
Philippe Mathieu-Daudé
6294e502a9 accel: Rename AccelCPUClass::cpu_realizefn() -> cpu_target_realize()
The AccelCPUClass::cpu_realizefn handler is meant for target
specific code, rename it using '_target_' to emphasis it.

Suggested-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20231003123026.99229-3-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-10-03 08:00:25 -07:00
Paolo Bonzini
1bce34aaa9 target/i386/svm_helper: eliminate duplicate local variable
This shadows an outer "cs" variable that is initialized to the
same expression.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-26 16:41:49 +02:00
Paolo Bonzini
49958057a2 target/i386/seg_helper: remove shadowed variable
Return the width of the new task directly from switch_tss_ra.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-26 16:41:49 +02:00
Paolo Bonzini
a908985971 target/i386/seg_helper: introduce tss_set_busy
Eliminate a shadowed local variable in the process.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-26 16:41:49 +02:00
Paolo Bonzini
19729affe1 target/i386/translate: avoid shadowed local variables
Just remove the declaration.  There is nothing in the function after the
switch statement, so it is safe to do.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-26 16:41:49 +02:00
Michael Tokarev
bad5cfcd60 i386: spelling fixes
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2023-09-20 07:54:34 +03:00
Stefan Hajnoczi
03a3a62fbd * only build util/async-teardown.c when system build is requested
* target/i386: fix BQL handling of the legacy FERR interrupts
 * target/i386: fix memory operand size for CVTPS2PD
 * target/i386: Add support for AMX-COMPLEX in CPUID enumeration
 * compile plugins on Darwin
 * configure and meson cleanups
 * drop mkvenv support for Python 3.7 and Debian10
 * add wrap file for libblkio
 * tweak KVM stubs
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmT5t6UUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroMmjwf+MpvVuq+nn+3PqGUXgnzJx5ccA5ne
 O9Xy8+1GdlQPzBw/tPovxXDSKn3HQtBfxObn2CCE1tu/4uHWpBA1Vksn++NHdUf2
 P0yoHxGskJu5iYYTtIcNw5cH2i+AizdiXuEjhfNjqD5Y234cFoHnUApt9e3zBvVO
 cwGD7WpPuSb4g38hHkV6nKcx72o7b4ejDToqUVZJ2N+RkddSqB03fSdrOru0hR7x
 V+lay0DYdFszNDFm05LJzfDbcrHuSryGA91wtty7Fzj6QhR/HBHQCUZJxMB5PI7F
 Zy4Zdpu60zxtSxUqeKgIi7UhNFgMcax2Hf9QEqdc/B4ARoBbboh4q4u8kQ==
 =dH7/
 -----END PGP SIGNATURE-----

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

* only build util/async-teardown.c when system build is requested
* target/i386: fix BQL handling of the legacy FERR interrupts
* target/i386: fix memory operand size for CVTPS2PD
* target/i386: Add support for AMX-COMPLEX in CPUID enumeration
* compile plugins on Darwin
* configure and meson cleanups
* drop mkvenv support for Python 3.7 and Debian10
* add wrap file for libblkio
* tweak KVM stubs

# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmT5t6UUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroMmjwf+MpvVuq+nn+3PqGUXgnzJx5ccA5ne
# O9Xy8+1GdlQPzBw/tPovxXDSKn3HQtBfxObn2CCE1tu/4uHWpBA1Vksn++NHdUf2
# P0yoHxGskJu5iYYTtIcNw5cH2i+AizdiXuEjhfNjqD5Y234cFoHnUApt9e3zBvVO
# cwGD7WpPuSb4g38hHkV6nKcx72o7b4ejDToqUVZJ2N+RkddSqB03fSdrOru0hR7x
# V+lay0DYdFszNDFm05LJzfDbcrHuSryGA91wtty7Fzj6QhR/HBHQCUZJxMB5PI7F
# Zy4Zdpu60zxtSxUqeKgIi7UhNFgMcax2Hf9QEqdc/B4ARoBbboh4q4u8kQ==
# =dH7/
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 07 Sep 2023 07:44:37 EDT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (51 commits)
  docs/system/replay: do not show removed command line option
  subprojects: add wrap file for libblkio
  sysemu/kvm: Restrict kvm_pc_setup_irq_routing() to x86 targets
  sysemu/kvm: Restrict kvm_has_pit_state2() to x86 targets
  sysemu/kvm: Restrict kvm_get_apic_state() to x86 targets
  sysemu/kvm: Restrict kvm_arch_get_supported_cpuid/msr() to x86 targets
  target/i386: Restrict declarations specific to CONFIG_KVM
  target/i386: Allow elision of kvm_hv_vpindex_settable()
  target/i386: Allow elision of kvm_enable_x2apic()
  target/i386: Remove unused KVM stubs
  target/i386/cpu-sysemu: Inline kvm_apic_in_kernel()
  target/i386/helper: Restrict KVM declarations to system emulation
  hw/i386/fw_cfg: Include missing 'cpu.h' header
  hw/i386/pc: Include missing 'cpu.h' header
  hw/i386/pc: Include missing 'sysemu/tcg.h' header
  Revert "mkvenv: work around broken pip installations on Debian 10"
  mkvenv: assume presence of importlib.metadata
  Python: Drop support for Python 3.7
  configure: remove dead code
  meson: list leftover CONFIG_* symbols
  ...

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-09-07 10:29:06 -04:00
Paolo Bonzini
abd41884c5 target/i386: fix memory operand size for CVTPS2PD
CVTPS2PD only loads a half-register for memory, unlike the other
operations under 0x0F 0x5A.  "Unpack" the group into separate
emission functions instead of using gen_unary_fp_sse.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-01 23:44:39 +02:00
Paolo Bonzini
a48b26978a target/i386: generalize operand size "ph" for use in CVTPS2PD
CVTPS2PD only loads a half-register for memory, like CVTPH2PS.  It can
reuse the "ph" packed half-precision size to load a half-register,
but rename it to "xh" because it is now a variation of "x" (it is not
used only for half-precision values).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-01 23:44:39 +02:00
Paolo Bonzini
c1f27a0c6a target/i386: raise FERR interrupt with iothread locked
Otherwise tcg_handle_interrupt() triggers an assertion failure:

  #5  0x0000555555c97369 in tcg_handle_interrupt (cpu=0x555557434cb0, mask=2) at ../accel/tcg/tcg-accel-ops.c:83
  #6  tcg_handle_interrupt (cpu=0x555557434cb0, mask=2) at ../accel/tcg/tcg-accel-ops.c:81
  #7  0x0000555555b4d58b in pic_irq_request (opaque=<optimized out>, irq=<optimized out>, level=1) at ../hw/i386/x86.c:555
  #8  0x0000555555b4f218 in gsi_handler (opaque=0x5555579423d0, n=13, level=1) at ../hw/i386/x86.c:611
  #9  0x00007fffa42bde14 in code_gen_buffer ()
  #10 0x0000555555c724bb in cpu_tb_exec (cpu=cpu@entry=0x555557434cb0, itb=<optimized out>, tb_exit=tb_exit@entry=0x7fffe9bfd658) at ../accel/tcg/cpu-exec.c:457

Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1808
Reported-by: NyanCatTW1 <https://gitlab.com/a0939712328>
Co-developed-by: Richard Henderson <richard.henderson@linaro.org>'
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-01 23:44:39 +02:00
Philippe Mathieu-Daudé
09b07f286d target/translate: Include missing 'exec/cpu_ldst.h' header
All these files access the CPU LD/ST API declared in "exec/cpu_ldst.h".

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230828221314.18435-4-philmd@linaro.org>
2023-08-31 19:47:43 +02:00
Matt Borgerson
b2ea6450d8 target/i386: Check CR0.TS before enter_mmx
When CR0.TS=1, execution of x87 FPU, MMX, and some SSE instructions will
cause a Device Not Available (DNA) exception (#NM). System software uses
this exception event to lazily context switch FPU state.

Before this patch, enter_mmx helpers may be generated just before #NM
generation, prematurely resetting FPU state before the guest has a
chance to save it.

Signed-off-by: Matt Borgerson <contact@mborgerson.com>
Message-ID: <CADc=-s5F10muEhLs4f3mxqsEPAHWj0XFfOC2sfFMVHrk9fcpMg@mail.gmail.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-04 11:42:18 +02:00
Paolo Bonzini
40a205da41 target/i386: emulate 64-bit ring 0 for linux-user if LM feature is set
32-bit binaries can run on a long mode processor even if the kernel
is 64-bit, of course, and this can have slightly different behavior;
for example, SYSCALL is allowed on Intel processors.

Allow reporting LM to programs running under user mode emulation,
so that "-cpu" can be used with named CPU models even for qemu-i386
and even without disabling LM by hand.

Fortunately, most of the runtime code in QEMU has to depend on HF_LMA_MASK
or on HF_CS64_MASK (which is anyway false for qemu-i386's 32-bit code
segment) rather than TARGET_X86_64, therefore all that is needed is an
update of linux-user's ring 0 setup.

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1534
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-06-29 10:49:43 +02:00
Paolo Bonzini
63fd8ef080 target/i386: implement SYSCALL/SYSRET in 32-bit emulators
AMD supports both 32-bit and 64-bit SYSCALL/SYSRET, but the TCG only
exposes it for 64-bit targets.  For system emulation just reuse the
helper; for user-mode emulation the ABI is the same as "int $80".

The BSDs does not support any fast system call mechanism in 32-bit
mode so add to bsd-user the same stub that FreeBSD has for 64-bit
compatibility mode.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-06-26 10:23:56 +02:00
Paolo Bonzini
6750485bf4 target/i386: implement RDPID in TCG
RDPID corresponds to a RDMSR(TSC_AUX); however, it is unprivileged
so for user-mode emulation we must provide the value that the kernel
places in the MSR.  For Linux, it is a combination of the current CPU
and the current NUMA node, both of which can be retrieved with getcpu(2).
Also try sched_getcpu(), which might be there on the BSDs.  If there is
no portable way to retrieve the current CPU id from userspace, return 0.

RDTSCP is reimplemented as RDTSC + RDPID ECX; the differences in terms
of serializability are not relevant to QEMU.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-06-26 10:23:56 +02:00
Paolo Bonzini
53b9b4cc9f target/i386: sysret and sysexit are privileged
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-06-26 10:23:56 +02:00
Paolo Bonzini
75a02adf81 target/i386: AMD only supports SYSENTER/SYSEXIT in 32-bit mode
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-06-26 10:23:44 +02:00
Paolo Bonzini
fd5dcb1ccd target/i386: Intel only supports SYSCALL/SYSRET in long mode
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-06-26 10:23:07 +02:00
Paolo Bonzini
431c51e9d4 target/i386: TCG supports WBNOINVD
WBNOINVD is the same as INVD or WBINVD as far as TCG is concerned,
since there is no cache in TCG and therefore no invalidation side effect
in WBNOINVD.

With respect to SVM emulation, processors that do not support WBNOINVD
will ignore the prefix and treat it as WBINVD, while those that support
it will generate exactly the same vmexit.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-06-26 10:23:07 +02:00
Paolo Bonzini
f9e0dbae78 target/i386: do not accept RDSEED if CPUID bit absent
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-06-26 10:23:02 +02:00
Paolo Bonzini
4d714d1a0b target/i386: fix INVD vmexit
Due to a typo or perhaps a brain fart, the INVD vmexit was never generated.
Fix it (but not that fixing just the typo would break both INVD and WBINVD,
due to a case of two wrongs making a right).

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-06-26 10:23:01 +02:00
Philippe Mathieu-Daudé
de6cd7599b meson: Replace softmmu_ss -> system_ss
We use the user_ss[] array to hold the user emulation sources,
and the softmmu_ss[] array to hold the system emulation ones.
Hold the latter in the 'system_ss[]' array for parity with user
emulation.

Mechanical change doing:

  $ sed -i -e s/softmmu_ss/system_ss/g $(git grep -l softmmu_ss)

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230613133347.82210-10-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-20 10:01:30 +02:00
Philippe Mathieu-Daudé
c7b64948f8 meson: Replace CONFIG_SOFTMMU -> CONFIG_SYSTEM_ONLY
Since we *might* have user emulation with softmmu,
use the clearer 'CONFIG_SYSTEM_ONLY' key to check
for system emulation.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230613133347.82210-9-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-20 10:01:30 +02:00
Philippe Mathieu-Daudé
1dc7bb0e96 target/i386: Simplify i386_tr_init_disas_context()
Since cpu_mmu_index() is well-defined for user-only,
we can remove the surrounding #ifdef'ry entirely.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230613133347.82210-2-philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-20 10:01:30 +02:00
Philippe Mathieu-Daudé
f1cc7c28b6 target/i386: Rename helper template headers as '.h.inc'
Since commit 139c1837db ("meson: rename included C source files
to .c.inc"), QEMU standard procedure for included C files is to
use *.c.inc.

Besides, since commit 6a0057aa22 ("docs/devel: make a statement
about includes") this is documented as the Coding Style:

  If you do use template header files they should be named with
  the ``.c.inc`` or ``.h.inc`` suffix to make it clear they are
  being included for expansion.

Therefore move the included templates in the tcg/ directory and
rename as '.h.inc'.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230608133108.72655-5-philmd@linaro.org>
2023-06-13 11:28:58 +02:00
Richard Henderson
dfd1b81274 accel/tcg: Introduce translator_io_start
New wrapper around gen_io_start which takes care of the USE_ICOUNT
check, as well as marking the DisasContext to end the TB.
Remove exec/gen-icount.h.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-05 12:04:29 -07:00
Richard Henderson
d53106c997 tcg: Pass TCGHelperInfo to tcg_gen_callN
In preparation for compiling tcg/ only once, eliminate
the all_helpers array.  Instantiate the info structs for
the generic helpers in accel/tcg/, and the structs for
the target-specific helpers in each translate.c.

Since we don't see all of the info structs at startup,
initialize at first use, using g_once_init_* to make
sure we don't race while doing so.

Reviewed-by: Anton Johansson <anjo@rev.ng>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-06-05 12:04:29 -07:00
Ricky Zhou
8bf171c2d1 target/i386: Fix exception classes for MOVNTPS/MOVNTPD.
Before this change, MOVNTPS and MOVNTPD were labeled as Exception Class
4 (only requiring alignment for legacy SSE instructions). This changes
them to Exception Class 1 (always requiring memory alignment), as
documented in the Intel manual.
Message-Id: <20230501111428.95998-3-ricky@rzhou.org>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-18 08:53:50 +02:00
Ricky Zhou
cab529b0dc target/i386: Fix exception classes for SSE/AVX instructions.
Fix the exception classes for some SSE/AVX instructions to match what is
documented in the Intel manual.

These changes are expected to have no functional effect on the behavior
that qemu implements (primarily >= 16-byte memory alignment checks). For
instance, since qemu does not implement the AC flag, there is no
difference in behavior between Exception Classes 4 and 5 for
instructions where the SSE version only takes <16 byte memory operands.
Message-Id: <20230501111428.95998-2-ricky@rzhou.org>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-18 08:53:50 +02:00
Ricky Zhou
afa94dabc5 target/i386: Fix and add some comments next to SSE/AVX instructions.
Adds some comments describing what instructions correspond to decoding
table entries and fixes some existing comments which named the wrong
instruction.
Message-Id: <20230501111428.95998-1-ricky@rzhou.org>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-18 08:53:50 +02:00
Xinyu Li
056d649007 target/i386: fix avx2 instructions vzeroall and vpermdq
vzeroall: xmm_regs should be used instead of xmm_t0
vpermdq: bit 3 and 7 of imm should be considered

Signed-off-by: Xinyu Li <lixinyu20s@ict.ac.cn>
Message-Id: <20230510145222.586487-1-lixinyu20s@ict.ac.cn>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-18 08:53:50 +02:00
Paolo Bonzini
2b55e479e6 target/i386: fix operand size for VCOMI/VUCOMI instructions
Compared to other SSE instructions, VUCOMISx and VCOMISx are different:
the single and double precision versions are distinguished through a
prefix, however they use no-prefix and 0x66 for SS and SD respectively.
Scalar values usually are associated with 0xF2 and 0xF3.

Because of these, they incorrectly perform a 128-bit memory load instead
of a 32- or 64-bit load.  Fix this by writing a custom decoding function.

I tested that the reproducer is fixed and the test-avx output does not
change.

Reported-by: Gabriele Svelto <gsvelto@mozilla.com>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1637
Fixes: f8d19eec0d ("target/i386: reimplement 0x0f 0x28-0x2f, add AVX", 2022-10-18)
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-05-18 08:53:50 +02:00
Richard Henderson
732e89f4c4 tcg: Replace tcg_abort with g_assert_not_reached
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-04-23 08:17:46 +01:00
Peter Maydell
987b63f24a target/i386: Avoid unreachable variable declaration in mmu_translate()
Coverity complains (CID 1507880) that the declaration "int error_code;"
in mmu_translate() is unreachable code. Since this is only a declaration,
this isn't actually a bug, but:
 * it's a bear-trap for future changes, because if it was changed to
   include an initialization 'int error_code = foo;' then the
   initialization wouldn't actually happen (being dead code)
 * it's against our coding style, which wants declarations to be
   at the start of blocks
 * it means that anybody reading the code has to go and look up
   exactly what the C rules are for skipping over variable declarations
   using a goto

Move the declaration to the top of the function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20230406155946.3362077-1-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-20 11:17:35 +02:00
Richard Henderson
3df11bb14a target/i386: Avoid use of tcg_const_* throughout
All uses are strictly read-only.  Most of the obviously so,
as direct arguments to gen_helper_*.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-13 06:44:37 -07:00
Richard Henderson
3e7da311d7 target/i386: Simplify POPF
Compute the eflags write mask separately, leaving one call
to the helper.  Use tcg_constant_i32.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-05 13:45:31 -08:00
Richard Henderson
5128d58480 target/i386: Drop tcg_temp_free
Translators are no longer required to free tcg temporaries.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-05 13:44:08 -08:00
Richard Henderson
3a5d177322 target/i386: Don't use tcg_temp_local_new
Since tcg_temp_new is now identical, use that.
In some cases we can avoid a copy from A0 or T0.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-01 07:33:28 -10:00
Richard Henderson
597f9b2d30 accel/tcg: Pass max_insn to gen_intermediate_code by pointer
In preparation for returning the number of insns generated
via the same pointer.  Adjust only the prototypes so far.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-01 07:33:27 -10:00
Anton Johansson
34a39c2443 target/i386: Replace tb_pc() with tb->pc
Signed-off-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230227135202.9710-23-anjo@rev.ng>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-01 07:33:20 -10:00
Anton Johansson
2e3afe8e19 target/i386: Replace TARGET_TB_PCREL with CF_PCREL
Signed-off-by: Anton Johansson <anjo@rev.ng>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230227135202.9710-8-anjo@rev.ng>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-01 07:31:56 -10:00
Richard Henderson
d507e6c565 accel/tcg: Add 'size' param to probe_access_full
Change to match the recent change to probe_access_flags.
All existing callers updated to supply 0, so no change in behaviour.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-02-28 10:32:31 -10:00
Richard Henderson
9ad2ba6e8e target/i386: Fix BZHI instruction
We did not correctly handle N >= operand size.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1374
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230114233206.3118472-1-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-27 09:18:55 +01:00
Richard Henderson
6fbef9426b target/i386: Fix 32-bit AD[CO]X insns in 64-bit mode
Failure to truncate the inputs results in garbage for the carry-out.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1373
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230115012103.3131796-1-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-16 16:57:34 +01:00
Paolo Bonzini
60c7dd22e1 target/i386: fix ADOX followed by ADCX
When ADCX is followed by ADOX or vice versa, the second instruction's
carry comes from EFLAGS and the condition codes use the CC_OP_ADCOX
operation.  Retrieving the carry from EFLAGS is handled by this bit
of gen_ADCOX:

        tcg_gen_extract_tl(carry_in, cpu_cc_src,
            ctz32(cc_op == CC_OP_ADCX ? CC_C : CC_O), 1);

Unfortunately, in this case cc_op has been overwritten by the previous
"if" statement to CC_OP_ADCOX.  This works by chance when the first
instruction is ADCX; however, if the first instruction is ADOX,
ADCX will incorrectly take its carry from OF instead of CF.

Fix by moving the computation of the new cc_op at the end of the function.
The included exhaustive test case fails without this patch and passes
afterwards.

Because ADCX/ADOX need not be invoked through the VEX prefix, this
regression bisects to commit 16fc5726a6 ("target/i386: reimplement
0x0f 0x38, add AVX", 2022-10-18).  However, the mistake happened a
little earlier, when BMI instructions were rewritten using the new
decoder framework.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1471
Reported-by: Paul Jolly <https://gitlab.com/myitcv>
Fixes: 1d0b926150 ("target/i386: move scalar 0F 38 and 0F 3A instruction to new decoder", 2022-10-18)
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-11 09:07:25 +01:00
Richard Henderson
99282098dc target/i386: Fix C flag for BLSI, BLSMSK, BLSR
We forgot to set cc_src, which is used for computing C.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1370
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230114180601.2993644-1-richard.henderson@linaro.org>
Cc: qemu-stable@nongnu.org
Fixes: 1d0b926150 ("target/i386: move scalar 0F 38 and 0F 3A instruction to new decoder", 2022-10-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-11 09:07:25 +01:00
Richard Henderson
b14c009897 target/i386: Fix BEXTR instruction
There were two problems here: not limiting the input to operand bits,
and not correctly handling large extraction length.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1372
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230114230542.3116013-3-richard.henderson@linaro.org>
Cc: qemu-stable@nongnu.org
Fixes: 1d0b926150 ("target/i386: move scalar 0F 38 and 0F 3A instruction to new decoder", 2022-10-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-02-11 09:07:25 +01:00
Richard Henderson
5f0dd8cd33 target/i386: Inline cmpxchg16b
Use tcg_gen_atomic_cmpxchg_i128 for the atomic case,
and tcg_gen_qemu_ld/st_i128 otherwise.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-02-04 06:19:43 -10:00
Richard Henderson
326ad06cf5 target/i386: Inline cmpxchg8b
Use tcg_gen_atomic_cmpxchg_i64 for the atomic case,
and tcg_gen_nonatomic_cmpxchg_i64 otherwise.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-02-04 06:19:43 -10:00
Richard Henderson
6218c177af target/i386: Split out gen_cmpxchg8b, gen_cmpxchg16b
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-02-04 06:19:43 -10:00
Paolo Bonzini
3d304620ec target/i386: fix operand size of unary SSE operations
VRCPSS, VRSQRTSS and VCVTSx2Sx have a 32-bit or 64-bit memory operand,
which is represented in the decoding tables by X86_VEX_REPScalar.  Add it
to the tables, and make validate_vex() handle the case of an instruction
that is in exception type 4 without the REP prefix and exception type 5
with it; this is the cas of VRCP and VRSQRT.

Reported-by: yongwoo <https://gitlab.com/yongwoo36>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1377
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11 10:44:35 +01:00
Joe Richey
b585edca34 i386: Emit correct error code for 64-bit IDT entry
When in 64-bit mode, IDT entiries are 16 bytes, so `intno * 16` is used
for base/limit/offset calculations. However, even in 64-bit mode, the
exception error code still uses bits [3,16) for the invlaid interrupt
index.

This means the error code should still be `intno * 8 + 2` even in 64-bit
mode.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1382
Signed-off-by: Joe Richey <joerichey@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-11 09:59:38 +01:00
Richard Henderson
8218c048be target/i386: Always completely initialize TranslateFault
In get_physical_address, the canonical address check failed to
set TranslateFault.stage2, which resulted in an uninitialized
read from the struct when reporting the fault in x86_cpu_tlb_fill.

Adjust all error paths to use structure assignment so that the
entire struct is always initialized.

Reported-by: Daniel Hoffman <dhoff749@gmail.com>
Fixes: 9bbcf37219 ("target/i386: Reorg GET_HPHYS")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221201074522.178498-1-richard.henderson@linaro.org>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1324
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-01 09:53:24 +01:00
Paolo Bonzini
38e65936a8 target/i386: allow MMX instructions with CR4.OSFXSR=0
MMX state is saved/restored by FSAVE/FRSTOR so the instructions are
not illegal opcodes even if CR4.OSFXSR=0.  Make sure that validate_vex
takes into account the prefix and only checks HF_OSFXSR_MASK in the
presence of an SSE instruction.

Fixes: 20581aadec ("target/i386: validate VEX prefixes via the instructions' exception classes", 2022-10-18)
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1350
Reported-by: Helge Konetzka (@hejko on gitlab.com)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-12-01 09:05:05 +01:00
Paolo Bonzini
35d95e4126 target/i386: hardcode R_EAX as destination register for LAHF/SAHF
When translating code that is using LAHF and SAHF in combination with the
REX prefix, the instructions should not use any other register than AH;
however, QEMU selects SPL (SP being register 4, just like AH) if the
REX prefix is present.  To fix this, use deposit directly without
going through gen_op_mov_v_reg and gen_op_mov_reg_v.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/130
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-11-15 09:34:42 +10:00
Paolo Bonzini
d1bb978ba1 target/i386: fix cmpxchg with 32-bit register destination
Unlike the memory case, where "the destination operand receives a write
cycle without regard to the result of the comparison", rm must not be
touched altogether if the write fails, including not zero-extending
it on 64-bit processors.  This is not how the movcond currently works,
because it is always followed by a gen_op_mov_reg_v to rm.

To fix it, introduce a new function that is similar to gen_op_mov_reg_v
but writes to a TCG temporary.

Considering that gen_extu(ot, oldv) is not needed in the memory case
either, the two cases for register and memory destinations are different
enough that one might as well fuse the two "if (mod == 3)" into one.
So do that too.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/508
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[rth: Add a test case ]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-11-15 09:34:42 +10:00
Stefan Hajnoczi
7f5acfcb66 * bug fixes
* reduced memory footprint for IPI virtualization on Intel processors
 * asynchronous teardown support (Linux only)
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmNiVykUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroN0Swf/YxjphCtFgYYSO14WP+7jAnfRZLhm
 0xWChWP8rco5I352OBFeFU64Av5XoLGNn6SZLl8lcg86lQ/G0D27jxu6wOcDDHgw
 0yTDO1gevj51UKsbxoC66OWSZwKTEo398/BHPDcI2W41yOFycSdtrPgspOrFRVvf
 7M3nNjuNPsQorZeuu8NGr3jakqbt99ZDXcyDEWbrEAcmy2JBRMbGgT0Kdnc6aZfW
 CvL+1ljxzldNwGeNBbQW2QgODbfHx5cFZcy4Daze35l5Ra7K/FrgAzr6o/HXptya
 9fEs5LJQ1JWI6JtpaWwFy7fcIIOsJ0YW/hWWQZSDt9JdAJFE5/+vF+Kz5Q==
 =CgrO
 -----END PGP SIGNATURE-----

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

* bug fixes
* reduced memory footprint for IPI virtualization on Intel processors
* asynchronous teardown support (Linux only)

# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmNiVykUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroN0Swf/YxjphCtFgYYSO14WP+7jAnfRZLhm
# 0xWChWP8rco5I352OBFeFU64Av5XoLGNn6SZLl8lcg86lQ/G0D27jxu6wOcDDHgw
# 0yTDO1gevj51UKsbxoC66OWSZwKTEo398/BHPDcI2W41yOFycSdtrPgspOrFRVvf
# 7M3nNjuNPsQorZeuu8NGr3jakqbt99ZDXcyDEWbrEAcmy2JBRMbGgT0Kdnc6aZfW
# CvL+1ljxzldNwGeNBbQW2QgODbfHx5cFZcy4Daze35l5Ra7K/FrgAzr6o/HXptya
# 9fEs5LJQ1JWI6JtpaWwFy7fcIIOsJ0YW/hWWQZSDt9JdAJFE5/+vF+Kz5Q==
# =CgrO
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 02 Nov 2022 07:40:25 EDT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [full]
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* tag 'for-upstream' of https://gitlab.com/bonzini/qemu:
  target/i386: Fix test for paging enabled
  util/log: Close per-thread log file on thread termination
  target/i386: Set maximum APIC ID to KVM prior to vCPU creation
  os-posix: asynchronous teardown for shutdown on Linux
  target/i386: Fix calculation of LOCK NEG eflags

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-11-03 10:54:37 -04:00
Richard Henderson
03a60ae9ca target/i386: Fix test for paging enabled
If CR0.PG is unset, pg_mode will be zero, but it will also be zero
for non-PAE/non-PSE page tables with CR0.WP=0.  Restore the
correct test for paging enabled.

Fixes: 98281984a3 ("target/i386: Add MMU_PHYS_IDX and MMU_NESTED_IDX")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1269
Reported-by: Andreas Gustafsson <gson@gson.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221102091232.1092552-1-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-11-02 12:35:16 +01:00
Richard Henderson
6317933086 target/i386: Expand eflags updates inline
The helpers for reset_rf, cli, sti, clac, stac are
completely trivial; implement them inline.

Drop some nearby #if 0 code.

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-11-01 08:31:41 +11:00
Richard Henderson
3d419a4dd2 accel/tcg: Remove will_exit argument from cpu_restore_state
The value passed is always true, and if the target's
synchronize_from_tb hook is non-trivial, not exiting
may be erroneous.

Reviewed-by: Claudio Fontana <cfontana@suse.de>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-11-01 08:31:41 +11:00
Qi Hu
1215317510 target/i386: Fix calculation of LOCK NEG eflags
After:

        lock negl -0x14(%rbp)
        pushf
        pop    %rax

%rax will contain the wrong value because the "lock neg" calculates the
wrong eflags.  Simple test:

        #include <assert.h>

        int main()
        {
          __volatile__ unsigned test = 0x2363a;
          __volatile__ char cond = 0;
          asm(
              "lock negl %0 \n\t"
              "sets %1"
              : "=m"(test), "=r"(cond));
          assert(cond & 1);
          return 0;
        }

Reported-by: Jinyang Shen <shenjinyang@loongson.cn>
Co-Developed-by: Xuehai Chen <chenxuehai@loongson.cn>
Signed-off-by: Xuehai Chen <chenxuehai@loongson.cn>
Signed-off-by: Qi Hu <huqi@loongson.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-31 09:46:34 +01:00
Richard Henderson
434382e640 target/i386: Convert to tcg_ops restore_state_to_opc
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-10-26 11:11:28 +10:00
Paolo Bonzini
2872b0f390 target/i386: implement FMA instructions
The only issue with FMA instructions is that there are _a lot_ of them (30
opcodes, each of which comes in up to 4 versions depending on VEX.W and
VEX.L; a total of 96 possibilities).  However, they can be implement with
only 6 helpers, two for scalar operations and four for packed operations.
(Scalar versions do not do any merging; they only affect the bottom 32
or 64 bits of the output operand.  Therefore, there is no separate XMM
and YMM of the scalar helpers).

First, we can reduce the number of helpers to one third by passing four
operands (one output and three inputs); the reordering of which operands
go to the multiply and which go to the add is done in emit.c.

Second, the different instructions also dispatch to the same softfloat
function, so the flags for float32_muladd and float64_muladd are passed
in the helper as int arguments, with a little extra complication to
handle FMADDSUB and FMSUBADD.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-22 09:05:54 +02:00
Paolo Bonzini
cf5ec6641e target/i386: implement F16C instructions
F16C only consists of two instructions, which are a bit peculiar
nevertheless.

First, they access only the low half of an YMM or XMM register for the
packed-half operand; the exact size still depends on the VEX.L flag.
This is similar to the existing avx_movx flag, but not exactly because
avx_movx is hardcoded to affect operand 2.  To this end I added a "ph"
format name; it's possible to reuse this approach for the VPMOVSX and
VPMOVZX instructions, though that would also require adding two more
formats for the low-quarter and low-eighth of an operand.

Second, VCVTPS2PH is somewhat weird because it *stores* the result of
the instruction into memory rather than loading it.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-20 15:16:18 +02:00
Paolo Bonzini
314d3eff66 target/i386: introduce function to set rounding mode from FPCW or MXCSR bits
VROUND, FSTCW and STMXCSR all have to perform the same conversion from
x86 rounding modes to softfloat constants.  Since the ISA is consistent
on the meaning of the two-bit rounding modes, extract the common code
into a wrapper for set_float_rounding_mode.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-20 15:16:13 +02:00
Paolo Bonzini
0d4bcac3ca target/i386: decode-new: avoid out-of-bounds access to xmm_regs[-1]
If the destination is a memory register, op->n is -1.  Going through
tcg_gen_gvec_dup_imm path is both useless (the value has been stored
by the gen_* function already) and wrong because of the out-of-bounds
access.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-20 15:15:50 +02:00
Paolo Bonzini
653fad2497 target/i386: remove old SSE decoder
With all SSE (and AVX!) instructions now implemented in disas_insn_new,
it's possible to remove gen_sse, as well as the helpers for instructions
that now use gvec.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
71a0891d61 target/i386: move 3DNow to the new decoder
This adds another kind of weirdness when you thought you had seen it all:
an opcode byte that comes _after_ the address, not before.  It's not
worth adding a new X86_SPECIAL_* constant for it, but it's actually
not unlike VCMP; so, forgive me for exploiting the similarity and just
deciding to dispatch to the right gen_helper_* call in a single code
generation function.

In fact, the old decoder had a bug where s->rip_offset should have
been set to 1 for 3DNow! instructions, and it's fixed now.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
57f6bba023 target/i386: implement VLDMXCSR/VSTMXCSR
These are exactly the same as the non-VEX version, but one has to be careful
that only VEX.L=0 is allowed.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
892544317f target/i386: implement XSAVE and XRSTOR of AVX registers
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
f8d19eec0d target/i386: reimplement 0x0f 0x28-0x2f, add AVX
Here the code is a bit uglier due to the truncation and extension
of registers to and from 32-bit.  There is also a mistake in the
manual with respect to the size of the memory operand of CVTPS2PI
and CVTTPS2PI, reported by Ricky Zhou.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
7170a17ec3 target/i386: reimplement 0x0f 0x10-0x17, add AVX
These are mostly moves, and yet are a total pain.  The main issue
is that:

1) some instructions are selected by mod==11 (register operand)
vs. mod=00/01/10 (memory operand)

2) stores to memory are two-operand operations, while the 3-register
and load-from-memory versions operate on the entire contents of the
destination; this makes it easier to separate the gen_* function for
the store case

3) it's inefficient to load into xmm_T0 only to move the value out
again, so the gen_* function for the load case is separated too

The manual also has various mistakes in the operands here, for example
the store case of MOVHPS operates on a 128-bit source (albeit discarding
the bottom 64 bits) and therefore should be Mq,Vdq rather than Mq,Vq.
Likewise for the destination and source of MOVHLPS.

VUNPCK?PS and VUNPCK?PD are the same as VUNPCK?DQ and VUNPCK?QDQ,
but encoded as prefixes rather than separate operands.  The helpers
can be reused however.

For MOVSLDUP, MOVSHDUP and MOVDDUP I chose to reimplement them as
helpers.  I named the helper for MOVDDUP "movdldup" in preparation
for possible future introduction of MOVDHDUP and to clarify the
similarity with MOVSLDUP.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
aba2b8ecb9 target/i386: reimplement 0x0f 0xc2, 0xc4-0xc6, add AVX
Nothing special going on here, for once.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
16fc5726a6 target/i386: reimplement 0x0f 0x38, add AVX
There are several special cases here:

1) extending moves have different widths for the helpers vs. for the
memory loads, and the width for memory loads depends on VEX.L too.
This is represented by X86_SPECIAL_AVXExtMov.

2) some instructions, such as variable-width shifts, select the vector element
size via REX.W.

3) VSIB instructions (VGATHERxPy, VPGATHERxy) are also part of this group,
and they have (among other things) two output operands.

3) the macros for 4-operand blends (which are under 0x0f 0x3a) have to be
extended to support 2-operand blends.  The 2-operand variant actually
came a few years earlier, but it is clearer to implement them in the
opposite order.

X86_TYPE_WM, introduced earlier for unaligned loads, is reused for helpers
that accept a Reg* but have a M argument.

These three-byte opcodes also include AVX new instructions, for which
the helpers were originally implemented by Paul Brook <paul@nowt.org>.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Richard Henderson
d4af67a27a target/i386: Use tcg gvec ops for pmovmskb
As pmovmskb is used by strlen et al, this is the third
highest overhead sse operation at %0.8.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
[Reorganize to generate code for any vector size. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
7906847768 target/i386: reimplement 0x0f 0x3a, add AVX
The more complicated operations here are insertions and extractions.
Otherwise, there are just more entries than usual because the PS/PD/SS/SD
variations are encoded in the opcode rater than in the prefixes.

These three-byte opcodes also include AVX new instructions, whose
implementation in the helpers was originally done by Paul Brook
<paul@nowt.org>.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
6bbeb98d10 target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVX
The more complicated ones here are d6-d7, e6-e7, f7.  The others
are trivial.

For LDDQU, using gen_load_sse directly might corrupt the register if
the second part of the load fails.  Therefore, add a custom X86_TYPE_WM
value; like X86_TYPE_W it does call gen_load(), but it also rejects a
value of 11 in the ModRM field like X86_TYPE_M.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
ce4fcb9478 target/i386: reimplement 0x0f 0x70-0x77, add AVX
This includes shifts by immediate, which use bits 3-5 of the ModRM byte
as an opcode extension.  With the exception of 128-bit shifts, they are
implemented using gvec.

This also covers VZEROALL and VZEROUPPER, which use the same opcode
as EMMS.  If we were wanting to optimize out gen_clear_ymmh then this
would be one of the starting points.  The implementation of the VZEROALL
and VZEROUPPER helpers is by Paul Brook.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
d1c1a4222c target/i386: reimplement 0x0f 0x78-0x7f, add AVX
These are a mixed batch, including the first two horizontal
(66 and F2 only) operations, more moves, and SSE4a extract/insert.

Because SSE4a is pretty rare, I chose to leave the helper as they are,
but it is possible to unify them by loading index and length from the
source XMM register and generating deposit or extract TCG ops.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
03b4588070 target/i386: reimplement 0x0f 0x50-0x5f, add AVX
These are mostly floating-point SSE operations.  The odd ones out
are MOVMSK and CVTxx2yy, the others are straightforward.

Unary operations are a bit special in AVX because they have 2 operands
for PD/PS operands (VEX.vvvv must be 1111b), and 3 operands for SD/SS.
They are handled using X86_OP_GROUP3 for compactness.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:05 +02:00
Paolo Bonzini
1d0efbdb35 target/i386: reimplement 0x0f 0xd8-0xdf, 0xe8-0xef, 0xf8-0xff, add AVX
These are more simple integer instructions present in both MMX and SSE/AVX,
with no holes that were later occupied by newer instructions.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
92ec056a6b target/i386: reimplement 0x0f 0x60-0x6f, add AVX
These are both MMX and SSE/AVX instructions, except for vmovdqu.  In both
cases the inputs and output is in s->ptr{0,1,2}, so the only difference
between MMX, SSE, and AVX is which helper to call.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
b98f886c8f target/i386: Introduce 256-bit vector helpers
The new implementation of SSE will cover AVX from the get go, because
all the work for the helper functions is already done.  We just need to
build them.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
620f75566a target/i386: provide 3-operand versions of unary scalar helpers
Compared to Paul's implementation, the new decoder will use a different approach
to implement AVX's merging of dst with src1 on scalar operations.  Adjust the
old SSE decoder to be compatible with new-style helpers.

The affected instructions are CVTSx2Sx, ROUNDSx, RSQRTSx, SQRTSx, RCPSx.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
f05f9789f5 target/i386: extend helpers to support VEX.V 3- and 4- operand encodings
Add to the helpers all the operands that are needed to implement AVX.

Extracted from a patch by Paul Brook <paul@nowt.org>.

Message-Id: <20220424220204.2493824-26-paul@nowt.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
1d0b926150 target/i386: move scalar 0F 38 and 0F 3A instruction to new decoder
Because these are the only VEX instructions that QEMU supports, the
new decoder is entered on the first byte of a valid VEX prefix, and VEX
decoding only needs to be done in decode-new.c.inc.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
55a3328669 target/i386: validate SSE prefixes directly in the decoding table
Many SSE and AVX instructions are only valid with specific prefixes
(none, 66, F3, F2).  Introduce a direct way to encode this in the
decoding table to avoid using decode groups too much.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
20581aadec target/i386: validate VEX prefixes via the instructions' exception classes
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paul Brook
608db8dbfb target/i386: add AVX_EN hflag
Add a new hflag bit to determine whether AVX instructions are allowed

Signed-off-by: Paul Brook <paul@nowt.org>
Message-Id: <20220424220204.2493824-4-paul@nowt.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
caa01fadbe target/i386: add CPUID feature checks to new decoder
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
268dc4648f target/i386: add CPUID[EAX=7,ECX=0].ECX to DisasContext
TCG will shortly implement VAES instructions, so add the relevant feature
word to the DisasContext.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
6ba13999be target/i386: add ALU load/writeback core
Add generic code generation that takes care of preparing operands
around calls to decode.e.gen in a table-driven manner, so that ALU
operations need not take care of that.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
b3e22b2318 target/i386: add core of new i386 decoder
The new decoder is based on three principles:

- use mostly table-driven decoding, using tables derived as much as possible
  from the Intel manual.  Centralizing the decode the operands makes it
  more homogeneous, for example all immediates are signed.  All modrm
  handling is in one function, and can be shared between SSE and ALU
  instructions (including XMM<->GPR instructions).  The SSE/AVX decoder
  will also not have duplicated code between the 0F, 0F38 and 0F3A tables.

- keep the code as "non-branchy" as possible.  Generally, the code for
  the new decoder is more verbose, but the control flow is simpler.
  Conditionals are not nested and have small bodies.  All instruction
  groups are resolved even before operands are decoded, and code
  generation is separated as much as possible within small functions
  that only handle one instruction each.

- keep address generation and (for ALU operands) memory loads and writeback
  as much in common code as possible.  All ALU operations for example
  are implemented as T0=f(T0,T1).  For non-ALU instructions,
  read-modify-write memory operations are rare, but registers do not
  have TCGv equivalents: therefore, the common logic sets up pointer
  temporaries with the operands, while load and writeback are handled
  by gvec or by helpers.

These principles make future code review and extensibility simpler, at
the cost of having a relatively large amount of code in the form of this
patch.  Even EVEX should not be _too_ hard to implement (it's just a crazy
large amount of possibilities).

This patch introduces the main decoder flow, and integrates the old
decoder with the new one.  The old decoder takes care of parsing
prefixes and then optionally drops to the new one.  The changes to the
old decoder are minimal and allow it to be replaced incrementally with
the new one.

There is a debugging mechanism through a "LIMIT" environment variable.
In user-mode emulation, the variable is the number of instructions
decoded by the new decoder before permanently switching to the old one.
In system emulation, the variable is the highest opcode that is decoded
by the new decoder (this is less friendly, but it's the best that can
be done without requiring deterministic execution).

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
a61ef762f9 target/i386: make rex_w available even in 32-bit mode
REX.W can be used even in 32-bit mode by AVX instructions, where it is retroactively
renamed to VEX.W.  Make the field available even in 32-bit mode but keep the REX_W()
macro as it was; this way, that the handling of dflag does not use it by mistake and
the AVX code more clearly points at the special VEX behavior of the bit.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Paolo Bonzini
12a2c9c72c target/i386: make ldo/sto operations consistent with ldq
ldq takes a pointer to the first byte to load the 64-bit word in;
ldo takes a pointer to the first byte of the ZMMReg.  Make them
consistent, which will be useful in the new SSE decoder's
load/writeback routines.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Richard Henderson
8629e77be5 target/i386: Use probe_access_full for final stage2 translation
Rather than recurse directly on mmu_translate, go through the
same softmmu lookup that we did for the page table walk.
This centralizes all knowledge of MMU_NESTED_IDX, with respect
to setup of TranslationParams, to get_physical_address.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221002172956.265735-10-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Richard Henderson
4a1e9d4d11 target/i386: Use atomic operations for pte updates
Use probe_access_full in order to resolve to a host address,
which then lets us use a host cmpxchg to update the pte.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/279
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221002172956.265735-9-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Richard Henderson
11b4e971dc target/i386: Combine 5 sets of variables in mmu_translate
We don't need one variable set per translation level,
which requires copying into pte/pte_addr for huge pages.
Standardize on pte/pte_addr for all levels.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221002172956.265735-8-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Richard Henderson
726ea33531 target/i386: Use MMU_NESTED_IDX for vmload/vmsave
Use MMU_NESTED_IDX for each memory access, rather than
just a single translation to physical.  Adjust svm_save_seg
and svm_load_seg to pass in mmu_idx.

This removes the last use of get_hphys so remove it.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221002172956.265735-7-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00
Richard Henderson
98281984a3 target/i386: Add MMU_PHYS_IDX and MMU_NESTED_IDX
These new mmu indexes will be helpful for improving
paging and code throughout the target.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20221002172956.265735-6-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18 13:58:04 +02:00