CPUState *cpu gets added to the cpus list during cpu_exec_init(). It
should be removed from cpu_exec_exit().
cpu_exec_exit() is called from generic CPU::instance_finalize and some
archs like PowerPC call it from CPU unrealizefn. So ensure that we
dequeue the cpu only once.
Now -1 value for cpu->cpu_index indicates that we have already dequeued
the cpu for CONFIG_USER_ONLY case also.
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
This will enable decoding of hrfid
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Otherwise tight loops at smt_low for example, which OPAL does,
eat so much CPU that we can't boot a kernel anymore. With that,
I can boot 8 CPUs just fine with powernv.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Otherwise it will trip on the forms used in recent architecture.
Ideally, we should have different handlers for different architecture
levels but our current implementation of TLB flushing is dumb enough
that this will do for now.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Not that anything remotely recent supports tlbia but ...
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
On ppc64 especially, we flush the tlb on any slbie or tlbie instruction.
However, those instructions often come in bursts of 3 or more (context
switch will favor a series of slbie's for example to an slbia if the
SLB has less than a certain number of entries in it, and tlbie's can
happen in a series, with PAPR, H_BULK_REMOVE can remove up to 4 entries
at a time.
Doing a tlb_flush() each time is a waste of time. We end up doing a memset
of the whole TLB, reloading it for the next instruction, memset'ing again,
etc...
Those instructions don't have to take effect immediately. For slbie, they
can wait for the next context synchronizing event. For tlbie, the next
tlbsync.
This implements batching by keeping a flag that indicates that we have a
TLB in need of flushing. We check it on interrupts, rfi's, isync's and
tlbsync and flush the TLB if needed.
This reduces the number of tlb_flush() on a boot to a ubuntu installer
first dialog screen from roughly 360K down to 36K.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[clg: added a 'CPUPPCState *' variable in h_remove() and
h_bulk_remove() ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
[dwg: removed spurious whitespace change, use 0/1 not true/false
consistently, since tlb_need_flush has int type]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We rework the way the MMU indices are calculated, providing separate
indices for I and D side based on MSR:IR and MSR:DR respectively,
and thus no longer need to flush the TLB on context changes. This also
adds correct support for HV as a separate address space.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
We don't use the resulting accessors and this gets in the way of
the split I/D TLB work.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Let users of qemu_get_ram_ptr and qemu_ram_ptr_length pass in an
address that is relative to the MemoryRegion. This basically means
what address_space_translate returns.
Because the semantics of the second parameter change, rename the
function to qemu_map_ram_ptr.
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move the old qemu_ram_addr_from_host to memory_region_from_host and
make it return an offset within the region. For qemu_ram_addr_from_host
return the ram_addr_t directly, similar to what it was before
commit 1b5ec23 ("memory: return MemoryRegion from qemu_ram_addr_from_host",
2013-07-04).
Reviewed-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Of the two callers, one does not use it, and the other can compute
it itself based on the other output argument (offset) and the RAMBlock.
Reviewed-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Remove direct uses of ram_addr_t and optimize memory_region_{get,set}_fd
now that a MemoryRegion knows its RAMBlock directly.
Reviewed-by: Marc-André Lureau <marcandre.lureau@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The rationale is similar to the above mode sense response interception:
this is practically the only channel to communicate restraints from
elsewhere such as host and block driver.
The scsi bus we attach onto can have a larger max xfer len than what is
accepted by the host file system (guarding between the host scsi LUN and
QEMU), in which case the SG_IO we generate would get -EINVAL.
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-Id: <1464243305-10661-3-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Using pread/pwrite or io_submit has the advantage of eliminating the
bounce buffer, but drops the SCSI status. This keeps the guest from
seeing unit attention codes, as well as statuses such as RESERVATION
CONFLICT. Because we know scsi-block operates on an SBC device we can
still use the DMA helpers with SG_IO; just remember to patch the CDBs
if the transfer is split into multiple segments.
This means that scsi-block will always use the thread-pool unfortunately,
instead of respecting aio=native.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commonize all the checks for canceled requests and errors. The next patch
will add another case to check for, in order to handle passthrough commands.
There is no semantic change here; the only nontrivial modification is in
scsi_write_do_fua, where cancellation has been checked earlier by both
callers. Thus, the check is replaced with an assertion.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
scsi-block will be able to do FUA just by passing the request through
to the LUN (which is also more efficient); there is no need to emulate
it like we do for scsi-disk.
Add a new method to distinguish this.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
These are replacements for blk_aio_readv and blk_aio_writev that allow
customization of the data path. They reuse the DMA helpers' DMAIOFunc
callback type, so that the same function can be used in either the
QEMUSGList or the bounce-buffered case.
This customization will be needed in the next patch to do zero-copy
SG_IO on scsi-block.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This will be the place to add DMAIOFuncs in the next patch. There
are also a couple DeviceClass members that can be moved to the
abstract class's initialization function.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Since Xen will correctly handle accesses to unimplemented I/O ports (by
returning all 1's for reads and ignoring writes) there is no need for
QEMU to register backgroud I/O sections.
This patch therefore adds checks to xen_io_add/del so that sections with
memory-region ops pointing at 'unassigned_io_ops' are ignored.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1462811480-16295-1-git-send-email-paul.durrant@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Over time, some differences between QEMU and Linux atomics are getting
smoothed. In particular, Linux grew atomic_fetch_or (and in general
the differences regarding RMW operations were not described accurately)
and smp_load_acquire/smp_store_release. Also, set_mb was renamed to
smp_store_mb(). Include these changes in the documentation.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently we emit a consume-load in atomic_rcu_read. Because of
limitations in current compilers, this is overkill for non-Alpha hosts
and it is only useful to make Thread Sanitizer work.
This patch leaves the consume-load in atomic_rcu_read when
compiling with Thread Sanitizer enabled, and resorts to a
relaxed load + smp_read_barrier_depends otherwise.
On an RMO host architecture, such as aarch64, the performance
improvement of this change is easily measurable. For instance,
qht-bench performs an atomic_rcu_read on every lookup. Performance
before and after applying this patch:
$ tests/qht-bench -d 5 -n 1
Before: 9.78 MT/s
After: 10.96 MT/s
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1464120374-8950-4-git-send-email-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
For correctness, smp_read_barrier_depends() is only required to
emit a barrier on Alpha hosts. However, we are currently emitting
a consume fence unconditionally, and most compilers currently treat
consume and acquire fences as equivalent.
Fix it by keeping the consume fence if we're compiling with Thread
Sanitizer, since this might help prevent false warnings. Otherwise,
only emit the barrier for Alpha hosts. Note that we still guarantee
that smp_read_barrier_depends() is a compiler barrier.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1464120374-8950-3-git-send-email-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Recently Linux did a mass conversion of its atomic_read/set calls
so that they at least are READ/WRITE_ONCE. See Linux's commit
62e8a325 ("atomic, arch: Audit atomic_{read,set}()"). It seems though
that their documentation hasn't been updated to reflect this.
The appended updates our documentation to reflect the change, which
means there is effectively no difference between our atomic_read/set
and the current Linux implementation.
While at it, fix the statement that a barrier is implied by
atomic_read/set, which is incorrect. Volatile/atomic semantics prevent
transformations pertaining the variable they apply to; this, however,
has no effect on surrounding statements like barriers do. For more
details on this, see:
https://gcc.gnu.org/onlinedocs/gcc/Volatiles.html
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1464120374-8950-2-git-send-email-cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The usage of INT_MAX in this function confuses Coverity. I think
the defect is bogus, however there is no protection against
getting more than sizeof(s->inpkt) bytes from the character device
backend.
Rewrite the function to only fill in as much data as needed from
buf into s->inpkt. The plen variable is replaced by a simple
state machine and there is no need anymore to shift contents to
the beginning of s->inpkt.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
at least in the path via virtio-blk the maximum size is not
restricted.
Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Message-Id: <1464080368-29584-1-git-send-email-pl@kamp.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
While doing MegaRAID SAS controller command frame lookup, routine
'megasas_lookup_frame' uses 'read_queue_head' value as an index
into 'frames[MEGASAS_MAX_FRAMES=2048]' array. Limit its value
within array bounds to avoid any OOB access.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1464179110-18593-1-git-send-email-ppandit@redhat.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When reading MegaRAID SAS controller configuration via MegaRAID
Firmware Interface(MFI) commands, routine megasas_dcmd_cfg_read
uses an uninitialised local data buffer. Initialise this buffer
to avoid stack information leakage.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1464178304-12831-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
When setting MegaRAID SAS controller properties via MegaRAID
Firmware Interface(MFI) commands, a user supplied size parameter
is used to set property value. Use appropriate size value to avoid
OOB access issues.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Message-Id: <1464172291-2856-2-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The LSI SAS1068 Host Bus Adapter emulator in Qemu, periodically
looks for requests and fetches them. A loop doing that in
mptsas_fetch_requests() could run infinitely if 's->state' was
not operational. Move check to avoid such a loop.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-stable@nongnu.org
Message-Id: <1464077264-25473-1-git-send-email-ppandit@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Vmware Paravirtual SCSI emulation uses command descriptors to
process SCSI commands. These descriptors come with their ring
buffers. A guest could set the ring buffer size to an arbitrary
value leading to OOB access issue. Add check to avoid it.
Reported-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org>
Cc: qemu-stable@nongnu.org
Message-Id: <1464000485-27041-1-git-send-email-ppandit@redhat.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@ravellosystems.com>
Reviewed-by: Dmitry Fleytman <dmitry@daynix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Similar to commit df7b97ff, we are mishandling clients that
give an unaligned NBD_CMD_TRIM request, and potentially
trimming bytes that occur before their request; which in turn
can cause potential unintended data loss (unlikely in
practice, since most clients are sane and issue aligned trim
requests). However, while we fixed read and write by switching
to the byte interfaces of blk_, we don't yet have a byte
interface for discard. On the other hand, trim is advisory, so
rounding the user's request to simply ignore the first and last
unaligned sectors (or the entire request, if it is sub-sector
in length) is just fine.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <1464173965-9694-1-git-send-email-eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
drop the qemu_char_get_next_serial and use chardev prop instead
Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-6-git-send-email-zxq_yx_007@163.com>
Tested-by: Michael Walle <michael@walle.cc>
Acked-by: Michael Walle <michael@walle.cc>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Drop the old SysBus init function and use instance_init
* Call qemu_chr_add_handlers in the realize callback
* Use qdev chardev prop instead of qemu_char_get_next_serial
* Add lm32_uart_create function to create lm32 uart device
Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-5-git-send-email-zxq_yx_007@163.com>
Tested-by: Michael Walle <michael@walle.cc>
Acked-by: Michael Walle <michael@walle.cc>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Drop the old SysBus init function
* Call qemu_chr_add_handlers in the realize callback
* Use qdev chardev prop instead of qemu_char_get_next_serial
Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-4-git-send-email-zxq_yx_007@163.com>
Tested-by: Michael Walle <michael@walle.cc>
Acked-by: Michael Walle <michael@walle.cc>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Drop the old SysBus init function and use instance_init
* Call qemu_chr_add_handlers in the realize callback
* Use qdev chardev prop instead of qemu_char_get_next_serial
* Add etraxfs_ser_create function to create etraxfs serial device
Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-3-git-send-email-zxq_yx_007@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Drop the old SysBus init function and use instance_init
* Call qemu_chr_add_handlers in the realize callback
Signed-off-by: xiaoqiang zhao <zxq_yx_007@163.com>
Message-Id: <1464158344-12266-2-git-send-email-zxq_yx_007@163.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
This reverts commit 5b5660adf1,
as it breaks the UEFI guest firmware (known as ArmVirtPkg or AAVMF)
running in the "virt" machine type of "qemu-system-aarch64":
Contrary to the commit message, (a->mr == b->mr) does *not* imply
that (a->romd_mode == b->romd_mode): the pflash device model calls
memory_region_rom_device_set_romd() -- for switching between the above
modes --, and that function changes mr->romd_mode but the current
AddressSpaceDispatch's FlatRange keeps the old value. Therefore
region_del/region_add are not called on the KVM MemoryListener.
Reported-by: Drew Jones <drjones@redhat.com>
Tested-by: Drew Jones <drjones@redhat.com>
Analyzed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAV0g0rrRIkN7ePJvAAQh+hA//TZOrmRskN7/PzlGp90pkNVL4Xm6LDNsj
fvgzB4ceuA+eRv426ttSpYgZestZjxmb/29c5UJWvH0Otnhrxj0aNKimxAO4FCIV
u2u73Mn4cTGZPfoMJesCYihC19LIFIpYFPjXTKz2UJ4Dg/ZLBd6KORDTC7kfXoSx
9wTgaHZkbrUBCwWQ+0SKrbibPJWY4RHUkYCK4dxCCPFRVpGhYFt1vihjZpzDzzNH
njjxr/GB8W1OByBI90TWUpx6e0YRqhKluuvr1ObezrK2/nCZhLwOkvgIB1yuew3e
AwFz5wUFtx80FV3ThAKDoJadz6tVKPG67OG2w5DmgUrK6gpOTP9n1LcypTK+wti4
/LNb9AhyEqsDD/apFQWNEEdYwCG6qjlaE01zp6ZQPgNUWpN1BSEDmikoHwpUqLaq
fXfgJZyOItp6LP3Tajkto2h9iAdXI6eoJHQtMK562J89WZz1Bb4IJxONAkWZq5oV
aQCOj0wD/vObUhMU72jLy2cd3kHM/RT5NliMhb+nbR9JY/WXbTeOn41hOhEnkJJJ
bjRPCN2O3ULfi3yn3St2ByCngPUG954uqmR2OaOGwEHGx6E/UEkWsduUlRwwY1zK
96pZvsHO8DSIDc5gNmUlvQlBc62v8b+KUcwsbloBCl2YGt9mbZNM9zi0Gg4R97MJ
jHkpVjF4L4I=
=/Nln
-----END PGP SIGNATURE-----
Merge remote-tracking branch 'remotes/riku/tags/pull-linux-user-20160527' into staging
linux-user pull request v2 for may 2016
# gpg: Signature made Fri 27 May 2016 12:51:10 BST using RSA key ID DE3C9BC0
# gpg: Good signature from "Riku Voipio <riku.voipio@iki.fi>"
# gpg: aka "Riku Voipio <riku.voipio@linaro.org>"
* remotes/riku/tags/pull-linux-user-20160527: (38 commits)
linux-user,target-ppc: fix use of MSR_LE
linux-user/signal.c: Use s390 target space address instead of host space
linux-user/signal.c: Use target address instead of host address for microblaze restorer
linux-user/signal.c: Generate opcode data for restorer in setup_rt_frame
linux-user: arm: Remove ARM_cpsr and similar #defines
linux-user: Use direct syscalls for setuid(), etc
linux-user: x86_64: Don't use 16-bit UIDs
linux-user: Use g_try_malloc() in do_msgrcv()
linux-user: Handle msgrcv error case correctly
linux-user: Handle negative values in timespec conversion
linux-user: Use safe_syscall for futex syscall
linux-user: Use safe_syscall for pselect, select syscalls
linux-user: Use safe_syscall for execve syscall
linux-user: Use safe_syscall for wait system calls
linux-user: Use safe_syscall for open and openat system calls
linux-user: Use safe_syscall for read and write system calls
linux-user: Provide safe_syscall for fixing races between signals and syscalls
linux-user: Add debug code to exercise restarting system calls
linux-user: Support for restarting system calls for Microblaze targets
linux-user: Set r14 on exit from microblaze syscall
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
setup_frame()/setup_rt_frame()/restore_user_regs() are using
MSR_LE as the similar kernel functions do: as a bitmask.
But in QEMU, MSR_LE is a bit position, so change this
accordingly.
The previous code was doing nothing as MSR_LE is 0,
and "env->msr &= ~MSR_LE" doesn't change the value of msr.
And yes, a user process can change its endianness,
see linux kernel commit:
fab5db9 [PATCH] powerpc: Implement support for setting little-endian mode via prctl
and prctl(2): PR_SET_ENDIAN, PR_GET_ENDIAN
Reviewed-by: Thomas Huth <huth@tuxfamily.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
The return address is in target space, so the restorer address needs to
be target space, too.
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
The return address is in target space, so the restorer address needs to
be target space, too.
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Original implementation uses do_rt_sigreturn directly in host space,
when a guest program is in unwind procedure in guest space, it will get
an incorrect restore address, then causes unwind failure.
Also cleanup the original incorrect indentation.
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
The #defines of ARM_cpsr and friends in linux-user/arm/target-syscall.h
can clash with versions in the system headers if building on an
ARM or AArch64 build (though this seems to be dependent on the version
of the system headers). The QEMU defines are not very useful (it's
not clear that they're intended for use with the target_pt_regs struct
rather than (say) the CPUARMState structure) and we only use them in one
function in elfload.c anyway. So just remove the #defines and directly
access regs->uregs[].
Reported-by: Christopher Covington <cov@codeaurora.org>
Tested-by: Christopher Covington <cov@codeaurora.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
On Linux the setuid(), setgid(), etc system calls have different semantics
from the libc functions. The libc functions follow POSIX and update the
credentials for all threads in the process; the system calls update only
the thread which makes the call. (This impedance mismatch is worked around
in libc by signalling all threads to tell them to do a syscall, in a
byzantine and fragile way; see http://ewontfix.com/17/.)
Since in linux-user we are trying to emulate the system call semantics,
we must implement all these syscalls to directly call the underlying
host syscall, rather than calling the host libc function.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
The 64-bit x86 syscall ABI uses 32-bit UIDs; only define
USE_UID16 for 32-bit x86.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
In do_msgrcv() we want to allocate a message buffer, whose size
is passed to us by the guest. That means we could legitimately
fail, so use g_try_malloc() and handle the error case, in the same
way that do_msgsnd() does.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
The msgrcv ABI is a bit odd -- the msgsz argument is a size_t, which is
unsigned, but it must fail EINVAL if the value is negative when cast
to a long. We were incorrectly passing the value through an
"unsigned int", which meant that if the guest was 32-bit longs and
the host was 64-bit longs an input of 0xffffffff (which should trigger
EINVAL) would simply be passed to the host msgrcv() as 0xffffffff,
where it does not cause the host kernel to reject it.
Follow the same approach as do_msgsnd() in using a ssize_t and
doing the check for negative values by hand, so we correctly fail
in this corner case.
This fixes the msgrcv03 Linux Test Project test case, which otherwise
hangs.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
In a struct timespec, both fields are signed longs. Converting
them from guest to host with code like
host_ts->tv_sec = tswapal(target_ts->tv_sec);
mishandles negative values if the guest has 32-bit longs and
the host has 64-bit longs because tswapal()'s return type is
abi_ulong: the assignment will zero-extend into the host long
type rather than sign-extending it.
Make the conversion routines use __get_user() and __set_user()
instead: this automatically picks up the signedness of the
field type and does the correct kind of sign or zero extension.
It also handles the possibility that the target struct is not
sufficiently aligned for the host's requirements.
In particular, this fixes a hang when running the Linux Test Project
mq_timedsend01 and mq_timedreceive01 tests: one of the test cases
sets the timeout to -1 and expects an EINVAL failure, but we were
setting a very long timeout instead.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>