Commit Graph

1037 Commits

Author SHA1 Message Date
Michael Tokarev
1e35d32789 linux-user: fix getgroups/setgroups allocations
linux-user getgroups(), setgroups(), getgroups32() and setgroups32()
used alloca() to allocate grouplist arrays, with unchecked gidsetsize
coming from the "guest".  With NGROUPS_MAX being 65536 (linux, and it
is common for an application to allocate NGROUPS_MAX for getgroups()),
this means a typical allocation is half the megabyte on the stack.
Which just overflows stack, which leads to immediate SIGSEGV in actual
system getgroups() implementation.

An example of such issue is aptitude, eg
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=811087#72

Cap gidsetsize to NGROUPS_MAX (return EINVAL if it is larger than that),
and use heap allocation for grouplist instead of alloca().  While at it,
fix coding style and make all 4 implementations identical.

Try to not impose random limits - for example, allow gidsetsize to be
negative for getgroups() - just do not allocate negative-sized grouplist
in this case but still do actual getgroups() call.  But do not allow
negative gidsetsize for setgroups() since its argument is unsigned.

Capping by NGROUPS_MAX seems a bit arbitrary, - we can do more, it is
not an error if set size will be NGROUPS_MAX+1. But we should not allow
integer overflow for the array being allocated. Maybe it is enough to
just call g_try_new() and return ENOMEM if it fails.

Maybe there's also no need to convert setgroups() since this one is
usually smaller and known beforehand (KERN_NGROUPS_MAX is actually 63, -
this is apparently a kernel-imposed limit for runtime group set).

The patch fixes aptitude segfault mentioned above.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Message-Id: <20230409105327.1273372-1-mjt@msgid.tls.msk.ru>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-05-17 07:20:29 +02:00
Thomas Weißschuh
f443a26cc6 linux-user: Don't require PROT_READ for mincore
The kernel does not require PROT_READ for addresses passed to mincore.
For example the fincore(1) tool from util-linux uses PROT_NONE and
currently does not work under qemu-user.

Example (with fincore(1) from util-linux 2.38):

$ fincore /proc/self/exe
RES PAGES  SIZE FILE
24K     6 22.1K /proc/self/exe

$ qemu-x86_64 /usr/bin/fincore /proc/self/exe
fincore: failed to do mincore: /proc/self/exe: Cannot allocate memory

With this patch:

$ ./build/qemu-x86_64 /usr/bin/fincore /proc/self/exe
RES PAGES  SIZE FILE
24K     6 22.1K /proc/self/exe

Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20230422100314.1650-3-thomas@t-8ch.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-05-17 07:20:29 +02:00
Thomas Weißschuh
7f696cddd9 linux-user: Add open_tree() syscall
Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20230424153429.276788-2-thomas@t-8ch.de>
[lv: move declaration at the beginning of the block,
     define syscall]
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-05-17 07:20:29 +02:00
Thomas Weißschuh
4b2d2753e8 linux-user: Add move_mount() syscall
Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
[lv: define syscall]
Message-Id: <20230424153429.276788-1-thomas@t-8ch.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-05-17 07:20:29 +02:00
Thomas Weißschuh
59d1172776 linux-user: report ENOTTY for unknown ioctls
The correct error number for unknown ioctls is ENOTTY.

ENOSYS would mean that the ioctl() syscall itself is not implemented,
which is very improbable and unexpected for userspace.

ENOTTY means "Inappropriate ioctl for device". This is what the kernel
returns on unknown ioctls, what qemu is trying to express and what
userspace is prepared to handle.

Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230426070659.80649-1-thomas@t-8ch.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-05-17 07:20:29 +02:00
Afonso Bordado
8ddc171b7b linux-user: Emulate /proc/cpuinfo output for riscv
RISC-V does not expose all extensions via hwcaps, thus some userspace
applications may want to query these via /proc/cpuinfo.

Currently when querying this file the host's file is shown instead
which is slightly confusing. Emulate a basic /proc/cpuinfo file
with mmu info and an ISA string.

Signed-off-by: Afonso Bordado <afonsobordado@gmail.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Message-Id: <167873059442.9885.15152085316575248452-0@git.sr.ht>
[lv: removed the test that fails in CI for unknown reason]
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-05-17 07:19:47 +02:00
Richard Henderson
49840a4a09 accel/tcg: Pass last not end to page_set_flags
Pass the address of the last byte to be changed, rather than
the first address past the last byte.  This avoids overflow
when the last page of the address space is involved.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1528
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-28 15:23:10 -07:00
Richard Henderson
720ace24ae *: Add missing includes of qemu/plugin.h
This had been pulled in from hw/core/cpu.h,
but that will be removed.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230310195252.210956-6-richard.henderson@linaro.org>
[AJB: also syscall-trace.h]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230315174331.2959-16-alex.bennee@linaro.org>
Reviewed-by: Emilio Cota <cota@braap.org>
2023-03-22 15:06:57 +00:00
Helge Deller
895ce8bb53 linux-user: Emulate CLONE_PIDFD flag in clone()
Add emulation for the CLONE_PIDFD flag of the clone() syscall.
This flag was added in Linux kernel 5.2.

Successfully tested on a x86-64 Linux host with hppa-linux target.
Can be verified by running the testsuite of the qcoro debian package,
which breaks hard and kills the currently logged-in user without this
patch.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>

Message-Id: <Y4XoJCpvUA1JD7Sj@p100>
[lv: define CLONE_PIDFD if it is not]
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-10 20:45:47 +01:00
Helge Deller
fe080593dd linux-user: Add translation for argument of msync()
msync() uses the flags MS_ASYNC, MS_INVALIDATE and MS_SYNC, which differ
between platforms, specifcally on alpha and hppa.

Add a target to host translation for those and wire up a nicer strace
output.

This fixes the testsuite of the macaulay2 debian package with a hppa-linux
guest on a x86-64 host.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>

Message-Id: <Y5rMcts4qe15RaVN@p100>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-10 20:45:47 +01:00
Mathis Marion
44cf6731d6 linux-user: fix sockaddr_in6 endianness
The sin6_scope_id field uses the host byte order, so there is a
conversion to be made when host and target endianness differ.

Signed-off-by: Mathis Marion <mathis.marion@silabs.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230307154256.101528-2-Mathis.Marion@silabs.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-10 20:45:47 +01:00
Helge Deller
86f04735ac linux-user: Fix brk() to release pages
The current brk() implementation does not de-allocate pages if a lower
address is given compared to earlier brk() calls.
But according to the manpage, brk() shall deallocate memory in this case
and currently it breaks a real-world application, specifically building
the debian gcl package in qemu-user.

Fix this issue by reworking the qemu brk() implementation.

Tested with the C-code testcase included in qemu commit 4d1de87c75, and
by building debian package of gcl in a hppa-linux guest on a x86-64
host.

Signed-off-by: Helge Deller <deller@gmx.de>
Message-Id: <Y6gId80ek49TK1xB@p100>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-10 20:42:00 +01:00
Andreas Schwab
25bb27c715 linux-user: fill out task state in /proc/self/stat
Some programs want to match an actual task state character.

Signed-off-by: Andreas Schwab <schwab@suse.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <mvmedq2kxoe.fsf@suse.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-10 20:42:00 +01:00
Ilya Leoshkevich
9c1da8b5ee linux-user: Fix unaligned memory access in prlimit64 syscall
target_rlimit64 contains uint64_t fields, so it's 8-byte aligned on
some hosts, while some guests may align their respective type on a
4-byte boundary. This may lead to an unaligned access, which is an UB.

Fix by defining the fields as abi_ullong. This makes the host alignment
match that of the guest, and lets the compiler know that it should emit
code that can deal with the guest alignment.

While at it, also use __get_user() and __put_user() instead of
tswap64().

Fixes: 163a05a839 ("linux-user: Implement prlimit64 syscall")
Reported-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20230224003907.263914-2-iii@linux.ibm.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-10 20:42:00 +01:00
Mathis Marion
d759a62b12 linux-user: fix timerfd read endianness conversion
When reading the expiration count from a timerfd, the endianness of the
64bit value read is the one of the host, just as for eventfds.

Signed-off-by: Mathis Marion <mathis.marion@silabs.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20230220085822.626798-2-Mathis.Marion@silabs.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-10 20:42:00 +01:00
Helge Deller
258bec39f3 linux-user: Fix access to /proc/self/exe
When accsssing /proc/self/exe from a userspace program, linux-user tries
to resolve the name via realpath(), which may fail if the process
changed the working directory in the meantime.

An example:
- a userspace program ist started with ./testprogram
- the program runs chdir("/tmp")
- then the program calls readlink("/proc/self/exe")
- linux-user tries to run realpath("./testprogram") which fails
  because ./testprogram isn't in /tmp
- readlink() will return -ENOENT back to the program

Avoid this issue by resolving the full path name of the started process
at startup of linux-user and store it in real_exec_path[]. This then
simplifies the emulation of readlink() and readlinkat() as well, because
they can simply copy the path string to userspace.

I noticed this bug because the testsuite of the debian package "pandoc"
failed on linux-user while it succeeded on real hardware.  The full log
is here:
https://buildd.debian.org/status/fetch.php?pkg=pandoc&arch=hppa&ver=2.17.1.1-1.1%2Bb1&stamp=1670153210&raw=0

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20221205113825.20615-1-deller@gmx.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-10 20:41:30 +01:00
Ilya Leoshkevich
7de0816f69 linux-user: Always exit from exclusive state in fork_end()
fork()ed processes currently start with
current_cpu->in_exclusive_context set, which is, strictly speaking, not
correct, but does not cause problems (even assertion failures).

With one of the next patches, the code begins to rely on this value, so
fix it by always calling end_exclusive() in fork_end().

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-Id: <20230214140829.45392-2-iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-02-21 08:44:13 -10:00
Helge Deller
3f0744f98b linux-user: Allow sendmsg() without IOV
Applications do call sendmsg() without any IOV, e.g.:
 sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=NULL, msg_iovlen=0,
            msg_control=[{cmsg_len=36, cmsg_level=SOL_ALG, cmsg_type=0x2}],
            msg_controllen=40, msg_flags=0}, MSG_MORE) = 0
 sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="The quick brown fox jumps over t"..., iov_len=183}],
            msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_ALG, cmsg_type=0x3}],
            msg_controllen=24, msg_flags=0}, 0) = 183

The function do_sendrecvmsg_locked() is used for sndmsg() and recvmsg()
and calls lock_iovec() to lock the IOV into memory. For the first
sendmsg() above it returns NULL and thus wrongly skips the call the host
sendmsg() syscall, which will break the calling application.

Fix this issue by:
- allowing sendmsg() even with empty IOV
- skip recvmsg() if IOV is NULL
- skip both if the return code of do_sendrecvmsg_locked() != 0, which
  indicates some failure like EFAULT on the IOV

Tested with the debian "ell" package with hppa guest on x86_64 host.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20221212173416.90590-2-deller@gmx.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03 22:55:12 +01:00
Helge Deller
27404b6c15 linux-user: Implement SOL_ALG encryption support
Add suport to handle SOL_ALG packets via sendmsg() and recvmsg().
This allows emulated userspace to use encryption functionality.

Tested with the debian ell package with hppa guest on x86_64 host.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20221212173416.90590-1-deller@gmx.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03 22:55:12 +01:00
Helge Deller
e0174afeea linux-user: Fix /proc/cpuinfo output for hppa
The hppa architectures provides an own output for the emulated
/proc/cpuinfo file.

Some userspace applications count (even if that's not the recommended
way) the number of lines which start with "processor:" and assume that
this number then reflects the number of online CPUs. Since those 3
architectures don't provide any such line, applications may assume "0"
CPUs.  One such issue can be seen in debian bug report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024653

Avoid such issues by adding a "processor:" line for each of the online
CPUs.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <Y9QvyRSq1I1k5/JW@p100>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03 22:55:12 +01:00
Helge Deller
cb88b7c214 linux-user: Fix SO_ERROR return code of getsockopt()
Add translation for the host error return code of:
    getsockopt(19, SOL_SOCKET, SO_ERROR, [ECONNREFUSED], [4]) = 0

This fixes the testsuite of the cockpit debian package with a
hppa-linux guest on a x86-64 host.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <Y9QzNzXg0hrzHQeo@p100>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03 22:55:12 +01:00
Daniel P. Berrangé
6003159ce1 Revert "linux-user: fix compat with glibc >= 2.36 sys/mount.h"
This reverts commit 3cd3df2a95.

glibc has fixed (in 2.36.9000-40-g774058d729) the problem
that caused a clash when both sys/mount.h annd linux/mount.h
are included, and backported this to the 2.36 stable release
too:

  https://sourceware.org/glibc/wiki/Release/2.36#Usage_of_.3Clinux.2Fmount.h.3E_and_.3Csys.2Fmount.h.3E

It is saner for QEMU to remove the workaround it applied for
glibc 2.36 and expect distros to ship the 2.36 maint release
with the fix. This avoids needing to add a further workaround
to QEMU to deal with the fact that linux/brtfs.h now also pulls
in linux/mount.h via linux/fs.h since Linux 6.1

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230110174901.2580297-3-berrange@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03 22:55:12 +01:00
Daniel P. Berrangé
9f0246539a Revert "linux-user: add more compat ioctl definitions"
This reverts commit c5495f4ecb.

glibc has fixed (in 2.36.9000-40-g774058d729) the problem
that caused a clash when both sys/mount.h annd linux/mount.h
are included, and backported this to the 2.36 stable release
too:

  https://sourceware.org/glibc/wiki/Release/2.36#Usage_of_.3Clinux.2Fmount.h.3E_and_.3Csys.2Fmount.h.3E

It is saner for QEMU to remove the workaround it applied for
glibc 2.36 and expect distros to ship the 2.36 maint release
with the fix. This avoids needing to add a further workaround
to QEMU to deal with the fact that linux/brtfs.h now also pulls
in linux/mount.h via linux/fs.h since Linux 6.1

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230110174901.2580297-2-berrange@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03 22:55:12 +01:00
Richard Henderson
6490d9aa62 linux-user: un-parent OBJECT(cpu) when closing thread
This reinstates commit 52f0c16076:

While forcing the CPU to unrealize by hand does trigger the clean-up
code we never fully free resources because refcount never reaches
zero. This is because QOM automatically added objects without an
explicit parent to /unattached/, incrementing the refcount.

Instead of manually triggering unrealization just unparent the object
and let the device machinery deal with that for us.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/866
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220811151413.3350684-2-alex.bennee@linaro.org>

The original patch tickled a problem in target/arm, and was reverted.
But that problem is fixed as of commit 3b07a936d3.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230124201019.3935934-1-richard.henderson@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03 22:55:12 +01:00
Drew DeVault
55bbe4d5ee linux-user/syscall: Implement execveat()
References: https://gitlab.com/qemu-project/qemu/-/issues/1007
Signed-off-by: Drew DeVault <sir@cmpwn.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20221104081015.706009-1-sir@cmpwn.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20221104173632.1052-6-philmd@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-01-25 10:44:48 +01:00
Drew DeVault
156e1f6718 linux-user/syscall: Extract do_execve() from do_syscall1()
execve() is a particular case of execveat(). In order
to add do_execveat(), first factor do_execve() out.

Signed-off-by: Drew DeVault <sir@cmpwn.com>
Message-Id: <20221104081015.706009-1-sir@cmpwn.com>
[PMD: Split of bigger patch, filled description, fixed style]
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20221104173632.1052-5-philmd@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-01-25 10:44:48 +01:00
Markus Armbruster
3d558330ad Drop more useless casts from void * to pointer
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20221123133811.1398562-1-armbru@redhat.com>
2022-12-14 16:19:35 +01:00
Icenowy Zheng
16c81dd563 linux-user: always translate cmsg when recvmsg
It's possible that a message contains both normal payload and ancillary
data in the same message, and even if no ancillary data is available
this information should be passed to the target, otherwise the target
cmsghdr will be left uninitialized and the target is going to access
uninitialized memory if it expects cmsg.

Always call the function that translate cmsg when recvmsg, because that
function should be empty-cmsg-safe (it creates an empty cmsg in the
target).

Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20221028081220.1604244-1-uwu@icenowy.me>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-11-02 17:29:17 +01:00
Helge Deller
af804f39cc linux-user: Add close_range() syscall
Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <Y1dLJoEDhJ2AAYDn@p100>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-11-02 17:17:07 +01:00
Helge Deller
bd5ccd6108 linux-user: Add guest memory layout to exception dump
When the emulation stops with a hard exception it's very useful for
debugging purposes to dump the current guest memory layout (for an
example see /proc/self/maps) beside the CPU registers.

The open_self_maps() function provides such a memory dump, but since
it's located in the syscall.c file, various changes (add #includes, make
this function externally visible, ...) are needed to be able to call it
from the existing EXCP_DUMP() macro.

This patch takes another approach by re-defining EXCP_DUMP() to call
target_exception_dump(), which is in syscall.c, consolidates the log
print functions and allows to add the call to dump the memory layout.

Beside a reduced code footprint, this approach keeps the changes across
the various callers minimal, and keeps EXCP_DUMP() highlighted as
important macro/function.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <Y1bzAWbw07WBKPxw@p100>
[lv: remove pc declaration and setting]
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-10-25 09:20:40 +02:00
WANG Xuerui
35a2c85f7d linux-user: Implement faccessat2
User space has been preferring this syscall for a while, due to its
closer match with C semantics, and newer platforms such as LoongArch
apparently have libc implementations that don't fallback to faccessat
so normal access checks are failing without the emulation in place.

Tested by successfully emerging several packages within a Gentoo loong
stage3 chroot, emulated on amd64 with help of static qemu-loongarch64.

Reported-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Signed-off-by: WANG Xuerui <xen0n@gentoo.org>
Message-Id: <20221009060813.2289077-1-xen0n@gentoo.org>
[lv: removing defined(__NR_faccessat2) in syscall.c,
     adding defined(TARGET_NR_faccessat2) on print_faccessat()]
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-10-21 17:46:19 +02:00
Daniel P. Berrangé
c5495f4ecb linux-user: add more compat ioctl definitions
GLibc changes prevent us from including linux/fs.h anymore,
and we previously adjusted to this in

  commit 3cd3df2a95
  Author: Daniel P. Berrangé <berrange@redhat.com>
  Date:   Tue Aug 2 12:41:34 2022 -0400

    linux-user: fix compat with glibc >= 2.36 sys/mount.h

That change required adding compat ioctl definitions on the
QEMU side for any ioctls that we would otherwise obtain
from linux/fs.h.  This commit adds more that were initially
missed, due to their usage being conditionalized in QEMU.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20221004093206.652431-2-berrange@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-10-21 17:46:19 +02:00
Laurent Vivier
00ed8a3459 linux-user: don't use AT_EXECFD in do_openat()
AT_EXECFD gives access to the binary file even if
it is not readable (only executable).

Moreover it can be opened with flags and mode that are not the ones
provided by do_openat() caller.

And it is not available because loader_exec() has closed it.

To avoid that, use only safe_openat() with the exec_path.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220927124357.688536-3-laurent@vivier.eu>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-10-21 17:46:19 +02:00
Laurent Vivier
f07eb1c4f8 linux-user: handle /proc/self/exe with execve() syscall
If path is /proc/self/exe, use the executable path
provided by exec_path.

Don't use execfd as it is closed by loader_exec() and otherwise
will survive to the exec() syscall and be usable child process.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220927124357.688536-2-laurent@vivier.eu>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-10-21 17:46:19 +02:00
Laurent Vivier
46187d707e linux-user: fix pidfd_send_signal()
According to pidfd_send_signal(2), info argument can be a NULL pointer.
Fix strace to correctly manage ending comma in parameters.

Fixes: cc054c6f13 ("linux-user: Add pidfd_open(), pidfd_send_signal() and pidfd_getfd() syscalls")
cc: Helge Deller <deller@gmx.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Helge Deller <deller@gmx.de>
Message-Id: <20221005163826.1455313-1-laurent@vivier.eu>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-10-21 17:46:19 +02:00
WANG Xuerui
eeed22916b linux-user: Fix more MIPS n32 syscall ABI issues
In commit 80f0fe3a85 ("linux-user: Fix syscall parameter handling for
MIPS n32") the ABI problem regarding offset64 on MIPS n32 was fixed,
but still some cases remain where the n32 is incorrectly treated as any
other 32-bit ABI that passes 64-bit arguments in pairs of GPRs. Fix by
excluding TARGET_ABI_MIPSN32 from various TARGET_ABI_BITS == 32 checks.

Closes: https://gitlab.com/qemu-project/qemu/-/issues/1238
Signed-off-by: WANG Xuerui <xen0n@gentoo.org>
Cc: Philippe Mathieu-Daudé <f4bug@amsat.org>
Cc: Jiaxun Yang <jiaxun.yang@flygoat.com>
Cc: Andreas K. Hüttel <dilfridge@gentoo.org>
Cc: Joshua Kinard <kumba@gentoo.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Tested-by: Andreas K. Huettel <dilfridge@gentoo.org>
Message-Id: <20221006085500.290341-1-xen0n@gentoo.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-10-21 16:37:36 +02:00
Richard Henderson
c72a90df47 linux-user: Implement PI futexes
Define the missing FUTEX_* constants in syscall_defs.h

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220829021006.67305-6-richard.henderson@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27 13:19:05 +02:00
Richard Henderson
0f94673112 linux-user: Convert signal number for FUTEX_FD
The val argument to FUTEX_FD is a signal number.  Convert to match
the host, as it will be converted back when the signal is delivered.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220829021006.67305-5-richard.henderson@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27 13:19:05 +02:00
Richard Henderson
a6180f8aed linux-user: Implement FUTEX_WAKE_BITSET
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220829021006.67305-4-richard.henderson@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27 13:19:05 +02:00
Richard Henderson
57b9ccd4c0 linux-user: Sink call to do_safe_futex
Leave only the argument adjustments within the shift,
and sink the actual syscall to the end.  Sink the
timespec conversion as well, as there will be more users.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220829021006.67305-3-richard.henderson@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27 13:19:05 +02:00
Richard Henderson
0fbc0f8da1 linux-user: Combine do_futex and do_futex_time64
Pass a boolean to select between time32 and time64.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220829021006.67305-2-richard.henderson@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27 13:19:05 +02:00
Peter Maydell
9e59899f8c linux-user: Don't assume 0 is not a valid host timer_t value
For handling guest POSIX timers, we currently use an array
g_posix_timers[], whose entries are a host timer_t value, or 0 for
"this slot is unused".  When the guest calls the timer_create syscall
we look through the array for a slot containing 0, and use that for
the new timer.

This scheme assumes that host timer_t values can never be zero.  This
is unfortunately not a valid assumption -- for some host libc
versions, timer_t values are simply indexes starting at 0.  When
using this kind of host libc, the effect is that the first and second
timers end up sharing a slot, and so when the guest tries to operate
on the first timer it changes the second timer instead.

Rework the timer allocation code, so that:
 * the 'slot in use' indication uses a separate array from the
   host timer_t array
 * we grab the free slot atomically, to avoid races when multiple
   threads call timer_create simultaneously
 * releasing an allocated slot is abstracted out into a new
   free_host_timer_slot() function called in the correct places

This fixes:
 * problems on hosts where timer_t 0 is valid
 * the FIXME in next_free_host_timer() about locking
 * bugs in the error paths in timer_create where we forgot to release
   the slot we grabbed, or forgot to free the host timer

Reported-by: Jon Alduan <jon.alduan@gmail.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20220725110035.1273441-1-peter.maydell@linaro.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27 13:19:05 +02:00
fanwenjie
9b9145f04d linux-user: fix bug about missing signum convert of sigqueue
Fixes: 66fb9763af ("basic signal handling")
Fixes: cf8b8bfc50 ("linux-user: add support for rt_tgsigqueueinfo() system call")
Signed-off-by: fanwenjie <fanwj@mail.ustc.edu.cn>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27 13:19:05 +02:00
Helge Deller
0a3346b593 linux-user/hppa: Increase guest stack size to 80MB for hppa target
The hppa target requires a much bigger stack than many other targets,
and the Linux kernel allocates 80 MB by default for it.

This patch increases the guest stack for hppa to 80MB, and prevents
that this default stack size gets reduced by a lower stack limit on the
host.

Since the stack grows upwards on hppa, the stack_limit value marks the
upper boundary of the stack. Fix the output of /proc/self/maps (in the
guest) to show the [stack] marker on the correct memory area.

Signed-off-by: Helge Deller <deller@gmx.de>
Message-Id: <20220924114501.21767-6-deller@gmx.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27 09:33:56 +02:00
Helge Deller
cc054c6f13 linux-user: Add pidfd_open(), pidfd_send_signal() and pidfd_getfd() syscalls
I noticed those were missing when running the glib2.0 testsuite.
Add the syscalls including the strace output.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220918194555.83535-4-deller@gmx.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27 09:29:33 +02:00
Jameson Nash
65d4830dac linux-user: fix readlinkat handling with magic exe symlink
Exactly the same as f17f4989fa before was
for readlink. I suppose this was simply missed at the time.

Signed-off-by: Jameson Nash <vtjnash@gmail.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220808190727.875155-1-vtjnash@gmail.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-23 23:43:45 +02:00
Richard Henderson
976a55c0fe Revert "linux-user: un-parent OBJECT(cpu) when closing thread"
This reverts commit 52f0c16076.

This caused a regression in arm/aarch64.

We are hard-coding ARMCPRegInfo pointers into TranslationBlocks,
for calling into helper_{get,set}cp_reg{,64}.  So we have a race
condition between whichever cpu thread translates the code first
(encoding the pointer), and that cpu thread exiting, so that the
next execution of the TB references a freed data structure.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-08-18 18:08:57 -07:00
Alex Bennée
52f0c16076 linux-user: un-parent OBJECT(cpu) when closing thread
While forcing the CPU to unrealize by hand does trigger the clean-up
code we never fully free resources because refcount never reaches
zero. This is because QOM automatically added objects without an
explicit parent to /unattached/, incrementing the refcount.

Instead of manually triggering unrealization just unparent the object
and let the device machinery deal with that for us.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/866
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20220811151413.3350684-2-alex.bennee@linaro.org>
2022-08-16 09:57:07 +01:00
Daniel P. Berrangé
3cd3df2a95 linux-user: fix compat with glibc >= 2.36 sys/mount.h
The latest glibc 2.36 has extended sys/mount.h so that it
defines the FSCONFIG_* enum constants. These are historically
defined in linux/mount.h, and thus if you include both headers
the compiler complains:

In file included from /usr/include/linux/fs.h:19,
                 from ../linux-user/syscall.c:98:
/usr/include/linux/mount.h:95:6: error: redeclaration of 'enum fsconfig_command'
   95 | enum fsconfig_command {
      |      ^~~~~~~~~~~~~~~~
In file included from ../linux-user/syscall.c:31:
/usr/include/sys/mount.h:189:6: note: originally defined here
  189 | enum fsconfig_command
      |      ^~~~~~~~~~~~~~~~
/usr/include/linux/mount.h:96:9: error: redeclaration of enumerator 'FSCONFIG_SET_FLAG'
   96 |         FSCONFIG_SET_FLAG       = 0,    /* Set parameter, supplying no value */
      |         ^~~~~~~~~~~~~~~~~
/usr/include/sys/mount.h:191:3: note: previous definition of 'FSCONFIG_SET_FLAG' with type 'enum fsconfig_command'
  191 |   FSCONFIG_SET_FLAG       = 0,    /* Set parameter, supplying no value */
      |   ^~~~~~~~~~~~~~~~~
...snip...

QEMU doesn't include linux/mount.h, but it does use
linux/fs.h and thus gets linux/mount.h indirectly.

glibc acknowledges this problem but does not appear to
be intending to fix it in the forseeable future, simply
documenting it as a known incompatibility with no
workaround:

  https://sourceware.org/glibc/wiki/Release/2.36#Usage_of_.3Clinux.2Fmount.h.3E_and_.3Csys.2Fmount.h.3E
  https://sourceware.org/glibc/wiki/Synchronizing_Headers

To address this requires either removing use of sys/mount.h
or linux/fs.h, despite QEMU needing declarations from
both.

This patch removes linux/fs.h, meaning we have to define
various FS_IOC constants that are now unavailable.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Message-Id: <20220802164134.1851910-1-berrange@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-08-10 18:37:46 +02:00
Rainer Müller
5b63de6b54 linux-user: Use memfd for open syscall emulation
For certain paths in /proc, the open syscall is intercepted and the
returned file descriptor points to a temporary file with emulated
contents.

If TMPDIR is not accessible or writable for the current user (for
example in a read-only mounted chroot or container) tools such as ps
from procps may fail unexpectedly. Trying to read one of these paths
such as /proc/self/stat would return an error such as ENOENT or EROFS.

To relax the requirement on a writable TMPDIR, use memfd_create()
instead to create an anonymous file and return its file descriptor.

Signed-off-by: Rainer Müller <raimue@codingfarm.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20220729154951.76268-1-raimue@codingfarm.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-08-02 15:44:27 +02:00