Commit Graph

1065 Commits

Author SHA1 Message Date
Helge Deller
b8002058c4 linux-user: Fix openat() emulation to correctly detect accesses to /proc
In qemu we catch accesses to files like /proc/cpuinfo or /proc/net/route
and return to the guest contents which would be visible on a real system
(instead what the host would show).

This patch fixes a bug, where for example the accesses
    cat /proc////cpuinfo
or
    cd /proc && cat cpuinfo
will not be recognized by qemu and where qemu will wrongly show
the contents of the host's /proc/cpuinfo file.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20230803214450.647040-2-deller@gmx.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-08-09 09:31:30 -07:00
Richard Henderson
a05cee93f4 linux-user: Use ARRAY_SIZE with bitmask_transtbl
Rather than using a zero tuple to end the table, use a macro
to apply ARRAY_SIZE and pass that on to the convert functions.

This fixes two bugs in which the conversion functions required
that both the target and host masks be non-zero in order to
continue, rather than require both target and host masks be
zero in order to terminate.

This affected mmap_flags_tbl when the host does not support
all of the flags we wish to convert (e.g. MAP_UNINITIALIZED).
Mapping these flags to zero is good enough, and matches how
the kernel ignores bits that are unknown.

Fixes: 4b840f96 ("linux-user: Populate more bits in mmap_flags_tbl")
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-08-09 07:17:42 -07:00
Richard Henderson
9ab8d07149 linux-user: Split out do_mmap
New function that rejects unsupported map types and flags.
In 4b840f96 we should not have accepted MAP_SHARED_VALIDATE
without actually validating the rest of the flags.

Fixes: 4b840f96 ("linux-user: Populate more bits in mmap_flags_tbl")
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-08-09 07:17:04 -07:00
Richard Henderson
3ce3dd8ca9 util/selfmap: Rewrite using qemu/interval-tree.h
We will want to be able to search the set of mappings.
For this patch, the two users iterate the tree in order.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-08-08 13:27:17 -07:00
Akihiko Odaki
2aea137a42 linux-user: Do not align brk with host page size
do_brk() minimizes calls into target_mmap() by aligning the address
with host page size, which is potentially larger than the target page
size. However, the current implementation of this optimization has two
bugs:

- The start of brk is rounded up with the host page size while brk
  advertises an address aligned with the target page size as the
  beginning of brk. This makes the beginning of brk unmapped.
- Content clearing after mapping is flawed. The size to clear is
  specified as HOST_PAGE_ALIGN(brk_page) - brk_page, but brk_page is
  aligned with the host page size so it is always zero.

This optimization actually has no practical benefit. It makes difference
when brk() is called multiple times with values in a range of the host
page size. However, sophisticated memory allocators try to avoid to
make such frequent brk() calls. For example, glibc 2.37 calls brk() to
shrink the heap only when there is a room more than 128 KiB. It is
rare to have a page size larger than 128 KiB if it happens.

Let's remove the optimization to fix the bugs and make the code simpler.

Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1616
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20230802071754.14876-7-akihiko.odaki@daynix.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-08-06 16:46:03 -07:00
Akihiko Odaki
cb9d5d1fda linux-user: Do nothing if too small brk is specified
Linux 6.4.7 does nothing when a value smaller than the initial brk is
specified.

Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20230802071754.14876-6-akihiko.odaki@daynix.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-08-06 16:45:03 -07:00
Akihiko Odaki
e69e032d1a linux-user: Use MAP_FIXED_NOREPLACE for do_brk()
MAP_FIXED_NOREPLACE can ensure the mapped address is fixed without
concerning that the new mapping overwrites something else.

Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20230802071754.14876-5-akihiko.odaki@daynix.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-08-06 16:44:52 -07:00
Akihiko Odaki
c6cc059eca linux-user: Do not call get_errno() in do_brk()
Later the returned value is compared with -1, and negated errno is not
expected.

Fixes: 00faf08c95 ("linux-user: Don't use MAP_FIXED in do_brk()")
Reviewed-by: Helge Deller <deller@gmx.de>
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Message-Id: <20230802071754.14876-4-akihiko.odaki@daynix.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-08-06 16:39:00 -07:00
Helge Deller
eac78a4b0b linux-user: Fix signed math overflow in brk() syscall
Fix the math overflow when calculating the new_malloc_size.

new_host_brk_page and brk_page are unsigned integers. If userspace
reduces the heap, new_host_brk_page is lower than brk_page which results
in a huge positive number (but should actually be negative).

Fix it by adding a proper check and as such make the code more readable.

Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: "Markus F.X.J. Oberhumer" <markus@oberhumer.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
Cc: qemu-stable@nongnu.org
Buglink: https://github.com/upx/upx/issues/683
2023-07-18 20:42:05 +02:00
Helge Deller
dfe49864af linux-user: Prohibit brk() to to shrink below initial heap address
Since commit 86f04735ac ("linux-user: Fix brk() to release pages") it's
possible for userspace applications to reduce their memory footprint by
calling brk() with a lower address and free up memory. Before that commit
guest heap memory was never unmapped.

But the Linux kernel prohibits to reduce brk() below the initial memory
address which is set at startup by the set_brk() function in binfmt_elf.c.
Such a range check was missed in commit 86f04735ac.

This patch adds the missing check by storing the initial brk value in
initial_target_brk and verify any new brk addresses against that value.

Tested with the i386 upx binary from
https://github.com/upx/upx/releases/download/v4.0.2/upx-4.0.2-i386_linux.tar.xz

Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: "Markus F.X.J. Oberhumer" <markus@oberhumer.com>
Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
Cc: qemu-stable@nongnu.org
Buglink: https://github.com/upx/upx/issues/683
2023-07-18 20:42:05 +02:00
Helge Deller
15ad98536a linux-user: Fix qemu brk() to not zero bytes on current page
The qemu brk() implementation is too aggressive and cleans remaining bytes
on the current page above the last brk address.

But some existing applications are buggy and read/write bytes above their
current heap address. On a phyiscal machine this does not trigger a
runtime error as long as the access happens on the same page. Additionally
the Linux kernel allocates only full pages and does no zeroing on already
allocated pages, even if the brk address is lowered.

Fix qemu to behave the same way as the kernel does. Do not touch already
allocated pages, and - when running with different page sizes of guest and
host - zero out only those memory areas where the host page size is bigger
than the guest page size.

Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: "Markus F.X.J. Oberhumer" <markus@oberhumer.com>
Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
Cc: qemu-stable@nongnu.org
Buglink: https://github.com/upx/upx/issues/683
2023-07-18 20:42:05 +02:00
Peter Maydell
aab746106c linux-user: Remove pointless NULL check in clock_adjtime handling
In the code for TARGET_NR_clock_adjtime, we set the pointer phtx to
the address of the local variable htx.  This means it can never be
NULL, but later in the code we check it for NULL anyway.  Coverity
complains about this (CID 1507683) because the NULL check comes after
a call to clock_adjtime() that assumes it is non-NULL.

Since phtx is always &htx, and is used only in three places, it's not
really necessary.  Remove it, bringing the code structure in to line
with that for TARGET_NR_clock_adjtime64, which already uses a simple
'&htx' when it wants a pointer to 'htx'.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20230623144410.1837261-1-peter.maydell@linaro.org
2023-07-17 11:05:07 +01:00
Juan Quintela
ac42f44310 linux-user: Drop uint and ulong
These are types not used anymore anywhere else.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-id: <20230511085056.13809-1-quintela@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:33 +01:00
Richard Henderson
bef6f008b9 accel/tcg: Return bool from page_check_range
Replace the 0/-1 result with true/false.
Invert the sense of the test of all callers.
Document the function.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230707204054.8792-25-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
55baec0f4c linux-user: Widen target_mmap offset argument to off_t
We build with _FILE_OFFSET_BITS=64, so off_t = off64_t = uint64_t.
With an extra cast, this fixes emulation of mmap2, which could
overflow the computation of the full value of offset.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230707204054.8792-14-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
4b840f9609 linux-user: Populate more bits in mmap_flags_tbl
Fix translation of TARGET_MAP_SHARED and TARGET_MAP_PRIVATE,
which are types not single bits.  Add TARGET_MAP_SHARED_VALIDATE,
TARGET_MAP_SYNC, TARGET_MAP_NONBLOCK, TARGET_MAP_POPULATE,
TARGET_MAP_FIXED_NOREPLACE, and TARGET_MAP_UNINITIALIZED.

Update strace to match.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230707204054.8792-9-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Andreas Schwab
d28b3c90cf linux-user: Make sure initial brk(0) is page-aligned
Fixes: 86f04735ac ("linux-user: Fix brk() to release pages")
Signed-off-by: Andreas Schwab <schwab@suse.de>
Message-Id: <mvmpm55qnno.fsf@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Richard Henderson
9b61f77f40 linux-user: Fix do_shmat type errors
The guest address, raddr, should be unsigned, aka abi_ulong.
The host addresses should be cast via *intptr_t not long.
Drop the inline and fix two other whitespace issues.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Anton Johansson <anjo@rev.ng>
Message-Id: <20230626140250.69572-1-richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Pierrick Bouvier
7a8d9f3a0e linux-user/syscall: Implement execve without execveat
Support for execveat syscall was implemented in 55bbe4 and is available
since QEMU 8.0.0. It relies on host execveat, which is widely available
on most of Linux kernels today.

However, this change breaks qemu-user self emulation, if "host" qemu
version is less than 8.0.0. Indeed, it does not implement yet execveat.
This strange use case happens with most of distribution today having
binfmt support.

With a concrete failing example:
$ qemu-x86_64-7.2 qemu-x86_64-8.0 /bin/bash -c /bin/ls
/bin/bash: line 1: /bin/ls: Function not implemented
-> not implemented means execve returned ENOSYS

qemu-user-static 7.2 and 8.0 can be conveniently grabbed from debian
packages qemu-user-static* [1].

One usage of this is running wine-arm64 from linux-x64 (details [2]).
This is by updating qemu embedded in docker image that we ran into this
issue.

The solution to update host qemu is not always possible. Either it's
complicated or ask you to recompile it, or simply is not accessible
(GitLab CI, GitHub Actions). Thus, it could be worth to implement execve
without relying on execveat, which is the goal of this patch.

This patch was tested with example presented in this commit message.

[1] http://ftp.us.debian.org/debian/pool/main/q/qemu/
[1] https://www.linaro.org/blog/emulate-windows-on-arm/

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Message-Id: <20230705121023.973284-1-pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-15 08:02:32 +01:00
Robbin Ehn
9e1c7d982d linux-user/riscv: Add syscall riscv_hwprobe
This patch adds the new syscall for the
"RISC-V Hardware Probing Interface"
(https://docs.kernel.org/riscv/hwprobe.html).

Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Robbin Ehn <rehn@rivosinc.com>
Message-Id: <06a4543df2aa6101ca9a48f21a3198064b4f1f87.camel@rivosinc.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-07-10 22:29:15 +10:00
Helge Deller
dca4c8384d linux-user: Fix accept4(SOCK_NONBLOCK) syscall
The Linux accept4() syscall allows two flags only: SOCK_NONBLOCK and
SOCK_CLOEXEC, and returns -EINVAL if any other bits have been set.

Change the qemu implementation accordingly, which means we can not use
the fcntl_flags_tbl[] translation table which allows too many other
values.

Beside the correction in behaviour, this actually fixes the accept4()
emulation for hppa, mips and alpha targets for which SOCK_NONBLOCK is
different than TARGET_SOCK_NONBLOCK (aka O_NONBLOCK).

The fix can be verified with the testcase of the debian lwt package,
which hangs forever in a read() syscall without this patch.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-08 16:55:08 +02:00
Helge Deller
e0ddf8eac9 linux-user: Fix fcntl() and fcntl64() to return O_LARGEFILE for 32-bit targets
When running a 32-bit guest on a 64-bit host, fcntl[64](F_GETFL) should
return with the TARGET_O_LARGEFILE flag set, because all 64-bit hosts
support large files unconditionally.

But on 64-bit hosts, O_LARGEFILE has the value 0, so the flag
translation can't be done with the fcntl_flags_tbl[]. Instead add the
TARGET_O_LARGEFILE flag afterwards.

Note that for 64-bit guests the compiler will optimize away this code,
since TARGET_O_LARGEFILE is zero.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2023-07-08 16:55:08 +02:00
Ilya Leoshkevich
77ae5761f3 linux-user: Emulate /proc/self/smaps
/proc/self/smaps is an extension of /proc/self/maps: it provides the
same lines, plus additional information about each range.

GDB uses /proc/self/smaps when available, which means that
generate-core-file tries it first before falling back to
/proc/self/maps. This, in turn, causes it to dump the host mappings,
since /proc/self/smaps is not emulated and is just passed through.

Fix by emulating /proc/self/smaps. Provide true values only for
Size, KernelPageSize, MMUPageSize and VmFlags. Leave all other values
at 0, which is a valid conservative estimate.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230621203627.1808446-4-iii@linux.ibm.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-34-alex.bennee@linaro.org>
2023-07-03 12:52:34 +01:00
Ilya Leoshkevich
35be898e2f linux-user: Add "safe" parameter to do_guest_openat()
gdbstub cannot meaningfully handle QEMU_ERESTARTSYS, and it doesn't
need to. Add a parameter to do_guest_openat() that makes it use
openat() instead of safe_openat(), so that it becomes usable from
gdbstub.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230621203627.1808446-3-iii@linux.ibm.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-33-alex.bennee@linaro.org>
2023-07-03 12:52:34 +01:00
Ilya Leoshkevich
a4dab0a0d3 linux-user: Expose do_guest_openat() and do_guest_readlink()
These functions will be required by the GDB stub in order to provide
the guest view of /proc to GDB.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230621203627.1808446-2-iii@linux.ibm.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230630180423.558337-32-alex.bennee@linaro.org>
2023-07-03 12:52:34 +01:00
Peter Maydell
8fbf89a966 linux-user: Return EINVAL for getgroups() with negative gidsetsize
Coverity doesn't like the way we might end up calling getgroups()
with a NULL grouplist pointer. This is fine for the special case
of gidsetsize == 0, but we will also do it if the guest passes
us a negative gidsetsize. (CID 1512465)

Explicitly fail the negative gidsetsize with EINVAL, as the kernel
does. This means we definitely only call the libc getgroups()
with valid parameters. It also brings the getgroups() code in
to line with the setgroups() code.

Possibly Coverity may still complain about getgroups(0, NULL), but
that would be a false positive.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-06-10 00:00:24 +03:00
Michael Tokarev
725160fe56 linux-user: add comments for TARGET_NR_[gs]etgroups{,32}
There are 2 pairs of identical code (with different types)
for TARGET_NR_setgroups & TARGET_NR_setgroups32, and
for TARGET_NR_getgroups & TARGET_NR_getgroups32.  Add
comments stating this fact, so that further modifications
are done in two places.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-06-09 23:59:11 +03:00
Ilya Leoshkevich
1fb9bdaf59 linux-user: Emulate /proc/cpuinfo on s390x
Some s390x userspace programs are confused when seeing a foreign
/proc/cpuinfo [1]. Add the emulation for s390x; follow the respective
kernel code structure where possible.

Output example:

	vendor_id       : IBM/S390
	# processors    : 12
	bogomips per cpu: 13370.00
	max thread id   : 0
	features	: esan3 zarch stfle msa
	facilities      : 0 1 2 3 4 7 9 16 17 18 19 21 22 24 25 27 30 31 32 33 34 35 37 40 41 45 49 51 52 53 57 58 61 69 71 72 75 76 77 129 130 131 135 138 146 148
	processor 0: version = 00,  identification = 000000,  machine = 8561
	processor 1: version = 00,  identification = 100000,  machine = 8561
	[...]

	cpu number      : 0
	version         : 00
	identification  : 000000
	machine         : 8561

	cpu number      : 1
	version         : 00
	identification  : 100000
	machine         : 8561
	[...]

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2211472

Reported-by: Tulio Magno Quites Machado Filho <tuliom@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-Id: <20230605113950.1169228-5-iii@linux.ibm.com>
Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-06-05 20:48:34 +02:00
Michael Tokarev
1e35d32789 linux-user: fix getgroups/setgroups allocations
linux-user getgroups(), setgroups(), getgroups32() and setgroups32()
used alloca() to allocate grouplist arrays, with unchecked gidsetsize
coming from the "guest".  With NGROUPS_MAX being 65536 (linux, and it
is common for an application to allocate NGROUPS_MAX for getgroups()),
this means a typical allocation is half the megabyte on the stack.
Which just overflows stack, which leads to immediate SIGSEGV in actual
system getgroups() implementation.

An example of such issue is aptitude, eg
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=811087#72

Cap gidsetsize to NGROUPS_MAX (return EINVAL if it is larger than that),
and use heap allocation for grouplist instead of alloca().  While at it,
fix coding style and make all 4 implementations identical.

Try to not impose random limits - for example, allow gidsetsize to be
negative for getgroups() - just do not allocate negative-sized grouplist
in this case but still do actual getgroups() call.  But do not allow
negative gidsetsize for setgroups() since its argument is unsigned.

Capping by NGROUPS_MAX seems a bit arbitrary, - we can do more, it is
not an error if set size will be NGROUPS_MAX+1. But we should not allow
integer overflow for the array being allocated. Maybe it is enough to
just call g_try_new() and return ENOMEM if it fails.

Maybe there's also no need to convert setgroups() since this one is
usually smaller and known beforehand (KERN_NGROUPS_MAX is actually 63, -
this is apparently a kernel-imposed limit for runtime group set).

The patch fixes aptitude segfault mentioned above.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Message-Id: <20230409105327.1273372-1-mjt@msgid.tls.msk.ru>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-05-17 07:20:29 +02:00
Thomas Weißschuh
f443a26cc6 linux-user: Don't require PROT_READ for mincore
The kernel does not require PROT_READ for addresses passed to mincore.
For example the fincore(1) tool from util-linux uses PROT_NONE and
currently does not work under qemu-user.

Example (with fincore(1) from util-linux 2.38):

$ fincore /proc/self/exe
RES PAGES  SIZE FILE
24K     6 22.1K /proc/self/exe

$ qemu-x86_64 /usr/bin/fincore /proc/self/exe
fincore: failed to do mincore: /proc/self/exe: Cannot allocate memory

With this patch:

$ ./build/qemu-x86_64 /usr/bin/fincore /proc/self/exe
RES PAGES  SIZE FILE
24K     6 22.1K /proc/self/exe

Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20230422100314.1650-3-thomas@t-8ch.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-05-17 07:20:29 +02:00
Thomas Weißschuh
7f696cddd9 linux-user: Add open_tree() syscall
Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20230424153429.276788-2-thomas@t-8ch.de>
[lv: move declaration at the beginning of the block,
     define syscall]
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-05-17 07:20:29 +02:00
Thomas Weißschuh
4b2d2753e8 linux-user: Add move_mount() syscall
Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
[lv: define syscall]
Message-Id: <20230424153429.276788-1-thomas@t-8ch.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-05-17 07:20:29 +02:00
Thomas Weißschuh
59d1172776 linux-user: report ENOTTY for unknown ioctls
The correct error number for unknown ioctls is ENOTTY.

ENOSYS would mean that the ioctl() syscall itself is not implemented,
which is very improbable and unexpected for userspace.

ENOTTY means "Inappropriate ioctl for device". This is what the kernel
returns on unknown ioctls, what qemu is trying to express and what
userspace is prepared to handle.

Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230426070659.80649-1-thomas@t-8ch.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-05-17 07:20:29 +02:00
Afonso Bordado
8ddc171b7b linux-user: Emulate /proc/cpuinfo output for riscv
RISC-V does not expose all extensions via hwcaps, thus some userspace
applications may want to query these via /proc/cpuinfo.

Currently when querying this file the host's file is shown instead
which is slightly confusing. Emulate a basic /proc/cpuinfo file
with mmu info and an ISA string.

Signed-off-by: Afonso Bordado <afonsobordado@gmail.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Message-Id: <167873059442.9885.15152085316575248452-0@git.sr.ht>
[lv: removed the test that fails in CI for unknown reason]
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-05-17 07:19:47 +02:00
Richard Henderson
49840a4a09 accel/tcg: Pass last not end to page_set_flags
Pass the address of the last byte to be changed, rather than
the first address past the last byte.  This avoids overflow
when the last page of the address space is involved.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1528
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-03-28 15:23:10 -07:00
Richard Henderson
720ace24ae *: Add missing includes of qemu/plugin.h
This had been pulled in from hw/core/cpu.h,
but that will be removed.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230310195252.210956-6-richard.henderson@linaro.org>
[AJB: also syscall-trace.h]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20230315174331.2959-16-alex.bennee@linaro.org>
Reviewed-by: Emilio Cota <cota@braap.org>
2023-03-22 15:06:57 +00:00
Helge Deller
895ce8bb53 linux-user: Emulate CLONE_PIDFD flag in clone()
Add emulation for the CLONE_PIDFD flag of the clone() syscall.
This flag was added in Linux kernel 5.2.

Successfully tested on a x86-64 Linux host with hppa-linux target.
Can be verified by running the testsuite of the qcoro debian package,
which breaks hard and kills the currently logged-in user without this
patch.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>

Message-Id: <Y4XoJCpvUA1JD7Sj@p100>
[lv: define CLONE_PIDFD if it is not]
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-10 20:45:47 +01:00
Helge Deller
fe080593dd linux-user: Add translation for argument of msync()
msync() uses the flags MS_ASYNC, MS_INVALIDATE and MS_SYNC, which differ
between platforms, specifcally on alpha and hppa.

Add a target to host translation for those and wire up a nicer strace
output.

This fixes the testsuite of the macaulay2 debian package with a hppa-linux
guest on a x86-64 host.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>

Message-Id: <Y5rMcts4qe15RaVN@p100>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-10 20:45:47 +01:00
Mathis Marion
44cf6731d6 linux-user: fix sockaddr_in6 endianness
The sin6_scope_id field uses the host byte order, so there is a
conversion to be made when host and target endianness differ.

Signed-off-by: Mathis Marion <mathis.marion@silabs.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20230307154256.101528-2-Mathis.Marion@silabs.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-10 20:45:47 +01:00
Helge Deller
86f04735ac linux-user: Fix brk() to release pages
The current brk() implementation does not de-allocate pages if a lower
address is given compared to earlier brk() calls.
But according to the manpage, brk() shall deallocate memory in this case
and currently it breaks a real-world application, specifically building
the debian gcl package in qemu-user.

Fix this issue by reworking the qemu brk() implementation.

Tested with the C-code testcase included in qemu commit 4d1de87c75, and
by building debian package of gcl in a hppa-linux guest on a x86-64
host.

Signed-off-by: Helge Deller <deller@gmx.de>
Message-Id: <Y6gId80ek49TK1xB@p100>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-10 20:42:00 +01:00
Andreas Schwab
25bb27c715 linux-user: fill out task state in /proc/self/stat
Some programs want to match an actual task state character.

Signed-off-by: Andreas Schwab <schwab@suse.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <mvmedq2kxoe.fsf@suse.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-10 20:42:00 +01:00
Ilya Leoshkevich
9c1da8b5ee linux-user: Fix unaligned memory access in prlimit64 syscall
target_rlimit64 contains uint64_t fields, so it's 8-byte aligned on
some hosts, while some guests may align their respective type on a
4-byte boundary. This may lead to an unaligned access, which is an UB.

Fix by defining the fields as abi_ullong. This makes the host alignment
match that of the guest, and lets the compiler know that it should emit
code that can deal with the guest alignment.

While at it, also use __get_user() and __put_user() instead of
tswap64().

Fixes: 163a05a839 ("linux-user: Implement prlimit64 syscall")
Reported-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20230224003907.263914-2-iii@linux.ibm.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-10 20:42:00 +01:00
Mathis Marion
d759a62b12 linux-user: fix timerfd read endianness conversion
When reading the expiration count from a timerfd, the endianness of the
64bit value read is the one of the host, just as for eventfds.

Signed-off-by: Mathis Marion <mathis.marion@silabs.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20230220085822.626798-2-Mathis.Marion@silabs.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-10 20:42:00 +01:00
Helge Deller
258bec39f3 linux-user: Fix access to /proc/self/exe
When accsssing /proc/self/exe from a userspace program, linux-user tries
to resolve the name via realpath(), which may fail if the process
changed the working directory in the meantime.

An example:
- a userspace program ist started with ./testprogram
- the program runs chdir("/tmp")
- then the program calls readlink("/proc/self/exe")
- linux-user tries to run realpath("./testprogram") which fails
  because ./testprogram isn't in /tmp
- readlink() will return -ENOENT back to the program

Avoid this issue by resolving the full path name of the started process
at startup of linux-user and store it in real_exec_path[]. This then
simplifies the emulation of readlink() and readlinkat() as well, because
they can simply copy the path string to userspace.

I noticed this bug because the testsuite of the debian package "pandoc"
failed on linux-user while it succeeded on real hardware.  The full log
is here:
https://buildd.debian.org/status/fetch.php?pkg=pandoc&arch=hppa&ver=2.17.1.1-1.1%2Bb1&stamp=1670153210&raw=0

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20221205113825.20615-1-deller@gmx.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-03-10 20:41:30 +01:00
Ilya Leoshkevich
7de0816f69 linux-user: Always exit from exclusive state in fork_end()
fork()ed processes currently start with
current_cpu->in_exclusive_context set, which is, strictly speaking, not
correct, but does not cause problems (even assertion failures).

With one of the next patches, the code begins to rely on this value, so
fix it by always calling end_exclusive() in fork_end().

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-Id: <20230214140829.45392-2-iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-02-21 08:44:13 -10:00
Helge Deller
3f0744f98b linux-user: Allow sendmsg() without IOV
Applications do call sendmsg() without any IOV, e.g.:
 sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=NULL, msg_iovlen=0,
            msg_control=[{cmsg_len=36, cmsg_level=SOL_ALG, cmsg_type=0x2}],
            msg_controllen=40, msg_flags=0}, MSG_MORE) = 0
 sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="The quick brown fox jumps over t"..., iov_len=183}],
            msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_ALG, cmsg_type=0x3}],
            msg_controllen=24, msg_flags=0}, 0) = 183

The function do_sendrecvmsg_locked() is used for sndmsg() and recvmsg()
and calls lock_iovec() to lock the IOV into memory. For the first
sendmsg() above it returns NULL and thus wrongly skips the call the host
sendmsg() syscall, which will break the calling application.

Fix this issue by:
- allowing sendmsg() even with empty IOV
- skip recvmsg() if IOV is NULL
- skip both if the return code of do_sendrecvmsg_locked() != 0, which
  indicates some failure like EFAULT on the IOV

Tested with the debian "ell" package with hppa guest on x86_64 host.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20221212173416.90590-2-deller@gmx.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03 22:55:12 +01:00
Helge Deller
27404b6c15 linux-user: Implement SOL_ALG encryption support
Add suport to handle SOL_ALG packets via sendmsg() and recvmsg().
This allows emulated userspace to use encryption functionality.

Tested with the debian ell package with hppa guest on x86_64 host.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <20221212173416.90590-1-deller@gmx.de>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03 22:55:12 +01:00
Helge Deller
e0174afeea linux-user: Fix /proc/cpuinfo output for hppa
The hppa architectures provides an own output for the emulated
/proc/cpuinfo file.

Some userspace applications count (even if that's not the recommended
way) the number of lines which start with "processor:" and assume that
this number then reflects the number of online CPUs. Since those 3
architectures don't provide any such line, applications may assume "0"
CPUs.  One such issue can be seen in debian bug report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024653

Avoid such issues by adding a "processor:" line for each of the online
CPUs.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <Y9QvyRSq1I1k5/JW@p100>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03 22:55:12 +01:00
Helge Deller
cb88b7c214 linux-user: Fix SO_ERROR return code of getsockopt()
Add translation for the host error return code of:
    getsockopt(19, SOL_SOCKET, SO_ERROR, [ECONNREFUSED], [4]) = 0

This fixes the testsuite of the cockpit debian package with a
hppa-linux guest on a x86-64 host.

Signed-off-by: Helge Deller <deller@gmx.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Message-Id: <Y9QzNzXg0hrzHQeo@p100>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03 22:55:12 +01:00
Daniel P. Berrangé
6003159ce1 Revert "linux-user: fix compat with glibc >= 2.36 sys/mount.h"
This reverts commit 3cd3df2a95.

glibc has fixed (in 2.36.9000-40-g774058d729) the problem
that caused a clash when both sys/mount.h annd linux/mount.h
are included, and backported this to the 2.36 stable release
too:

  https://sourceware.org/glibc/wiki/Release/2.36#Usage_of_.3Clinux.2Fmount.h.3E_and_.3Csys.2Fmount.h.3E

It is saner for QEMU to remove the workaround it applied for
glibc 2.36 and expect distros to ship the 2.36 maint release
with the fix. This avoids needing to add a further workaround
to QEMU to deal with the fact that linux/brtfs.h now also pulls
in linux/mount.h via linux/fs.h since Linux 6.1

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20230110174901.2580297-3-berrange@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03 22:55:12 +01:00