If a file or device's open function returns ERESTART, respect that --
restart the syscall; don't pretend a signal has been delivered when
it was not. If an SA_RESTART signal was delivered, POSIX does not
allow it to fail with EINTR:
SA_RESTART
This flag affects the behavior of interruptible functions;
that is, those specified to fail with errno set to [EINTR].
If set, and a function specified as interruptible is
interrupted by this signal, the function shall restart and
shall not fail with [EINTR] unless otherwise specified. If
an interruptible function which uses a timeout is restarted,
the duration of the timeout following the restart is set to
an unspecified value that does not exceed the original
timeout value. If the flag is not set, interruptible
functions interrupted by this signal shall fail with errno
set to [EINTR].
https://pubs.opengroup.org/onlinepubs/9699919799/functions/sigaction.html
Nothing in the POSIX definition of open specifies otherwise.
In 1990, Kirk McKusick added these lines with a mysterious commit
message:
Author: Kirk McKusick <mckusick>
Date: Tue Apr 10 19:36:33 1990 -0800
eliminate longjmp from the kernel (for karels)
diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c
index 7bc7b39bbf..d572d3a32d 100644
--- a/sys/kern/vfs_syscalls.c
+++ b/sys/kern/vfs_syscalls.c
@@ -14,7 +14,7 @@
* IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.
*
- * @(#)vfs_syscalls.c 7.42 (Berkeley) 3/26/90
+ * @(#)vfs_syscalls.c 7.43 (Berkeley) 4/10/90
*/
#include "param.h"
@@ -530,8 +530,10 @@ copen(scp, fmode, cmode, ndp, resultfd)
if (error = vn_open(ndp, fmode, (cmode & 07777) &~ S_ISVTX)) {
crfree(fp->f_cred);
fp->f_count--;
- if (error == -1) /* XXX from fdopen */
- return (0); /* XXX from fdopen */
+ if (error == EJUSTRETURN) /* XXX from fdopen */
+ return (0); /* XXX from fdopen */
+ if (error == ERESTART)
+ error = EINTR;
scp->sc_ofile[indx] = NULL;
return (error);
}
(found via this git import of the CSRG history:
cce2869b7a)
This change appears to have served two related purposes:
1. The fdopen function (the erstwhile open routine for /dev/fd/N)
used to return -1 as a hack to mean it had just duplicated the fd;
it was recently changed by Mike Karels, in kern_descrip.c 7.9, to
return EJUSTRETURN, now defined to be -2, presumably to avoid a
conflict with ERESTART, defined to be -1. So this change finished
part of the change by Mike Karels to use a different magic return
code from fdopen.
Of course, today we use still another disgusting hack, EDUPFD, for
the same purpose, so none of this is relevant any more.
2. Prior to April 1990, the kernel handled signals during tsleep(9)
by longjmping out to the system call entry point or similar. In
April 1990, Mike Karels worked to convert all of that into
explicit unwind logic by passing through EINTR or ERESTART as
appropriate, instead of setjmp at each entry point.
However, it's not clear to me why this setjmp/longjmp and
fdopen/-1/EJUSTRETURN renovation justifies unconditional logic to map
ERESTART to EINTR in open(2). I suspect it was a mistake.
In 2013, the corresponding logic to map ERESTART to EINTR in open(2)
was removed from FreeBSD:
r246472 | kib | 2013-02-07 14:53:33 +0000 (Thu, 07 Feb 2013) | 11 lines
Stop translating the ERESTART error from the open(2) into EINTR.
Posix requires that open(2) is restartable for SA_RESTART.
For non-posix objects, in particular, devfs nodes, still disable
automatic restart of the opens. The open call to a driver could have
significant side effects for the hardware.
Noted and reviewed by: jilles
Discussed with: bde
MFC after: 2 weeks
Index: vfs_syscalls.c
===================================================================
--- vfs_syscalls.c (revision 246471)
+++ vfs_syscalls.c (revision 246472)
@@ -1106,8 +1106,6 @@
goto success;
}
- if (error == ERESTART)
- error = EINTR;
goto bad;
}
td->td_dupfd = 0;
https://cgit.freebsd.org/src/commit/sys/kern/vfs_syscalls.c?id=2ca49983425886121b506cb5126b60a705afc38c
It's not clear to me that there's any reason to treat device nodes
specially here; in fact, if a driver's .d_open routine sleeps and is
woken by a concurrent revoke without a signal pending or with an
SA_RESTART signal pending, it is wrong for it to fail with EINTR.
But it MUST restart the whole system call rather than continue
sleeping in a loop or just exit the loop and continue to open,
because it is mandatory in the security model of revoke for open(2)
to retry the permissions check at that point.
PR kern/57260
XXX pullup-8
XXX pullup-9
XXX pullup-10
- Use cv_timedwait() rather than cv_timedwait_sig(); the wait here is
bounded (and fairly short besides) and seems appropriate to treat like
other uninterruptible waits. The behavior is now consistent with com(4)
in this regard.
- Map EWOULDBLOCK return from cv_timedwait() to 0, as the successful passage
of time is not an error in this case.
- If the HUP-wait time has passed, clear the HUP-wait timestamp.
kern/57259 (although insufficient -- another change to vfs_syscalls.c
is required)
The previous variable names V42, V66, iV1 and iV2 didn't carry enough
information to be readily readable, making the test hard to understand.
Rename the variables to be more expressive. While here, properly
explain what happened behind the scenes in 2020 and how the evaluation
of conditions was fixed after discovering the actual cause of the
unexpected error messages.
16 bytes is not enough.
(Is this why it never worked on Xen some years back? Got lucky and
accidentally had 64-byte alignment on native x86, but not in the call
stack in Xen?)
XXX pullup-10
Entropy sources should all have nonempty names, and this will enable
an operator to, for example, disable all but a specific entropy
source.
XXX pullup-10
If the time pointer is null, then write permission
on the file is also sufficient.
From FreeBSD.
Should fix PR kern/57246 "NFS group permissions regression"
fstab that have nothing to do with swapping (fs_type is neither "sw" nor "dp")
before running getfsspecname() on the fs_spec field of the line.
This avoids entries like this:
NAME=OFTEN_UNCONNECTED /local/archived ffs rw,log,noauto 0 0
in fstab from generating spurious error messages when the wedge named
is not currently connected to the system - that is the drive on which the
wedge exists is not connected, or not powered on. "noauto" handles that
for some other uses, the "0"s in fs_freq and fs_passno work for other uses,
but swap{on,ctl} never look at those fields (not for this purpose).
Non "sw"/"dp" lines were being ignored anyway, but not until (a little) later.
When we are triggering a softint, it can't already hold any mutexes.
So any path to mutex_exit(mtx) must go via mutex_enter(mtx), which is
always done with atomic r/m/w, and we need not issue any explicit
barrier between ci->ci_curlwp = softlwp and a potential load of
mtx->mtx_owner in mutex_exit.
PR kern/57240
XXX pullup-8
XXX pullup-9
XXX pullup-10
When we are triggering a softint, it can't already hold any mutexes.
So any path to mutex_exit(mtx) must go via mutex_enter(mtx), which is
always done with atomic r/m/w, and we need not issue any explicit
barrier between ci->ci_curlwp = softlwp and a potential load of
mtx->mtx_owner in mutex_exit.
PR kern/57240
When we are triggering a softint, it can't already hold any mutexes.
So any path to mutex_exit(mtx) must go via mutex_enter(mtx), which is
always done with atomic r/m/w, and we need not issue any explicit
barrier between ci->ci_curlwp = softlwp and a potential load of
mtx->mtx_owner in mutex_exit.
PR kern/57240
XXX pullup-8
XXX pullup-9
XXX pullup-10
When we are triggering a softint, it can't already hold any mutexes.
So any path to mutex_exit(mtx) must go via mutex_enter(mtx), which is
always done with atomic r/m/w, and we need not issue any explicit
barrier between ci->ci_curlwp = softlwp and a potential load of
mtx->mtx_owner in mutex_exit.
PR kern/57240
XXX pullup-8
XXX pullup-9
XXX pullup-10
When we are triggering a softint, it can't already hold any mutexes.
So any path to mutex_exit(mtx) must go via mutex_enter(mtx), which is
always done with atomic r/m/w, and we need not issue any explicit
barrier between ci->ci_curlwp = softlwp and a potential load of
mtx->mtx_owner in mutex_exit.
PR kern/57240
XXX pullup-8
XXX pullup-9
XXX pullup-10
When we are triggering a softint, it can't already hold any mutexes.
So any path to mutex_exit(mtx) must go via mutex_enter(mtx), which is
always done with atomic r/m/w, and we need not issue any explicit
barrier between ci->ci_curlwp = softlwp and a potential load of
mtx->mtx_owner in mutex_exit.
PR kern/57240
XXX pullup-9
XXX pullup-10
This is called with `hardware' interrupts enabled (between sti and
cli), so presumably preemption is possible here.
XXX pullup-8
XXX pullup-9
XXX pullup-10