Commit Graph

119 Commits

Author SHA1 Message Date
kre f55c8670e1 PR bin/57894
For jobs -p for a non-job-control job, avoid just printing 0 (as
there is no process group pid) and instead output what we used to,
the pid of one of the processes in the job (usually the right one!)

XXX pullup -10 (9 and earlier not affected).
2024-01-30 19:05:07 +00:00
kre 0827e1f954 The great shell trailing whitespace cleanup of 2023...
Inspired by private e-mail comments from mouse@

NFCI.
2023-04-07 10:34:13 +00:00
kre 1c627bdffa PR bin/57053 is related (peripherally) here.
sh has been remembering the process group of a job for a while now, but
using that for almost nothing.

The old way to resume a job, was to try each pid in the job with a
SIGCONT (using it as the process group identifier via killpg()) until
one worked (or none did, in which case resuming would be impossible,
but that never actually happened).   This wasn't as bad as it seems,
as in practice the first process attempted was *always* the correct
one.  Why the loop was considered necessary I am not sure.  Nothing
but the first could possibly work.

This worked until a fix for an obscure possible bug was added a
while ago - now a process which has already finished, and had its
zombie collected via wait*() is no longer ever considered to have
a pid which is a candidate for use in any system call.  That's
because the kernel might have reassigned that pid for some newly
created process (we have no idea how much time might have passed
since the pid was returned to the kernel for reuse, it might have
happened weeks ago).

This is where the example in bin/57053 revealed a problem.

That PR is really about a quite different problem in zsh (from pksrc)
and should be pkg/57053, but as the test case also hit the problem
here, it was assumed (by some) they were the same issue.

The example is (in a small directory)
	ls | less
which is then suspended (^Z), and resumed (fg).   Since the directory
is small, ls will be finished, and reaped by sh - so the code would
now refuse to use its pid for the killpg() call to send the SIGCONT.
The (useless) loop would attempt to use less's pid for this purpose
(it is still alive at this point) but that would fail, as that pid
is not a process group identifier, of anything.   Hence the job
could not be resumed.

Before the PR (or preceding mailing list discussion) the change here
had already been made (part of a much bigger set of changes, some of
which might follow - sometime).   We now actually use the job's
remembered process group identifier when we want the process group
identifier, instead of trying to guess which pid it happens to be
(which actually never took any guessing, it was, and is always the
pid of the first process created for the job).   A couple of minor
fixes to how the pgrp is obtained, and used, accompany the changes
to use it when appropriate.
2022-10-30 01:46:16 +00:00
kre 21f0086877 Introduce a new macro JNUM to replace the idiom jp-jobtab+1
(the job number, given jp a pointer to a jobs table entry)
used open coded previously in many places (mostly in DEBUG mode
trace messages, so not included in most shells, but there are
a few others).

Make the type of JNUM() be int rather than the ptrdiff_t the
open coded version became ... which when used in some printf()
type function arg list was cast to some other arbitrary (but not
consistent) int type for which there is a standard %Xd type
format conversion.   Now we can (and do) just use %d for this.

If the number of jobs ever exceeds the range of an int, we would
have far more serious problems than the broken output this would
cause.

While here improve a comment or two, and use JOBRUNNING instead
of 0 where the intent is the former (JOBRUNNING is #defined as 0).

NFCI.
2022-04-18 06:02:27 +00:00
andvar 92e3dd6443 s/forground/foreground/ in comments. 2021-12-19 21:15:27 +00:00
rillig 4453f5596f sh: remove useless do-while-0 loop
28 years was more than enough for the useless 'continue' statement in
this do-while-0 "loop". Without the 'continue' statement, there is no
need for the "loop" anymore. The comment at its top was confusing since
the word 'while' suggested a loop, but there was none, so remove that as
well.

Pointed out by Tom Ivar Helbekkmo on source-changes-d.

No change to the resulting binary.
2021-10-10 18:46:25 +00:00
rillig 074335e1f1 sh: remove no-op 'continue' from do-while-0 loop
With Clang, the only change to the binary are the line number changes
from __LINE__, GCC generates a bit different code.

No functional change.
2021-10-10 08:35:34 +00:00
kre 777df0a3cd Don't dereference NULL on "jobs -Z" (with no title given), instead
do setproctitle(NULL) (which is not the same thing at all).  Do the
same with jobs -Z '' as setting the title to "sh: " isn't useful.

Improve the way this is documented, and note that it is only done
this way because zsh did it first (ie: pass on the balme, doing this
in the jobs command is simply absurd.)
2021-09-12 01:30:41 +00:00
christos 4e2a477813 Add jobs -Z (like in zsh(1)) to setproctitle(3). 2021-09-11 20:43:32 +00:00
kre 8821fc2c7a Related to PR bin/48875
Correct an issue found by Oguz <oguzismailuysal@gmail.com> and reported
in e-mail (on the bug-bash list initially!) with the code changed to deal
with PR bin/48875

With:

	 sh -c 'echo start at $SECONDS;
			(sleep 3 & (sleep 1& wait) );
		echo end at $SECONDS'

The shell should say "start at 0\nend at 1\n", but instead (before
this fix, in -9 and HEAD, but not -8) does "start at 0\nend at 3\n"
(Not in -8 as the 48875 changes were never pulled up)>

There was an old problem, fixed years ago, which cause the same symptom,
related to the way the jobs table was cleared (or not) in subshells, and
it seemed like that might have resurfaced.

But not so, the issue here is the sub-shell elimination, which was part
of the 48875 "fix" (not really, it wasn't really a bug, just sub-optimal
and unexpected behaviour).

What the shell actually has been running in this case is:

	 sh -c 'echo start at $SECONDS;
			(sleep 3 & sleep 1& wait );
		echo end at $SECONDS'

as the inner subshell was deemed unnecessary - all its parent would
do is wait for its exit status, and then exit with that status - we
may as well simply replace the current sub-shell with the new one,
let it do its thing, and we're done...

But not here, the running "sleep 3" will remain a child of that merged
sub-shell, and the "wait" will thus wait for it, along with the sleep 1
which is all it should be seeing.

For now, fix this by not eliminating a sub-shell if there are existing
unwaited upon children in the current one.  It might be possible to
simply disregard the old child for the purposes of wait (and "jobs", etc,
all cmds which look at the jobs table) but the bookkeeping required to
make that work reliably is likely to take some time to get correct...

Along with this fix comes a fix to DEBUG mode shells, which, in situations
like this, could dump core in the debug code if the relevant tracing was
enabled, and add a new trace for when the jobs table is cleared (which was
added predating the discovery of the actual cause of this issue, but seems
worth keeping.)   Neither of these changes have any effect on shells
compiled normally.

XXX pullup -9
2021-04-04 13:24:07 +00:00
kre 8ad9ebd911 Since "struct job" gained a pgrp member some time ago now, use it
instead of simply assuming that the pid of the first (leftmost) process
in a pipeline is the pgrp - someday we may switch things around and
create pipelines right to left instead, which has several advantages,
but which would invalidate the assumption which was being made here.
2020-08-30 19:45:05 +00:00
kre 7a2f8a050c Add lots of comments explaining what is happening in here.
Also enhance some of the DEBUG mode trace output (nothing visible
in a normal shell build).

A couple of very minor code changes that no-one should ever notice
(eg: one less wait() call in the case that there is nothing pending).
2020-08-20 23:03:17 +00:00
kre 7aa4a7e25f Avoid a core dump if a child process that is not one of our
children happens to exit while we are waiting for another child
to exit.

This can happen with code like

	sh -c '
		sleep 5 &
		exec sh -c "sleep 10 & wait !$"
	      '

when the inner "sh" is waiting for the 10 second sleep to be
done, the 5 second sleep started earlier terminates.   It is
a child of our process, as the inner shell is the same process
as the outer one, but not a known child (the inner shell has no
idea what the outer one did before it started).

This was observed in the wild by Martijn Dekker (where the outer
shell was bash but that's irrelevant).

XXX pullup -9
2020-02-07 02:06:12 +00:00
kre da46072561 Fix a logic botch that prevented "wait -n" (with no pid args) from
finding a job that had previously terminated.

Now in that case JOBWANTED is set on all jobs (since any will do)
which then simplifies a later test which no longer needs to special
case "wait -n".   Further, we always look to see if any wanted
job has already terminated, even if there are still running jobs
we can wait upon - if anything is already ready, that's where we start
harvesting (and finish, if -n is specified).
2019-03-26 13:32:26 +00:00
kre ccf5ffdbe9 In the unlikely event that restarting a job fails (the fg bg and various
%x commands) generate the most useful error message (from errno value)
rather than whichever happened last.

In posix mode, cause the "jobs" command to delete records of completed
jobs it reports on (as posix requires) as is done in interactive shells.
We don't (won't) do this in !posix mode, as the ability to throw in a
"jobs" command in a script to debug what is happening is too useful to
lose -- and any script that is relying on "jobs" instead of "wait" to
cleanup background processes (from the sh jobs table, sh always collects
zombies from the kernel) is absurd and not worth considering (besides
which I've never seen one).
2019-02-09 09:31:33 +00:00
kre e8999de45c INTON / INTOFF audit and cleanup.
No visible differences expected - there is a remote chance that
some internal lossage may no longer occur in interactive shells
that receive SIGINT (untrapped) at inopportune times, but you would
have had to have been very unlucky to have ever suffered from that.
2019-02-09 03:35:55 +00:00
kre 7f63ac72c5 When forking a child shell, arrange for errors/exit to always unwind
to the main handler, rather than wherever the parent shell would go.

nb: not needed for vfork(), after vfork() we never go that path - which
is good or we'd be corrupting the parent's handler.

This allows the child to always exit (when it should) rather than being
caught up doing something else (and while it would eventually exit, the
status would be incorrect in some cases).

One test is:
	sh -c 'trap "(! :) && echo BUG || echo nobug" EXIT'
from Martijn Dekker

Fix from FreeBSD (missed earlier).

XXX - 2b part of the 48875 pullup to -8
2018-12-03 02:38:30 +00:00
kre 185226c2be Use strsignal() rather than direct reference to sys_siglist[]
(apart from being cleaner, it also simplifies the code, as
strsignal() never fails ... it also removes one reference to NSIG).
2018-10-28 18:16:01 +00:00
kre feb6abd7ba A change in rev 1.91 interacted badly with the way that showjobs()
worked, preventing $(jobs) (and more usefully $(jobs -p) from
working.   Fix that.

XXX pullup -8
2018-09-13 22:12:35 +00:00
kre f53fd6e91f Change the way the pipefail option works. Now it is the setting of
the option when a pipeline is created that controls the way the exit
status of the pipeline is calculated.  Previously it was the state of
the option when the exit status of the pipeline was collected.

This makes no difference at all for foreground pipelines (there is
no way to change the option between starting and completing the
pipeline) but it does for asynchronous (background) pipelines.

This was always the right way to implement it - it was originally
done the other way as I could not find any other shell implemented
this way - they all seemed to do it our previous way, and I could
not see a good reason to be the sole different shell.

However, now I know that ksh93 works as we will now work, and I
am told that if the option is added to the FreeBSD shell (apparently
the code exists, uncommitted) it will be the same.
2018-09-04 23:16:30 +00:00
kre ee301070ed PR bin/38004
Save more characters of command in non-interactive jobs, in case of
core dumps and similar (16 effective chars was a few too little).

Arrange for number to increase if command buffer size increases.
2018-09-04 01:09:28 +00:00
kre 13689a6c8a In addition to previous the which fixed a (harmless) MSAN detected ref
of uninit'd field also fix a couple more (still harmless) related
technical C usage bugs.

Explaining why these issues were harmless would take too long to include here.
2017-12-30 23:24:19 +00:00
christos d27db487e5 initialize just used and prev_job 2017-12-30 20:42:28 +00:00
christos e11969ea5d initialize the jobtab; it is easier than putting checks for used everywhere. 2017-12-30 01:21:25 +00:00
kre 4e9fc30dca Add '-n' and '-p var' args to the wait command (-n: wait for any,
-p var: set var to identifier, from arg list, or PID if no job args)
of the job for which status is returned (becomes $? after wait.)

Note: var is unset if the status returned from wait came from wait
itself rather than from some job exiting (so it is now possible to
tell whether 127 means "no such job" or "job did exit(127)", and
whether $? > 128 means "wait was interrupted" or "job was killed
by a signal or did exit(>128)".   ($? is too limited to to allow
indicating whether the job died with a signal, or exited with a
status such that it looks like it did...)
2017-10-28 06:36:17 +00:00
kre 786714f50f Another %zu for size_t (this one in a DEBUG mode trace call, so it
doesn't actually ever bother anyone in practice.)
2017-10-28 04:50:38 +00:00
martin 6df8f46e34 Use %zu for size_t 2017-10-25 08:50:05 +00:00
kre f697d47ee3 Add options to the builtin jobid command to allow discovering the
process group (-g), the process leader pid (-p) ($! if the job was &'d)
and the job identifier (-j) (the %n that refers to the job) in addition to
(default) the list of all pids in the job (which it has always done).
No change to the (single) "job" arg, which is a specifier of the job:
the process leader pid, or one of the % forms, and defaults to %% (aka %+).
(This is all now documented in sh(1))

Also document the jobs command properly (no change to the command, just
document what it actually is.)

And while here, a whole new section in sh(1) "Job Control".  It probably
needs better wording, but this is (perhaps) better than the nothing that
was there before.
2017-10-25 05:42:56 +00:00
kre 9572a5c228 PR bin/52640 PR bin/52641
Don't delete jobs from the jobs table merely because they finished,
if they are not the job we are waiting upon.   (bin/52640 part 1)

In a sub-shell environment, don't allow wait to find jobs from the
parent shell that had already exited (before the sub-shell was
created) and return status for them as if they are our children.
(bin/52640 part 2)

Don't have the "jobs" command also be an implicit "wait" command
in non-interactive shells.  (bin/52641)

Use WCONTINUED (when it exists) so we can report on stopped jobs that
"mysteriously" move back to running state without the user issuing
a "bg" command (eg: kill -CONT <pid>)   Previously they would keep
being reported as stopped until they exited.

When a job is detected as having changed status just as we're
issuing a "jobs" command (i.e.: the change occurred between the last
prompt and the jobs command being entered) don't report it twice,
once from the status change, and then again in the jobs command
output.   Once is enough (keep the jobs output, suppress the other).

Apply some sanity to the way jobs_invalid is processed - ignore it
in getjob() instead of just ignoring it most of the time there, and
instead always check it before calling getjob() in situations where
we can handle only children of the current shell.  This allows the
(totally broken) save/clear/restore of jobs_invalid in jobscmd() to
be done away with (previously an error while in the clear state would
have left jobs_invalid incorrectly cleared - shouldn't have mattered
since jobs_invalid => subshell => error causes exit, but better to be safe).

Add/improve the DEBUG more tracing.

XXX pullup -8
2017-10-23 10:52:07 +00:00
kre bffe519047 Re-factor the code that extracts status from exited jobs, avoiding
code duplication, and reducing the size of /bin/sh by a trivial amount.

NFCI.

This is being done now as there are two other changes forthcoming, both
of which benefit - one would result in even more code duplication without
this, the other might need to alter how this is done, and doing it after this
means there's just one place to change (if required).
2017-10-19 01:57:18 +00:00
kre f07d3f9b6b DEBUG only changes (non-debug, ie: normal, shell unaffected)
Add a little extra info in a few of the trace messages.
2017-09-29 17:53:57 +00:00
kre ed2c7aaa15 Implement the "pipefail" option (same semantics as in other shells)
to cause (when set, which it is not by default) the exit status of a
pipe to be 0 iff all commands in the pipe exited with status 0, and
otherwise, the status of the rightmost command to exit with a non-0
status.

In the doc, while describing this, also reword some of the text about
commands in general, how they are structured, and when they are executed.
2017-07-24 14:17:11 +00:00
kre 16bbc7c6f2 NFC - DEBUG mode only change - convert this to the new TRACE() format. 2017-06-17 12:12:50 +00:00
kre 727a69dc1d A better LINENO implementation. This version deletes (well, #if 0's out)
the LINENO hack, and uses the LINENO var for both ${LINENO} and $((LINENO)).
(Code to invert the LINENO hack when required, like when de-compiling the
execution tree to provide the "jobs" command strings, is still included,
that can be deleted when the LINENO hack is completely removed - look for
refs to VSLINENO throughout the code.  The var funclinno in parser.c can
also be removed, it is used only for the LINENO hack.)

This version produces accurate results: $((LINENO)) was made as accurate
as the LINENO hack made ${LINENO} which is very good.  That's why the
LINENO hack is not yet completely removed, so it can be easily re-enabled.
If you can tell the difference when it is in use, or not in use, then
something has broken (or I managed to miss a case somewhere.)

The way that LINENO works is documented in its own (new) section in the
man page, so nothing more about that, or the new options, etc, here.

This version introduces the possibility of having a "reference" function
associated with a variable, which gets called whenever the value of the
variable is required (that's what implements LINENO).  There is just
one function pointer however, so any particular variable gets at most
one of the set function (as used for PATH, etc) or the reference function.
The VFUNCREF bit in the var flags indicates which func the variable in
question uses (if any - the func ptr, as before, can be NULL).

I would not call the results of this perfect yet, but it is close.
2017-06-07 05:08:32 +00:00
kre 352391ffc1 DEBUG mode only change - correctly track internal shell sub-shell nesting
levels for debug output.  This change accidentally omitted earlier (only
effect is incorrect nesting levels shown in trace output when the option
to show them is enabled.)   NFC for any normal shell build.
2017-05-18 13:34:17 +00:00
kre d7c8afdf82 Avoid truncating the command string saved with background jobs if
one of the words happens to contain ${#var}.  (This is the command
string shown by the "jobs" command, and when a background job completes)

While here, undo the LINENO hack when building that string.
And one ot two other foibles...
2017-05-11 14:57:14 +00:00
kre aa563ca425 If we are going to permit
! ! pipeline
(And for now the other places where ! is permitted)
we should at least generate the logically correct exit
status:
	! ! (exit 5); echo $?
should print 1, not 5.   ksh and bosh do it this way - and it makes sense.
bash and the FreeBSD sh echo "5" (as did we until now.)
dash, zsh, yash all enforce the standard syntax, and prohibit this.
2017-05-09 05:14:03 +00:00
kre 7d41ae4eb6 Implement the ';&' (used instead of ';;') case statement list terminator
which causes fall through the to command list of the following pattern
(wuthout evaluating that pattern).   This has been approved for inclusion
in the next major version of the POSIX standard (Issue 8), and is
implemented by most other shells.

Now all form a circle and together attempt to summon the great wizd
in the hopes that his magic spells can transform the poor attempt
at documenting this feature into something rational...
2017-05-04 04:37:51 +00:00
kre 8eee1dabed Don't forget the ! reserved word exists (node ype NNOT) when displaying
"jobs" output (or other places where the cmd string is shown - like
when reporting status when a background job completes.)

Without this fix, try
	! sleep 5 &
	jobs
	wait
and try not to wonder at the '???" that appears instead of "! sleep 5"
2017-05-03 21:31:03 +00:00
kre 51c4dfe49c Keep track of which file descriptors the shell is using for its
own purposes, and move them elsewhere whenever a user redirection
happens to pick the same number.   With this we can move the shell
file descriptors back to lower values (be slightly kinder to the kernel)
since we can no longer clash.   (Also get rid of a little old unneeded code.)

This also completes the fdflags command, which no longer permits access
to (by way or either obtaining, or changing) the shell's internal fds.
2017-04-29 15:14:28 +00:00
kre 09ecfab926 Slightly improve "jobs" command output in cases where a job includes
embedded background commands or pipelines.   (just slightly...)
OK christos@
2016-05-07 20:07:47 +00:00
kre 0fe4e12852 Unbreak build ... again... gcc is insane. 2016-05-03 23:55:12 +00:00
kre 3c6d76cd74 PR bin/51114 - print the correct values for >&- and >& N (N > 9)
in output from the "jobs" command (and other places that use the
same routines.)
2016-05-03 20:46:35 +00:00
christos 1fad4bb60c Fix handing of user file descriptors outside the 0..9 range.
Also, move (most of) the shell's internal use fd's to much
higher values (depending upon what ulimit -n allows) so they
are less likely to clash with user supplied fd numbers.  A future
patch will (hopefully) avoid this problem completely by dynamically
moving the shell's internal fds around as needed. (From kre@)
2016-05-02 01:46:31 +00:00
christos f95d5940cc report the signal that wait was interrupted by, which is not always SIGINT
anymore.
2015-08-22 12:12:47 +00:00
christos c0195771da Process pending signals while waiting for a job:
$ cat << EOF > hup.sh
    #!/bin/sh
    trap 'echo SIGHUP; exit 1' 1
    sleep 10000 &
    wait
    EOF
    $ chmod +x ./hup.sh
    $ ./hup.sh &
    $ kill -HUP %1
2015-08-22 09:55:23 +00:00
christos 318c2b5cda PR/48729: Torbjörn Granlund: Avoid negative index in array ref. 2014-04-11 01:49:45 +00:00
christos 1468e9a310 explain why forks fail 2014-01-26 22:38:20 +00:00
dsl 658a58d038 Add support for '%n' being a shorthand for 'fg %n'. 2012-12-31 14:10:15 +00:00
joerg a401c50446 Don't use a for-loop with empty body. 2012-02-23 18:23:33 +00:00