xargs: an example for parallel batch jobs

Last modification on 2023-12-17

This describes a simple shellscript programming pattern to process a list of jobs in parallel. This script example is contained in one file.

Simple but less optimal example

#!/bin/sh
maxjobs=4

# fake program for example purposes.
someprogram() {
	echo "Yep yep, I'm totally a real program!"
	sleep "$1"
}

# run(arg1, arg2)
run() {
	echo "[$1] $2 started" >&2
	someprogram "$1" >/dev/null
	status="$?"
	echo "[$1] $2 done" >&2
	return "$status"
}

# process the jobs.
j=1
for f in 1 2 3 4 5 6 7 8 9 10; do
	run "$f" "something" &

	jm=$((j % maxjobs)) # shell arithmetic: modulo
	test "$jm" = "0" && wait
	j=$((j+1))
done
wait

Why is this less optimal

This is less optimal because it waits until all jobs in the same batch are finished (each batch contain $maxjobs items).

For example with 2 items per batch and 4 total jobs it could be:

Job 1 is started.
Job 2 is started.
Job 2 is done.
Job 1 is done.
Wait: wait on process status of all background processes.
Job 3 in new batch is started.

This could be optimized to:

Job 1 is started.
Job 2 is started.
Job 2 is done.
Job 3 in new batch is started (immediately).
Job 1 is done.
...

It also does not handle signals such as SIGINT (^C). However the xargs example below does:

Example

#!/bin/sh
maxjobs=4

# fake program for example purposes.
someprogram() {
	echo "Yep yep, I'm totally a real program!"
	sleep "$1"
}

# run(arg1, arg2)
run() {
	echo "[$1] $2 started" >&2
	someprogram "$1" >/dev/null
	status="$?"
	echo "[$1] $2 done" >&2
	return "$status"
}

# child process job.
if test "$CHILD_MODE" = "1"; then
	run "$1" "$2"
	exit "$?"
fi

# generate a list of jobs for processing.
list() {
	for f in 1 2 3 4 5 6 7 8 9 10; do
		printf '%s\0%s\0' "$f" "something"
	done
}

# process jobs in parallel.
list | CHILD_MODE="1" xargs -r -0 -P "${maxjobs}" -L 2 "$(readlink -f "$0")"

Run and timings

Although the above example is kindof stupid, it already shows the queueing of jobs is more efficient.

Script 1:

time ./script1.sh
[...snip snip...]
real    0m22.095s

Script 2:

time ./script2.sh
[...snip snip...]
real    0m18.120s

How it works

The parent process:

The parent, using xargs, handles the queue of jobs and schedules the jobs to execute as a child process.
The list function writes the parameters to stdout. These parameters are separated by the NUL byte separator. The NUL byte separator is used because this character cannot be used in filenames (which can contain spaces or even newlines) and cannot be used in text (the NUL byte terminates the buffer for a string).
The -L option must match the amount of arguments that are specified for the job. It will split the specified parameters per job.
The expression "$(readlink -f "$0")" gets the absolute path to the shellscript itself. This is passed as the executable to run for xargs.
xargs calls the script itself with the specified parameters it is being fed. The environment variable $CHILD_MODE is set to indicate to the script itself it is run as a child process of the script.

The child process:

The command-line arguments are passed by the parent using xargs.
The environment variable $CHILD_MODE is set to indicate to the script itself it is run as a child process of the script.
The script itself (ran in child-mode process) only executes the task and signals its status back to xargs and the parent.
The exit status of the child program is signaled to xargs. This could be handled, for example to stop on the first failure (in this example it is not). For example if the program is killed, stopped or the exit status is 255 then xargs stops running also.

Description of used xargs options

From the OpenBSD man page: https://man.openbsd.org/xargs

xargs - construct argument list(s) and execute utility

Options explained:

-r: Do not run the command if there are no arguments. Normally the command is executed at least once even if there are no arguments.
-0: Change xargs to expect NUL ('\0') characters as separators, instead of spaces and newlines.
-P maxprocs: Parallel mode: run at most maxprocs invocations of utility at once.
-L number: Call utility for every number of non-empty lines read. A line ending in unescaped white space and the next non-empty line are considered to form one single line. If EOF is reached and fewer than number lines have been read then utility will be called with the available lines.

xargs options -0 and -P, portability and historic context

Some of the options, like -P are as of writing (2023) non-POSIX: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/xargs.html. However many systems support this useful extension for many years now.

The specification even mentions implementations which support parallel operations:

"The version of xargs required by this volume of POSIX.1-2017 is required to wait for the completion of the invoked command before invoking another command. This was done because historical scripts using xargs assumed sequential execution. Implementations wanting to provide parallel operation of the invoked utilities are encouraged to add an option enabling parallel invocation, but should still wait for termination of all of the children before xargs terminates normally."

Some historic context:

The xargs -0 option was added on 1996-06-11 by Theo de Raadt, about a year after the NetBSD import (over 27 years ago at the time of writing):

CVS log

On OpenBSD the xargs -P option was added on 2003-12-06 by syncing the FreeBSD code:

CVS log

Looking at the imported git history log of GNU findutils (which has xargs), the very first commit already had the -0 and -P option:

git log

commit c030b5ee33bbec3c93cddc3ca9ebec14c24dbe07
Author: Kevin Dalley <kevin@seti.org>
Date:   Sun Feb 4 20:35:16 1996 +0000

    Initial revision

xargs: some incompatibilities found

Using the -0 option empty fields are handled differently in different implementations.
The -n and -L option doesn't work correctly in many of the BSD implementations. Some count empty fields, some don't. In early implementations in FreeBSD and OpenBSD it only processed the first line. In OpenBSD it has been improved around 2017.

Depending on what you want to do a workaround could be to use the -0 option with a single field and use the -n flag. Then in each child program invocation split the field by a separator.

References

xargs: https://man.openbsd.org/xargs
printf: https://man.openbsd.org/printf
ksh, wait: https://man.openbsd.org/ksh#wait
wait(2): https://man.openbsd.org/wait