Function std.parallelism.TaskPool.parallel
Implements a parallel foreach loop over a range. This works by implicitly
creating and submitting one Task
to the TaskPool
for each worker
thread. A work unit is a set of consecutive elements of range
to
be processed by a worker thread between communication with any other
thread. The number of elements processed per work unit is controlled by the
workUnitSize
parameter. Smaller work units provide better load
balancing, but larger work units avoid the overhead of communicating
with other threads frequently to fetch the next work unit. Large work
units also avoid false sharing in cases where the range is being modified.
The less time a single iteration of the loop takes, the larger
workUnitSize
should be. For very expensive loop bodies,
workUnitSize
should be 1. An overload that chooses a default work
unit size is also available.
ParallelForeach!R parallel(R)
(
R range,
size_t workUnitSize
);
ParallelForeach!R parallel(R)
(
R range
);
Example
// Find the logarithm of every number from 1 to
// 10_000_000 in parallel.
auto logs = new double[10_000_000];
// Parallel foreach works with or without an index
// variable. It can iterate by ref if range.front
// returns by ref.
// Iterate over logs using work units of size 100.
foreach (i, ref elem; taskPool .parallel(logs, 100))
{
elem = log(i + 1.0);
}
// Same thing, but use the default work unit size.
//
// Timings on an Athlon 64 X2 dual core machine:
//
// Parallel foreach: 388 milliseconds
// Regular foreach: 619 milliseconds
foreach (i, ref elem; taskPool .parallel(logs))
{
elem = log(i + 1.0);
}
Notes
The memory usage of this implementation is guaranteed to be constant
in range
.
Breaking from a parallel foreach loop via a break, labeled break,
labeled continue, return or goto statement throws a
ParallelForeachError
.
In the case of non-random access ranges, parallel foreach buffers lazily
to an array of size workUnitSize
before executing the parallel portion
of the loop. The exception is that, if a parallel foreach is executed
over a range returned by asyncBuf
or map
, the copying is elided
and the buffers are simply swapped. In this case workUnitSize
is
ignored and the work unit size is set to the buffer size of range
.
A memory barrier is guaranteed to be executed on exit from the loop, so that results produced by all threads are visible in the calling thread.
Exception Handling:
When at least one exception is thrown from inside a parallel foreach loop,
the submission of additional Task
objects is terminated as soon as
possible, in a non-deterministic manner. All executing or
enqueued work units are allowed to complete. Then, all exceptions that
were thrown by any work unit are chained using Throwable
and
rethrown. The order of the exception chaining is non-deterministic.