Function std.parallelism.TaskPool.reduce.reduce
Parallel reduce on a random access range.  Except as otherwise noted,
        usage is similar to reduce.  There is
        also fold which does the same thing with a different parameter
        order.
						
				auto reduce(Args...)
				(
				
				  Args args
				
				);
						
					
				This function works by splitting the range to be reduced into work units, which are slices to be reduced in parallel. Once the results from all work units are computed, a final serial reduction is performed on these results to compute the final answer. Therefore, care must be taken to choose the seed value appropriately.
        Because the reduction is being performed in parallel, functions
        must be associative.  For notational simplicity, let # be an
        infix operator representing functions.  Then, (a # b) # c must equal
        a # (b # c).  Floating point addition is not associative
        even though addition in exact arithmetic is.  Summing floating
        point numbers using this function may give different results than summing
        serially.  However, for many practical purposes floating point addition
        can be treated as associative.
        Note that, since functions are assumed to be associative,
        additional optimizations are made to the serial portion of the reduction
        algorithm. These take advantage of the instruction level parallelism of
        modern CPUs, in addition to the thread-level parallelism that the rest
        of this module exploits.  This can lead to better than linear speedups
        relative to reduce, especially for
        fine-grained benchmarks like dot products.
        An explicit seed may be provided as the first argument.  If
        provided, it is used as the seed for all work units and for the final
        reduction of results from all work units.  Therefore, if it is not the
        identity value for the operation being performed, results may differ
        from those generated by reduce or
        depending on how many work units are used.  The next argument must be
        the range to be reduced.
// Find the sum of squares of a range in parallel, using
// an explicit seed.
//
// Timings on an Athlon 64 X2 dual core machine:
//
// Parallel reduce:                     72 milliseconds
// Using std.algorithm.reduce instead:  181 milliseconds
auto nums = iota(10_000_000.0f);
auto sumSquares = taskPoolIf no explicit seed is provided, the first element of each work unit is used as a seed. For the final reduction, the result from the first work unit is used as the seed.
// Find the sum of a range in parallel, using the first
// element of each work unit as the seed.
auto sum = taskPool        An explicit work unit size may be specified as the last argument.
        Specifying too small a work unit size will effectively serialize the
        reduction, as the final reduction of the result of each work unit will
        dominate computation time.  If TaskPool for this instance
        is zero, this parameter is ignored and one work unit is used.
// Use a work unit size of 100.
auto sum2 = taskPool        Parallel reduce supports multiple functions, like
        std.
// Find both the min and max of nums.
auto minMax = taskPoolException Handling:
        After this function is finished executing, any exceptions thrown
        are chained together via Throwable and rethrown.  The chaining
        order is non-deterministic.
See Also
fold is functionally equivalent to reduce except the
            range parameter comes first and there is no need to use
            tuple for multiple seeds.