code to distribute the processing of bulk data to several threads More...

#include <assert.h>
#include <thread>
#include <mutex>
#include <queue>
#include <condition_variable>
#include <atomic>
#include "thread_pool.h"
#include "common.h"

Namespaces
namespace	vspline

Typedefs
template<typename T >
using	vspline::atomic = std::atomic< T >

Functions
template<typename index_t >
bool	vspline::fetch_descending (vspline::atomic< index_t > &source, index_t &index)
	fetch_descending fetches the next index from an atomic, counting down. Indexes will range from one less than the value the atomic was initialized with, down to zero. If all indexes were distributed already, false is returned, true otherwise. Like the other fetch_XXX routines, the return of a boolean makes these functions good candidates to be used as conditional in a loop. More...

template<typename index_t >
bool	vspline::fetch_ascending (vspline::atomic< index_t > &source, const index_t &total, index_t &index)
	fetch_ascending counts up from zero to total-1, which is more efficient if the indexes are used to address memory. This is due to the side effects of accessing memory: if memory is accessed at an address x, the OS will typically fetch a chunk of data starting at or shortly before x. If the next fetch requires data just after x, there is a good chance that they are already in cache. More...

template<typename index_t >
bool	vspline::fetch_range_descending (vspline::atomic< index_t > &source, const index_t &count, index_t &low, index_t &high)
	fetch_range_ascending fetches the beginning and end of a range of indexes (in iterator semantic, low <= i < high) from a vspline::atomic which has been initialized with the total number of indexes that are to be processed. If the vspline::atomic, when accessed, already holds a value of or below zero, fetch_index_range returns false and leaves low and high unchanged. Otherwise it returns true and low and high will be set to the values gleaned from the atomic, raising 'low 'to zero if it would come out below. this function (and the next) enable calling code to process batches of adjacent indexes without any index artistry: the code is perfectly general and simple, and the use of the atomic and fetch_sub garantees that each fetch provides a distinct batch of indexes. More...

template<typename index_t >
bool	vspline::fetch_range_ascending (vspline::atomic< index_t > &source, const index_t &count, const index_t &total, index_t &low, index_t &high)
	fetch_range_ascending also uses an atomic initialized to the total number of indexes to be distributed, but the successive ranges are handed out in ascending order, which is more efficient if the indexes are used to address memory. More...

template<bool dummy = true>
int	vspline::multithread (std::function< void() > payload, std::size_t nr_workers=default_njobs)
	multithread uses a thread pool of worker threads to perform a multithreaded operation. It receives a functor (a single-threaded function used for all individual tasks), and, optionally, the desired number of worker instances to be used. These tasks are wrapped with a wrapper which takes care of signalling when the last task has completed. More...

Variables
const int	vspline::ncores = vspline_threadpool::ncores

const int	vspline::default_njobs = vspline_threadpool::default_njobs

Detailed Description

code to distribute the processing of bulk data to several threads

The code in this header provides a resonably general method to perform processing of manifolds of data with several threads in parallel. In vspline, there are several areas where potentially large numbers of individual values have to be processed independently of each other or in a dependence which can be preserved in partitioning. To process such 'bulk' data effectively, vspline employs two strategies: multithreading and vectorization. This file handles the multithreading.

multithreading, the use of related headers and linkage with pthread are optional in vspline and can be switched off by #defining VSPLINE_SINGLETHREAD. This is the reason for the template vspline::atomic, which normally uses std::atomic, but instead uses a stand-in type providing 'mock' functionality if VSPLINE_SINGLETHREAD is defined, to allow the use of the same logic.

As of March 2019, the multithreading code has been simplified: The routine 'multithread' takes a std::function<void()> as 'payload'. The payload code is wrapped in an outer function keeping track of worker threads terminating, and when the last of a set of worker threads who have been assigned a job terminates, control is returned to the caller of 'maultithread'. This logic ensures that a multithreaded task is complete when control returns.

This logic implies that the caller and the payload code cooperate to split up the work load, since 'multithread' itself does not concern itself with this aspect. In vspline, this is done using a vspline::atomic instance in the caller, which is passed into the workers and used by them to obtain 'joblet numbers', which they interpret as signifying a (small-ish) share of some larger data set. The passing-to-the-workers is done by per-reference lambda capture, which conveniently transports the caller's context into the workers. The payload code keeps taking 'joblet numbers' from the atomic until they are finished - then it terminates.

This strategy separates the granularity of the workload distribution from the number of worker threads, resulting in even load distribution and little tail-end idling (when most jobs are complete and only a few aren't) - and it also makes any data partitioning code unnecessary: jobs which are laid to rest by the OS may, on re-awakening, have some data left to process, but if the remainder of the job is done (joblet numbers are finished) that's all they have to do, taking much less time than having to complete some previously scheduled fixed work load. It also allows running some task 'on the back burner', employing only a small number of workers: Since load distribution is automatic, the job will only take longer to execute.

I like this technique, and I've dubbed it 'atomic surfing' ;)

So now it should be clear why a stnd-in type is needed if VSPLINE_SINGLETHREAD is #defined: using the atomic to share joblet numbers between caller and workers is part of the logic, and it's easier to just code using them and using the stand-in type for the single-threaded case, which preserves the logic, but gets by without any reference to multithreading-related headers or libraries.

In vspline, the granularity of 'joblets' is single 1D subarrays of a larger data set. If the data are images, joblets process lines. The number of threads used to process an entire data set is fixed in thread_pool.h, and is some small multiple of the number of available physical cores. This seems strange, because one might assume that it should be best to have as many threads as cores, but I found that using more (up to a point) increases performance, and since one of my main concerns in vspline is speed, I've coded so that per default a number of threads is chosen which runs best on my system - thiy may not be optimal for other hardware. To change the number, see the definition of default_njobs in thread_pool.h.

the multithreading code in this header is used both by vspline's digital filter code and by 'wielding.h' which provides the data flow logic for vspline's transform-like routines.

Definition in file multithread.h.

Namespaces

Typedefs

Functions

Variables

Detailed Description