reimplementation of wield using the new 'neutral' multithread. The workers now all receive the same task to process one line at a time until all lines are processed. This simplifies the code; the wield object directly calls 'multithread' in it's operator(). And it improves performance, presumably because tail-end idling is reduced: all active threads have data to process until the last line has been picked up by an aggregator. So tail-end idling is in the order of magnitude of a line's worth, in contrast to half a worker's share of the data in the previous implementation. The current implementation does away with specialized partitioning code (at least for the time being); it looks like the performance is decent throughout, even without exploiting locality by partitioning to tiles.
More...
|
template<size_t vsz, typename ... types> |
void | operator() (const in_view_type &in_view, out_view_type &out_view, coupled_aggregator< vsz, types ... > func, int axis=0, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0, std::ptrdiff_t segment_size=WIELDING_SEGMENT_SIZE) |
|
template<size_t vsz, typename ... types> |
void | operator() (out_view_type &out_view, indexed_aggregator< vsz, types ... > func, int axis=0, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0, std::ptrdiff_t segment_size=WIELDING_SEGMENT_SIZE) |
|
template<size_t vsz, typename ... types> |
void | operator() (shape_type in_shape, const indexed_reductor< vsz, types ... > &func, int axis=0, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0, std::ptrdiff_t segment_size=WIELDING_SEGMENT_SIZE) |
|
template<size_t vsz, typename ... types> |
void | operator() (const in_view_type &in_view, const yield_reductor< vsz, types ... > &func, int axis=0, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0, std::ptrdiff_t segment_size=WIELDING_SEGMENT_SIZE) |
|
template<size_t vsz, typename ... types> |
void | generate (out_view_type &out_view, generate_aggregator< vsz, types ... > func, int axis=0, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0) |
|
template<int dimension, class in_type, class out_type = in_type>
struct wielding::wield< dimension, in_type, out_type >
reimplementation of wield using the new 'neutral' multithread. The workers now all receive the same task to process one line at a time until all lines are processed. This simplifies the code; the wield object directly calls 'multithread' in it's operator(). And it improves performance, presumably because tail-end idling is reduced: all active threads have data to process until the last line has been picked up by an aggregator. So tail-end idling is in the order of magnitude of a line's worth, in contrast to half a worker's share of the data in the previous implementation. The current implementation does away with specialized partitioning code (at least for the time being); it looks like the performance is decent throughout, even without exploiting locality by partitioning to tiles.
Definition at line 1154 of file wielding.h.