vspline 1.1.0
Generic C++11 Code for Uniform B-Splines
|
Namespaces | |
namespace | detail |
Classes | |
struct | coupled_aggregator |
an aggregator for separate - possibly different - source and target. If source and target are in fact different, the inner functor will read data from source, process them and then write them to target. If source and target are the same, the operation will be in-place, but not explicitly so. vspline uses this style of two-argument functor, and this is the aggregator we use for vspline's array-based transforms. The code in this template will only be used for vectorized operation, If vectorization is not used, only the specialization for vsize == 1 below is used. More... | |
struct | coupled_aggregator< 1, ic_type, functor_type > |
specialization for vsz == 1. Here the data are simply processed one by one in a loop, without vectorization. More... | |
struct | generate_aggregator |
generate_aggregator is very similar to indexed_aggregator, but instead of managing and passing a coordinate to the functor, the functor now manages the argument side of the operation: it acts as a generator. To make this possible, the generator has to hold run-time modifiable state and can't be const like the functors used in the other aggregators, where the functors are 'pure' in a functional programming sense. A 'generator' functor to be used with this body of code is expected to behave in a certain fashion: More... | |
struct | generate_aggregator< 1, ic_type, functor_type > |
specialization for vsz == 1. Here the data are simply processed one by one in a loop, without vectorization. More... | |
struct | indexed_aggregator |
indexed_aggregator receives the start coordinate and processing axis along with the data to process, this is meant for index-transforms. The coordinate is updated for every call to the 'inner' functor so that the inner functor has the current coordinate as input. The code in this template will only be used for vectorized operation, without vectorization, only the specialization for vsize == 1 below is used. More... | |
struct | indexed_aggregator< 1, ic_type, functor_type > |
specialization for vsz == 1. Here the data are simply processed one by one in a loop, without vectorization. More... | |
struct | indexed_reductor |
indexed_reductor is used for reductions and has no output. The actual reduction is handled by the functor: each thread has it's own copy of the functor, which does it's own part of the reduction, and 'offloads' it's result to some mutex-protected receptacle when it's destructed, see the 'reduce' functions in transform.h for a more detailed explanation and an example of such a functor. idexed_reductor processes discrete coordinates, whereas yield_reductor (the next class down) processes values. This variant works just like an indexed_aggregator, only that it produces no output - at least not for every coordinate fed to the functor, the functor itself does hold state (the reduction) and is also responsible for offloading per-thread results when the worker threads terminate. This class holds a copy of the functor, and each thread has an instance of this class, ensuring that each worker thread can reduce it's share of the work load independently. More... | |
struct | indexed_reductor< 1, ic_type, functor_type > |
specialization for vsz == 1. Here the data are simply processed one by one in a loop, without vectorization. More... | |
struct | vs_adapter |
vs_adapter wraps a vspline::unary_functor to produce a functor which is compatible with the wielding code. This is necessary, because vspline's unary_functors take 'naked' arguments if the data are 1D, while the wielding code always passes TinyVectors. The operation of this wrapper class should not have a run-time effect; it's simply converting references. the wrapped functor is only used via operator(), so this is what we provide. While it would be nice to simply pass through the unwrapped unary_functor, this would force us to deal with the distinction between data in TinyVectors and 'naked' fundamentals deeper down in the code, and here is a good central place where we can route to uniform access via TinyVectors - possibly with only one element. By inheriting from inner_type, we provide all of inner_type's type system which we don't explicitly override. Rest assured: the reinterpret_cast is safe. If the data are single-channel, the containerized version takes up the same meory as the uncontainerized version of the datum. multi-channel data are containerized anyway. More... | |
struct | vs_sink_adapter |
same procedure for a vspline::sink_type More... | |
struct | wield |
reimplementation of wield using the new 'neutral' multithread. The workers now all receive the same task to process one line at a time until all lines are processed. This simplifies the code; the wield object directly calls 'multithread' in it's operator(). And it improves performance, presumably because tail-end idling is reduced: all active threads have data to process until the last line has been picked up by an aggregator. So tail-end idling is in the order of magnitude of a line's worth, in contrast to half a worker's share of the data in the previous implementation. The current implementation does away with specialized partitioning code (at least for the time being); it looks like the performance is decent throughout, even without exploiting locality by partitioning to tiles. More... | |
struct | wield< 1, in_type, out_type > |
struct | yield_reductor |
an aggregator to reduce arrays. This is like using indexed_reductor with a functor gathering from an array, but due to the use of 'bunch' this class is faster for certain array types, because it can use load/shuffle operations instead of always gathering. More... | |
struct | yield_reductor< 1, ic_type, functor_type > |
specialization for vsz == 1. Here the data are simply processed one by one in a loop, without vectorization. More... | |
Typedefs | |
typedef int | ic_type |
Functions | |
template<typename ele_type , int chn, std::size_t vsz> | |
void | bunch (const vigra::TinyVector< ele_type, chn > *const &src, vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > &trg, const ic_type &stride) |
bunch picks up data from interleaved, strided memory and stores them in a data type representing a package of vector data. More... | |
template<typename ele_type , std::size_t vsz> | |
void | bunch (const vigra::TinyVector< ele_type, 1 > *const &src, vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, 1 > &trg, std::true_type) |
overload for unstrided single-channel data. here we can use an SIMD load, the implementation is very straightforward, and the performance gain is large. More... | |
template<typename ele_type , int chn, std::size_t vsz> | |
std::enable_if< vsz%Vc::Vector< ele_type >::size()==0 >::type | bunch (const vigra::TinyVector< ele_type, chn > *const &src, vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > &trg, std::false_type) |
the third overload, which is only enabled if vsz is a multiple of the SIMD vector capacity, delegates to detail::fetch, which handles the data acquisition with a Vc::InterleavedMemoryWrapper. This overload is only for unstrided multichannel data. More... | |
template<typename ele_type , int chn, std::size_t vsz> | |
void | fluff (const vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > &src, vigra::TinyVector< ele_type, chn > *const &trg, const ic_type &stride) |
reverse operation: a package of vectorized data is written to interleaved, strided memory. We have the same sequence of overloads as for 'bunch'. More... | |
template<typename ele_type , std::size_t vsz> | |
void | fluff (const vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, 1 > &src, vigra::TinyVector< ele_type, 1 > *const &trg, std::true_type) |
template<typename ele_type , int chn, std::size_t vsz> | |
std::enable_if< vsz%Vc::Vector< ele_type >::size()==0 >::type | fluff (const vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > &src, vigra::TinyVector< ele_type, chn > *const &trg, std::false_type) |
template<typename target_type , typename ele_type > | |
void | bunch (const vigra::TinyVector< ele_type, 1 > *const &src, target_type &trg, std::true_type) |
template<typename ele_type , typename source_type > | |
void | fluff (const source_type &src, vigra::TinyVector< ele_type, 1 > *const &trg, std::true_type) |
template<typename target_type , typename ele_type , int chn> | |
void | _bunch (const vigra::TinyVector< ele_type, chn > *const &src, target_type &trg, const ic_type &stride) |
template<typename ele_type , typename source_type , int chn> | |
void | _fluff (const source_type &src, vigra::TinyVector< ele_type, chn > *const &trg, const ic_type &stride) |
template<typename target_type , typename ele_type , int chn> | |
void | bunch (const vigra::TinyVector< ele_type, chn > *const &src, target_type &trg, const ic_type &stride) |
template<typename ele_type , typename source_type , int chn> | |
void | fluff (const source_type &src, vigra::TinyVector< ele_type, chn > *const &trg, const ic_type &stride) |
template<class functor_type , int dimension> | |
void | index_wield (const functor_type functor, vigra::MultiArrayView< dimension, typename functor_type::out_type > *output, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0) |
index_wield uses vspline's 'multithread' function to invoke an index-transformation functor for all indexes into an array, We use functors which are vector-capable, typically they will be derived from vspline::unary_functor. index_wield internally uses a 'wield' object to invoke the functor on the chunks of data. More... | |
template<class functor_type , int dimension> | |
void | index_reduce (const functor_type &functor, vigra::TinyVector< long, dimension > shape, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0) |
template<class functor_type , int dimension> | |
void | value_reduce (const functor_type &functor, const vigra::MultiArrayView< dimension, typename functor_type::in_type > *input, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0) |
template<class functor_type , int dimension> | |
void | coupled_wield (const functor_type functor, const vigra::MultiArrayView< dimension, typename functor_type::in_type > *input, vigra::MultiArrayView< dimension, typename functor_type::out_type > *output, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0) |
coupled_wield processes two arrays. The first array is taken as input, the second for output. Both arrays must have the same dimensionality and shape. Their data types have to be the same as the 'in_type' and the 'out_type' of the functor which was passed in. More... | |
template<class functor_type , unsigned int dimension> | |
void | generate_wield (const functor_type functor, vigra::MultiArrayView< dimension, typename functor_type::out_type > &output, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0) |
generate_wield uses a generator function to produce data. Inside vspline, this is used for grid_eval, which can produce performance gains by precalculating frequently reused b-spline evaluation weights. The generator holds these weights in readily vectorized form, shared for all worker threads. More... | |
typedef int wielding::ic_type |
Definition at line 98 of file interleave.h.
void wielding::_bunch | ( | const vigra::TinyVector< ele_type, chn > *const & | src, |
target_type & | trg, | ||
const ic_type & | stride | ||
) |
Definition at line 330 of file interleave.h.
void wielding::_fluff | ( | const source_type & | src, |
vigra::TinyVector< ele_type, chn > *const & | trg, | ||
const ic_type & | stride | ||
) |
Definition at line 344 of file interleave.h.
void wielding::bunch | ( | const vigra::TinyVector< ele_type, 1 > *const & | src, |
target_type & | trg, | ||
std::true_type | |||
) |
Definition at line 310 of file interleave.h.
void wielding::bunch | ( | const vigra::TinyVector< ele_type, 1 > *const & | src, |
vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, 1 > & | trg, | ||
std::true_type | |||
) |
overload for unstrided single-channel data. here we can use an SIMD load, the implementation is very straightforward, and the performance gain is large.
Definition at line 236 of file interleave.h.
void wielding::bunch | ( | const vigra::TinyVector< ele_type, chn > *const & | src, |
target_type & | trg, | ||
const ic_type & | stride | ||
) |
Definition at line 359 of file interleave.h.
void wielding::bunch | ( | const vigra::TinyVector< ele_type, chn > *const & | src, |
vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > & | trg, | ||
const ic_type & | stride | ||
) |
bunch picks up data from interleaved, strided memory and stores them in a data type representing a package of vector data.
The first overload of 'bunch' uses a gather operation to obtain the data from memory. This overload is used if the source data are strided and are therefore not contiguous in memory. It's also used if unstrided data are multi-channel and the vector width is not a multiple of the hardware vector width, because I haven't fully implemented using Vc::InterleavedMemoryWrapper for SimdArrays. This first routine can be used for all situations, the two overloads below are optimizations, increasing performance for specific cases.
Definition at line 220 of file interleave.h.
std::enable_if< vsz%Vc::Vector< ele_type >::size()==0 >::type wielding::bunch | ( | const vigra::TinyVector< ele_type, chn > *const & | src, |
vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > & | trg, | ||
std::false_type | |||
) |
the third overload, which is only enabled if vsz is a multiple of the SIMD vector capacity, delegates to detail::fetch, which handles the data acquisition with a Vc::InterleavedMemoryWrapper. This overload is only for unstrided multichannel data.
Definition at line 251 of file interleave.h.
void wielding::coupled_wield | ( | const functor_type | functor, |
const vigra::MultiArrayView< dimension, typename functor_type::in_type > * | input, | ||
vigra::MultiArrayView< dimension, typename functor_type::out_type > * | output, | ||
int | njobs = vspline::default_njobs , |
||
vspline::atomic< bool > * | p_cancel = 0 |
||
) |
coupled_wield processes two arrays. The first array is taken as input, the second for output. Both arrays must have the same dimensionality and shape. Their data types have to be the same as the 'in_type' and the 'out_type' of the functor which was passed in.
Definition at line 2122 of file wielding.h.
void wielding::fluff | ( | const source_type & | src, |
vigra::TinyVector< ele_type, 1 > *const & | trg, | ||
std::true_type | |||
) |
Definition at line 321 of file interleave.h.
void wielding::fluff | ( | const source_type & | src, |
vigra::TinyVector< ele_type, chn > *const & | trg, | ||
const ic_type & | stride | ||
) |
Definition at line 367 of file interleave.h.
void wielding::fluff | ( | const vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, 1 > & | src, |
vigra::TinyVector< ele_type, 1 > *const & | trg, | ||
std::true_type | |||
) |
Definition at line 279 of file interleave.h.
void wielding::fluff | ( | const vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > & | src, |
vigra::TinyVector< ele_type, chn > *const & | trg, | ||
const ic_type & | stride | ||
) |
reverse operation: a package of vectorized data is written to interleaved, strided memory. We have the same sequence of overloads as for 'bunch'.
Definition at line 267 of file interleave.h.
std::enable_if< vsz%Vc::Vector< ele_type >::size()==0 >::type wielding::fluff | ( | const vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > & | src, |
vigra::TinyVector< ele_type, chn > *const & | trg, | ||
std::false_type | |||
) |
Definition at line 289 of file interleave.h.
void wielding::generate_wield | ( | const functor_type | functor, |
vigra::MultiArrayView< dimension, typename functor_type::out_type > & | output, | ||
int | njobs = vspline::default_njobs , |
||
vspline::atomic< bool > * | p_cancel = 0 |
||
) |
generate_wield uses a generator function to produce data. Inside vspline, this is used for grid_eval, which can produce performance gains by precalculating frequently reused b-spline evaluation weights. The generator holds these weights in readily vectorized form, shared for all worker threads.
Definition at line 2152 of file wielding.h.
void wielding::index_reduce | ( | const functor_type & | functor, |
vigra::TinyVector< long, dimension > | shape, | ||
int | njobs = vspline::default_njobs , |
||
vspline::atomic< bool > * | p_cancel = 0 |
||
) |
Definition at line 2081 of file wielding.h.
void wielding::index_wield | ( | const functor_type | functor, |
vigra::MultiArrayView< dimension, typename functor_type::out_type > * | output, | ||
int | njobs = vspline::default_njobs , |
||
vspline::atomic< bool > * | p_cancel = 0 |
||
) |
index_wield uses vspline's 'multithread' function to invoke an index-transformation functor for all indexes into an array, We use functors which are vector-capable, typically they will be derived from vspline::unary_functor. index_wield internally uses a 'wield' object to invoke the functor on the chunks of data.
Definition at line 2052 of file wielding.h.
void wielding::value_reduce | ( | const functor_type & | functor, |
const vigra::MultiArrayView< dimension, typename functor_type::in_type > * | input, | ||
int | njobs = vspline::default_njobs , |
||
vspline::atomic< bool > * | p_cancel = 0 |
||
) |
Definition at line 2099 of file wielding.h.