vspline 1.1.0
Generic C++11 Code for Uniform B-Splines
|
Implementation of 'bunch' and 'fluff'. More...
Go to the source code of this file.
Namespaces | |
namespace | wielding |
namespace | wielding::detail |
Typedefs | |
typedef int | wielding::ic_type |
Functions | |
template<typename T , size_t N, size_t K, size_t ... seq> | |
void | wielding::detail::fetch (vigra::TinyVector< vspline::simdized_type< T, K *Vc::Vector< T >::size() >, N > &v, const vigra::TinyVector< T, N > *_data, const size_t &sz, Vc::index_sequence< seq ... >) |
template<typename T , size_t N, size_t K, size_t ... seq> | |
void | wielding::detail::stash (const vigra::TinyVector< vspline::simdized_type< T, K *Vc::Vector< T >::size() >, N > &v, vigra::TinyVector< T, N > *_data, const size_t &sz, Vc::index_sequence< seq ... >) |
template<typename ele_type , int chn, std::size_t vsz> | |
void | wielding::bunch (const vigra::TinyVector< ele_type, chn > *const &src, vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > &trg, const ic_type &stride) |
bunch picks up data from interleaved, strided memory and stores them in a data type representing a package of vector data. More... | |
template<typename ele_type , std::size_t vsz> | |
void | wielding::bunch (const vigra::TinyVector< ele_type, 1 > *const &src, vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, 1 > &trg, std::true_type) |
overload for unstrided single-channel data. here we can use an SIMD load, the implementation is very straightforward, and the performance gain is large. More... | |
template<typename ele_type , int chn, std::size_t vsz> | |
std::enable_if< vsz%Vc::Vector< ele_type >::size()==0 >::type | wielding::bunch (const vigra::TinyVector< ele_type, chn > *const &src, vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > &trg, std::false_type) |
the third overload, which is only enabled if vsz is a multiple of the SIMD vector capacity, delegates to detail::fetch, which handles the data acquisition with a Vc::InterleavedMemoryWrapper. This overload is only for unstrided multichannel data. More... | |
template<typename ele_type , int chn, std::size_t vsz> | |
void | wielding::fluff (const vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > &src, vigra::TinyVector< ele_type, chn > *const &trg, const ic_type &stride) |
reverse operation: a package of vectorized data is written to interleaved, strided memory. We have the same sequence of overloads as for 'bunch'. More... | |
template<typename ele_type , std::size_t vsz> | |
void | wielding::fluff (const vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, 1 > &src, vigra::TinyVector< ele_type, 1 > *const &trg, std::true_type) |
template<typename ele_type , int chn, std::size_t vsz> | |
std::enable_if< vsz%Vc::Vector< ele_type >::size()==0 >::type | wielding::fluff (const vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > &src, vigra::TinyVector< ele_type, chn > *const &trg, std::false_type) |
template<typename target_type , typename ele_type > | |
void | wielding::bunch (const vigra::TinyVector< ele_type, 1 > *const &src, target_type &trg, std::true_type) |
template<typename ele_type , typename source_type > | |
void | wielding::fluff (const source_type &src, vigra::TinyVector< ele_type, 1 > *const &trg, std::true_type) |
template<typename target_type , typename ele_type , int chn> | |
void | wielding::_bunch (const vigra::TinyVector< ele_type, chn > *const &src, target_type &trg, const ic_type &stride) |
template<typename ele_type , typename source_type , int chn> | |
void | wielding::_fluff (const source_type &src, vigra::TinyVector< ele_type, chn > *const &trg, const ic_type &stride) |
template<typename target_type , typename ele_type , int chn> | |
void | wielding::bunch (const vigra::TinyVector< ele_type, chn > *const &src, target_type &trg, const ic_type &stride) |
template<typename ele_type , typename source_type , int chn> | |
void | wielding::fluff (const source_type &src, vigra::TinyVector< ele_type, chn > *const &trg, const ic_type &stride) |
Implementation of 'bunch' and 'fluff'.
The two function templates 'bunch' and 'fluff' provide code to access interleaved memory holding 'xel' data: stuff like pixels, or coordinates - data types which consist of several equally-typed fundamentals. 'bunch' fetches data from interleaved memory and deposits them in a set of vectors, and 'fluff' does the reverse operation. There are single-channel variants routing to the more efficient simple load/store operation. Strided data will be handled correctly, but the specialized library code which realizes the access as a load/shuffle or shuffle/store does require unstrided data, so it will only be used if the stride is one (measured in 'xel' units') - strides larger than one will be routed to the less specialized code.
de/interleaving is a common operation, and speeding it up does usually pay off. The most basic approach used here is 'goading': the memory access is coded as a small loop, hoping the compiler will 'get it' and autovectorize the operation. Mileage will vary. 'One step up' is the use of 'regular gather/scatter' - a gather or scatter operation with fixed indices. This may still route to 'goading' code if the current ISA does not provide gather/scatter. The best perfromance will usually arise from routing to dedicated de/interleaving code, like Vc's InterleavedMemoryWrapper or highway's StoreInterleaved function templates.
Because the access to interleaved memory is a recognizably separate operation, I have factored out the code to this header. The code is used extensively by wielding.h.
Definition in file interleave.h.