vspline 1.1.0
Generic C++11 Code for Uniform B-Splines
Namespaces | Typedefs | Functions
interleave.h File Reference

Implementation of 'bunch' and 'fluff'. More...

Go to the source code of this file.

Namespaces

namespace  wielding
 
namespace  wielding::detail
 

Typedefs

typedef int wielding::ic_type
 

Functions

template<typename T , size_t N, size_t K, size_t ... seq>
void wielding::detail::fetch (vigra::TinyVector< vspline::simdized_type< T, K *Vc::Vector< T >::size() >, N > &v, const vigra::TinyVector< T, N > *_data, const size_t &sz, Vc::index_sequence< seq ... >)
 
template<typename T , size_t N, size_t K, size_t ... seq>
void wielding::detail::stash (const vigra::TinyVector< vspline::simdized_type< T, K *Vc::Vector< T >::size() >, N > &v, vigra::TinyVector< T, N > *_data, const size_t &sz, Vc::index_sequence< seq ... >)
 
template<typename ele_type , int chn, std::size_t vsz>
void wielding::bunch (const vigra::TinyVector< ele_type, chn > *const &src, vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > &trg, const ic_type &stride)
 bunch picks up data from interleaved, strided memory and stores them in a data type representing a package of vector data. More...
 
template<typename ele_type , std::size_t vsz>
void wielding::bunch (const vigra::TinyVector< ele_type, 1 > *const &src, vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, 1 > &trg, std::true_type)
 overload for unstrided single-channel data. here we can use an SIMD load, the implementation is very straightforward, and the performance gain is large. More...
 
template<typename ele_type , int chn, std::size_t vsz>
std::enable_if< vsz%Vc::Vector< ele_type >::size()==0 >::type wielding::bunch (const vigra::TinyVector< ele_type, chn > *const &src, vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > &trg, std::false_type)
 the third overload, which is only enabled if vsz is a multiple of the SIMD vector capacity, delegates to detail::fetch, which handles the data acquisition with a Vc::InterleavedMemoryWrapper. This overload is only for unstrided multichannel data. More...
 
template<typename ele_type , int chn, std::size_t vsz>
void wielding::fluff (const vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > &src, vigra::TinyVector< ele_type, chn > *const &trg, const ic_type &stride)
 reverse operation: a package of vectorized data is written to interleaved, strided memory. We have the same sequence of overloads as for 'bunch'. More...
 
template<typename ele_type , std::size_t vsz>
void wielding::fluff (const vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, 1 > &src, vigra::TinyVector< ele_type, 1 > *const &trg, std::true_type)
 
template<typename ele_type , int chn, std::size_t vsz>
std::enable_if< vsz%Vc::Vector< ele_type >::size()==0 >::type wielding::fluff (const vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > &src, vigra::TinyVector< ele_type, chn > *const &trg, std::false_type)
 
template<typename target_type , typename ele_type >
void wielding::bunch (const vigra::TinyVector< ele_type, 1 > *const &src, target_type &trg, std::true_type)
 
template<typename ele_type , typename source_type >
void wielding::fluff (const source_type &src, vigra::TinyVector< ele_type, 1 > *const &trg, std::true_type)
 
template<typename target_type , typename ele_type , int chn>
void wielding::_bunch (const vigra::TinyVector< ele_type, chn > *const &src, target_type &trg, const ic_type &stride)
 
template<typename ele_type , typename source_type , int chn>
void wielding::_fluff (const source_type &src, vigra::TinyVector< ele_type, chn > *const &trg, const ic_type &stride)
 
template<typename target_type , typename ele_type , int chn>
void wielding::bunch (const vigra::TinyVector< ele_type, chn > *const &src, target_type &trg, const ic_type &stride)
 
template<typename ele_type , typename source_type , int chn>
void wielding::fluff (const source_type &src, vigra::TinyVector< ele_type, chn > *const &trg, const ic_type &stride)
 

Detailed Description

Implementation of 'bunch' and 'fluff'.

The two function templates 'bunch' and 'fluff' provide code to access interleaved memory holding 'xel' data: stuff like pixels, or coordinates - data types which consist of several equally-typed fundamentals. 'bunch' fetches data from interleaved memory and deposits them in a set of vectors, and 'fluff' does the reverse operation. There are single-channel variants routing to the more efficient simple load/store operation. Strided data will be handled correctly, but the specialized library code which realizes the access as a load/shuffle or shuffle/store does require unstrided data, so it will only be used if the stride is one (measured in 'xel' units') - strides larger than one will be routed to the less specialized code.

de/interleaving is a common operation, and speeding it up does usually pay off. The most basic approach used here is 'goading': the memory access is coded as a small loop, hoping the compiler will 'get it' and autovectorize the operation. Mileage will vary. 'One step up' is the use of 'regular gather/scatter' - a gather or scatter operation with fixed indices. This may still route to 'goading' code if the current ISA does not provide gather/scatter. The best perfromance will usually arise from routing to dedicated de/interleaving code, like Vc's InterleavedMemoryWrapper or highway's StoreInterleaved function templates.

Because the access to interleaved memory is a recognizably separate operation, I have factored out the code to this header. The code is used extensively by wielding.h.

Definition in file interleave.h.