Implementation of 'bunch' and 'fluff'. More...

Namespaces
namespace	wielding

namespace	wielding::detail

Typedefs
typedef int	wielding::ic_type

Functions
template<typename T , size_t N, size_t K, size_t ... seq>
void	wielding::detail::fetch (vigra::TinyVector< vspline::simdized_type< T, K Vc::Vector< T >::size() >, N > &v, const vigra::TinyVector< T, N > _data, const size_t &sz, Vc::index_sequence< seq ... >)

template<typename T , size_t N, size_t K, size_t ... seq>
void	wielding::detail::stash (const vigra::TinyVector< vspline::simdized_type< T, K Vc::Vector< T >::size() >, N > &v, vigra::TinyVector< T, N > _data, const size_t &sz, Vc::index_sequence< seq ... >)

template<typename ele_type , int chn, std::size_t vsz>
void	wielding::bunch (const vigra::TinyVector< ele_type, chn > *const &src, vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > &trg, const ic_type &stride)
	bunch picks up data from interleaved, strided memory and stores them in a data type representing a package of vector data. More...

template<typename ele_type , std::size_t vsz>
void	wielding::bunch (const vigra::TinyVector< ele_type, 1 > *const &src, vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, 1 > &trg, std::true_type)
	overload for unstrided single-channel data. here we can use an SIMD load, the implementation is very straightforward, and the performance gain is large. More...

template<typename ele_type , int chn, std::size_t vsz>
std::enable_if< vsz%Vc::Vector< ele_type >::size()==0 >::type	wielding::bunch (const vigra::TinyVector< ele_type, chn > *const &src, vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > &trg, std::false_type)
	the third overload, which is only enabled if vsz is a multiple of the SIMD vector capacity, delegates to detail::fetch, which handles the data acquisition with a Vc::InterleavedMemoryWrapper. This overload is only for unstrided multichannel data. More...

template<typename ele_type , int chn, std::size_t vsz>
void	wielding::fluff (const vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > &src, vigra::TinyVector< ele_type, chn > *const &trg, const ic_type &stride)
	reverse operation: a package of vectorized data is written to interleaved, strided memory. We have the same sequence of overloads as for 'bunch'. More...

template<typename ele_type , std::size_t vsz>
void	wielding::fluff (const vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, 1 > &src, vigra::TinyVector< ele_type, 1 > *const &trg, std::true_type)

template<typename ele_type , int chn, std::size_t vsz>
std::enable_if< vsz%Vc::Vector< ele_type >::size()==0 >::type	wielding::fluff (const vigra::TinyVector< vspline::vc_simd_type< ele_type, vsz >, chn > &src, vigra::TinyVector< ele_type, chn > *const &trg, std::false_type)

template<typename target_type , typename ele_type >
void	wielding::bunch (const vigra::TinyVector< ele_type, 1 > *const &src, target_type &trg, std::true_type)

template<typename ele_type , typename source_type >
void	wielding::fluff (const source_type &src, vigra::TinyVector< ele_type, 1 > *const &trg, std::true_type)

template<typename target_type , typename ele_type , int chn>
void	wielding::_bunch (const vigra::TinyVector< ele_type, chn > *const &src, target_type &trg, const ic_type &stride)

template<typename ele_type , typename source_type , int chn>
void	wielding::_fluff (const source_type &src, vigra::TinyVector< ele_type, chn > *const &trg, const ic_type &stride)

template<typename target_type , typename ele_type , int chn>
void	wielding::bunch (const vigra::TinyVector< ele_type, chn > *const &src, target_type &trg, const ic_type &stride)

template<typename ele_type , typename source_type , int chn>
void	wielding::fluff (const source_type &src, vigra::TinyVector< ele_type, chn > *const &trg, const ic_type &stride)

Detailed Description

Implementation of 'bunch' and 'fluff'.

The two function templates 'bunch' and 'fluff' provide code to access interleaved memory holding 'xel' data: stuff like pixels, or coordinates - data types which consist of several equally-typed fundamentals. 'bunch' fetches data from interleaved memory and deposits them in a set of vectors, and 'fluff' does the reverse operation. There are single-channel variants routing to the more efficient simple load/store operation. Strided data will be handled correctly, but the specialized library code which realizes the access as a load/shuffle or shuffle/store does require unstrided data, so it will only be used if the stride is one (measured in 'xel' units') - strides larger than one will be routed to the less specialized code.

de/interleaving is a common operation, and speeding it up does usually pay off. The most basic approach used here is 'goading': the memory access is coded as a small loop, hoping the compiler will 'get it' and autovectorize the operation. Mileage will vary. 'One step up' is the use of 'regular gather/scatter' - a gather or scatter operation with fixed indices. This may still route to 'goading' code if the current ISA does not provide gather/scatter. The best perfromance will usually arise from routing to dedicated de/interleaving code, like Vc's InterleavedMemoryWrapper or highway's StoreInterleaved function templates.

Because the access to interleaved memory is a recognizably separate operation, I have factored out the code to this header. The code is used extensively by wielding.h.

Definition in file interleave.h.

Namespaces

Typedefs

Functions

Detailed Description