code for horizontal vectorization in vspline More...

#include "common.h"
#include "simd_type.h"
#include "vc_simd_type.h"

Classes
struct	vspline::simd_traits< T >
	traits class simd_traits provides three traits: More...

struct	vspline::vector_traits< T, _vsize, Enable >
	with the definition of 'simd_traits', we can proceed to implement 'vector_traits': struct vector_traits is a traits class fixing the types used for vectorized code in vspline. These types go beyond mere vectors of fundamentals: most of the time, the data vspline has to process are not fundamentals, but what I call 'xel' data: pixels, voxels, stereo sound samples, etc. - so, small aggregates of a fundamental type. vector_traits defines how fundamentals and 'xel' data are to be vectorized. with the types defined by vector_traits, a system of type names is introduced which uses a set of patterns: More...

struct	vspline::vector_traits< T, _vsize, typename std::enable_if< vspline::is_element_expandable< T > ::value > ::type >
	specialization of vector_traits for 'element-expandable' types. These types are recognized by vigra's ExpandElementResult mechanism, resulting in the formation of a 'vectorized' version of the type. These data are what I call 'xel' data. As explained above, vectorization is horizontal, so if T is, say, a pixel of three floats, the type generated here will be a TinyVector of three vectors of vsize floats. More...

Namespaces
namespace	vspline

Macros
#define	VC_SIMD(T)

Typedefs
template<typename T , size_t N>
using	vspline::simdized_type = typename vector_traits< T, N > ::type
	this alias is used as a shorthand to pick the vectorized type for a given type T and a size N from 'vector_traits': More...

Functions
template<typename T , typename U >
void	vspline::assign (T &t, const U &u)

template<typename T , typename U , int N>
void	vspline::assign (vigra::TinyVector< T, N > &t, const vigra::TinyVector< U, N > &u)

template<typename VT1 , typename PT , typename VT2 >
void	vspline::assign_if (VT1 &target, const PT &predicate, const VT2 &source)

template<typename T >
void	vspline::assign_if (T &target, const bool &predicate, const T &source)

Detailed Description

code for horizontal vectorization in vspline

vspline currently has three ways of approaching vectorization:

no vectorization. Scalar code is less complex since it does not have to aggregate the data into vectorization-friendly parcels, and for some data types, the performance is just as good as with vectorization. Use of scalar code results from setting the vectorization width to 1. This is usually a template argument going by the name 'vsize'.
Use of Vc for vectorization. This requires the presence of Vc during compilation and results in explicit vectorization for all elementary types Vc can handle. Vc provides code for several operations which are outside the scope of autovectorization, most prominently hardware gather and scatter operations, and the explicit vectorization with Vc makes sure that vectorization is indeed used whenever possible, rather than having to rely on the compiler to recognize the opportunity. Use of Vc has to be explicitly activated by defining USE_VC during compilation. Using this option usually produces the fastest code. The downside is the dependence on an external library which may or may not actually implement the intended vector operations with vector code for a given target: Newer processors may not yet be supported, or support may be implemented for part of the instructions only. Also, the Vc version coming from the distro's packet management may not be up-to-date. Building processing pipelines based on Vc::SimdArray is, on the other hand, straightforward - the type is well-thought-out and there is good library support for many operations. Use of Vc triggers use of fallback code for elementary types which Vc can't vectorize - such types are pseudo-vectorized:
The third option is to produce code which is designed to be easily recognized by the compiler as amenable to autovectorization. This option is implemented in simd_type.h, which defines an arithmetic type 'vspline::simd_type' holding data in a C vector. This is a technique I call 'goading': data are processed in small aggregates of vector friendly size, resulting in inner loops which oftentimes are recognized by the autovectorization stage, resulting in hardware vector code if the compiler flags allow for it and the compiler can generate code for the intended target. Since this approach relies entirely on the compiler's capability to autovectorize the (deliberately vectorization-friendly) code, the mileage varies. If it works, this is a clean and simple solution. A disadvantage is the use of class simd_type for vectorization, which is mildly exotic and very much a vspline creature - building processing pipelines using this type will not be as effortless as using Vc::SimdArray. As long as you're not building your own functors to be used with vspline's family of transform-like functions, the precise mode of vectorization remains an internal issue and you needn't concern yourself with with it beyond choosing whether you want vspline to use Vc or not, and choosing a suitable vectorization width if the default does not suit you. Class vspline::simd_type can 'vectorize' every fundamental and is used as fallback type when Vc use is allowed but Vc can't provide a vectorized data type, like for 'long double' data, so it will be used even with Vc active when the need arises.

It's important to understand that using SIMD is not simply mapping, say, pixels of three floats to a vector of three floats - that would be 'vertical' vectorization, which is represented by vspline's scalar code. Instead, vspline is coded to use horizontal vectorization, which produces vector data fitting the size of the vector unit's registers, where each element held by the vector has exactly the same meaning as every other: rather than vectors holding, like, the colour channels of a pixel, we have a 'red', a 'green' and a 'blue' vector holding, say, eight floats each. Horizontal vectorization is best explicitly coded, and if it is coded explicitly, the code structure itself suggests vectorization to the compiler. Using code like Vc gives more structure to this process and adds capabilities beyond the scope of autovectorization, but having the horizontal vectorization manifest in the code's structure already goes a long way, and if the 'structurally' vectorized code autovectorizes well, that may well be 'good enough' as it is. In my experience, it is often significantly faster than scalar code - provided the processor has vector units.

So it turns out that successful vectorization is, to a large degree, a conceptual change making the intended vectorization explicit by choosing appropriate data types. I am indebted to Matthis Kretz, the author of the Vc library, who has opened my eyes to this fact with his thesis: 'Extending C++ for explicit data-parallel programming via SIMD vector types'.

With vspline::simd_type 'in the back hand' vspline code can rely on the presence of a vectorized type for every fundamental, and, by extension, vectorized 'xel' data - i.e. vectorized pixels, voxels etc. which are implemented as vigra::TinyVectors of vectorized fundamentals. This allows vspline to be coded so that it relies on vectorization, but not necessarily on Vc: Vc is an option to provide extra-fast, tailormade vector code for some operations, but when it can't be used, vspline's own vector code will be used instead, providing the same interface. This makes maintainance much easier compared to a scenario where, without Vc, the code would have to fall back to a scalar version - as indeed it did in early vspline versions, giving me plenty of headaches.

Note that this header is included by vspline/common.h, so this code is available throughout vspline.

Definition in file vector.h.

Macro Definition Documentation

◆ VC_SIMD

#define VC_SIMD ( T )

Value:

template<> struct simd_traits<T> \
{ \
  static const size_t hsize = Vc::Vector < T > :: size() ; \
  template < size_t sz > using type = \
    typename std::conditional \
             < sz == 1 , \
               T , \
               vc_simd_type < T , sz > \
             > :: type ; \
  enum { default_size =   sizeof ( T ) > VSPLINE_VECTOR_NBYTES \
                        ? 1 \
                        : VSPLINE_VECTOR_NBYTES / sizeof ( T ) } ; \
} ;

Definition at line 248 of file vector.h.

Classes

Namespaces

Macros

Typedefs

Functions

Detailed Description

Macro Definition Documentation

◆ VC_SIMD