vspline 1.1.0
Generic C++11 Code for Uniform B-Splines
|
SIMD type using small loops. More...
#include <iostream>
Go to the source code of this file.
Classes | |
struct | vspline::simd_type< _value_type, _vsize > |
class template simd_type provides a fixed-size container type for small sets of fundamentals which are stored in a C vector. The type offers arithmetic capabilities which are implemented by using loops over the elements in the vector, expecting that the compiler will autovectorize these loops into 'proper' SIMD code. The interface of this type is modelled to be compatible with Vc's SimdArray. Unfortunately, Vc::SimdArray requires additional template arguments, so at times it's difficult to use the two types instead of each other. The interface compatibility does not mean that the arithmetic will produce the same results - this is intended but neither tested nor enforced. More... | |
struct | vspline::simd_type< _value_type, _vsize >::masked_type |
struct | std::allocator_traits< vspline::simd_type< T, N > > |
Namespaces | |
namespace | vspline |
Macros | |
#define | BUILD_FROM_CONTAINER(SIZE_TYPE, VSZ) |
#define | BROADCAST_STD_FUNC(FUNC) |
#define | BROADCAST_STD_FUNC2(FUNC) |
#define | BROADCAST_STD_FUNC3(FUNC) |
#define | INTEGRAL_ONLY |
#define | BOOL_ONLY |
#define | OPEQ_FUNC(OPFUNC, OPEQ, CONSTRAINT) |
#define | C_PROMOTE(A, B) |
#define | OP_FUNC(OPFUNC, OP, CONSTRAINT) |
#define | OP_FUNC(OPFUNC, OP, CONSTRAINT) |
#define | COMPARE_FUNC(OPFUNC, OP) |
#define | OPEQ_FUNC(OPFUNC, OPEQ, CONSTRAINT) |
#define | CLAMP(FNAME, REL) |
Typedefs | |
template<typename T > | |
using | vspline::is_scalar = typename std::integral_constant< bool, std::is_fundamental< T > ::value||std::is_same< T, bool > ::value > ::type |
Functions | |
template<typename T , std::size_t vsize> | |
bool | vspline::any_of (simd_type< T, vsize > arg) |
template<typename T , std::size_t vsize> | |
bool | vspline::all_of (simd_type< T, vsize > arg) |
template<typename T , std::size_t vsize> | |
bool | vspline::none_of (simd_type< T, vsize > arg) |
SIMD type using small loops.
SIMD type derived from std::simd.
vspline can use Vc for explicit vectorization, and at the time of this writing, this is usually the best option. But Vc is not available everywhere, or it's use may be unwanted. To help with such situations, vspline defines it's own 'SIMD' type, which is implemented as a simple C vector and small loops operating on it. If these constructs are compiled with compilers capable of autovectorization (and with the relevent flags activating use of SIMD instruction sets like AVX) the resulting code will oftentimes be 'proper' SIMD code, because the small loops are presented so that the compiler can easily recognize them as potential clients of loop vectorization. I call this technique 'goading': By presenting the data flow in deliberately vector-friendly format, the compiler is more likely to 'get it'.
class template simd_type is designed to provide an interface similar to Vc::SimdArray, to be able to use it as a drop-in replacement. It aims to provide those SIMD capabilities which are actually used by vspline and is not a complete replacement for Vc::SimdArray.
Wherever possible, the code is as simple as possible, avoiding frills and trickery which might keep the compiler from recognizing potentially auto-vectorizable constructs. The resulting code is - in my limited experience - often not too far from explicit SIMD code. Some constructs do actually produce binary which is en par with code using Vc, namely such code which does not use gather, scatter or masked operations. So b-spline prefiltering, restoration of original data, and general filtering is very fast, while code involving b-spline evaluation shows a speed penalty, since vectorized b-spline evaluation (as coded in vspline) relies massively on gather operations of a kind which seem not to be auto-vectorized into binary gather commands - this is my guess, I have not investigated the binary closely.
The code presented here adds some memory access functions which are not present in Vc::SimdArray, namely strided load/store operations and load/store using functors.
Note that I use clang++ most of the time, and the code has evolved to produce fast binary with clang++. Your mileage will vary with other compilers.
Class vspline::simd_type is actually quite similar to vigra::TinyVector which also stores in a plain C array and provides arithmetic. But that type is quite complex, using CRTP with a base class, explicitly coding loop unrolling, catering for deficient compilers and using vigra's sophisticated type promotion mechanism. vspline::simd_type on the other hand is stripped down to the bare essentials, to make the code as simple as possible, in the hope that 'goading' will indeed work. It replaces vspline's previous SIMD type, vspline::simd_tv, which was derived from vigra::TinyVector.
One word of warning: the lack of type promotion requires you to pick a value_type of sufficient precision and capacity for the intended operation. In other words: you won't get an int when multiplying two shorts.
Note also that this type is intended for horizontal vectorization, and you'll get the best results when picking a vector size which is a small-ish power of two - preferably at least the number of values of the given value_type which a register of the intended vector ISA will contain.
vspline uses TinyVectors of SIMD data types, but their operations are coded with loops over the TinyVector's elements throughout vspline's code base. In vspline's opt directory, you can find 'xel_of_vector.h', which can provide overloads for all operator functions involving TinyVectors of vspline::simd_type - or, more generally, small aggregates of vector data. Please see this header's comments for more detailed information.
Note also that throughout vspline, there is almost no explicit use of vspline::simd_type. vspline picks appropriate SIMD data types with mechanisms 'one level up', coded in vector.h. vector.h checks if use of Vc is possible and whether Vc can vectorize a given type, and produces a 'simdized type', which you mustn't confuse with a simd_type.
To use this header, an implementation of std::simd has to be installed, and the -std=c++17 option is needed as well. It has been tried with clang++ and g++; you'll need a recent version.
Definition in file simd_type.h.
#define BOOL_ONLY |
Definition at line 544 of file simd_type.h.
#define BROADCAST_STD_FUNC | ( | FUNC | ) |
Definition at line 470 of file simd_type.h.
#define BROADCAST_STD_FUNC2 | ( | FUNC | ) |
Definition at line 497 of file simd_type.h.
#define BROADCAST_STD_FUNC3 | ( | FUNC | ) |
Definition at line 515 of file simd_type.h.
#define BUILD_FROM_CONTAINER | ( | SIZE_TYPE, | |
VSZ | |||
) |
Definition at line 247 of file simd_type.h.
#define C_PROMOTE | ( | A, | |
B | |||
) |
Definition at line 591 of file simd_type.h.
#define CLAMP | ( | FNAME, | |
REL | |||
) |
Definition at line 873 of file simd_type.h.
#define COMPARE_FUNC | ( | OPFUNC, | |
OP | |||
) |
Definition at line 707 of file simd_type.h.
#define INTEGRAL_ONLY |
Definition at line 540 of file simd_type.h.
#define OP_FUNC | ( | OPFUNC, | |
OP, | |||
CONSTRAINT | |||
) |
Definition at line 663 of file simd_type.h.
#define OP_FUNC | ( | OPFUNC, | |
OP, | |||
CONSTRAINT | |||
) |
Definition at line 663 of file simd_type.h.
#define OPEQ_FUNC | ( | OPFUNC, | |
OPEQ, | |||
CONSTRAINT | |||
) |
Definition at line 786 of file simd_type.h.
#define OPEQ_FUNC | ( | OPFUNC, | |
OPEQ, | |||
CONSTRAINT | |||
) |
Definition at line 786 of file simd_type.h.