vspline 1.1.0
Generic C++11 Code for Uniform B-Splines
vector.h
Go to the documentation of this file.
1/************************************************************************/
2/* */
3/* vspline - a set of generic tools for creation and evaluation */
4/* of uniform b-splines */
5/* */
6/* Copyright 2015 - 2023 by Kay F. Jahnke */
7/* */
8/* The git repository for this software is at */
9/* */
10/* https://bitbucket.org/kfj/vspline */
11/* */
12/* Please direct questions, bug reports, and contributions to */
13/* */
14/* kfjahnke+vspline@gmail.com */
15/* */
16/* Permission is hereby granted, free of charge, to any person */
17/* obtaining a copy of this software and associated documentation */
18/* files (the "Software"), to deal in the Software without */
19/* restriction, including without limitation the rights to use, */
20/* copy, modify, merge, publish, distribute, sublicense, and/or */
21/* sell copies of the Software, and to permit persons to whom the */
22/* Software is furnished to do so, subject to the following */
23/* conditions: */
24/* */
25/* The above copyright notice and this permission notice shall be */
26/* included in all copies or substantial portions of the */
27/* Software. */
28/* */
29/* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */
30/* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */
31/* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */
32/* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */
33/* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */
34/* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */
35/* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */
36/* OTHER DEALINGS IN THE SOFTWARE. */
37/* */
38/************************************************************************/
39
40/*! \file vector.h
41
42 \brief code for horizontal vectorization in vspline
43
44 vspline currently has three ways of approaching vectorization:
45
46 - no vectorization. Scalar code is less complex since it does not
47 have to aggregate the data into vectorization-friendly parcels,
48 and for some data types, the performance is just as good as with
49 vectorization. Use of scalar code results from setting the
50 vectorization width to 1. This is usually a template argument
51 going by the name 'vsize'.
52
53 - Use of Vc for vectorization. This requires the presence of Vc
54 during compilation and results in explicit vectorization for all
55 elementary types Vc can handle. Vc provides code for several
56 operations which are outside the scope of autovectorization,
57 most prominently hardware gather and scatter operations, and the
58 explicit vectorization with Vc makes sure that vectorization is
59 indeed used whenever possible, rather than having to rely on the
60 compiler to recognize the opportunity. Use of Vc has to be
61 explicitly activated by defining USE_VC during compilation.
62 Using this option usually produces the fastest code. The downside
63 is the dependence on an external library which may or may not
64 actually implement the intended vector operations with vector
65 code for a given target: Newer processors may not yet be supported,
66 or support may be implemented for part of the instructions only.
67 Also, the Vc version coming from the distro's packet management
68 may not be up-to-date. Building processing pipelines based on
69 Vc::SimdArray is, on the other hand, straightforward - the type
70 is well-thought-out and there is good library support for many
71 operations. Use of Vc triggers use of fallback code for elementary
72 types which Vc can't vectorize - such types are pseudo-vectorized:
73
74 - The third option is to produce code which is designed to be
75 easily recognized by the compiler as amenable to autovectorization.
76 This option is implemented in simd_type.h, which defines an
77 arithmetic type 'vspline::simd_type' holding data in a C vector.
78 This is a technique I call 'goading': data are processed in small
79 aggregates of vector friendly size, resulting in inner loops
80 which oftentimes are recognized by the autovectorization stage,
81 resulting in hardware vector code if the compiler flags allow
82 for it and the compiler can generate code for the intended target.
83 Since this approach relies entirely on the compiler's capability
84 to autovectorize the (deliberately vectorization-friendly) code,
85 the mileage varies. If it works, this is a clean and simple
86 solution. A disadvantage is the use of class simd_type for
87 vectorization, which is mildly exotic and very much a vspline
88 creature - building processing pipelines using this type will
89 not be as effortless as using Vc::SimdArray. As long as you're
90 not building your own functors to be used with vspline's family
91 of transform-like functions, the precise mode of vectorization
92 remains an internal issue and you needn't concern yourself with
93 with it beyond choosing whether you want vspline to use Vc or not,
94 and choosing a suitable vectorization width if the default does
95 not suit you. Class vspline::simd_type can 'vectorize' every
96 fundamental and is used as fallback type when Vc use is allowed
97 but Vc can't provide a vectorized data type, like for 'long double'
98 data, so it will be used even with Vc active when the need arises.
99
100 It's important to understand that using SIMD is not simply mapping,
101 say, pixels of three floats to a vector of three floats - that would
102 be 'vertical' vectorization, which is represented by vspline's *scalar*
103 code. Instead, vspline is coded to use *horizontal* vectorization,
104 which produces vector data fitting the size of the vector unit's
105 registers, where each element held by the vector has exactly the same
106 meaning as every other: rather than vectors holding, like, the colour
107 channels of a pixel, we have a 'red', a 'green' and a 'blue' vector
108 holding, say, eight floats each. Horizontal vectorization is best
109 explicitly coded, and if it is coded explicitly, the code structure
110 itself suggests vectorization to the compiler. Using code like Vc
111 gives more structure to this process and adds capabilities beyond the
112 scope of autovectorization, but having the horizontal vectorization
113 manifest in the code's structure already goes a long way, and if the
114 'structurally' vectorized code autovectorizes well, that may well be
115 'good enough' as it is. In my experience, it is often significantly
116 faster than scalar code - provided the processor has vector units.
117
118 So it turns out that successful vectorization is, to a large degree,
119 a *conceptual change* making the intended vectorization explicit by
120 choosing appropriate data types. I am indebted to Matthis Kretz, the
121 author of the Vc library, who has opened my eyes to this fact with his
122 thesis: 'Extending C++ for explicit data-parallel programming via SIMD
123 vector types'.
124
125 With vspline::simd_type 'in the back hand' vspline code can rely on
126 the presence of a vectorized type for every fundamental, and, by
127 extension, vectorized 'xel' data - i.e. vectorized pixels, voxels
128 etc. which are implemented as vigra::TinyVectors of vectorized
129 fundamentals. This allows vspline to be coded so that it relies
130 on vectorization, but not necessarily on Vc: Vc is an option to
131 provide extra-fast, tailormade vector code for some operations,
132 but when it can't be used, vspline's own vector code will be used
133 instead, providing *the same interface*. This makes maintainance
134 much easier compared to a scenario where, without Vc, the code
135 would have to fall back to a scalar version - as indeed it did in
136 early vspline versions, giving me plenty of headaches.
137
138 Note that this header is included by vspline/common.h, so this code
139 is available throughout vspline.
140*/
141
142#ifndef VSPLINE_VECTOR_H
143#define VSPLINE_VECTOR_H
144
145#include "common.h"
146
147// we have several SIMD back-ends to provide SIMD code, which are all
148// activated by preprocessor #defines. 'Proper' SIMD code may be
149// generated with Vc (USE_VC), highway (USE_HWY) and std::simd
150// (USE_STDSIMD). Only one of these should be defined - it may be
151// possible to mix them, but this is uncharted territory. If none
152// of the 'proper' SIMD backends are specified, vspline will fall
153// back to it's own 'goading' implementation, which codes SIMD
154// operations as small loops, hoping that they will be autovectorized.
155
156// when std::simd is used as backend, this directly implements
157// vspline::simd_type, which is possible because std::simd supports
158// all elementary types, so vspline::simd_type is not needed as
159// fallback for some elementary types.
160
161#if defined USE_STDSIMD
162
163#include "std_simd_type.h"
164
165// #define VSPLINE_SIMD_TYPE_H
166
167#else // defined USE_STDIMD
168
169// simd_type.h is needed as a fallback for those fundamentals
170// which can't be vectorized, and it's also used if no SIMD library
171// is available, in which case the operations are vectorized with
172// 'goading', relying on autovectorization of small loops.
173
174#include "simd_type.h"
175
176#if defined USE_VC
177
178#include "vc_simd_type.h"
179
180#elif defined USE_HWY
181
182#include "hwy_simd_type.h"
183
184#endif
185
186#endif // else to defined USE_STDIMD
187
188namespace vspline
189{
190#ifndef VSPLINE_VECTOR_NBYTES
191
192#if defined USE_VC
193
194#define VSPLINE_VECTOR_NBYTES (2*sizeof(Vc::Vector<float>))
195
196#else
197
198#define VSPLINE_VECTOR_NBYTES 64
199
200#endif
201
202#endif
203
204/// traits class simd_traits provides three traits:
205///
206/// - 'hsize' holds the hardware vector width if applicable (used only with Vc)
207/// - 'type': template yielding the vector type for a given vectorization width
208/// - 'default_size': the default vectorization width to use for T
209
210/// default simd_traits: without further specialization, T will be vectorized
211/// as a vspline::simd_type. This way, *all* types will be vectorized, there is
212/// no fallback to scalar code for certain types. Scalar code will only be
213/// produced if the vectorization width is set to 1 in code taking this
214/// datum as a template argument. Note that the type which simd_traits produces
215/// for sz == 1 is T itself, not a simd_type of one element.
216
217template < typename T >
219{
220 template < size_t sz > using type =
221 typename std::conditional < sz == 1 ,
222 T ,
224 > :: type ;
225
226 static const size_t hsize = 0 ;
227
228 // the default vector size picked here comes out as 16 for floats,
229 // which is twice the vector size used by AVX2.
230
231 enum { default_size = sizeof ( T ) > VSPLINE_VECTOR_NBYTES
232 ? 1
233 : VSPLINE_VECTOR_NBYTES / sizeof ( T ) } ;
234} ;
235
236// next, for some SIMD backends, we specialize simd_traits for a given
237// set of fundamentals. fundamental T which we don't mark this way
238// will be routed to use vspline::simd_type, the 'goading' implementation.
239
240#if defined USE_VC
241
242// in Vc ML discussion M. Kretz states that the set of types Vc can vectorize
243// (with 1.3) is consistent throughout all ABIs, so we can just list the
244// acceptable types without having to take the ABI into account.
245// So, for these types we specialize 'simd_traits', resulting in the use of
246// the appropriate Vc::SimdArray.
247
248#define VC_SIMD(T) \
249template<> struct simd_traits<T> \
250{ \
251 static const size_t hsize = Vc::Vector < T > :: size() ; \
252 template < size_t sz > using type = \
253 typename std::conditional \
254 < sz == 1 , \
255 T , \
256 vc_simd_type < T , sz > \
257 > :: type ; \
258 enum { default_size = sizeof ( T ) > VSPLINE_VECTOR_NBYTES \
259 ? 1 \
260 : VSPLINE_VECTOR_NBYTES / sizeof ( T ) } ; \
261} ;
262
263VC_SIMD(float)
264VC_SIMD(double)
265VC_SIMD(int)
266VC_SIMD(unsigned int)
267VC_SIMD(short)
268VC_SIMD(unsigned short)
269
270#undef VC_SIMD
271
272#endif // USE_VC
273
274#if defined USE_HWY
275
276// with highway, we route pretty much all fundamentals to hwy_simd_type;
277// highway seems to have a wider 'feeding spectrum' of fundamentals.
278// additionally, route only vector widths which are a power of two
279// to produce hwy_simd_type. hwy_simd_type does not handle other
280// vector sizes.
281
282#define HWY_SIMD(T) \
283template<> struct simd_traits<T> \
284{ \
285 static const size_t hsize = 0 ; \
286 template < size_t sz > using type = \
287 typename std::conditional \
288 < sz == 1 , \
289 T , \
290 typename std::conditional \
291 < ( sz & ( sz - 1 ) ) == 0 , \
292 hwy_simd_type < T , sz > , \
293 vspline::simd_type < T , sz > \
294 > :: type \
295 > :: type ; \
296 enum { default_size = sizeof ( T ) > VSPLINE_VECTOR_NBYTES \
297 ? 1 \
298 : VSPLINE_VECTOR_NBYTES / sizeof ( T ) } ; \
299} ;
300
301HWY_SIMD(float)
302HWY_SIMD(double)
303HWY_SIMD(long)
304HWY_SIMD(unsigned long)
305HWY_SIMD(int)
306HWY_SIMD(unsigned int)
307HWY_SIMD(short)
308HWY_SIMD(unsigned short)
309HWY_SIMD(signed char)
310HWY_SIMD(unsigned char)
311
312#undef HWY_SIMD
313
314#endif // USE_HWY
315
316/// with the definition of 'simd_traits', we can proceed to implement
317/// 'vector_traits':
318/// struct vector_traits is a traits class fixing the types used for
319/// vectorized code in vspline. These types go beyond mere vectors of
320/// fundamentals: most of the time, the data vspline has to process
321/// are *not* fundamentals, but what I call 'xel' data: pixels, voxels,
322/// stereo sound samples, etc. - so, small aggregates of a fundamental
323/// type. vector_traits defines how fundamentals and 'xel' data are to
324/// be vectorized.
325/// with the types defined by vector_traits, a system of type names is
326/// introduced which uses a set of patterns:
327/// - 'ele' stands for 'elementary', the type of an aggregate's component
328/// - 'nd' stands for 'n-dimensional', a type of an aggregate of one or
329/// more components of a given elementary type.
330/// - 'v' suffix indicates a 'simdized' type, vspline uses Vc::SimdArrays
331/// and vigra::TinyVectors of Vc::SimdArrays if Vc is used and the type
332/// can be used with Vc::SimdArray, and the equivalent types using
333/// vspline::simd_type instead of Vc::SimdArray otherwise.
334/// the unspecialized definition of class vector_traits will vectorize
335/// by concatenating instances of T into the type simd_traits produces,
336/// taking, per default, as many T as the default_size given there.
337/// This will work with any type T, even though it makes most sense with
338/// fundamentals.
339
340template < typename T ,
341 size_t _vsize = 0 ,
342 typename Enable = void >
344{
345 // T is not 'element-expandable', so 'dimension' is 1 and T is ele_type
346 enum { dimension = 1 } ;
347 typedef T ele_type ;
348
349 // find the right vectorization width
350 enum { size = _vsize == 0
351 ? simd_traits < ele_type > :: default_size
352 : _vsize } ;
353
354 enum { vsize = size } ;
356
357 // produce the 'synthetic' type,
358 typedef vigra::TinyVector < ele_type , 1 > nd_ele_type ;
359
360 // the vectorized type
361 template < typename U , size_t sz >
362 using vector = typename simd_traits < U > :: template type < sz > ;
363
365
366 // and the 'synthetic' vectorized type
367 typedef vigra::TinyVector < ele_v , 1 > nd_ele_v ;
368
369 // for not 'element-expandable' T, we produce ele_v as 'type'
370 typedef ele_v type ;
371} ;
372
373/// specialization of vector_traits for 'element-expandable' types.
374/// These types are recognized by vigra's ExpandElementResult mechanism,
375/// resulting in the formation of a 'vectorized' version of the type.
376/// These data are what I call 'xel' data. As explained above, vectorization
377/// is *horizontal*, so if T is, say, a pixel of three floats, the type
378/// generated here will be a TinyVector of three vectors of vsize floats.
379
380template < typename T , size_t _vsize >
382 < T ,
383 _vsize ,
384 typename std::enable_if
385 < vspline::is_element_expandable < T > :: value
386 > ::type
387 >
388{
389 // T is 'element-expandable' - meaning it can be element-expanded
390 // with vigra's ExpandElementResult mechanism. We use that to obtain
391 // the elementary type and the dimension of T. Note that, if T is
392 // fundamental, the resulting traits are the same as they would be
393 // for the unspecialized case. What we're interested in here are
394 // multi-channel types; that fundamentals are routed through here
395 // is just as good as if they were routed through the unspecialized
396 // case above.
397
398 enum { dimension = vigra::ExpandElementResult < T > :: size } ;
399 typedef typename vigra::ExpandElementResult < T > :: type ele_type ;
400
401 // given the elementary type, we define nd_ele_type as a vigra::TinyVector
402 // of ele_type. This is the 'synthetic' type.
403
404 typedef vigra::TinyVector < ele_type , dimension > nd_ele_type ;
405
406 // next we glean the number of elements a 'vector' should contain.
407 // if the template argument 'vsize' was passed as 0, which is the default,
408 // We use the default vector size which simd_traits provides. For
409 // explicitly specified _vsize we take the explicitly specified value.
410
411 enum { size = _vsize == 0
412 ? simd_traits < ele_type > :: default_size
413 : _vsize } ;
414
415 // I prefer to use 'vsize' as it is more specific than mere 'size'
416
417 enum { vsize = size } ;
418
419 // hardware vector register size, if applicable - only used with Vc
420
422
423 // now we obtain the template for a vector of a given size. This will
424 // be either Vc::SimdArray or vspline::simd_type
425
426 template < typename U , size_t sz >
427 using vector = typename simd_traits < U > :: template type < sz > ;
428
429 // using this template and the vectorization width we have established,
430 // we obtain the vectorized data type for a component:
431
433
434 // nd_ele_v is the 'synthetic' vectorized type, which is always a
435 // TinyVector of the vectorized component type, possibly with only
436 // one element:
437
438 typedef vigra::TinyVector < ele_v , dimension > nd_ele_v ;
439
440 // finally, 'type' is the 'canonical' vectorized type, meaning that if
441 // T is a fundamental we produce the component vector type itself, but if
442 // it is some aggregate (like a TinyVector) we produce a TinyVector of the
443 // component vector data type. So if T is float, 'type' is a vector of float,
444 // If T is a TinyVector of one float, 'type' is a TinyVector of one vector
445 // of float.
446
447 typedef typename std::conditional
448 < std::is_fundamental < T > :: value ,
449 ele_v ,
452
453} ;
454
455/// this alias is used as a shorthand to pick the vectorized type
456/// for a given type T and a size N from 'vector_traits':
457
458template < typename T , size_t N >
459using simdized_type = typename vector_traits < T , N > :: type ;
460
461// In order to avoid syntax which is specific to a specific vectorization
462// method, I use some the free function 'assign' for assignments, which
463// avoids member functions of the vector objects. While this produces some
464// notational inconvenience, it allows a formulation which is independent
465// of the vector type used. this way I can use Vc::SimdArray as a target
466// of an assignment from another vectorized data type, which would be
467// impossible with operator=, which has to be a member function.
468
469// the fallback is to use an assignment via operator=. Most of the time,
470// this is defined and works as expected.
471
472template < typename T , typename U >
473void assign ( T & t , const U & u )
474{
475 t = T ( u ) ;
476}
477
478// assignment between two vigra::TinyVectors of equal size. This 'rolls
479// out' the per-element assignment
480
481template < typename T , typename U , int N >
482void assign ( vigra::TinyVector < T , N > & t ,
483 const vigra::TinyVector < U , N > & u )
484{
485 for ( int i = 0 ; i < N ; i++ )
486 vspline::assign ( t [ i ] , u [ i ] ) ;
487}
488
489// conditional assignment as a free function. This is helpful to code
490// uniformly, avoiding the idiomatic difference between
491// if ( p ) t = s ; and t ( p ) = s ;
492
493template < typename VT1 , typename PT , typename VT2 >
494void assign_if ( VT1 & target ,
495 const PT & predicate ,
496 const VT2 & source )
497{
498 target ( predicate ) = source ;
499}
500
501template < typename T >
502void assign_if ( T & target ,
503 const bool & predicate ,
504 const T & source )
505{
506 if ( predicate )
507 target = source ;
508}
509
510} ; // end of namespace vspline
511
512#endif // #ifndef VSPLINE_VECTOR_H
class template simd_type provides a fixed-size container type for small sets of fundamentals which ar...
Definition: simd_type.h:147
definitions common to all files in this project, utility code
SIMD type using highway.
#define VSPLINE_VECTOR_NBYTES
Definition: basis.h:79
void assign(T &t, const U &u)
Definition: vector.h:473
void assign_if(VT1 &target, const PT &predicate, const VT2 &source)
Definition: vector.h:494
typename vector_traits< T, N > ::type simdized_type
this alias is used as a shorthand to pick the vectorized type for a given type T and a size N from 'v...
Definition: vector.h:459
SIMD type using small loops.
traits class simd_traits provides three traits:
Definition: vector.h:219
typename std::conditional< sz==1, T, vspline::simd_type< T, sz > > ::type type
Definition: vector.h:224
static const size_t hsize
Definition: vector.h:226
std::conditional< std::is_fundamental< T >::value, ele_v, nd_ele_v >::type type
Definition: vector.h:451
with the definition of 'simd_traits', we can proceed to implement 'vector_traits': struct vector_trai...
Definition: vector.h:344
vigra::TinyVector< ele_type, 1 > nd_ele_type
Definition: vector.h:358
vigra::TinyVector< ele_v, 1 > nd_ele_v
Definition: vector.h:367
typename simd_traits< U > ::template type< sz > vector
Definition: vector.h:362
vector< ele_type, vsize > ele_v
Definition: vector.h:364
SIMD type derived from Vc::SimdArray.
#define VC_SIMD(T)
Definition: vector.h:248