vspline 1.1.0
Generic C++11 Code for Uniform B-Splines
simd_type.h
Go to the documentation of this file.
1/************************************************************************/
2/* */
3/* vspline - a set of generic tools for creation and evaluation */
4/* of uniform b-splines */
5/* */
6/* Copyright 2015 - 2023 by Kay F. Jahnke */
7/* */
8/* The git repository for this software is at */
9/* */
10/* https://bitbucket.org/kfj/vspline */
11/* */
12/* Please direct questions, bug reports, and contributions to */
13/* */
14/* kfjahnke+vspline@gmail.com */
15/* */
16/* Permission is hereby granted, free of charge, to any person */
17/* obtaining a copy of this software and associated documentation */
18/* files (the "Software"), to deal in the Software without */
19/* restriction, including without limitation the rights to use, */
20/* copy, modify, merge, publish, distribute, sublicense, and/or */
21/* sell copies of the Software, and to permit persons to whom the */
22/* Software is furnished to do so, subject to the following */
23/* conditions: */
24/* */
25/* The above copyright notice and this permission notice shall be */
26/* included in all copies or substantial portions of the */
27/* Software. */
28/* */
29/* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */
30/* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */
31/* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */
32/* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */
33/* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */
34/* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */
35/* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */
36/* OTHER DEALINGS IN THE SOFTWARE. */
37/* */
38/************************************************************************/
39
40/*! \file simd_type.h
41
42 \brief SIMD type using small loops
43
44 vspline can use Vc for explicit vectorization, and at the time of
45 this writing, this is usually the best option. But Vc is not available
46 everywhere, or it's use may be unwanted. To help with such situations,
47 vspline defines it's own 'SIMD' type, which is implemented as a simple
48 C vector and small loops operating on it. If these constructs are
49 compiled with compilers capable of autovectorization (and with the
50 relevent flags activating use of SIMD instruction sets like AVX)
51 the resulting code will oftentimes be 'proper' SIMD code, because
52 the small loops are presented so that the compiler can easily recognize
53 them as potential clients of loop vectorization. I call this technique
54 'goading': By presenting the data flow in deliberately vector-friendly
55 format, the compiler is more likely to 'get it'.
56
57 class template simd_type is designed to provide an interface similar
58 to Vc::SimdArray, to be able to use it as a drop-in replacement.
59 It aims to provide those SIMD capabilities which are actually used by
60 vspline and is *not* a complete replacement for Vc::SimdArray.
61
62 Wherever possible, the code is as simple as possible, avoiding frills
63 and trickery which might keep the compiler from recognizing potentially
64 auto-vectorizable constructs. The resulting code is - in my limited
65 experience - often not too far from explicit SIMD code. Some constructs
66 do actually produce binary which is en par with code using Vc, namely
67 such code which does not use gather, scatter or masked operations.
68 So b-spline prefiltering, restoration of original data, and general
69 filtering is very fast, while code involving b-spline evaluation
70 shows a speed penalty, since vectorized b-spline evaluation (as coded
71 in vspline) relies massively on gather operations of a kind which seem
72 not to be auto-vectorized into binary gather commands - this is my guess,
73 I have not investigated the binary closely.
74
75 The code presented here adds some memory access functions which are
76 not present in Vc::SimdArray, namely strided load/store operations
77 and load/store using functors.
78
79 Note that I use clang++ most of the time, and the code has evolved to
80 produce fast binary with clang++. Your mileage will vary with other
81 compilers.
82
83 Class vspline::simd_type is actually quite similar to vigra::TinyVector
84 which also stores in a plain C array and provides arithmetic. But that
85 type is quite complex, using CRTP with a base class, explicitly coding
86 loop unrolling, catering for deficient compilers and using vigra's
87 sophisticated type promotion mechanism. vspline::simd_type on the other
88 hand is stripped down to the bare essentials, to make the code as simple
89 as possible, in the hope that 'goading' will indeed work. It replaces
90 vspline's previous SIMD type, vspline::simd_tv, which was derived
91 from vigra::TinyVector.
92
93 One word of warning: the lack of type promotion requires you to pick
94 a value_type of sufficient precision and capacity for the intended
95 operation. In other words: you won't get an int when multiplying two
96 shorts.
97
98 Note also that this type is intended for *horizontal* vectorization,
99 and you'll get the best results when picking a vector size which is
100 a small-ish power of two - preferably at least the number of values
101 of the given value_type which a register of the intended vector ISA
102 will contain.
103
104 vspline uses TinyVectors of SIMD data types, but their operations are
105 coded with loops over the TinyVector's elements throughout vspline's
106 code base. In vspline's opt directory, you can find 'xel_of_vector.h',
107 which can provide overloads for all operator functions involving
108 TinyVectors of vspline::simd_type - or, more generally, small
109 aggregates of vector data. Please see this header's comments for
110 more detailed information.
111
112 Note also that throughout vspline, there is almost no explicit use of
113 vspline::simd_type. vspline picks appropriate SIMD data types with
114 mechanisms 'one level up', coded in vector.h. vector.h checks if use
115 of Vc is possible and whether Vc can vectorize a given type, and
116 produces a 'simdized type', which you mustn't confuse with a simd_type.
117*/
118
119#ifndef VSPLINE_SIMD_TYPE_H
120#define VSPLINE_SIMD_TYPE_H
121
122#include <iostream>
123
124namespace vspline
125{
126
127template < typename T >
128using is_scalar = typename std::integral_constant < bool ,
129 std::is_fundamental < T > ::value
130 || std::is_same < T , bool > :: value > :: type ;
131
132/// class template simd_type provides a fixed-size container type for
133/// small sets of fundamentals which are stored in a C vector.
134/// The type offers arithmetic capabilities which are implemented by
135/// using loops over the elements in the vector, expecting that the
136/// compiler will autovectorize these loops into 'proper' SIMD code.
137/// The interface of this type is modelled to be compatible with
138/// Vc's SimdArray. Unfortunately, Vc::SimdArray requires additional
139/// template arguments, so at times it's difficult to use the two types
140/// instead of each other. The interface compatibility does *not* mean
141/// that the arithmetic will produce the same results - this is intended
142/// but neither tested nor enforced.
143
144template < typename _value_type ,
145 std::size_t _vsize >
147{
148 // storage of data is in a simple C array. This array is private,
149 // and the only access to it is via member functions. Using a plain
150 // C array will not overalign the data, even though this may be
151 // desirable, especially on older hardware. We rely on the
152 // compiler to handle this as efficiently as possible.
153
154 _value_type _store [ _vsize ] ;
155
156public:
157
158 // some common typedefs for a container type
159
160 typedef std::size_t size_type ;
161 typedef _value_type value_type ;
162 static const size_type vsize = _vsize ;
163 static const int ivsize = _vsize ; // finessing for g++
164
165 // provide the size as a constexpr
166
167 static constexpr size_type size()
168 {
169 return vsize ;
170 }
171
172 // types used for masks and index vectors. In terms of 'true' SIMD
173 // arithmetics, these definitions may not be optimal - especially the
174 // definition of a mask as a simd_type of bool is questionable - one
175 // might consider using a bit field or a sufficiently large integral
176 // type. But using a simd_type of bool makes processing simple, in
177 // a way it's the 'generic' mask type, whereas SIMD masks used by
178 // the hardware are the truly 'exotic' types. The problem here is
179 // the way C++ encodes booleans - they are usually encoded as some
180 // smallish integral type, rather than a single bit.
181
182 // we define both the 'old school' and the 'camel case' variants
183
186
189
190 // operator[] is mapped to ordinary element access on a C vector.
191 // This is the only place where we explicitly access _store, the
192 // remainder of the code uses operator[].
193
195 {
196 return _store[i] ;
197 }
198
199 const value_type & operator[] ( const size_type & i ) const
200 {
201 return _store[i] ;
202 }
203
204 // assignment from a value_type. The assignment is coded as a loop,
205 // but it should be obvious to the compiler's loop vectorizer that
206 // the loop is a 'SIMD operation in disguise', so here we have the
207 // first appearance of 'goading'.
208
210 {
211 for ( size_type i = 0 ; i < vsize ; i++ )
212 (*this) [ i ] = rhs ;
213 return *this ;
214 }
215
216 // c'tor from value_type. We use the assignment operator for
217 // initialization.
218
219 simd_type ( const value_type & ini )
220 {
221 *this = ini ;
222 }
223
224 // these two c'tors are left in default mode
225
226 simd_type() = default ;
227 simd_type ( const simd_type & ) = default ;
228
229 // assignment from equally-sized container. Most containers use std::size_t
230 // for the template argument defining the number of elements they hold,
231 // but some (notably vigra::TinyVector) use int, which is probably a relic
232 // from times when non-type template arguments were of a restricted type
233 // set only. By providing a specialization for SIZE_TYPE int, we make
234 // equally-sized vigra::TinyVectors permitted initializers.
235 // the c'tor from an equally-sized container also uses the corresponding
236 // operator= overload, so we use one macro for both.
237 // we also need two different variants of vsize for g++; clang++ accepts
238 // size_type vsize for both places where VSZ is used, but g++ requires
239 // an integer.
240 // Note that the rhs can use any elementary type which can be legally
241 // assigned to value_type. This allows transport of information from
242 // differently typed objects, but there are no further constraints on
243 // the types involved, which may degrade precision. It's the user's
244 // responsibility to make sure such assignments have the desired effect
245 // and overload them if necessary.
246
247 #define BUILD_FROM_CONTAINER(SIZE_TYPE,VSZ) \
248 template < typename U , template < typename , SIZE_TYPE > class V > \
249 simd_type & operator= ( const V < U , VSZ > & rhs ) \
250 { \
251 for ( size_type i = 0 ; i < vsize ; i++ ) \
252 (*this) [ i ] = rhs [ i ] ; \
253 return *this ; \
254 } \
255 template < typename U , template < typename , SIZE_TYPE > class V > \
256 simd_type ( const V < U , VSZ > & ini ) \
257 { \
258 *this = ini ; \
259 }
260
261 BUILD_FROM_CONTAINER(std::size_t,vsize)
263
264 #undef BUILD_FROM_CONTAINER
265
266 // this operator= overload caters for Vc::SimdArray as rhs type.
267 // Vc::SimdArray requires additional template arguments - 'an
268 // unfortunate implementation detail shining through'.
269 // We don't want to refer explicitly to Vc::SimdArray here, so
270 // we just code a template which will 'catch' it. In result, we
271 // can now easily assign a Vc::SimdArray to a vspline::simd_type.
272 // The reverse operation can't be coded like this, though, since
273 // assignment operators and c'tors have to be coded as members of
274 // the 'receiving end'. See 'offload' for an efficient way of
275 // moving simd_type data to Vc::SimdArrays
276
277 template < typename U ,
278 typename V ,
279 std::size_t Wt ,
280 template < typename , std::size_t ,
281 typename , std::size_t > class W >
282 simd_type & operator= ( const W < U , vsize , V , Wt > & rhs )
283 {
284 for ( size_type i = 0 ; i < vsize ; i++ )
285 (*this) [ i ] = rhs [ i ] ;
286 return *this ;
287 }
288
289 // corresponding c'tor:
290
291 template < typename U ,
292 typename V ,
293 std::size_t Wt ,
294 template < typename , std::size_t ,
295 typename , std::size_t > class W >
296 simd_type ( const W < U , vsize , V , Wt > & ini )
297 {
298 *this = ini ;
299 }
300
301 static const simd_type iota()
302 {
303 simd_type result ;
304 for ( size_type i = 0 ; i < vsize ; i++ )
305 result [ i ] = value_type ( i ) ;
306 return result ;
307 }
308
309 // mimick Vc's IndexesFromZero. This function produces an index
310 // vector filled with indexes starting with zero.
311
313 {
314 typedef typename index_type::value_type IT ;
315 static const IT ceiling = std::numeric_limits < IT > :: max() ;
316 assert ( ( vsize - 1 ) <= std::size_t ( ceiling ) ) ;
317
318 static const index_type ix ( index_type::iota() ) ;
319 return ix ;
320 }
321
322 // variant which starts from a different starting point and optionally
323 // uses steps other than one.
324
325 static const index_type IndexesFrom ( std::size_t start ,
326 std::size_t step = 1 )
327 {
328 typedef typename index_type::value_type IT ;
329 static const IT ceiling = std::numeric_limits < IT > :: max() ;
330 assert ( start + ( vsize - 1 ) * step <= std::size_t ( ceiling ) ) ;
331
332 return ( IndexesFromZero() * int(step) ) + int(start) ;
333 }
334
335 // functions Zero and One produce simd_type objects filled with
336 // 0, or 1, respectively
337
338 static const simd_type Zero()
339 {
340 return simd_type ( value_type ( 0 ) ) ;
341 }
342
343 static const simd_type One()
344 {
345 return simd_type ( value_type ( 1 ) ) ;
346 }
347
348 // echo the vector to a std::ostream, read it from an istream
349
350 friend std::ostream & operator<< ( std::ostream & osr ,
351 simd_type it )
352 {
353 osr << "(" ;
354 for ( size_type i = 0 ; i < vsize - 1 ; i++ )
355 osr << it [ i ] << ", " ;
356 osr << it [ vsize - 1 ] << ")" ;
357 return osr ;
358 }
359
360 friend std::istream & operator>> ( std::istream & isr ,
361 simd_type it )
362 {
363 for ( size_type i = 0 ; i < vsize ; i++ )
364 isr >> it [ i ] ;
365 return isr ;
366 }
367
368 // memory access functions, which load and store vector data.
369 // We start out with functions transporting data from memory into
370 // the simd_type. Some of these operations have corresponding
371 // c'tors which use the member function to initialize (*this).
372
373 // load uses a simple loop, which is about as easy to recognize as
374 // an autovectorizable construct as it gets:
375
376 void load ( const value_type * const p_src )
377 {
378 for ( size_type i = 0 ; i < vsize ; i++ )
379 (*this) [ i ] = p_src [ i ] ;
380 }
381
382 // generic gather performs the gather operation using a loop.
383 // Rather than loading consecutive values, The offset from 'p_src'
384 // is looked up in 'indexes' for each vector element. We allow
385 // any old indexable type as index_type, not just 'index_type'
386 // defined above.
387
388 template < typename index_type >
389 void gather ( const value_type * const p_src ,
390 const index_type & indexes )
391 {
392 for ( size_type i = 0 ; i < vsize ; i++ )
393 (*this) [ i ] = p_src [ indexes [ i ] ] ;
394 }
395
396 // c'tor from pointer and indexes, uses gather
397
398 template < typename index_type >
399 simd_type ( const value_type * const p_src ,
400 const index_type & indexes )
401 {
402 gather ( p_src , indexes ) ;
403 }
404
405 // store saves the content of the container to memory
406
407 void store ( value_type * const p_trg ) const
408 {
409 for ( size_type i = 0 ; i < vsize ; i++ )
410 p_trg [ i ] = (*this) [ i ] ;
411 }
412
413 // scatter is the reverse operation to gather, see the comments there.
414
415 template < typename index_type >
416 void scatter ( value_type * const p_trg ,
417 const index_type & indexes ) const
418 {
419 for ( size_type i = 0 ; i < vsize ; i++ )
420 p_trg [ indexes [ i ] ] = (*this) [ i ] ;
421 }
422
423 // 'regular' gather and scatter, accessing strided memory so that the
424 // first address visited is p_src/p_trg, and successive addresses are
425 // 'step' apart - in units of T. Might also be done with goading, the
426 // loop should autovectorize.
427
428 void rgather ( const value_type * const p_src ,
429 const int & step )
430 {
431 index_type indexes ( IndexesFrom ( 0 , step ) ) ;
432 gather ( p_src , indexes ) ;
433 }
434
435 void rscatter ( value_type * p_trg ,
436 const int & step ) const
437 {
438 index_type indexes ( IndexesFrom ( 0 , step ) ) ;
439 scatter ( p_trg , indexes ) ;
440 }
441
442 // use 'indexes' to perform a gather from the data held in '(*this)'
443 // and return the result of the gather operation.
444
445 template < typename index_type >
447 {
448 simd_type result ;
449 for ( size_type i = 0 ; i < vsize ; i++ )
450 result [ i ] = (*this) [ indexes [ i ] ] ;
451 return result ;
452 }
453
454 // operator[] with an index_type argument performs the same
455 // operation
456
458 {
459 return shuffle ( indexes ) ;
460 }
461
462 // apply functions from namespace std to each element in a vector,
463 // or to each corresponding set of elements in a set of vectors
464 // - going up to three for fma.
465 // many standard functions autovectorize well. Note that the
466 // autovectorization of standard functions often needs additional
467 // compiler flags, like, e.g., -fno-math-errno for clang++, to
468 // produce hardware SIMD instructions.
469
470 #define BROADCAST_STD_FUNC(FUNC) \
471 friend simd_type FUNC ( simd_type arg ) \
472 { \
473 simd_type result ; \
474 for ( size_type i = 0 ; i < vsize ; i++ ) \
475 result [ i ] = std::FUNC ( arg [ i ] ) ; \
476 return result ; \
477 }
478
480 BROADCAST_STD_FUNC(trunc)
481 BROADCAST_STD_FUNC(round)
482 BROADCAST_STD_FUNC(floor)
487
494
495 #undef BROADCAST_STD_FUNC
496
497 #define BROADCAST_STD_FUNC2(FUNC) \
498 friend simd_type FUNC ( simd_type arg1 , \
499 simd_type arg2 ) \
500 { \
501 simd_type result ; \
502 for ( size_type i = 0 ; i < vsize ; i++ ) \
503 result [ i ] = std::FUNC ( arg1 [ i ] , arg2 [ i ] ) ; \
504 return result ; \
505 }
506
507 // a short note on atan2: Vc provides a hand-written vectorized version
508 // of atan2 which is especially fast and superior to autovectorized code.
509
512
513 #undef BROADCAST_STD_FUNC2
514
515 #define BROADCAST_STD_FUNC3(FUNC) \
516 friend simd_type FUNC ( simd_type arg1 , \
517 simd_type arg2 , \
518 simd_type arg3 ) \
519 { \
520 simd_type result ; \
521 for ( size_type i = 0 ; i < vsize ; i++ ) \
522 result [ i ] = FUNC ( arg1 [ i ] , arg2 [ i ] , arg3[i] ) ; \
523 return result ; \
524 }
525
527
528 #undef BROADCAST_STD_FUNC3
529
530 // macros used for the parameter 'CONSTRAINT' in the definitions
531 // further down. Some operations are only allowed for integral types
532 // or boolans. This might be enforced by enable_if, here we use a
533 // static_assert with a clear error message.
534 // TODO: might relax constraints by using 'std::is_convertible'
535 // TODO: some operators make sense as simply manipulating the 'raw'
536 // bits in the data (e.g. ~ or ^) - I currently limit them to int,
537 // but this might be relaxed, casting the data to some unsigned
538 // integral format of equal size.
539
540 #define INTEGRAL_ONLY \
541 static_assert ( std::is_integral < value_type > :: value , \
542 "this operation is only allowed for integral types" ) ;
543
544 #define BOOL_ONLY \
545 static_assert ( std::is_same < value_type , bool > :: value , \
546 "this operation is only allowed for booleans" ) ;
547
548 // augmented assignment operators. Some operators are only applicable
549 // to specific data types, which is enforced by 'CONSTRAINT'.
550 // One might consider widening the scope by making these operator
551 // functions templates and accepting arbitrary indexable types.
552 // Only value_type and simd_type itself are taken as rhs arguments.
553
554 #define OPEQ_FUNC(OPFUNC,OPEQ,CONSTRAINT) \
555 simd_type & OPFUNC ( value_type rhs ) \
556 { \
557 CONSTRAINT \
558 for ( size_type i = 0 ; i < vsize ; i++ ) \
559 (*this) [ i ] OPEQ rhs ; \
560 return *this ; \
561 } \
562 simd_type & OPFUNC ( simd_type rhs ) \
563 { \
564 CONSTRAINT \
565 for ( size_type i = 0 ; i < vsize ; i++ ) \
566 (*this) [ i ] OPEQ rhs [ i ] ; \
567 return *this ; \
568 }
569
570 OPEQ_FUNC(operator+=,+=,)
571 OPEQ_FUNC(operator-=,-=,)
572 OPEQ_FUNC(operator*=,*=,)
573 OPEQ_FUNC(operator/=,/=,)
574
575 OPEQ_FUNC(operator%=,%=,INTEGRAL_ONLY)
576 OPEQ_FUNC(operator&=,&=,INTEGRAL_ONLY)
577 OPEQ_FUNC(operator|=,|=,INTEGRAL_ONLY)
578 OPEQ_FUNC(operator^=,^=,INTEGRAL_ONLY)
579 OPEQ_FUNC(operator<<=,<<=,INTEGRAL_ONLY)
580 OPEQ_FUNC(operator>>=,>>=,INTEGRAL_ONLY)
581
582 #undef OPEQ_FUNC
583
584 // we use a simple scheme for type promotion: the promoted type
585 // of two values should be the same as the type we would receive
586 // when adding the two values. That's standard C semantics, but
587 // it won't widen the result type to avoid overflow or increase
588 // precision - such conversions have to be made by user code if
589 // necessary.
590
591 #define C_PROMOTE(A,B) \
592 typename std::conditional \
593 < std::is_same < A , B > :: value , \
594 A , \
595 decltype ( std::declval < A > () \
596 + std::declval < B > () ) \
597 > :: type
598
599 // binary operators and left and right scalar operations with
600 // value_type, unary operators -, ! and ~
601
602#define OP_FUNC(OPFUNC,OP,CONSTRAINT) \
603 template < typename RHST , \
604 typename = typename std::enable_if \
605 < is_scalar < RHST > :: value \
606 > :: type \
607 > \
608 simd_type < C_PROMOTE ( value_type , RHST ) , vsize > \
609 OPFUNC ( simd_type < RHST , vsize > rhs ) const \
610 { \
611 CONSTRAINT \
612 simd_type < C_PROMOTE ( value_type , RHST ) , vsize > help ( *this ) ; \
613 for ( size_type i = 0 ; i < vsize ; i++ ) \
614 help [ i ] = ( (*this) [ i ] OP rhs [ i ] ) ; \
615 return help ; \
616 } \
617 template < typename RHST , \
618 typename = typename std::enable_if \
619 < is_scalar < RHST > :: value \
620 > :: type \
621 > \
622 simd_type < C_PROMOTE ( value_type , RHST ) , vsize > \
623 OPFUNC ( RHST rhs ) const \
624 { \
625 CONSTRAINT \
626 simd_type < C_PROMOTE ( value_type , RHST ) , vsize > help ( *this ) ; \
627 for ( size_type i = 0 ; i < vsize ; i++ ) \
628 help [ i ] = ( (*this) [ i ] OP rhs ) ; \
629 return help ; \
630 } \
631 template < typename LHST , \
632 typename = typename std::enable_if \
633 < is_scalar < LHST > :: value \
634 > :: type \
635 > \
636 friend simd_type < C_PROMOTE ( LHST , value_type ) , vsize > \
637 OPFUNC ( LHST lhs , simd_type rhs ) \
638 { \
639 CONSTRAINT \
640 simd_type < C_PROMOTE ( LHST , value_type ) , vsize > help ( lhs ) ; \
641 for ( size_type i = 0 ; i < vsize ; i++ ) \
642 help [ i ] = ( lhs OP rhs [ i ] ) ; \
643 return help ; \
644 }
645
646 OP_FUNC(operator+,+,)
647 OP_FUNC(operator-,-,)
648 OP_FUNC(operator*,*,)
649 OP_FUNC(operator/,/,)
650
651 OP_FUNC(operator%,%,INTEGRAL_ONLY)
652 OP_FUNC(operator&,&,INTEGRAL_ONLY)
653 OP_FUNC(operator|,|,INTEGRAL_ONLY)
654 OP_FUNC(operator^,^,INTEGRAL_ONLY)
655 OP_FUNC(operator<<,<<,INTEGRAL_ONLY)
656 OP_FUNC(operator>>,>>,INTEGRAL_ONLY)
657
658 OP_FUNC(operator&&,&&,BOOL_ONLY)
659 OP_FUNC(operator||,||,BOOL_ONLY)
660
661 #undef OP_FUNC
662
663 #define OP_FUNC(OPFUNC,OP,CONSTRAINT) \
664 simd_type OPFUNC() const \
665 { \
666 simd_type help ; \
667 for ( size_type i = 0 ; i < vsize ; i++ ) \
668 help [ i ] = OP (*this) [ i ] ; \
669 return help ; \
670 }
671
672 OP_FUNC(operator-,-,)
673 OP_FUNC(operator!,!,BOOL_ONLY)
674 OP_FUNC(operator~,~,INTEGRAL_ONLY)
675
676 #undef OP_FUNC
677
678 // provide methods to produce a mask on comparing a vector
679 // with another vector or a value_type.
680
681 // #define COMPARE_FUNC(OP,OPFUNC) \
682 // friend mask_type OPFUNC ( simd_type lhs , \
683 // simd_type rhs ) \
684 // { \
685 // mask_type m ; \
686 // for ( size_type i = 0 ; i < vsize ; i++ ) \
687 // m [ i ] = ( lhs [ i ] OP rhs [ i ] ) ; \
688 // return m ; \
689 // } \
690 // friend mask_type OPFUNC ( simd_type lhs , \
691 // value_type rhs ) \
692 // { \
693 // mask_type m ; \
694 // for ( size_type i = 0 ; i < vsize ; i++ ) \
695 // m [ i ] = ( lhs [ i ] OP rhs ) ; \
696 // return m ; \
697 // } \
698 // friend mask_type OPFUNC ( value_type lhs , \
699 // simd_type rhs ) \
700 // { \
701 // mask_type m ; \
702 // for ( size_type i = 0 ; i < vsize ; i++ ) \
703 // m [ i ] = ( lhs OP rhs [ i ] ) ; \
704 // return m ; \
705 // }
706
707#define COMPARE_FUNC(OPFUNC,OP) \
708 template < typename RHST , \
709 typename = typename std::enable_if \
710 < is_scalar < RHST > :: value \
711 > :: type \
712 > \
713 mask_type OPFUNC ( simd_type < RHST , vsize > rhs ) const \
714 { \
715 mask_type m ; \
716 for ( size_type i = 0 ; i < vsize ; i++ ) \
717 m [ i ] = ( (*this) [ i ] OP rhs [ i ] ) ; \
718 return m ; \
719 } \
720 template < typename RHST , \
721 typename = typename std::enable_if \
722 < is_scalar < RHST > :: value \
723 > :: type \
724 > \
725 mask_type OPFUNC ( RHST rhs ) const \
726 { \
727 mask_type m ; \
728 for ( size_type i = 0 ; i < vsize ; i++ ) \
729 m [ i ] = ( (*this) [ i ] OP rhs ) ; \
730 return m ; \
731 } \
732 template < typename LHST , \
733 typename = typename std::enable_if \
734 < is_scalar < LHST > :: value \
735 > :: type \
736 > \
737 friend mask_type OPFUNC ( LHST lhs , simd_type rhs ) \
738 { \
739 mask_type m ; \
740 for ( size_type i = 0 ; i < vsize ; i++ ) \
741 m [ i ] = ( lhs OP rhs [ i ] ) ; \
742 return m ; \
743 }
744
745 COMPARE_FUNC(operator<,<) ;
746 COMPARE_FUNC(operator<=,<=) ;
747 COMPARE_FUNC(operator>,>) ;
748 COMPARE_FUNC(operator>=,>=) ;
749 COMPARE_FUNC(operator==,==) ;
750 COMPARE_FUNC(operator!=,!=) ;
751
752 #undef COMPARE_FUNC
753
754 // next we define a masked vector as an object holding two pieces of
755 // information: one mask, determining which of the vector's elements
756 // will be 'open' to an effect, and one reference to a vector, which
757 // will be affected by an assignment or augmented assignment to the
758 // masked_type object. The mask is not held by reference - it is
759 // typically created right inside an invocation of the V(M) = ...
760 // idiom and will only persist when copied to the masked_type object.
761 // The compiler will take care of the masks's lifetime and storage and
762 // make sure the process is efficient and avoids unnecessary copying.
763 // The resulting object will only be viable as long as the referred-to
764 // vector is kept 'alive' - it's meant as a construct to be processed
765 // in the same scope, as the lhs of an assignment, typically using
766 // notation introduced by Vc: a vector's operator() is overloaded to
767 // to produce a masked_type when called with a mask_type object, and
768 // the resulting masked_type object is then assigned to - the V(M)=...
769 // idiom mentioned above.
770 // Note that this does not have any effect on those values in 'whither'
771 // for which the mask is false. They remain unchanged.
772
774 {
775 mask_type whether ; // if the mask is true at whether[i]
776 simd_type & whither ; // whither[i] will be assigned to
777
779 simd_type & _whither )
780 : whether ( _whether ) ,
781 whither ( _whither )
782 { }
783
784 // for the masked vector, we define the complete set of assignments:
785
786 #define OPEQ_FUNC(OPFUNC,OPEQ,CONSTRAINT) \
787 simd_type & OPFUNC ( value_type rhs ) \
788 { \
789 CONSTRAINT \
790 for ( size_type i = 0 ; i < vsize ; i++ ) \
791 { \
792 if ( whether [ i ] ) \
793 whither [ i ] OPEQ rhs ; \
794 } \
795 return whither ; \
796 } \
797 simd_type & OPFUNC ( simd_type rhs ) \
798 { \
799 CONSTRAINT \
800 for ( size_type i = 0 ; i < vsize ; i++ ) \
801 { \
802 if ( whether [ i ] ) \
803 whither [ i ] OPEQ rhs [ i ] ; \
804 } \
805 return whither ; \
806 }
807
808 OPEQ_FUNC(operator=,=,)
809 OPEQ_FUNC(operator+=,+=,)
810 OPEQ_FUNC(operator-=,-=,)
811 OPEQ_FUNC(operator*=,*=,)
812 OPEQ_FUNC(operator/=,/=,)
813 OPEQ_FUNC(operator%=,%=,INTEGRAL_ONLY)
814 OPEQ_FUNC(operator&=,&=,INTEGRAL_ONLY)
815 OPEQ_FUNC(operator|=,|=,INTEGRAL_ONLY)
816 OPEQ_FUNC(operator^=,^=,INTEGRAL_ONLY)
817 OPEQ_FUNC(operator<<=,<<=,INTEGRAL_ONLY)
818 OPEQ_FUNC(operator>>=,>>=,INTEGRAL_ONLY)
819
820 #undef OPEQ_FUNC
821
822 #undef INTEGRAL_ONLY
823 #undef BOOL_ONLY
824
825 } ;
826
827 // mimicking Vc, we define operator() with a mask_type argument
828 // to produce a masked_type object, which can be used later on to
829 // masked-assign to the referred-to vector. With this definition
830 // we can use the same syntax Vc uses, e.g. v1 ( v1 > v2 ) = v3
831 // This helps write code which compiles with Vc and without,
832 // because this idiom is 'very Vc'.
833
835 {
836 return masked_type ( mask , *this ) ;
837 }
838
839 // next we have a few functions creating masks - mainly for Vc
840 // compatibility.
841
842 static mask_type isnegative ( const simd_type & rhs )
843 {
844 return ( rhs < value_type(0) ) ;
845 }
846
847 static mask_type isfinite ( const simd_type & rhs )
848 {
849 for ( std::size_t i = 0 ; i < vsize ; i++ )
850 {
851 if ( isInf ( rhs[i] ) )
852 return false ;
853 }
854 return true ;
855 }
856
857 static mask_type isnan ( const simd_type & rhs )
858 {
859 for ( std::size_t i = 0 ; i < vsize ; i++ )
860 {
861 if ( isNan ( rhs[i] ) )
862 return true ;
863 }
864 return false ;
865 }
866
867 // member functions at_least and at_most. These functions provide the
868 // same functionality as max, or min, respectively. Given simd_type X
869 // and some threshold Y, X.at_least ( Y ) == max ( X , Y )
870 // Having the functionality as a member function makes it easy to
871 // implement, e.g., min as: min ( X , Y ) { return X.at_most ( Y ) ; }
872
873 #define CLAMP(FNAME,REL) \
874 simd_type FNAME ( simd_type threshold ) const \
875 { \
876 simd_type result ( threshold ) ; \
877 for ( std::size_t i = 0 ; i < vsize ; i++ ) \
878 { \
879 if ( (*this) [ i ] REL threshold ) \
880 result [ i ] = (*this) [ i ] ; \
881 } \
882 return result ; \
883 }
884
885 CLAMP(at_least,>)
886 CLAMP(at_most,<)
887
888 #undef CLAMP
889
890 // sum of vector elements. Note that there is no type promotion; the
891 // summation is done to value_type. Caller must make sure that overflow
892 // is not a problem.
893
895 {
896 value_type s ( 0 ) ;
897 for ( std::size_t e = 0 ; e < vsize ; e++ )
898 s += (*this) [ e ] ;
899 return s ;
900 }
901} ;
902
903// reductions for masks. It's often necessary to determine whether
904// a mask is completely full or empty, or has at least some non-false
905// members. We allow arbitrary simd_type to be testes with any_of,
906// all_of and none_of, rather than enforcing that only 'proper' masks
907// (namely simd_type < bool , vsize >) can be tested.
908
909template < typename T , std::size_t vsize >
911{
912 bool result = false ;
913 for ( std::size_t i = 0 ; i < vsize ; i++ )
914 result |= bool ( arg [ i ] ) ;
915 return result ;
916}
917
918template < typename T , std::size_t vsize >
920{
921 bool result = true ;
922 for ( std::size_t i = 0 ; i < vsize ; i++ )
923 result &= bool ( arg [ i ] ) ;
924 return result ;
925}
926
927template < typename T , std::size_t vsize >
929{
930 bool result = true ;
931 for ( std::size_t i = 0 ; i < vsize ; i++ )
932 result &= ( ! bool ( arg [ i ] ) ) ;
933 return result ;
934}
935
936} ;
937
938template < typename T , std::size_t N >
939struct std::allocator_traits < vspline::simd_type < T , N > >
940{
941 typedef std::allocator < vspline::simd_type < T , N > > type ;
942} ;
943
944#endif // #define VSPLINE_SIMD_TYPE_H
class template simd_type provides a fixed-size container type for small sets of fundamentals which ar...
Definition: simd_type.h:147
COMPARE_FUNC(operator<,<)
void load(const value_type *const p_src)
Definition: simd_type.h:376
COMPARE_FUNC(operator!=,!=)
friend simd_type abs(simd_type arg)
std::size_t size_type
Definition: simd_type.h:160
void rscatter(value_type *p_trg, const int &step) const
Definition: simd_type.h:435
static const int ivsize
Definition: simd_type.h:163
simd_type< bool, vsize > MaskType
Definition: simd_type.h:187
value_type sum() const
Definition: simd_type.h:894
void store(value_type *const p_trg) const
Definition: simd_type.h:407
static mask_type isfinite(const simd_type &rhs)
Definition: simd_type.h:847
_value_type value_type
Definition: simd_type.h:161
simd_type(const simd_type &)=default
COMPARE_FUNC(operator>,>)
COMPARE_FUNC(operator>=,>=)
COMPARE_FUNC(operator==,==)
friend std::istream & operator>>(std::istream &isr, simd_type it)
Definition: simd_type.h:360
void rgather(const value_type *const p_src, const int &step)
Definition: simd_type.h:428
static const simd_type Zero()
Definition: simd_type.h:338
static constexpr size_type size()
Definition: simd_type.h:167
simd_type(const value_type &ini)
Definition: simd_type.h:219
static const index_type IndexesFromZero()
Definition: simd_type.h:312
void scatter(value_type *const p_trg, const index_type &indexes) const
Definition: simd_type.h:416
friend std::ostream & operator<<(std::ostream &osr, simd_type it)
Definition: simd_type.h:350
simd_type & operator=(const value_type &rhs)
Definition: simd_type.h:209
static const simd_type iota()
Definition: simd_type.h:301
simd_type< bool, vsize > mask_type
Definition: simd_type.h:184
value_type & operator[](const size_type &i)
Definition: simd_type.h:194
static mask_type isnegative(const simd_type &rhs)
Definition: simd_type.h:842
simd_type< int, vsize > IndexType
Definition: simd_type.h:188
COMPARE_FUNC(operator<=,<=)
static const size_type vsize
Definition: simd_type.h:162
simd_type shuffle(index_type indexes)
Definition: simd_type.h:446
static mask_type isnan(const simd_type &rhs)
Definition: simd_type.h:857
static const index_type IndexesFrom(std::size_t start, std::size_t step=1)
Definition: simd_type.h:325
simd_type(const value_type *const p_src, const index_type &indexes)
Definition: simd_type.h:399
simd_type< int, vsize > index_type
Definition: simd_type.h:185
void gather(const value_type *const p_src, const index_type &indexes)
Definition: simd_type.h:389
masked_type operator()(mask_type mask)
Definition: simd_type.h:834
static const simd_type One()
Definition: simd_type.h:343
@ vsize
Definition: eval.cc:96
Definition: basis.h:79
bool any_of(simd_type< T, vsize > arg)
Definition: simd_type.h:910
bool none_of(simd_type< T, vsize > arg)
Definition: simd_type.h:928
bool all_of(simd_type< T, vsize > arg)
Definition: simd_type.h:919
typename std::integral_constant< bool, std::is_fundamental< T > ::value||std::is_same< T, bool > ::value > ::type is_scalar
Definition: simd_type.h:130
#define BUILD_FROM_CONTAINER(SIZE_TYPE, VSZ)
Definition: simd_type.h:247
#define BROADCAST_STD_FUNC3(FUNC)
Definition: simd_type.h:515
#define OPEQ_FUNC(OPFUNC, OPEQ, CONSTRAINT)
Definition: simd_type.h:786
#define BROADCAST_STD_FUNC(FUNC)
Definition: simd_type.h:470
#define OP_FUNC(OPFUNC, OP, CONSTRAINT)
Definition: simd_type.h:663
#define CLAMP(FNAME, REL)
Definition: simd_type.h:873
#define INTEGRAL_ONLY
Definition: simd_type.h:540
#define BOOL_ONLY
Definition: simd_type.h:544
#define BROADCAST_STD_FUNC2(FUNC)
Definition: simd_type.h:497
std::allocator< vspline::simd_type< T, N > > type
Definition: simd_type.h:941
masked_type(mask_type _whether, simd_type &_whither)
Definition: simd_type.h:778