vspline 1.1.0
Generic C++11 Code for Uniform B-Splines
convolve.h
Go to the documentation of this file.
1/************************************************************************/
2/* */
3/* vspline - a set of generic tools for creation and evaluation */
4/* of uniform b-splines */
5/* */
6/* Copyright 2015 - 2023 by Kay F. Jahnke */
7/* */
8/* The git repository for this software is at */
9/* */
10/* https://bitbucket.org/kfj/vspline */
11/* */
12/* Please direct questions, bug reports, and contributions to */
13/* */
14/* kfjahnke+vspline@gmail.com */
15/* */
16/* Permission is hereby granted, free of charge, to any person */
17/* obtaining a copy of this software and associated documentation */
18/* files (the "Software"), to deal in the Software without */
19/* restriction, including without limitation the rights to use, */
20/* copy, modify, merge, publish, distribute, sublicense, and/or */
21/* sell copies of the Software, and to permit persons to whom the */
22/* Software is furnished to do so, subject to the following */
23/* conditions: */
24/* */
25/* The above copyright notice and this permission notice shall be */
26/* included in all copies or substantial portions of the */
27/* Software. */
28/* */
29/* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND */
30/* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES */
31/* OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND */
32/* NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT */
33/* HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, */
34/* WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING */
35/* FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR */
36/* OTHER DEALINGS IN THE SOFTWARE. */
37/* */
38/************************************************************************/
39
40/*! \file convolve.h
41
42 \brief separable convolution of nD arrays
43
44 This file provides the core filtering code for convolution, which
45 can be used by itself to filter 1D arrays, or is used with the
46 'wielding' code in filter.h to filter nD arrays. The latter use is
47 what's used throughout most of vspline, since it provides automatic
48 multithreading and vectorization by buffering the data and applying
49 the 1D code to the buffer.
50
51 The implementation of convolution in this file can safely operate
52 in-place. The actual convolution operation is done using a small
53 kernel-sized circular buffer, which is multiplied with an adequately
54 shifted and rotated representation of the kernel. This is done
55 avoiding conditionals as best as possible. The 1D data are extrapolated
56 with one of the boundary condition codes known to class extrapolator
57 (see extrapolate.h). This is done transparently by putting extrapolated
58 data into the small circular buffer where this is needed.
59
60 The code is trivial insofar as it only uses indexed assignments, addition
61 and multiplication. So it can operate on a wide variety of data types,
62 prominently among them SIMD vector types.
63
64 Note how I use the kernel front-to-back, in the same forward sequence as
65 the data it is applied to. This is contrary to the normal convention of
66 using the kernel values back-to-front. Inside vspline, where only
67 symmetrical kernels are used, this makes no difference, but when vpline's
68 convolution code is used for other convolutions, this has to be kept in
69 mind.
70*/
71
72#ifndef VSPLINE_CONVOLVE_H
73#define VSPLINE_CONVOLVE_H
74
75#include "common.h"
76#include "extrapolate.h"
77#include "filter.h"
78
79namespace vspline {
80
81/// fir_filter_specs holds the parameters for a filter performing
82/// a convolution along a single axis. In vspline, the place where
83/// the specifications for a filter are fixed and the place where
84/// it is finally created are far apart: the filter is created
85/// in the separate worker threads. So this structure serves as
86/// a vehicle to transport the arguments.
87/// Note the specification of 'headroom': this allows for
88/// non-symmetrical and even kernels. When applying the kernel
89/// to obtain output[i], the kernel is applied to
90/// input [ i - headroom ] , ... , input [ i - headroom + ksize - 1 ]
91
93{
94 vspline::bc_code bc ; // boundary conditions
95 int ksize ; // kernel size
96 int headroom ; // part of kernel 'to the left'
97 const xlf_type * kernel ; // pointer to kernel values
98
100 int _ksize ,
101 int _headroom ,
102 const xlf_type * _kernel )
103 : bc ( _bc ) ,
104 ksize ( _ksize ) ,
105 headroom ( _headroom ) ,
106 kernel ( _kernel )
107 {
108 assert ( headroom < ksize ) ;
109 } ;
110} ;
111
112/// class fir_filter provides the 'solve' routine which convolves
113/// a 1D signal with selectable extrapolation. Here, the convolution
114/// kernel is applied to the incoming signal and the result is written
115/// to the specified output location. Note that this operation
116/// can be done in-place, but input and output may also be different.
117/// While most of the time this routine will be invoked by class
118/// convolve (below), it is also directly used by the specialized
119/// code for 1D filtering.
120/// Note how we conveniently inherit from the specs class. This also
121/// enables us to use an instance of fir_filter or class convolve
122/// as specs argument to create further filters with the same arguments.
123
124// TODO: some kernels are symmetric, which might be exploited.
125
126// TODO: special code for filters with 0-valued coefficients, like
127// sinc-derived half band filters
128
129template < typename in_type ,
130 typename out_type = in_type ,
131 typename _math_type = out_type >
133: public fir_filter_specs
134{
135 // this filter type does not need storage of intermediate results.
136
137 static const bool is_single_pass { true } ;
138
139 typedef vigra::MultiArrayView < 1 , in_type > in_buffer_type ;
140 typedef vigra::MultiArrayView < 1 , out_type > out_buffer_type ;
141 typedef _math_type math_type ;
142
143 // we put all state data into a single area of memory called 'reactor'.
144 // The separate parts holding the small circular buffer, the repeated
145 // kernel and the tail buffer are implemented as views to 'reactor'.
146 // This way, all data participating in the arithmetics are as close
147 // together in memory as possible.
148 // note how the current implementation does therefore hold the kernel
149 // values in the 'reactor' as simdized types (if math_type is simdized).
150 // this may be suboptimal, since the kernel values might be supplied
151 // as scalars and could be kept in a smaller area of memory.
152 // TODO: investigate
153
155 = typename vspline::allocator_traits < math_type > :: type ;
156
157 vigra::MultiArray < 1 , math_type , allocator_t > reactor ;
158 vigra::MultiArrayView < 1 , math_type > circular_buffer ;
159 vigra::MultiArrayView < 1 , math_type > kernel_values ;
160 vigra::MultiArrayView < 1 , math_type > tail_buffer ;
161
162
163 fir_filter ( const fir_filter_specs & specs )
164 : fir_filter_specs ( specs ) ,
165 reactor ( vigra::Shape1 ( specs.ksize * 4 ) )
166 {
167 circular_buffer = reactor.subarray
168 ( vigra::Shape1 ( 0 ) , vigra::Shape1 ( ksize ) ) ;
169
170 kernel_values = reactor.subarray
171 ( vigra::Shape1 ( ksize ) , vigra::Shape1 ( ksize * 3 ) ) ;
172
173 tail_buffer = reactor.subarray
174 ( vigra::Shape1 ( ksize * 3 ) , vigra::Shape1 ( ksize * 4 ) ) ;
175
176 for ( int i = 0 ; i < ksize ; i++ )
177 kernel_values [ i ] = kernel_values [ i + ksize ] = kernel [ i ] ;
178 } ;
179
180 /// calling code may have to set up buffers with additional
181 /// space around the actual data to allow filtering code to
182 /// 'run up' to the data, shedding margin effects in the
183 /// process. We stay on the safe side and return the width
184 /// of the whole kernel, which is always sufficient to
185 /// provide safe runup.
186
188 {
189 return ksize ;
190 }
191
192 /// public 'solve' routine. This is for calls 'from outside',
193 /// like when this object is used by itself, not as a base class
194 /// of class convolve below.
195 /// an extrapolator for the boundary condition code 'bc'
196 /// (see fir_filter_specs) is made, then the call is delegated
197 /// to the protected routine below which accepts an extrapolator
198 /// on top of input and output.
199
200 void solve ( const in_buffer_type & input ,
201 out_buffer_type & output )
202 {
203 int size = output.size() ;
204 extrapolator < in_buffer_type > source ( bc , input ) ;
205 solve ( input , output , source ) ;
206 }
207
208protected:
209
210 /// protected solve routine taking an extrapolator on top of
211 /// input and output. This way, the derived class (class convolve)
212 /// can maintain an extrapolator fixed to it's buffer and reuse
213 /// it for subsequent calls to this routine.
214 /// we use the following strategy:
215 /// - keep a small circular buffer as large as the kernel
216 /// - have two kernels concatenated in another buffer
217 /// - by pointing into the concatenated kernels, we can always
218 /// have ksize kernel values in sequence so that this sequence
219 /// is correct for the values in the circular buffer.
220 /// this strategy avoids conditionals as best as possible and
221 /// should be easy to optimize. the actual code is a bit more
222 /// complex to account for the fact that at the beginning and
223 /// end of the data, a few extrapolated values are used. The
224 /// central loop can directly read from input without using the
225 /// extrapolator, which is most efficient.
226
227 void solve ( const in_buffer_type & input ,
228 out_buffer_type & output ,
229 const extrapolator < in_buffer_type > & source )
230 {
231 if ( ksize < 1 )
232 {
233 // if kernel size is zero or even negative, then,
234 // if operation isn't in-place, copy input to output
235
236 if ( (void*) ( input.data() ) != (void*) ( output.data() ) )
237 {
238 for ( std::ptrdiff_t i = 0 ; i < output.size() ; i++ )
239 output[i] = out_type ( input[i] ) ;
240 }
241
242 return ; // we're done prematurely
243 }
244 else if ( ksize == 1 )
245 {
246 // for kernel size 1 we perform the multiplication of the
247 // single kernel value with the input in a simple loop without
248 // using the circular buffering mechanism below. This is an
249 // optimization, the circular buffer code can also handle
250 // single-value kernels.
251
252 math_type factor ( kernel[0] ) ;
253
254 for ( std::ptrdiff_t i = 0 ; i < output.size() ; i++ )
255 output[i] = out_type ( factor * math_type ( input[i] ) ) ;
256
257 return ; // we're done prematurely
258 }
259
260 int si = - headroom ; // read position
261 int ti = 0 ; // store position
262
263 // initialize circular buffer using the extrapolator
264 // note: initially I coded to fetch only the first 'headroom'
265 // values from the extrapolator, then up to ksize straight
266 // from 'input'. but this is *not* correct: 'input' may by
267 // very small, and with a large kernel we also need the
268 // extrapolator further on after the input is already
269 // consumed. So this is the correct way of doing it:
270
271 for ( int i = 0 ; i < ksize ; i++ , si++ )
272 circular_buffer[i] = source ( si ) ;
273
274 // see how many full cycles we can run, directly accessing
275 // 'input' without resorting to extrapolation
276
277 int size = output.size() ;
278 int leftover = size - si ;
279 int full_cycles = 0 ;
280 if ( leftover > 0 )
281 full_cycles = leftover / ksize ;
282
283 // stash the trailing extrapolated values: we want to be able
284 // to operate in-place, and if we write to the buffer we can't
285 // use the extrapolator over it anymore. note how we only fill
286 // in ksize - headroom values. this is all we'll need, the buffer
287 // may be slightly larger.
288
289 int ntail = ksize - headroom ;
290 int z = size ;
291 for ( int i = 0 ; i < ntail ; i++ , z++ )
292 tail_buffer[i] = source ( z ) ;
293
294 // central loop, reading straight from input without extrapolation
295
296 for ( int cycle = 0 ; cycle < full_cycles ; cycle++ )
297 {
298 auto p_kernel = kernel_values.data() + ksize ;
299 auto p_data = circular_buffer.data() ;
300
301 for ( int i = 0 ; i < ksize ; )
302 {
303 // perform the actual convolution
304 // TODO: exploit symmetry
305
306 math_type result = circular_buffer[0] * p_kernel[0] ;
307
308 // KFJ 2019-02-12 tentative use of fma
309
310#ifdef USE_FMA
311 for ( int j = 1 ; j < ksize ; j++ )
312 result = fma ( circular_buffer[j] , p_kernel[j] , result ) ;
313#else
314 for ( int j = 1 ; j < ksize ; j++ )
315 result += circular_buffer[j] * p_kernel[j] ;
316#endif
317
318 // stash result
319
320 output [ ti ] = out_type ( result ) ;
321
322 // fetch next input value
323
324 * p_data = input [ si ] ;
325
326 // adjust pointers and indices
327
328 ++ si ;
329 ++ ti ;
330 ++ i ;
331
332 if ( i == ksize )
333 break ;
334
335 ++ p_data ;
336 -- p_kernel ;
337 }
338 }
339
340 // produce the last few values, resorting to tail_buffer
341 // where it is necessary
342
343 while ( ti < size )
344 {
345 auto p_kernel = kernel_values.data() + ksize ;
346 auto p_data = circular_buffer.data() ;
347
348 for ( int i = 0 ; i < ksize && ti < size ; i++ )
349 {
350 math_type result = circular_buffer[0] * p_kernel[0] ;
351 for ( int j = 1 ; j < ksize ; j++ )
352 result += circular_buffer[j] * p_kernel[j] ;
353
354 output [ ti ] = out_type ( result ) ;
355
356 if ( si < size )
357 // still sweet
358 * p_data = input [ si ] ;
359 else
360 // input used up, use stashed extrapolated values
361 * p_data = tail_buffer [ si - size ] ;
362
363 ++ si ;
364 ++ ti ;
365
366 ++ p_data ;
367 -- p_kernel ;
368 }
369 }
370 }
371} ;
372
373/// class convolve provides the combination of class fir_filter
374/// above with a vector-friendly buffer. Calling code provides
375/// information about what should be buffered, the data are sucked
376/// into the buffer, filtered, and moved back from there.
377/// The operation is orchestrated by the code in filter.h, which
378/// is also used to 'wield' the b-spline prefilter. Both operations
379/// are sufficiently similar to share the wielding code.
380
381template < template < typename , size_t > class _vtype ,
382 typename _math_ele_type ,
383 size_t _vsize >
385: public buffer_handling < _vtype , _math_ele_type , _vsize > ,
386 public vspline::fir_filter < _vtype < _math_ele_type , _vsize > >
387{
388 // provide this type for queries
389
390 typedef _math_ele_type math_ele_type ;
391
392 // we'll use a few types from the buffer_handling type
393
396
397 using typename buffer_handling_type::vtype ;
400
401 // instances of class convolve hold the buffer as state:
402
404 = typename vspline::allocator_traits < vtype > :: type ;
405
406 typedef vigra::MultiArray < 1 , vtype , allocator_t > buffer_type ;
407 typedef vigra::MultiArrayView < 1 , vtype > buffer_view_type ;
408
410
411 // and also an extrapolator, which is fixed to the buffer
412
414
415 // the filter's 'solve' routine has the workhorse code to filter
416 // the data inside the buffer:
417
418 typedef _vtype < _math_ele_type , _vsize > simdized_math_type ;
420 using filter_type::solve ;
422
423 // by defining arg_type, we allow code to infer what type of
424 // initializer ('specs') the filter takes
425
427
428 // the constructor invokes the filter's constructor,
429 // sets up the buffer and initializes the buffer_handling
430 // component to use the whole buffer to accept incoming and
431 // provide outgoing data.
432
433 convolve ( const fir_filter_specs & specs , size_t size )
434 : filter_type ( specs ) ,
435 buffer ( size ) ,
436 buffer_extrapolator ( specs.bc , buffer )
437 {
438 init ( buffer , buffer ) ;
439 } ;
440
441 // operator() simply delegates to the filter's 'solve' routine,
442 // which filters the data in the buffer. Note how the solve
443 // overload accepting an extrapolator is used: the extrapolator
444 // remains the same, so there's no point creating a new one
445 // with every call.
446
448 {
450 }
451
452 // factory function to provide a filter with the same set of
453 // parameters, but possibly different data types. this is used
454 // for processing of 1D data, where the normal buffering mechanism
455 // may be sidestepped.
456
457 template < typename in_type ,
458 typename out_type = in_type ,
459 typename math_type = out_type >
462 {
464 ( specs ) ;
465 }
466
467} ;
468
469} ; // namespace vspline
470
471#endif // VSPLINE_CONVOLVE_H
buffer_handling provides services needed for interfacing with a buffer of simdized/goading data....
Definition: filter.h:227
void init(vigra::MultiArrayView< 1, vtype > &_in_window, vigra::MultiArrayView< 1, vtype > &_out_window)
Definition: filter.h:237
definitions common to all files in this project, utility code
extrapolation of 1D data sets with specific boundary conditions
generic implementation of separable filtering for nD arrays
Definition: basis.h:79
long double xlf_type
Definition: common.h:102
bc_code
This enumeration is used for codes connected to boundary conditions. There are two aspects to boundar...
Definition: common.h:71
vspline creates vigra::MultiArrays of vectorized types. As long as the vectorized types are Vc::SimdA...
Definition: common.h:267
class convolve provides the combination of class fir_filter above with a vector-friendly buffer....
Definition: convolve.h:387
fir_filter_specs arg_type
Definition: convolve.h:426
extrapolator< buffer_view_type > buffer_extrapolator
Definition: convolve.h:413
buffer_type buffer
Definition: convolve.h:409
vigra::MultiArrayView< 1, vtype > buffer_view_type
Definition: convolve.h:407
convolve(const fir_filter_specs &specs, size_t size)
Definition: convolve.h:433
void operator()()
Definition: convolve.h:447
void solve(const in_buffer_type &input, out_buffer_type &output)
public 'solve' routine. This is for calls 'from outside', like when this object is used by itself,...
Definition: convolve.h:200
buffer_handling< _vtype, _math_ele_type, _vsize > buffer_handling_type
Definition: convolve.h:395
typename vspline::allocator_traits< vtype > ::type allocator_t
Definition: convolve.h:404
_vtype< _math_ele_type, _vsize > simdized_math_type
Definition: convolve.h:418
vigra::MultiArray< 1, vtype, allocator_t > buffer_type
Definition: convolve.h:406
_math_ele_type math_ele_type
Definition: convolve.h:390
static vspline::fir_filter< in_type, out_type, math_type > get_raw_filter(const fir_filter_specs &specs)
Definition: convolve.h:461
vspline::fir_filter< simdized_math_type > filter_type
Definition: convolve.h:419
struct extrapolator is a helper class providing extrapolated values for a 1D buffer indexed with poss...
Definition: extrapolate.h:70
fir_filter_specs holds the parameters for a filter performing a convolution along a single axis....
Definition: convolve.h:93
fir_filter_specs(vspline::bc_code _bc, int _ksize, int _headroom, const xlf_type *_kernel)
Definition: convolve.h:99
const xlf_type * kernel
Definition: convolve.h:97
vspline::bc_code bc
Definition: convolve.h:94
class fir_filter provides the 'solve' routine which convolves a 1D signal with selectable extrapolati...
Definition: convolve.h:134
_math_type math_type
Definition: convolve.h:141
static const bool is_single_pass
Definition: convolve.h:137
vigra::MultiArrayView< 1, math_type > circular_buffer
Definition: convolve.h:158
vigra::MultiArrayView< 1, math_type > kernel_values
Definition: convolve.h:159
int get_support_width() const
calling code may have to set up buffers with additional space around the actual data to allow filteri...
Definition: convolve.h:187
vigra::MultiArray< 1, math_type, allocator_t > reactor
Definition: convolve.h:157
void solve(const in_buffer_type &input, out_buffer_type &output)
public 'solve' routine. This is for calls 'from outside', like when this object is used by itself,...
Definition: convolve.h:200
vigra::MultiArrayView< 1, out_type > out_buffer_type
Definition: convolve.h:140
typename vspline::allocator_traits< math_type > ::type allocator_t
Definition: convolve.h:155
void solve(const in_buffer_type &input, out_buffer_type &output, const extrapolator< in_buffer_type > &source)
protected solve routine taking an extrapolator on top of input and output. This way,...
Definition: convolve.h:227
vigra::MultiArrayView< 1, in_type > in_buffer_type
Definition: convolve.h:139
vigra::MultiArrayView< 1, math_type > tail_buffer
Definition: convolve.h:160
fir_filter(const fir_filter_specs &specs)
Definition: convolve.h:163