prefilter_8h_source.html

/************************************************************************/

/*                                                                      */

/*    vspline - a set of generic tools for creation and evaluation      */

/*              of uniform b-splines                                    */

/*                                                                      */

/*            Copyright 2015 - 2023 by Kay F. Jahnke                    */

/*                                                                      */

/*    The git repository for this software is at                        */

/*                                                                      */

/*    https://bitbucket.org/kfj/vspline                                 */

/*                                                                      */

/*    Please direct questions, bug reports, and contributions to        */

/*                                                                      */

/*    kfjahnke+vspline@gmail.com                                        */

/*                                                                      */

/*    Permission is hereby granted, free of charge, to any person       */

/*    obtaining a copy of this software and associated documentation    */

/*    files (the "Software"), to deal in the Software without           */

/*    restriction, including without limitation the rights to use,      */

/*    copy, modify, merge, publish, distribute, sublicense, and/or      */

/*    sell copies of the Software, and to permit persons to whom the    */

/*    Software is furnished to do so, subject to the following          */

/*    conditions:                                                       */

/*                                                                      */

/*    The above copyright notice and this permission notice shall be    */

/*    included in all copies or substantial portions of the             */

/*    Software.                                                         */

/*                                                                      */

/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */

/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */

/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */

/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */

/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */

/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */

/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */

/*    OTHER DEALINGS IN THE SOFTWARE.                                   */

/*                                                                      */

/************************************************************************/


/*! \file prefilter.h


    \brief Code to create the coefficient array for a b-spline.


    Note: the bulk of the code was factored out to filter.h, while this text still

    outlines the complete filtering process.


    B-spline coefficients can be generated in two ways (that I know of): the first

    is by solving a set of equations which encode the constraints of the spline.

    A good example of how this is done can be found in libeinspline. I term it

    the 'linear algebra approach'. In this implementation, I have chosen what I

    call the 'DSP approach'. In a nutshell, the DSP approach looks at the b-spline's

    reconstruction as a convolution of the coefficients with a specific kernel. This

    kernel acts as a low-pass filter. To counteract the effect of this filter and

    obtain the input signal from the convolution of the coefficients, a high-pass

    filter with the inverse transfer function to the low-pass is used. This high-pass

    has infinite support, but can still be calculated precisely within the bounds of

    the arithmetic precision the CPU offers, due to the properties it has.


    I recommend [CIT2000] for a formal explanation. At the core of my prefiltering

    routines there is code from Philippe Thevenaz' accompanying code to this paper,

    with slight modifications translating it to C++ and making it generic.

    The greater part of this file deals with 'generifying' the process and to

    employing multithreading and the CPU's vector units to gain speed.


    This code makes heavy use of vigra, which provides handling of multidimensional

    arrays and efficient handling of aggregate types - to only mention two of it's

    many qualities. Explicit vectorization is done with Vc, which allowed me to code

    the horizontal vectorization I use in a generic fashion. If Vc is not available,

    the code falls back to presenting the data so that autovectorization becomes

    very likely - a technique I call 'goading'.


    In another version of this code I used vigra's BSplineBase class to obtain prefilter

    poles. This required passing the spline degree/order as a template parameter. Doing it

    like this allows to make the Poles static members of the solver, but at the cost of

    type proliferation. Here I chose not to follow this path and pass the spline order as a

    parameter to the spline's constructor, thus reducing the number of solver specializations

    and allowing automated testing with loops over the degree. This variant may be slightly

    slower. The prefilter poles I use are precalculated externally with gsl/blas and polished

    in high precision to provide the most precise data possible. this avoids using

    vigra's polynomial root code which failed for high degrees when I used it.


    [CIT2000] Interpolation Revisited by Philippe Thévenaz, Member,IEEE, Thierry Blu, Member, IEEE, and Michael Unser, Fellow, IEEE in IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 19, NO. 7, JULY 2000,

*/


#ifndef VSPLINE_PREFILTER_H

#define VSPLINE_PREFILTER_H


#include <limits>

#include "common.h"

#include "poles.h"

#include "filter.h"


namespace vspline {


using namespace std ;

using namespace vigra ;


/// overall_gain is a helper routine:

/// Simply executing the filtering code by itself will attenuate the signal. Here

/// we calculate the gain which, pre-applied to the signal, will cancel this effect.

/// While this code was initially part of the filter's constructor, I took it out

/// to gain some flexibility by passing in the gain as a parameter.

///

/// Note that higher-degree splines need filtering with some poles which are *very*

/// small numerically. This is a problem: The data get 'squashed', since there are

/// mathematical operations between attenuated and unattenuated values. So for high

/// spline degrees, float data aren't suitable, and even doubles and long doubles

/// suffer from squashing and lose precision.

///

/// Note also how we perform the arithmetics in this routine in the highest precision

/// available. Calling code will cast the product down to the type it uses for maths.


static xlf_type overall_gain ( const int & nbpoles ,

                               const xlf_type * const pole )

{

  xlf_type lambda = 1 ;


  for ( int k = 0 ; k < nbpoles ; k++ )


    lambda *= ( 1 - pole[k] ) * ( 1 - 1 / pole[k] ) ;


  return lambda ;

}


/// overload of overall_gain taking the spline's degree


static xlf_type overall_gain ( const int & spline_degree )

{

  if ( spline_degree < 2 )

    return 1 ;

  assert ( spline_degree <= vspline_constants::max_degree ) ;

  return overall_gain ( spline_degree / 2 ,

                        vspline_constants::precomputed_poles [ spline_degree ] ) ;

}


/// structure to hold specifications for an iir_filter object.

/// This set of parameters has to be passed through from

/// the calling code through the multithreading code to the worker threads

/// where the filter objects are finally constructed. Rather than passing

/// the parameters via some variadic mechanism, it's more concise and

/// expressive to contain them in a structure and pass that around.

/// The filter itself inherits its specification type, and if the code

/// knows the handler's type, it can derive the spec type. This way the

/// argument passing can be formalized, allowing for uniform handling of

/// several different filter types with the same code. Here we have the

/// concrete parameter set needed for b-spline prefiltering. We'll pass

/// one set of 'specs' per axis; it contains:

/// - the boundary condition for this axis

/// - the number of filter poles (see poles.h)

/// - a pointer to npoles poles

/// - the acceptable tolerance


// TODO: KFJ 2018-03-21 added another member 'boost' to the filter specs.

// This value is used as a factor on 'gain', resulting in the signal

// being amplified by this factor at no additional computational cost,

// which might be desirable when pulling integral signals up to the

// maximal dynamic range. But beware: there are some corner cases with

// splines holding integral data which may cause wrong results

// if 'boost' is too large. Have a look at int_spline.cc and also the

// comments above _process_1d in filter.h


struct iir_filter_specs

{

  vspline::bc_code bc ;

  int npoles ;

  const xlf_type * pole ;

  xlf_type tolerance ;

  xlf_type boost ;


  iir_filter_specs ( vspline::bc_code _bc ,

                     int _npoles ,

                     const xlf_type * _pole ,

                     xlf_type _tolerance ,

                     xlf_type _boost = xlf_type ( 1 )

                   )

  : bc ( _bc ) ,

    npoles ( _npoles ) ,

    pole ( _pole ) ,

    tolerance ( _tolerance ) ,

    boost ( _boost )

  { } ;

} ;


/// class iir_filter implements an n-pole forward/backward recursive filter

/// to be used for b-spline prefiltering. It inherits from the 'specs'

/// class for easy initialization.


template < typename in_type ,

           typename out_type = in_type ,

           typename _math_type = out_type >

class iir_filter

: public iir_filter_specs

{

  typedef _math_type math_type ;


  typedef vigra::MultiArrayView < 1 , in_type > in_buffer_type ;

  typedef vigra::MultiArrayView < 1 , out_type > out_buffer_type ;


  /// typedef the fully qualified type for brevity, to make the typedefs below

  /// more legible


  typedef iir_filter < in_type , out_type , math_type > filter_type ;


  xlf_type gain ;

  std::vector < int > horizon ;


  // we handle the polymorphism internally, working with method pointers.

  // this saves us having to set up a base class with virtual member functions

  // and inheriting from it.


  typedef void  ( filter_type::*p_solve ) ( const in_buffer_type & input ,

                                                 out_buffer_type & output ) const ;

  typedef math_type ( filter_type::*p_icc )   ( const in_buffer_type & buffer , int k ) const ;

  typedef math_type ( filter_type::*p_iccx )  ( const out_buffer_type & buffer , int k ) const ;

  typedef math_type ( filter_type::*p_iacc )  ( const out_buffer_type & buffer , int k ) const ;


  // these are the method pointers used:


  p_solve _p_solve ; ///< pointer to the solve method

  p_icc   _p_icc ;   ///< pointer to calculation of initial causal coefficient (from in_)

  p_iccx  _p_iccx ;  ///< pointer to calculation of initial causal coefficient (from out_)

  p_iacc  _p_iacc ;  ///< pointer to calculation of initial anticausal coefficient


public:


  // this filter runs over the data several times and stores the result

  // of each run back to be picked up by the next run. This has certain

  // implications: if out_type is an integral type, using it to store

  // intermediates will produce quantization errors with every run.

  // this flag signals to the wielding code in filter.h that intermediates

  // need to be stored, so it can avoid the problem by providing a buffer

  // in a 'better' type as output ('output' is used to store intermediates)

  // and converting the data back to the 'real' output afterwards.


  static const bool is_single_pass { false } ;


  /// calling code may have to set up buffers with additional

  /// space around the actual data to allow filtering code to

  /// 'run up' to the data, shedding margin effects in the

  /// process. For an IIR filter, this is theoretically

  /// infinite , but since we usually work to a specified precision,

  /// we can pass 'horizon' - horizon[0] containing the largest

  /// of the horizon values.


  int get_support_width ( ) const

  {

    if ( npoles )

      return horizon [ 0 ] ;


    // TODO quick fix. I think this case never occurs, since the filtering

    // code is avoided for npoles < 1


    return 64 ;

  }


 /// solve() takes two buffers, one to the input data and one to the output space.

 /// The containers must have the same size. It's safe to use solve() in-place.


 void solve ( const in_buffer_type & input , out_buffer_type & output )

 {

   assert ( input.size ( ) == output.size ( ) ) ;

   ( this->*_p_solve ) ( input , output ) ;

 }


 /// for in-place operation we use the same filter routine.


 void solve ( out_buffer_type & data )

 {

   ( this->*_p_solve ) ( data , data ) ;

 }


// I use adapted versions of P. Thevenaz' code to calculate the initial causal and

// anticausal coefficients for the filter. The code is changed just a little to work

// with an iterator instead of a C vector.


private:


/// The code for mirrored BCs is adapted from P. Thevenaz' code, the other routines are my

/// own doing, with aid from a digest of spline formulae I received from P. Thevenaz and which

/// were helpful to verify the code against a trusted source.

///

/// note how, in the routines to find the initial causal coefficient, there are two different

/// cases: first the 'accelerated loop', which is used when the theoretically infinite sum of

/// terms has reached sufficient precision , and the 'full loop', which implements the mathematically

/// precise representation of the limes of the infinite sum towards an infinite number of terms,

/// which happens to be calculable due to the fact that the absolute value of all poles is < 1 and

///

///  lim     n                a

///         sum a * q ^ k =  ---

/// n->inf  k=0              1-q

///

/// first are mirror BCs. This is mirroring 'on bounds',

/// f ( -x ) == f ( x ) and f ( n-1 - x ) == f (n-1 + x)

///

/// note how mirror BCs are equivalent to requiring the first derivative to be zero in the

/// linear algebra approach. Obviously with mirrored data this has to be the case, the location

/// where mirroring occurs is always an extremum. So this case covers 'FLAT' BCs as well

///

/// the initial causal coefficient routines are templated by buffer type, because depending

/// on the circumstances, they may be used either on the input or the output.


// TODO format to vspline standard


/// we use accessor classes to access the input and output buffers.

/// To access an input buffer (which remains constant), we use

/// 'as_math_type' which simply provides the ith element cast to

/// math_type. This makes for legible, concise code. We return

/// const math_type from operator[] to make sure X[..] won't be

/// accidentally assigned to.


template < typename buffer_type >

struct as_math_type

{

  const buffer_type & c ;


  as_math_type ( const buffer_type & _c )

  : c ( _c )

  { } ;


  const math_type operator[] ( int i ) const

  {

    return math_type ( c [ i ] ) ;

  }

} ;


/// the second helper class, as_target, is meant for output

/// buffers. Here we need to read as well as write. Writing is

/// rare, so I use a method 'store' in preference to doing artistry

/// with a proxy. We return const math_type from operator[] to make

/// sure X[..] won't be accidentally assigned to.


template < typename buffer_type >

struct as_target

{

  buffer_type & x ;


  as_target ( buffer_type & _x )

  : x ( _x )

  { } ;


  const math_type operator[] ( int i ) const

  {

    return math_type ( x [ i ] ) ;

  }


  void store ( const math_type & v , const int & i )

  {

    x [ i ] = typename buffer_type::value_type ( v ) ;

  }

} ;


template < class buffer_type >

math_type icc_mirror ( const buffer_type & _c , int k ) const

{

  int M = _c.size ( ) ;

  as_math_type < buffer_type > c ( _c ) ;


  math_type z = math_type ( pole[k] ) ;

  math_type zn , z2n , iz ;

  math_type Sum ;

  int  n ;


  if ( horizon[k] < M )

  {

    /* accelerated loop */

    zn = z ;

    Sum = c[0] ;

    for ( n = 1 ; n < horizon[k] ; n++ )

    {

      Sum += zn * c[n] ;

      zn *= z ;

    }

  }

  else

  {

    /* full loop */

    zn = z ;

    iz = math_type ( 1.0 ) / z ;

    z2n = math_type ( pow ( xlf_type ( pole[k] ) , xlf_type ( M - 1 ) ) ) ;

    Sum = c[0] + z2n * c[M - 1] ;

    z2n *= z2n * iz ;

    for ( n = 1 ; n <= M - 2 ; n++ )

    {

      Sum += ( zn + z2n ) * c[n] ;

      zn *= z ;

      z2n *= iz ;

    }

    Sum /= ( math_type ( 1.0 ) - zn * zn ) ;

  }

 return ( Sum ) ;

}


/// the initial anticausal coefficient routines are always called with the output buffer,

/// so they needn't be templated like the icc routines.

///

/// I still haven't understood the 'magic' which allows to calculate the initial anticausal

/// coefficient from just two results of the causal filter, but I assume it's some exploitation

/// of the symmetry of the data. This code is adapted from P. Thevenaz'.


math_type iacc_mirror ( const out_buffer_type & _c , int k ) const

{

  int M = _c.size ( ) ;

  as_math_type < out_buffer_type > c ( _c ) ;


  math_type z = math_type ( pole[k] ) ;


  return ( math_type ( z / ( z * z - math_type ( 1.0 ) ) ) * ( c [ M - 1 ] + z * c [ M - 2 ] ) ) ;

}


/// next are 'antimirrored' BCs. This is the same as 'natural' BCs: the signal is

/// extrapolated via point mirroring at the ends, resulting in point-symmetry at the ends,

/// which is equivalent to the second derivative being zero, the constraint used in

/// the linear algebra approach to calculate 'natural' BCs:

///

/// f ( x ) - f ( 0 ) == f ( 0 ) - f ( -x ) ;

/// f ( x+n-1 ) - f ( n-1 ) == f ( n-1 ) - f (n-1-x)


template < class buffer_type >

math_type icc_natural ( const buffer_type & _c , int k ) const

{

  int M = _c.size ( ) ;

  as_math_type < buffer_type > c ( _c ) ;


  math_type z = math_type ( pole[k] ) ;

  math_type zn , z2n , iz ;

  math_type Sum , c02 ;

  int  n ;


  // f ( x ) - f ( 0 ) == f ( 0 ) - f (-x)

  // f ( -x ) == 2 * f ( 0 ) - f (x)


  if ( horizon[k] < M )

  {

    c02 = c[0] + c[0] ;

    zn = z ;

    Sum = c[0] ;

    for ( n = 1 ; n < horizon[k] ; n++ )

    {

      Sum += zn * ( c02 - c[n] ) ;

      zn *= z ;

    }

    return ( Sum ) ;

  }

  else {

    zn = z ;

    iz = math_type ( 1.0 ) / z ;

    z2n = math_type ( pow ( xlf_type ( pole[k] ) , xlf_type ( M - 1 )) ) ;

    Sum = math_type ( ( math_type ( 1.0 ) + z ) / ( math_type ( 1.0 ) - z ) )

          * ( c[0] - z2n * c[M - 1] ) ;

    z2n *= z2n * iz ;                                                   // z2n == z^2M-3

    for ( n = 1 ; n <= M - 2 ; n++ )

    {

      Sum -= ( zn - z2n ) * c[n] ;

      zn *= z ;

      z2n *= iz ;

    }

    return ( Sum / ( math_type ( 1.0 ) - zn * zn )) ;

  }

}


/// I still haven't understood the 'magic' which allows to calculate the initial anticausal

/// coefficient from just two results of the causal filter, but I assume it's some exploitation

/// of the symmetry of the data. This code is adapted from P. Thevenaz' formula.


math_type iacc_natural ( const out_buffer_type & _c , int k ) const

{

  int M = _c.size ( ) ;

  as_math_type < out_buffer_type > c ( _c ) ;


  math_type z = math_type ( pole[k] ) ;


  return - math_type ( z / ( ( math_type ( 1.0 ) - z ) * ( math_type ( 1.0 ) - z ) ) ) * ( c [ M - 1 ] - z * c [ M - 2 ] ) ;

}


/// next are reflective BCs. This is mirroring 'between bounds':

///

/// f ( -1 - x ) == f ( x ) and f ( n + x ) == f (n-1 - x)

///

/// I took Thevenaz' routine for mirrored data as a template and adapted it.

/// 'reflective' BCs have some nice properties which make them more suited than mirror BCs in

/// some situations:

/// - the artificial discontinuity is 'pushed out' half a unit spacing

/// - the extrapolated data are just as long as the source data

/// - they play well with even splines


template < class buffer_type >

math_type icc_reflect ( const buffer_type & _c , int k ) const

{

  int M = _c.size ( ) ;

  as_math_type < buffer_type > c ( _c ) ;


  math_type z = math_type ( pole[k] ) ;

  math_type zn , z2n , iz ;

  math_type Sum ;

  int  n ;


  if ( horizon[k] < M )

  {

    zn = z ;

    Sum = c[0] ;

    for ( n = 0 ; n < horizon[k] ; n++ )

    {

      Sum += zn * c[n] ;

      zn *= z ;

    }

    return ( Sum ) ;

  }

  else

  {

    zn = z ;

    iz = math_type ( 1.0 ) / z ;

    z2n = math_type ( pow ( xlf_type ( pole[k] ) , xlf_type ( 2 * M )) ) ;

    Sum = 0 ;

    for ( n = 0 ; n < M - 1 ; n++ )

    {

      Sum += ( zn + z2n ) * c[n] ;

      zn *= z ;

      z2n *= iz ;

    }

    Sum += ( zn + z2n ) * c[n] ;

    return c[0] + Sum / ( math_type ( 1.0 ) - zn * zn ) ;

  }

}


/// I still haven't understood the 'magic' which allows to calculate the initial anticausal

/// coefficient from just one result of the causal filter, but I assume it's some exploitation

/// of the symmetry of the data. I have to thank P. Thevenaz for his formula which let me code:


math_type iacc_reflect ( const out_buffer_type & _c , int k ) const

{

  int M = _c.size ( ) ;

  as_math_type < out_buffer_type > c ( _c ) ;


  math_type z = math_type ( pole[k] ) ;


  return c[M - 1] / ( math_type ( 1.0 ) - math_type ( 1.0 ) / z ) ;

}


/// next is periodic BCs. so, f ( x ) = f (x+N)

///

/// Implementing this is more straightforward than implementing the various mirrored types.

/// The mirrored types are, in fact, also periodic, but with a period twice as large, since they

/// repeat only after the first reflection. So especially the code for the full loop is more complex

/// for mirrored types. The down side here is the lack of symmetry to exploit, which made me code

/// a loop for the initial anticausal coefficient as well.


template < class buffer_type >

math_type icc_periodic ( const buffer_type & _c , int k ) const

{

  int M = _c.size ( ) ;

  as_math_type < buffer_type > c ( _c ) ;


  math_type z = math_type ( pole[k] ) ;

  math_type zn ;

  math_type Sum ;

  int  n ;


  if ( horizon[k] < M )

  {

    zn = z ;

    Sum = c[0] ;

    for ( n = M - 1 ; n > ( M - horizon[k] ) ; n-- )

    {

      Sum += zn * c[n] ;

      zn *= z ;

    }

   }

  else

  {

    zn = z ;

    Sum = c[0] ;

    for ( n = M - 1 ; n > 0 ; n-- )

    {

      Sum += zn * c[n] ;

      zn *= z ;

    }

    Sum /= ( math_type ( 1.0 ) - zn ) ;

  }

 return Sum ;

}


math_type iacc_periodic ( const out_buffer_type & _c , int k ) const

{

  int M = _c.size ( ) ;

  as_math_type < out_buffer_type > c ( _c ) ;


  math_type z = math_type ( pole[k] ) ;

  math_type zn ;

  math_type Sum ;


  if ( horizon[k] < M )

  {

    zn = z ;

    Sum = c[M-1] * z ;

    for ( int n = 0 ; n < horizon[k] ; n++ )

    {

      zn *= z ;

      Sum += zn * c[n] ;

    }

    Sum = -Sum ;

  }

  else

  {

    zn = z ;

    Sum = c[M-1] ;

    for ( int n = 0 ; n < M - 1 ; n++ )

    {

      Sum += zn * c[n] ;

      zn *= z ;

    }

    Sum = z * Sum / ( zn - math_type ( 1.0 ) ) ;

  }

  return Sum ;

}


/// guess the initial coefficient. This tries to minimize the effect

/// of starting out with a hard discontinuity as it occurs with zero-padding,

/// while at the same time requiring little arithmetic effort

///

/// for the forward filter, we guess an extrapolation of the signal to the left

/// repeating c[0] indefinitely, which is cheap to compute:


template < class buffer_type >

math_type icc_guess ( const buffer_type & _c , int k ) const

{

  as_math_type < buffer_type > c ( _c ) ;


  return c[0] * math_type ( 1.0 / ( 1.0 - pole[k] ) ) ;

}


// for the backward filter , we assume mirror BC, which is also cheap to compute:


math_type iacc_guess ( const out_buffer_type & c , int k ) const

{

  return iacc_mirror ( c , k ) ;

}


template < class buffer_type >

math_type icc_identity ( const buffer_type & _c , int k ) const

{

  as_math_type < buffer_type > c ( _c ) ;


  return c[0] ;

}


math_type iacc_identity ( const out_buffer_type & _c , int k ) const

{

  int M = _c.size ( ) ;

  as_math_type < out_buffer_type > c ( _c ) ;


  return c[M-1] ;

}


/// now we come to the solving, or prefiltering code itself.

/// The code is adapted from P. Thevenaz' code.

///

/// I use a 'carry' element, 'X', to carry the result of the recursion

/// from one iteration to the next instead of using the direct implementation

/// of the recursion formula, which would read the previous value of the

/// recursion from memory by accessing x[n-1], or x[n+1], respectively.


void solve_gain_inlined ( const in_buffer_type & _c ,

                          out_buffer_type & _x ) const

{

  int M = _c.size ( ) ;

  assert ( _x.size ( ) == M ) ;

  as_math_type < in_buffer_type > c ( _c ) ;

  as_target < out_buffer_type > x ( _x ) ;


  if ( M == 1 )

  {

    x.store ( c[0] , 0 ) ;

    return ;

  }


  assert ( M > 1 ) ;


  // use a buffer of one math_type for the recursion (see below)


  math_type X ;

  math_type p = math_type ( pole[0] ) ;


  // use first filter pole, applying overall gain in the process

  // of consuming the input.

  // Note that the application of the gain is performed during the processing

  // of the first (maybe the only) pole of the filter, instead of running a separate

  // loop over the input to apply it before processing starts.


  // note how the gain is applied to the initial causal coefficient. This is

  // equivalent to first applying the gain to the input and then calculating

  // the initial causal coefficient from the processed input.


  X = math_type ( gain ) * ( this->*_p_icc ) ( _c , 0 ) ;

  x.store ( X , 0 ) ;


  /* causal recursion */

  // the gain is applied to each input value as it is consumed


  for ( int n = 1 ; n < M ; n++ )

  {

    // KFJ 2019-02-12 tentative use of fma

#ifdef USE_FMA

    math_type cc = math_type ( gain ) * c[n] ;

    X = fma ( X , p , cc ) ;

#else

    X = math_type ( gain ) * c[n] + p * X ;

#endif

    x.store ( X , n ) ;

  }


  // now the input is used up and won't be looked at any more; all subsequent

  // processing operates on the output.


  /* anticausal initialization */


  X = ( this->*_p_iacc ) ( _x , 0 ) ;

  x.store ( X , M - 1 ) ;


  /* anticausal recursion */

  for ( int n = M - 2 ; 0 <= n ; n-- )

  {

    X = p * ( X - x[n] ) ;

    x.store ( X , n ) ;

  }


  // for the remaining poles, if any, don't apply the gain

  // and process the result from applying the first pole


  for ( int k = 1 ; k < npoles ; k++ )

  {

    p = math_type ( pole[k] ) ;

    /* causal initialization */

    X = ( this->*_p_iccx ) ( _x , k ) ;

    x.store ( X , 0 ) ;


    /* causal recursion */

    for ( int n = 1 ; n < M ; n++ )

    {

    // KFJ 2019-02-12 tentative use of fma

#ifdef USE_FMA

      math_type xx = x[n] ;

      X = fma ( X , p , xx ) ;

#else

      X = x[n] + p * X ;

#endif

      x.store ( X , n ) ;

    }


    /* anticausal initialization */

    X = ( this->*_p_iacc ) ( _x , k ) ;

    x.store ( X , M - 1 ) ;


    /* anticausal recursion */

    for ( int n = M - 2 ; 0 <= n ; n-- )

    {

      X = p * ( X - x[n] ) ;

      x.store ( X , n ) ;

    }

  }

}


/// solve_identity is used for spline degrees 0 and 1. In this case

/// there are no poles to apply, but if the operation is not in-place

/// and/or there is a 'boost' factor which is different from 1, the

/// data are copied and/or amplified with 'boost'.


void solve_identity ( const in_buffer_type & _c ,

                           out_buffer_type & _x ) const

{

  int M = _c.size ( ) ;

  assert ( _x.size ( ) == M ) ;

  as_math_type < in_buffer_type > c ( _c ) ;

  as_target < out_buffer_type > x ( _x ) ;


  if ( boost == xlf_type ( 1 ) )

  {

    // boost is 1, check if operation is not in-place

    if ( ( void* ) ( _c.data ( ) ) != ( void* ) ( _x.data ( ) ) )

    {

      // operation is not in-place, copy input to output

      for ( int n = 0 ; n < M ; n++ )

      {

        x.store ( c[n] , n ) ;

      }

    }

  }

  else

  {

    // we have a boost factor, so we apply it.

    math_type factor = math_type ( boost ) ;


    for ( int n = 0 ; n < M ; n++ )

    {

      x.store ( factor * c[n] , n ) ;

    }

  }

}


/// The last bit of work left is the constructor. This simply passes

/// the specs to the base class constructor, as iir_filter inherits

/// from the specs type.


public:


  iir_filter ( const iir_filter_specs & specs )

  : iir_filter_specs ( specs )

{

  // TODO we have a problem if the gain is getting very large, as it happens

  // for high spline degrees. The iir_filter attenuates the signal to next-to-nothing,

  // then it's amplified back to the previous amplitude. This degrades the signal,

  // most noticeably when the numeric type is lo-fi, since there are operations involving

  // both the attenuated and unattenuated data ('squashing').


  if ( npoles < 1 )

  {

    // zero poles means there's nothing to do but possibly

    // copying the input to the output, which solve_identity

    // will do if the operation isn't in-place.

    _p_solve = & filter_type::solve_identity ;

    return ;

  }


  // calculate the horizon for each pole, this is the number of iterations

  // the filter must perform on a unit pulse to decay below 'tolerance'


  // If tolerance is 0 (or negative) we set 'horizon' to MAX_INT. This

  // will have the effect of making it larger than M, or at least so

  // large that there won't be a difference between the accelerated and

  // the full loop. We might use a smaller value which still guarantees

  // the complete decay.


  for ( int i = 0 ; i < npoles ; i++ )

  {

    if ( tolerance > 0 )

      horizon.push_back (   ceil ( log ( tolerance )

                          / log ( std::abs ( pole[i] ) ) ) ) ;

    else

      horizon.push_back ( INT_MAX ) ; // TODO quick fix, think about it

  }


  // contrary to my initial implementation I use per-axis gain instead of

  // cumulating the gain for all axes. This may perform slightly worse, but

  // is more stable numerically and simplifies the code.


  gain = boost * vspline::overall_gain ( npoles , pole ) ;

  _p_solve = & filter_type::solve_gain_inlined ;


  // while the forward/backward IIR iir_filter in the solve_... routines is the same for all

  // boundary conditions, the calculation of the initial causal and anticausal coefficients

  // depends on the boundary conditions and is handled by a call through a method pointer

  // in the solve_... routines. Here we fix these method pointers:


  if ( bc == MIRROR )

  {

    _p_icc = & filter_type::icc_mirror<in_buffer_type> ;

    _p_iccx = & filter_type::icc_mirror<out_buffer_type> ;

    _p_iacc = & filter_type::iacc_mirror ;

  }

  else if ( bc == NATURAL )

  {

    _p_icc = & filter_type::icc_natural<in_buffer_type> ;

    _p_iccx = & filter_type::icc_natural<out_buffer_type> ;

    _p_iacc = & filter_type::iacc_natural ;

  }

  else if ( bc == PERIODIC )

  {

    _p_icc = & filter_type::icc_periodic<in_buffer_type> ;

    _p_iccx = & filter_type::icc_periodic<out_buffer_type> ;

    _p_iacc = & filter_type::iacc_periodic ;

  }

  else if ( bc == REFLECT )

  {

    _p_icc = & filter_type::icc_reflect<in_buffer_type> ;

    _p_iccx = & filter_type::icc_reflect<out_buffer_type> ;

    _p_iacc = & filter_type::iacc_reflect ;

  }

  else if ( bc == ZEROPAD )

  {

    _p_icc = & filter_type::icc_identity<in_buffer_type> ;

    _p_iccx = & filter_type::icc_identity<out_buffer_type> ;

    _p_iacc = & filter_type::iacc_identity ;

  }

  else if ( bc == GUESS )

  {

    _p_icc = & filter_type::icc_guess<in_buffer_type> ;

    _p_iccx = & filter_type::icc_guess<out_buffer_type> ;

    _p_iacc = & filter_type::iacc_guess ;

  }

  else

  {

    throw not_supported ( "boundary condition not supported by vspline::filter" ) ;

  }

}


} ; // end of class iir_filter


/// class to provide b-spline prefiltering, using 'iir_filter' above.

/// The actual filter object has to interface with the data handling

/// routine ('present', see filter.h). So this class functions as an

/// adapter, combining the code needed to set up adequate buffers

/// and creation of the actual IIR filter itself.

/// The interface to the data handling routine is provided by

/// inheriting from class buffer_handling


// KFJ 2019-04-16 added default for _vsize template argument


template < template < typename , size_t > class _vtype ,

           typename _math_ele_type ,

           size_t _vsize =

             vspline::vector_traits<_math_ele_type>::size >

struct bspl_prefilter

: public buffer_handling < _vtype , _math_ele_type , _vsize > ,

  public vspline::iir_filter < _vtype < _math_ele_type , _vsize > >

{

  // provide this type for queries


  typedef _math_ele_type math_ele_type ;


  // we'll use a few types from the buffer_handling type


  typedef buffer_handling < _vtype , _math_ele_type , _vsize > buffer_handling_type ;


  using typename buffer_handling_type::vtype ;

  using buffer_handling_type::vsize ;

  using buffer_handling_type::init ;


  // instances of class bspl_prefilter hold the buffer:


  using allocator_t

  = typename vspline::allocator_traits < vtype > :: type ;


  vigra::MultiArray < 1 ,  vtype , allocator_t > buffer ;


  // the filter's 'solve' routine has the workhorse code to filter

  // the data inside the buffer:


  typedef _vtype < _math_ele_type , _vsize > simdized_math_type ;

  typedef vspline::iir_filter < simdized_math_type > filter_type ;

  using filter_type::solve ;


  // by defining arg_type, we allow code to infer what type of

  // argument initializer the filter takes


  typedef iir_filter_specs arg_type ;


  // the constructor invokes the filter's constructor,

  // sets up the buffer and initializes the buffer_handling

  // component to use the whole buffer to accept incoming and

  // provide outgoing data.


  bspl_prefilter ( const iir_filter_specs & specs , size_t size )

  : filter_type ( specs ) ,

    buffer ( size )

  {

    // operate in-place and use the whole buffer to receive and

    // deliver data


    init ( buffer , buffer ) ;

  } ;


  // operator() simply delegates to the filter's 'solve' routine,

  // which filters the data in the buffer.


  void operator() ( )

  {

    solve ( buffer , buffer ) ;

  }


  // factory function to provide a filter with the same set of

  // parameters, but possibly different data types. this is used

  // for processing of 1D data, where the normal buffering mechanism

  // may be sidestepped


  template < typename in_type ,

             typename out_type = in_type ,

             typename math_type = out_type >

  static vspline::iir_filter < in_type , out_type , math_type >

         get_raw_filter ( const iir_filter_specs & specs )

  {

    return vspline::iir_filter < in_type , out_type , math_type >

           ( specs ) ;

  }


} ;


/// amplify is used to copy input to output, optionally applying

/// 'boost' in the process. If the operation is in-place and 'boost'

/// is 1, 'amplify' returns prematurely.


template < unsigned int dimension ,

           typename in_value_type ,

           typename out_value_type ,

           typename math_ele_type >

void amplify ( const vigra::MultiArrayView

                     < dimension , in_value_type >  & input ,

               vigra::MultiArrayView

                      < dimension , out_value_type > & output ,

               math_ele_type boost = 1 ,

               int njobs = vspline::default_njobs

             )

{

  // if the operation is in-place and boost is 1,

  // there is nothing to do.


  if (    (void*) ( input.data() ) == (void*) ( output.data() )

       && boost == math_ele_type ( 1 ) )

    return ;


  assert ( input.size() == output.size() ) ;


  // set up variables to orchestrate the batchwise processing

  // of the data by multithread's payload routine


  std::ptrdiff_t batch_size = 1024 ;

  std::ptrdiff_t total_size = input.size() ;

  vspline::atomic < std::ptrdiff_t > tickets ( total_size ) ;


  // the payload routine will process batches of up to batch_size

  // and continue to do so until the input is exhausted


  std::function < void() > payload =

  [&]()

  {

    // start and end index for the batches to be processed;

    // these are fetched from 'tickets' in the caller, above


    std::ptrdiff_t lo , hi ;


    while ( vspline::fetch_range_ascending

              ( tickets , batch_size , total_size , lo , hi ) )

    {

      if ( boost == math_ele_type ( 1 ) )

      {

        while ( lo < hi )

        {

          output[lo] = out_value_type ( input[lo] ) ;

          ++lo ;

        }

      }

      else

      {

        while ( lo < hi )

        {

          output[lo] = out_value_type ( input[lo] * boost ) ;

          ++lo ;

        }

      }

    }

  } ;


  // launch njobs workers executing the payload routine


  vspline::multithread ( payload , njobs ) ;

}


/// 'prefilter' handles b-spline prefiltering for the whole range of

/// acceptable input and output. It combines two bodies of code to

/// achieve this goal:

/// - the b-spline filtering code above

/// - 'wielding' code in filter.h, which is not specific to b-splines.

///

/// Note that vsize , the vectorization width, can be passed explicitly.

/// If Vc is in use and math_ele_type can be used with hardware

/// vectorization, the arithmetic will be done with Vc::SimdArrays

/// of the given size. Otherwise 'goading' will be used: the data are

/// presented in TinyVectors of vsize math_ele_type, hoping that the

/// compiler may autovectorize the operation.


// KFJ 2018-12-20 added default for math_ele_type, static_cast to

// int for bcv's dimension, default for 'tolerance' - so now the

// prototype matches that of the functions in general_filter.h


template < unsigned int dimension ,

           typename in_value_type ,

           typename out_value_type ,

           typename math_ele_type =

                    typename vspline::common_math_ele_type

                             < in_value_type , out_value_type > ,

           size_t vsize =

                  vspline::vector_traits < math_ele_type > :: size

         >

void prefilter ( const

                 vigra::MultiArrayView

                   < dimension ,

                     in_value_type > & input ,

                 vigra::MultiArrayView

                   < dimension ,

                     out_value_type > & output ,

                 vigra::TinyVector

                   < bc_code ,

                     static_cast < int > ( dimension ) > bcv ,

                 int degree ,

                 xlf_type tolerance

                  = std::numeric_limits < math_ele_type > :: epsilon(),

                 xlf_type boost = xlf_type ( 1 ) ,

                 int njobs = default_njobs )

{

  if ( degree <= 1 )

  {

    // if degree is <= 1, there is no filter to apply, but we may need

    // to apply 'boost' and/or copy input to output. We use 'amplify'

    // for the purpose, which multithreads the operation (if it is at

    // all necessary). I found this is (slightly) faster than doing the

    // job in a single thread - the process is mainly memory-bound, so

    // the gain is moderate.


    amplify < dimension , in_value_type , out_value_type , math_ele_type >

      ( input , output , math_ele_type ( boost ) ) ;


    return ;

  }


  std::vector < vspline::iir_filter_specs > vspecs ;


  // package the arguments to the filter; one set of arguments

  // per axis of the data


  auto poles = vspline_constants::precomputed_poles [ degree ] ;


  for ( int axis = 0 ; axis < dimension ; axis++ )

  {

    vspecs.push_back

      ( vspline::iir_filter_specs

        ( bcv [ axis ] , degree / 2 , poles , tolerance , 1 ) ) ;

  }


  // 'boost' is only applied to dimension 0, since it is meant to

  // affect the whole data set just once, not once per axis.


  vspecs [ 0 ] . boost = boost ;


  // KFJ 2018-05-08 with the automatic use of vectorization the

  // distinction whether math_ele_type is 'vectorizable' or not

  // is no longer needed: simdized_type will be a Vc::SimdArray

  // if possible, a vspline::simd_type otherwise.


  typedef typename vspline::bspl_prefilter

                            < vspline::simdized_type ,

                              math_ele_type ,

                              vsize

                            > filter_type ;


  // now call the 'wielding' code in filter.h


    vspline::filter

    < in_value_type , out_value_type , dimension , filter_type >

    ( input , output , vspecs ) ;

}


} ; // namespace vspline


#endif // VSPLINE_PREFILTER_H

vspline::buffer_handling
buffer_handling provides services needed for interfacing with a buffer of simdized/goading data....
Definition: filter.h:227

vspline::buffer_handling< _vtype, _math_ele_type, _vsize >::vsize
@ vsize
Definition: filter.h:230

vspline::buffer_handling< _vtype, _math_ele_type, _vsize >::vtype
_vtype< dtype, vsize > vtype
Definition: filter.h:232

vspline::buffer_handling< _vtype, _math_ele_type, _vsize >::init
void init(vigra::MultiArrayView< 1, vtype > &_in_window, vigra::MultiArrayView< 1, vtype > &_out_window)
Definition: filter.h:237

vspline::iir_filter
class iir_filter implements an n-pole forward/backward recursive filter to be used for b-spline prefi...
Definition: prefilter.h:193

vspline::iir_filter::solve
void solve(const in_buffer_type &input, out_buffer_type &output)
solve() takes two buffers, one to the input data and one to the output space. The containers must hav...
Definition: prefilter.h:259

vspline::iir_filter::is_single_pass
static const bool is_single_pass
Definition: prefilter.h:235

vspline::iir_filter::solve
void solve(out_buffer_type &data)
for in-place operation we use the same filter routine.
Definition: prefilter.h:267

vspline::iir_filter::iir_filter
iir_filter(const iir_filter_specs &specs)
The last bit of work left is the constructor. This simply passes the specs to the base class construc...
Definition: prefilter.h:805

vspline::iir_filter::get_support_width
int get_support_width() const
calling code may have to set up buffers with additional space around the actual data to allow filteri...
Definition: prefilter.h:245

common.h
definitions common to all files in this project, utility code

vsize
@ vsize
Definition: eval.cc:96

filter.h
generic implementation of separable filtering for nD arrays

std
Definition: hwy_simd_type.h:2101

vspline_constants::precomputed_poles
const xlf_type *const precomputed_poles[]
Definition: poles.h:1902

vspline
Definition: basis.h:79

vspline::prefilter
void prefilter(const vigra::MultiArrayView< dimension, in_value_type > &input, vigra::MultiArrayView< dimension, out_value_type > &output, vigra::TinyVector< bc_code, static_cast< int >(dimension) > bcv, int degree, xlf_type tolerance=std::numeric_limits< math_ele_type > ::epsilon(), xlf_type boost=xlf_type(1), int njobs=default_njobs)
'prefilter' handles b-spline prefiltering for the whole range of acceptable input and output....
Definition: prefilter.h:1082

vspline::filter
void filter(const vigra::MultiArrayView< D, in_type > &input, vigra::MultiArrayView< D, out_type > &output, types ... args)
vspline::filter is the common entry point for filter operations in vspline. This routine does not yet...
Definition: filter.h:1495

vspline::default_njobs
const int default_njobs
Definition: multithread.h:220

vspline::multithread
int multithread(std::function< void() > payload, std::size_t nr_workers=default_njobs)
multithread uses a thread pool of worker threads to perform a multithreaded operation....
Definition: multithread.h:412

vspline::common_math_ele_type
typename vigra::NumericTraits< promote_ele_type< T1, T2 > > ::RealPromote common_math_ele_type
Definition: common.h:192

vspline::fetch_range_ascending
bool fetch_range_ascending(vspline::atomic< index_t > &source, const index_t &count, const index_t &total, index_t &low, index_t &high)
fetch_range_ascending also uses an atomic initialized to the total number of indexes to be distribute...
Definition: multithread.h:336

vspline::xlf_type
long double xlf_type
Definition: common.h:102

vspline::amplify
void amplify(const vigra::MultiArrayView< dimension, in_value_type > &input, vigra::MultiArrayView< dimension, out_value_type > &output, math_ele_type boost=1, int njobs=vspline::default_njobs)
amplify is used to copy input to output, optionally applying 'boost' in the process....
Definition: prefilter.h:994

vspline::simdized_type
typename vector_traits< T, N > ::type simdized_type
this alias is used as a shorthand to pick the vectorized type for a given type T and a size N from 'v...
Definition: vector.h:459

vspline::atomic
std::atomic< T > atomic
Definition: multithread.h:224

vspline::bc_code
bc_code
This enumeration is used for codes connected to boundary conditions. There are two aspects to boundar...
Definition: common.h:71

vspline::GUESS
@ GUESS
Definition: common.h:78

vspline::NATURAL
@ NATURAL
Definition: common.h:75

vspline::REFLECT
@ REFLECT
Definition: common.h:74

vspline::PERIODIC
@ PERIODIC
Definition: common.h:73

vspline::MIRROR
@ MIRROR
Definition: common.h:72

vspline::ZEROPAD
@ ZEROPAD
Definition: common.h:77

poles.h
precalculated prefilter poles and basis function values

vspline::allocator_traits
vspline creates vigra::MultiArrays of vectorized types. As long as the vectorized types are Vc::SimdA...
Definition: common.h:267

vspline::bspl_prefilter
class to provide b-spline prefiltering, using 'iir_filter' above. The actual filter object has to int...
Definition: prefilter.h:914

vspline::bspl_prefilter::buffer_handling_type
buffer_handling< _vtype, _math_ele_type, _vsize > buffer_handling_type
Definition: prefilter.h:921

vspline::bspl_prefilter::arg_type
iir_filter_specs arg_type
Definition: prefilter.h:944

vspline::bspl_prefilter::filter_type
vspline::iir_filter< simdized_math_type > filter_type
Definition: prefilter.h:938

vspline::bspl_prefilter::operator()
void operator()()
Definition: prefilter.h:964

vspline::bspl_prefilter::math_ele_type
_math_ele_type math_ele_type
Definition: prefilter.h:917

vspline::bspl_prefilter::simdized_math_type
_vtype< _math_ele_type, _vsize > simdized_math_type
Definition: prefilter.h:937

vspline::bspl_prefilter::bspl_prefilter
bspl_prefilter(const iir_filter_specs &specs, size_t size)
Definition: prefilter.h:951

vspline::bspl_prefilter::solve
void solve(const in_buffer_type &input, out_buffer_type &output)
solve() takes two buffers, one to the input data and one to the output space. The containers must hav...
Definition: prefilter.h:259

vspline::bspl_prefilter::buffer
vigra::MultiArray< 1, vtype, allocator_t > buffer
Definition: prefilter.h:932

vspline::bspl_prefilter::allocator_t
typename vspline::allocator_traits< vtype > ::type allocator_t
Definition: prefilter.h:930

vspline::bspl_prefilter::get_raw_filter
static vspline::iir_filter< in_type, out_type, math_type > get_raw_filter(const iir_filter_specs &specs)
Definition: prefilter.h:978

vspline::iir_filter_specs
structure to hold specifications for an iir_filter object. This set of parameters has to be passed th...
Definition: prefilter.h:163

vspline::iir_filter_specs::npoles
int npoles
Definition: prefilter.h:165

vspline::iir_filter_specs::tolerance
xlf_type tolerance
Definition: prefilter.h:167

vspline::iir_filter_specs::iir_filter_specs
iir_filter_specs(vspline::bc_code _bc, int _npoles, const xlf_type *_pole, xlf_type _tolerance, xlf_type _boost=xlf_type(1))
Definition: prefilter.h:170

vspline::iir_filter_specs::pole
const xlf_type * pole
Definition: prefilter.h:166

vspline::iir_filter_specs::boost
xlf_type boost
Definition: prefilter.h:168

vspline::iir_filter_specs::bc
vspline::bc_code bc
Definition: prefilter.h:164

vspline::not_supported
exception which is thrown if an opertion is requested which vspline does not support
Definition: common.h:307

vspline::vector_traits
with the definition of 'simd_traits', we can proceed to implement 'vector_traits': struct vector_trai...
Definition: vector.h:344