transform_8h_source.html

/************************************************************************/

/*                                                                      */

/*    vspline - a set of generic tools for creation and evaluation      */

/*              of uniform b-splines                                    */

/*                                                                      */

/*            Copyright 2015 - 2023 by Kay F. Jahnke                    */

/*                                                                      */

/*    The git repository for this software is at                        */

/*                                                                      */

/*    https://bitbucket.org/kfj/vspline                                 */

/*                                                                      */

/*    Please direct questions, bug reports, and contributions to        */

/*                                                                      */

/*    kfjahnke+vspline@gmail.com                                        */

/*                                                                      */

/*    Permission is hereby granted, free of charge, to any person       */

/*    obtaining a copy of this software and associated documentation    */

/*    files (the "Software"), to deal in the Software without           */

/*    restriction, including without limitation the rights to use,      */

/*    copy, modify, merge, publish, distribute, sublicense, and/or      */

/*    sell copies of the Software, and to permit persons to whom the    */

/*    Software is furnished to do so, subject to the following          */

/*    conditions:                                                       */

/*                                                                      */

/*    The above copyright notice and this permission notice shall be    */

/*    included in all copies or substantial portions of the             */

/*    Software.                                                         */

/*                                                                      */

/*    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND    */

/*    EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES   */

/*    OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND          */

/*    NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT       */

/*    HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,      */

/*    WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING      */

/*    FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR     */

/*    OTHER DEALINGS IN THE SOFTWARE.                                   */

/*                                                                      */

/************************************************************************/


/*! \file transform.h


    \brief set of generic remap, transform and apply functions


    My foremost reason to have efficient B-spline processing is the

    formulation of generic remap-like functions. remap() is a function

    which takes an array of real-valued nD coordinates and an interpolator

    over a source array. Now each of the real-valued coordinates is fed

    into the interpolator in turn, yielding a value, which is placed in

    the output array at the same place the coordinate occupies in the

    coordinate array. To put it concisely, if we have


    - c, the coordinate array (or 'warp' array, 'arguments' array)

    - a, the source array (containing 'original' or 'knot point' data)

    - i, the interpolator over a

    - j, a coordinate into both c and t

    - t, the target array (receiving the 'result' of the remap)


    remap defines the operation


    t[j] = i(c[j]) for all j


    Now we widen the concept of remapping to a 'transform' function.

    Instead of limiting the process to the use of an 'interpolator',

    we use an arbitrary unary functor transforming incoming values to

    outgoing values, where the type of the incoming and outgoing values

    is determined by the functor. If the functor actually is an

    interpolator, we have a 'true' remap transforming coordinates

    into values, but this is merely a special case. So here we have:


    - c, an array containing input values

    - f, a unary functor converting input to output values

    - j, a coordinate into c and t

    - t, the target array


    transform performs the operation


    t[j] = f(c[j]) for all j


    remaps/transforms to other-dimensional objects are supported.

    This makes it possible to, for example, remap from a volume to a

    2D image, using a 2D coordinate array containing 3D coordinates

    ('slicing' a volume)


    There is also a variant of this transform function in this file,

    which doesn't take an input array. Instead, for every target

    location, the location's discrete coordinates are passed to the

    unary_functor type object. This way, transformation-based remaps

    can be implemented easily: the user code just has to provide a

    suitable functor to yield values for coordinates. This functor

    will internally take the discrete incoming coordinates (into the

    target array) and take it from there, eventually producing values

    of the target array's value_type.


    Here we have:


    - f, a unary functor converting discrete coordinates to output values

    - j, a discrete coordinate into t

    - t, the target array


    'index-based' transform performs the operation


    t[j] = f(j) for all j


    This file also has code to evaluate a b-spline at coordinates in a

    grid, which can be used for scaling, and for separable geometric

    transformations.


    Finally there is a function to restore the original data from a

    b-spline to the precision possible with the given data type and

    degree of the spline. This is done with a separable convolution,

    using a unit-stepped sampling of the basis function as the

    convolution kernel along every axis.


    Let me reiterate the strategy used to perform the transforms and

    remaps in this file. The approach is functional: A 'processing chain'

    is set up and encoded as a functor providing two evaluation functions:

    one for 'single' data and one for vectorized data. This functor is

    applied to the data by 'wielding' code, which partitions the data

    into several jobs to be performed by individual worker threads,

    invokes the worker threads, and, once in the worker thread, feeds the

    data to the functor in turn, using hardware vectorization if possible.

    So while at the user code level a single call to some 'transform'

    or 'remap' routine is issued, passing in arrays of data and functors,

    all the 'wielding' is done automatically without any need for the user

    to even be aware of it, while still using highly efficient vector code

    with a tread pool, potentially speeding up the operation greatly as

    compared to single-threaded, unvectorized operation, of course

    depending on the presence of several cores and vector units. On

    my system (Haswell 4-core i5 with AVX2) the speedup is about one

    order of magnitude. The only drawback is the production of a

    hardware-specific binary if vectorization is used. Both Vc use

    and multithreading can be easily activated/deactivated by #define

    switches, providing a clean fallback solution to produce code suitable

    for any target, even simple single-core machines with no vector units.

    Vectorization will be used if possible - either explicit vectorization

    (by defining USE_VC) - or autovectorization per default.

    Defining VSPLINE_SINGLETHREAD will disable multithreading.

    The code accessing multithreading and/or Vc use is #ifdeffed, so if

    these features are disabled, their code 'disappears' and the relevant

    headers are not included, nor do the corresponding libraries have to

    be present.

*/


// TODO: don't multithread or reduce number of jobs for small data sets


#ifndef VSPLINE_TRANSFORM_H

#define VSPLINE_TRANSFORM_H


#include "multithread.h" // vspline's multithreading code

#include "eval.h"        // evaluation of b-splines

#include "poles.h"

#include "convolve.h"


// The bulk of the implementation of vspline's two 'transform' functions

// is now in wielding.h:


#include "wielding.h"


namespace vspline {


/// implementation of two-array transform using wielding::coupled_wield.

///

/// 'array-based' transform takes two template arguments:

///

/// - 'unary_functor_type', which is a class satisfying the interface

///   laid down in unary_functor.h. Typically, this would be a type

///   inheriting from vspline::unary_functor, but any type will do as

///   long as it provides the required typedefs and an the relevant

///   eval() routines.

///

/// - the dimensionality of the input and output array

///

/// this overload of transform takes three parameters:

///

/// - a reference to a const unary_functor_type object providing the

///   functionality needed to generate values from arguments.

///

/// - a reference to a const MultiArrayView holding arguments to feed to

///   the unary functor object. It has to have the same shape as the target

///   array and contain data of the unary_functor's 'in_type'.

///

/// - a reference to a MultiArrayView to use as a target. This is where the

///   resulting data are put, so it has to contain data of unary_functor's

///   'out_type'. It has to have the same shape as the input array.

///

/// transform can be used without template arguments, they will be inferred

/// by ATD from the arguments.

///

/// trailing the argument list, all transform overloads have two further

/// parameters with default values, which are only used for special purposes:

///

/// - njobs gives the number of worker threads to be used when multithreading

///   the transform. The default results in using as many worker threads as

///   there are worker threads in the thread pool. This is a generous number,

///   and several times the number of physical cores, but this benefits

///   performance. Passing '1' here will execute the transform in 'this'

///   thread rather than invoking a single worker thread from the pool.

///

/// - p_cancel, when passed, has to point to a std::atomic<bool> which

///   is initially 'true' but can be (atomically) set to 'false' by calling

///   code. This will result in the transform stopping 'at the next convenient

///   point', usually after having processed a complete line or line segment

///   of data, resulting in an invalid result. While the result can not be

///   used, this is a graceful way out if the calling code finds that

///   continuing with the transform is futile - for instance because the

///   user asks for the whole program to terminate. The cancellation still

///   takes 'a while' to 'trickle down' to all worker threads.


template < typename unary_functor_type ,

           unsigned int dimension >

void transform ( const unary_functor_type & functor ,

                 const vigra::MultiArrayView

                     < dimension ,

                       typename unary_functor_type::in_type

                     > & input ,

                  vigra::MultiArrayView

                     < dimension ,

                       typename unary_functor_type::out_type

                     > & output ,

                  int njobs = vspline::default_njobs ,

                  vspline::atomic < bool > * p_cancel = 0

               )

{

  // check shape compatibility


  if ( output.shape() != input.shape() )

  {

    throw vspline::shape_mismatch

     ( "transform: the shapes of the input and output array do not match" ) ;

  }


  // wrap the vspline::unary_functor to be used with wielding code.

  // The wrapper is necessary because the code in wielding.h feeds

  // arguments as TinyVectors, even if the data are 'singular'.

  // The wrapper simply reinterpret_casts any TinyVectors of one

  // element to their corresponding value_type before calling the

  // functor.


  typedef wielding::vs_adapter < unary_functor_type > coupled_functor_type ;

  coupled_functor_type coupled_functor ( functor ) ;


  // we'll cast the pointers to the arrays to these types to be

  // compatible with the wrapped functor above.


  typedef typename coupled_functor_type::in_type src_type ;

  typedef typename coupled_functor_type::out_type trg_type ;


  typedef vigra::MultiArrayView < dimension , src_type > src_view_type ;

  typedef vigra::MultiArrayView < dimension , trg_type > trg_view_type ;


  wielding::coupled_wield < coupled_functor_type , dimension >

    ( coupled_functor ,

      (src_view_type*)(&input) ,

      (trg_view_type*)(&output) ,

      njobs ,

      p_cancel

    ) ;

}


/// reduce function processing by value. sink_functor_type is a functor

/// taking values (which are loaded from an array). The reduce function,

/// by delegation to wielding::value_reduce, performs the feeding part

/// by parcelling the data, the functor receives either single in_type

/// or vectorized data. The functor has to be a construct capable of

/// accumulating data with several threads in parallel. A typical

/// example would be a functor cumulating to some internal structure

/// and 'offloading' to a mutex-protected pooling target during it's

/// destructor. The wielding code creates a copy of the functor which

/// is passed here for each worker thread and guarantees that this

/// copy's destructor is executed before the call to reduce returns,

/// so the user can be assured of having collected the partial results

/// of all worker threads in the pooling target. The functor needs a

/// copy constructor to communicate the pooling target to the copies

/// made for the individual threads, see the index-based reduce function

/// below this one for an illustration of the simple and elegant mechanism.

/// Note that sink_functor will normally be a type inheriting from

/// vspline::sink_functor, a close relative of vspline::unary_functor

/// and also defined in unary_functor.h. This inheritance does not

/// produce any functionality, just the standard vspline functor

/// argument type system, which this code relies on. Of course the

/// sink functor can be coded 'manually' as well - inheriting form

/// vspline::sink_functor is merely convenient.


template < typename sink_functor_type ,

           unsigned int dimension >

void reduce ( const sink_functor_type & functor ,

              const vigra::MultiArrayView

                  < dimension ,

                    typename sink_functor_type::in_type

                  > & input ,

              int njobs = vspline::default_njobs ,

              vspline::atomic < bool > * p_cancel = 0

             )

{

  // wrap the vspline::sink_functor to be used with wielding code.

  // The wrapper is necessary because the code in wielding.h feeds

  // all arguments as TinyVectors, even if the data are 'singular'.

  // The wrapper simply reinterpret_casts any TinyVectors of one

  // element to the corresponding value_type before calling the

  // functor. At the same time, this function reinterpret_casts

  // the incoming array to an array of TinyVectors, even if the

  // data in the array are fundamentals. This is because the

  // wielding code expects an array of TinyVectors in every case.


  typedef wielding::vs_sink_adapter

            < sink_functor_type > adapted_functor_type ;

  adapted_functor_type adapted_functor ( functor ) ;


  // we'll cast the pointers to the array to this type to be

  // compatible with the wrapped functor above.


  typedef typename adapted_functor_type::in_type src_type ;


  typedef vigra::MultiArrayView < dimension , src_type > src_view_type ;


  // now delegate to the wielding code


  wielding::value_reduce < adapted_functor_type , dimension >

    ( adapted_functor ,

      (src_view_type*)(&input) ,

      njobs ,

      p_cancel

    ) ;

}


/// reduce function processing by coordinate. sink_functor_type is

/// a functor receiving integral coordinates, with no fixed meaning,

/// but usually pertaining to an array. The reduce function feeds

/// all coordinates encompassed by the shape, either in vectorized

/// or single form. Here's an example of a functor for this function:


/*


template < typename coordinate_type > struct count_items

: public vspline::sink_functor < coordinate_type >

{

  typedef vspline::sink_functor < coordinate_type > base_type ;

  using typename base_type::in_type ;

  using typename base_type::in_v ;

  using base_type::vsize ;


  // reference to a thread-safe 'pooling target'


  std::atomic < long > & collector ;

  long sum ;


  // this c'tor is used to create the initial count_items object and

  // fixes the reference to the 'pooling target'


  count_items ( std::atomic < long > & _collector )

  : collector ( _collector )

  { sum = 0 ; }


  // count_items needs a copy c'tor to propagate the reference to

  // the 'pooling target'. Note that copy assignment won't work.


  count_items ( const count_items & other )

  : collector ( other.collector )

  { sum = 0 ; }


  // finally, two operator() overloads, one for scalar input and

  // one for SIMD input.


  void operator() ( const in_type & crd )

  { sum ++ ; }


  void operator() ( const in_v & crd )

  { sum += vsize ; }


  ~count_items()

  { collector += sum ; }

} ;


*/


/// implementation of index-based transform using wielding::index_wield

///

/// this overload of transform() is very similar to the first one, but

/// instead of picking input from an array, it feeds the discrete coordinates

/// of the successive places data should be rendered to to the

/// unary_functor_type object.

///

/// This sounds complicated, but is really quite simple. Let's assume you have

/// a 2X3 output array to fill with data. When this array is passed to transform,

/// the functor will be called with every coordinate pair in turn, and the result

/// the functor produces is written to the output array. So for the example given,

/// with 'ev' being the functor, we have this set of operations:

///

/// output [ ( 0 , 0 ) ] = ev ( ( 0 , 0 ) ) ;

///

/// output [ ( 1 , 0 ) ] = ev ( ( 1 , 0 ) ) ;

///

/// output [ ( 2 , 0 ) ] = ev ( ( 2 , 0 ) ) ;

///

/// output [ ( 0 , 1 ) ] = ev ( ( 0 , 1 ) ) ;

///

/// output [ ( 1 , 1 ) ] = ev ( ( 1 , 1 ) ) ;

///

/// output [ ( 2 , 1 ) ] = ev ( ( 2 , 1 ) ) ;

///

/// this transform overload takes one template argument:

///

/// - 'unary_functor_type', which is a class satisfying the interface laid

///   down in unary_functor.h. This is an object which can provide values

///   given *discrete* coordinates, like class evaluator, but generalized

///   to allow for arbitrary ways of achieving it's goal. The unary functor's

///   'in_type' determines the number of dimensions of the coordinates - since

///   they are coordinates into the target array, the functor's input type

///   has to have the same number of dimensions as the target. The functor's

///   'out_type' has to be the same as the data type of the target array, since

///   the target array stores the results of calling the functor.

///

/// this transform overload takes two parameters:

///

/// - a reference to a const unary_functor_type object providing the

///   functionality needed to generate values from discrete coordinates

///

/// - a reference to a MultiArrayView to use as a target. This is where the

///   resulting data are put.

///

/// Please note that vspline holds with vigra's coordinate handling convention,

/// which puts the fastest-changing index first. In a 2D, image processing,

/// context, this is the column index, or the x coordinate. C and C++ do

/// instead put this index last when using multidimensional array access code.

///

/// transform can be used without template arguments, they will be inferred

/// by ATD from the arguments.

///

/// See the remarks on the two trailing parameters given with the two-array

/// overload of transform.


template < class unary_functor_type >

void transform ( const unary_functor_type & functor ,

                 vigra::MultiArrayView

                        < unary_functor_type::dim_in ,

                          typename unary_functor_type::out_type

                        > & output ,

                 int njobs = vspline::default_njobs ,

                 vspline::atomic < bool > * p_cancel = 0 )

{

  enum { dimension = unary_functor_type::dim_in } ;


  // wrap the vspline::unary_functor to be used with wielding code


  typedef wielding::vs_adapter < unary_functor_type > index_functor_type ;

  index_functor_type index_functor ( functor ) ;


  // we'll cast the pointer to the target array to this type to be

  // compatible with the wrapped functor above


  typedef typename index_functor_type::out_type trg_type ;

  typedef vigra::MultiArrayView < dimension , trg_type > trg_view_type ;


  // now delegate to the wielding code


  wielding::index_wield < index_functor_type , dimension >

    ( index_functor ,

      (trg_view_type*)(&output) ,

      njobs ,

      p_cancel ) ;

}


/// while the function above fills an array by calling a functor with

/// the target coordinates, this function accumulates the results by

/// means of a sink_functor. The effect is as if an index-based transform

/// was executed first, filling an intermediate array, followed by a call

/// to reduce taking the intermediate array as input. The combination of

/// both operations 'cuts out' the intermediate. Since this function does

/// not have a target array, the shape has to be communicated explicitly.

/// The functor will be invoked for every possible coordinate within the

/// shape's scope. This is done by the function 'index_reduce' in wielding.h

/// the type of the argument 'shape' is derived from the sink_functor's

/// 'dim_in' value, to make the code 'respect the singular': if the data

/// are 1D, reduce will accept a single integer as 'shape', whereas nD

/// data require a TinyVector of int.

/// For suitable sink_functors, see the remarks and example given above

/// in the comments for the array-base overload of 'reduce'.


template < typename sink_functor_type >

void reduce ( const sink_functor_type & functor ,

              typename std::conditional

                     < ( sink_functor_type::dim_in == 1 ) ,

                       int ,

                       vigra::TinyVector < int , sink_functor_type::dim_in >

                     > :: type shape ,

              int njobs = vspline::default_njobs ,

              vspline::atomic < bool > * p_cancel = 0

            )

{

  enum { dimension = sink_functor_type::dim_in } ;


  // adapt the functor to be compatible with the wielding code


  typedef wielding::vs_sink_adapter

            < sink_functor_type > adapted_functor_type ;


  adapted_functor_type adapted_functor ( functor ) ;


  // the wielding code expects all shape information as TinyVectors,

  // even if the data are 1D:


  typename sink_functor_type::in_nd_ele_type nd_shape ( shape ) ;


  wielding::index_reduce < adapted_functor_type , dimension >

    ( adapted_functor ,

      nd_shape ,

      njobs ,

      p_cancel

    ) ;

}


/// we code 'apply' as a special variant of 'transform' where the output

/// is also used as input, so the effect is to feed the unary functor

/// each 'output' value in turn, let it process it and store the result

/// back to the same location. While this looks like a rather roundabout

/// way of performing an apply, it has the advantage of using the same

/// type of functor (namely one with const input and writable output),

/// rather than a different functor type which modifies it's argument

/// in-place. While, at this level, using such a functor looks like a

/// feasible idea, It would require specialized code 'further down the

/// line' when complex functors are built with vspline's functional

/// programming tools: the 'apply-capable' functors would need to read

/// the output values first and write them back after anyway, resulting

/// in the same sequence of loads and stores as we get with the current

/// 'fake apply' implementation.

/// From the direct delegation to the two-array overload of 'transform',

/// you can see that this is merely syntactic sugar.


template < typename unary_functor_type  , // functor to apply

           unsigned int dimension >       // input/output array's dimension

void apply ( const unary_functor_type & ev ,

             vigra::MultiArrayView

                    < dimension ,

                      typename unary_functor_type::out_type >

                    & output ,

             int njobs = vspline::default_njobs ,

             vspline::atomic < bool > * p_cancel = 0

           )

{

  // make sure the functor's input and output type are the same


  static_assert ( std::is_same < typename unary_functor_type::in_type ,

                                 typename unary_functor_type::out_type > :: value ,

                  "apply: functor's input and output type must be the same" ) ;


  // delegate to transform


  transform ( ev , output , output , njobs , p_cancel ) ;

}


/// a type for a set of boundary condition codes, one per axis


template < unsigned int dimension >

using bcv_type = vigra::TinyVector < vspline::bc_code , dimension > ;


/// Implementation of 'classic' remap, which directly takes an array of

/// values and remaps it, internally creating a b-spline of given order

/// just for the purpose. This is used for one-shot remaps where the spline

/// isn't reused, and specific to b-splines, since the functor used is a

/// b-spline evaluator. The spline defaults to a cubic b-spline with

/// mirroring on the bounds.

///

/// So here we have the 'classic' remap, where the input array holds

/// coordinates and the functor used is actually an interpolator. Since

/// this is merely a special case of using transform(), we delegate to

/// transform() once we have the evaluator.

///

/// The template arguments are chosen to allow the user to call 'remap'

/// without template arguments; the template arguments can be found by ATD

/// by looking at the MultiArrayViews passed in.

///

/// - original_type is the value_type of the array holding the 'original' data over

///   which the interpolation is to be performed

///

/// - result_type is the value_type of the array taking the result of the remap,

///   namely the values produced by the interpolation. these data must have as

///   many channels as original_type

///

/// - coordinate_type is the type for coordinates at which the interpolation is to

///   be performed. coordinates must have as many components as the input array

///   has dimensions.

///

/// optionally, remap takes a set of boundary condition values and a spline

/// degree, to allow creation of splines for specific use cases beyond the

/// default. I refrain from extending the argument list further; user code with

/// more specific requirements will have to create an evaluator and use 'transform'.

///

/// Note that remap can be called without template arguments, the types will

/// be inferred by ATD from the arguments passed in.

///

/// See the remarks on the two trailing parameters given with the two-array

/// overload of transform.


template < typename original_type ,   // data type of original data

           typename result_type ,     // data type for interpolated data

           typename coordinate_type , // data type for coordinates

           unsigned int cf_dimension ,  // dimensionality of original data

           unsigned int trg_dimension , // dimensionality of result array

           int bcv_dimension = cf_dimension > // see below. g++ ATD needs this.

void remap ( const vigra::MultiArrayView

                          < cf_dimension , original_type > & input ,

             const vigra::MultiArrayView

                          < trg_dimension , coordinate_type > & coordinates ,

             vigra::MultiArrayView

                    < trg_dimension , result_type > & output ,

             bcv_type < bcv_dimension > bcv

              = bcv_type < bcv_dimension > ( MIRROR ) ,

             int degree = 3 ,

             int njobs = vspline::default_njobs ,

             vspline::atomic < bool > * p_cancel = 0 )

{

  static_assert (    vigra::ExpandElementResult < original_type > :: size

                  == vigra::ExpandElementResult < result_type > :: size ,

                  "input and output data type must have same nr. of channels" ) ;


  static_assert (    cf_dimension

                  == vigra::ExpandElementResult < coordinate_type > :: size ,

                  "coordinate type must have the same dimension as input array" ) ;


  // this is silly, but when specifying bcv_type < cf_dimension >, the code failed

  // to compile with g++. So I use a separate template argument bcv_dimension

  // and static_assert it's the same as cf_dimension. TODO this sucks...


  static_assert (    cf_dimension

                  == bcv_dimension ,

                  "boundary condition specification needs same dimension as input array" ) ;


  // check shape compatibility


  if ( output.shape() != coordinates.shape() )

  {

    throw shape_mismatch

    ( "the shapes of the coordinate array and the output array must match" ) ;

  }


  // get a suitable type for the b-spline's coefficients


  typedef typename vigra::PromoteTraits < original_type , result_type >

                          :: Promote _cf_type ;


  typedef typename vigra::NumericTraits < _cf_type >

                          :: RealPromote cf_type ;


  // create the bspline object


  typedef typename vspline::bspline < cf_type , cf_dimension > spline_type ;

  spline_type bsp ( input.shape() , degree , bcv ) ;


  // prefilter, taking data in 'input' as knot point data


  bsp.prefilter ( input ) ;


  // since this is a commodity function, we use a 'safe' evaluator.

  // If maximum performance is needed and the coordinates are known to be

  // in range, user code should create a 'naked' vspline::evaluator and

  // use it with vspline::transform.

  // Note how we pass in 'rc_type', the elementary type of a coordinate.

  // We want to allow the user to pass float or double coordinates.


  typedef typename vigra::ExpandElementResult < coordinate_type > :: type rc_type ;


  auto ev = vspline::make_safe_evaluator < spline_type , rc_type > ( bsp ) ;


  // call transform(), passing in the evaluator,

  // the coordinate array and the target array


  transform ( ev , coordinates , output , njobs , p_cancel ) ;

}


// next we have code for evaluation of b-splines over grids of coordinates.

// This code lends itself to some optimizations, since part of the weight

// generation used in the evaluation process is redundant, and by

// precalculating all redundant values and referring to the precalculated

// values during the evaluation a good deal of time can be saved - provided

// that the data involved are nD.


// TODO: as in separable convolution, it might be profitable here to apply

// weights for one axis to the entire array, then repeat with the other axes

// in turn. storing, modifying and rereading the array may still

// come out faster than the rather expensive DDA needed to produce the

// value with weighting in all dimensions applied at once, as the code

// below does (by simply applying the weights in the innermost eval

// class evaluator offers). The potential gain ought to increase with

// the dimensionality of the data.


// for evaluation over grids of coordinates, we use a vigra::TinyVector

// of 1D MultiArrayViews holding the component coordinates for each

// axis.

// When default-constructed, this object holds default-constructed

// MultiArrayViews, which, when assigned another MultiArrayView,

// will hold another view over the data, rather than copying them.

// initially I was using a small array of pointers for the purpose,

// but that is potentially unsafe and does not allow passing strided

// data.


template < unsigned int dimension , typename rc_ele_type = float >

using grid_spec =

vigra::TinyVector

  < vigra::MultiArrayView < 1 , rc_ele_type > ,

    dimension

  > ;


// the implementation of grid-based evaluation is in namespace detail;

// I assume that this code is not very useful for users and it would

// only clutter the vspline namespace.


namespace detail

{


/// we need a 'generator functor' to implement grid_eval using the code

/// in wielding.h.

/// this functor precalculates the b-spline evaluation weights corresponding

/// to the coordinates in the grid and stores them in vectorized format,

/// to speed up their use as much as possible.


template < typename ET >

struct grev_generator

{

  typedef ET evaluator_type ;


  static const size_t dimension = evaluator_type::dimension ;

  static const size_t vsize = evaluator_type::vsize ;

  static const size_t channels = evaluator_type::channels ;


  typedef typename evaluator_type::inner_type inner_type ;

  typedef typename evaluator_type::math_ele_type weight_type ;


  typedef typename evaluator_type::out_type out_type ;

  typedef typename evaluator_type::out_ele_type out_ele_type ;

  typedef typename evaluator_type::out_nd_ele_type out_nd_ele_type ;

  typedef typename evaluator_type::out_v out_v ;

  typedef typename evaluator_type::out_ele_v out_ele_v ;

  typedef typename evaluator_type::out_nd_ele_v out_nd_ele_v ;


  typedef typename vspline::vector_traits

                   < weight_type , vsize > :: type weight_v ;

  typedef typename evaluator_type::rc_ele_type rc_type ;

  typedef vigra::MultiArrayView

          < dimension , typename evaluator_type::trg_type > target_type ;

  typedef vigra::TinyVector < size_t , dimension > shape_type ;


  typedef typename vspline::vector_traits

                   < int , vsize > :: type ofs_ele_v ;


  typedef typename vspline::grid_spec < dimension , rc_type > grid_spec_type ;


  const int spline_order ; // order of the b-spline

  const shape_type shape ; // shape of result array

  const grid_spec_type & grid ; // the grid coordinates

  const evaluator_type & itp ; // b-spline evaluator

  const size_t axis ;       // axis to process, other axes will vary

  const size_t aggregates ; // number of full weight/offset packets

  const size_t rest ;       // number leftover single coordinates

  const size_t w_pack_sz ;  // number of weight vectors in a packet


  int socket_offset ; // cumulated offset for axes != axis


  // keeps track of ownership of allocated data: it's set to 'this'

  // when the buffers are allocated in the c'tor. If default copy

  // construction or default assignment take place, memory_guard

  // and this don't match anymore and the destructor does not free

  // the data.


  void * memory_guard ;


  // dimension X pointers to sets of weights. Each set of weights

  // consists of spline_order single weights, and there are as many

  // sets as there are corresponding coordinates in the grid spec.


  weight_type * grid_weight [ dimension ] ;


  // for the dimension 'axis', we keep a reshuffled copy of the

  // weights, so that we have groups of spline_order weight vectors

  // followed by single weight sets which don't fit a vectorized

  // weight pack (the latter might be dropped)


  weight_type * shuffled_weights ;


  // per dimesnion, one offset per coordinate:


  int * grid_ofs [ dimension ] ;


  // We'll count the vectorized packets up, this state variable

  // holds the next packet's number


  size_t current_pack ;


  // tail_crd will hold the nD coordinate of the first coordinate

  // along the grid coordinates for axis which did not fit into

  // a full packet, so that single-value processing cas start there

  // after peeling


  shape_type tail_crd ;


  // set of weights for a single coordinate


  vigra::MultiArray < 2 , weight_type > weight ;


  // have storage ready for one set of vectorized weights


  using allocator_t

  = typename vspline::allocator_traits < weight_v > :: type ;


  vigra::MultiArray < 2 , weight_v , allocator_t > vweight ;


  // reset is called when starting another line. The packet number

  // is reset to zero, tail_crd is set to the first unvectorized

  // position (starting point after peeling is complete). Then those

  // components of the coordinate 'crd' which are *not* 'axis' are

  // broadcast to the corresponding vectors in 'vweight', which

  // holds the current packet; they will remain constant for the

  // whole line, whereas the row in vweight corresponding to 'axis'

  // will receive fresh values for each (vectorized) iteration.

  // The extra argument _aggregates will usually be the same as

  // 'aggregates', but a smaller number may be specified here to

  // process subsets (used for 1D operation, TODO needs testing)


  void reset ( shape_type crd , size_t _aggregates )

  {

    // we require starting coordinates to be a multiple of vsize for

    // the processing axis, so that we can fetch vectorized weights,

    // which have been stored in batches of vsize, starting at 0.

    // If we'd allow 'odd' starting coordinates, we'd have to first

    // process a few single coordinates before switching to vectors,

    // which would complicate the logic.


    assert ( crd [ axis ] % vsize == 0 ) ;


    current_pack = crd [ axis ] / vsize ;

    tail_crd [ axis ] = crd [ axis ] + _aggregates * vsize ;

    socket_offset = 0 ;


    for ( size_t d = 0 ; d < dimension ; d++ )

    {

      if ( d != axis )

      {

        tail_crd [ d ] = crd [ d ] ;

        socket_offset += grid_ofs [ d ] [ crd [ d ] ] ;

        for ( size_t k = 0 ; k < spline_order ; k++ )

        {

          vweight [ vigra::Shape2 ( k , d ) ]

            = weight [ vigra::Shape2 ( k , d ) ]

            = grid_weight [ d ] [ crd [ d ] * spline_order + k ] ;

        }

      }

    }

  }


  // fetch_weights copies a set of vectorized weights from

  // 'shuffled_weights' (where it has been put in vectorized form

  // to be picked up by fast SIMD loads, rather than having to

  // deinterleave the data from grid_weight [ axis ])


  void fetch_weights()

  {

    auto p_wgt = shuffled_weights + current_pack * w_pack_sz ;

    auto w_line = vweight.bindAt ( 1 , axis ) ;

    for ( auto & wvr : w_line )

    {

      wvr.load ( p_wgt ) ;

      p_wgt += vsize ;

    }

  }


  // the c'tor sets up all invariant data of the generator. The most

  // important of these are the precalculated weights, which allow for

  // fast evaluation, avoiding the weight generation which is otherwise

  // needed for every evaluation locus.

  // Note that grev_generator objects may be default-copied, retaining their

  // validity.


  grev_generator ( const vspline::grid_spec < dimension , rc_type > & _grid ,

                   const evaluator_type & _itp ,

                   size_t _axis ,

                   shape_type _shape

                 )

  : grid ( _grid ) ,

    itp ( _itp ) ,

    axis ( _axis ) ,

    shape ( _shape ) ,

    aggregates ( _shape [ _axis ] / vsize ) ,

    rest ( _shape [ _axis ] % vsize ) ,

    w_pack_sz ( _itp.inner.get_order() * vsize ) ,

    spline_order ( _itp.inner.get_order() ) ,

    vweight ( vigra::Shape2 ( _itp.inner.get_order() , dimension ) ) ,

    weight ( vigra::Shape2 ( _itp.inner.get_order() , dimension ) ) ,

    memory_guard ( this )

  {

    // get some metrics from the b-spline evaluator needed to translate

    // coordinates to offsets in memory


    vigra::TinyVector < int , dimension >

      estride ( itp.inner.get_estride() ) ;


    // allocate space for the per-axis weights and offsets


    for ( int d = 0 ; d < dimension ; d++ )

    {

      grid_weight[d] = new weight_type [ spline_order * shape [ d ] ] ;

      grid_ofs[d] = new int [ shape [ d ] ] ;

    }


    // fill in the weights and offsets, using the interpolator's

    // split() to split the coordinates received in grid_coordinate,

    // the interpolator's obtain_weights() method to produce the

    // weight components, and the strides of the coefficient array

    // to convert the integral parts of the coordinates into offsets.


    for ( int d = 0 ; d < dimension ; d++ )

    {

      int select ;   // integral part of coordinate

      rc_type tune ; // delta, so that select + tune == crd [ d ]


      // split all coordinates received in the grid spec into

      // integral and remainder part, then use the remainder part

      // to obtain a set of weights. The integral part of the

      // coordinates is transformed to an offset in memory.


      for ( int c = 0 ; c < shape [ d ] ; c++ )

      {

        itp.inner.split

            ( grid [ d ] [ c ] , select , tune ) ;


        itp.inner.obtain_weights

            ( grid_weight [ d ] + spline_order * c , d , tune ) ;


        grid_ofs [ d ] [ c ] = select * estride [ d ] ;

      }

    }


    // reshuffle the weights along axis 'axis' to vectorized order

    // so that the evaluation can pick them up with SIMD loads.

    // This might be coded more efficiently, but since this

    // calculation is done only once for a whole grid, it's no big deal.


    shuffled_weights = new weight_type [ spline_order * shape [ axis ] ] ;


    if ( shape [ axis ] >= vsize )

    {

      for ( size_t a = 0 ; a < aggregates ; a++ )

      {

        auto source = grid_weight[axis] + a * w_pack_sz ;

        auto target = shuffled_weights + a * w_pack_sz ;


        for ( size_t e = 0 ; e < vsize ; e++ )

        {

          for ( size_t k = 0 ; k < spline_order ; k++ )

          {

            target [ vsize * k + e ] = source [ e * spline_order + k ] ;

          }

        }

      }

    }


    // fill up with weights which didn't fit a pack. might be omitted.


    for ( size_t i = aggregates * w_pack_sz ;

          i < spline_order * shape [ axis ] ;

          i++ )

      shuffled_weights [ i ] = grid_weight [ axis ] [ i ] ;

  }


  // evaluate, using the current packet of weights/offsets. This

  // blindly trust that calling code keeps track of the number of

  // packets which can be processed during the peeling phase.


  template < typename = std::enable_if < ( vsize > 1 ) > >

  void eval ( out_v & result )

  {

    // pick up the varying weights into the current package


    fetch_weights() ;


    // load current set offsets for 'axis'


    ofs_ele_v select ;


    select.load ( grid_ofs [ axis ] + current_pack * vsize ) ;


    // add offsets for the remaining axes; these are broadcast,

    // there is only a single offset instead of a whole vector of

    // offsets needed for 'axis'.


    select += socket_offset ;


    // a bit of type artistry to provide a reference to an nD

    // result type, even if the data are single-channel:

    // trg_t is an nD type even for single-channel data, so

    // vector_traits produces an nD vectorized type.

    // The inner evaluator which is used for evaluation uses

    // 'synthetic' data types for all data, and we want to

    // use it for evaluation, so we have to present 'result'

    // in proper guise.


    typedef typename inner_type::trg_type trg_t ;

    typedef typename vspline::vector_traits

                     < trg_t , vsize > :: type trg_v ;


    trg_v & nd_result = reinterpret_cast < trg_v & > ( result ) ;


    // finally we perfrom the evaluation


    itp.inner.eval ( select , vweight , nd_result ) ;


    // and advance current_pack for the next cycle


    current_pack++ ;

  }


  // here we have the single-value evaluation which is used after

  // all full vectorized packs have been processed


  void eval ( out_type & result )

  {

    auto c = tail_crd [ axis ] ;


    int select = socket_offset + grid_ofs [ axis ] [ c ] ;


    for ( int k = 0 ; k < spline_order ; k++ )

    {

      weight [ vigra::Shape2 ( k , axis ) ] =

        grid_weight [ axis ] [ c * spline_order + k ] ;

    }


    // move to the 'synthetic' type


    typedef typename inner_type::trg_type trg_t ;

    trg_t & nd_result = reinterpret_cast < trg_t & > ( result ) ;

    itp.inner.eval ( select , weight , nd_result ) ;


    // count up coordinate along 'axis'


    ++ tail_crd [ axis ] ;

  }


  // evaluation into a buffer. Here the data for one line are produced

  // all at once.


  void eval ( vigra::MultiArrayView < 1 , out_ele_v > & vresult ,

              vigra::MultiArrayView < 1 , out_type > & leftover )

  {

    assert ( vresult.shape(0) == aggregates * channels ) ;

    assert ( leftover.shape(0) == rest ) ;


    // TODO: would be nice to simply have a MultiArrayView of

    // aggregates * out_v, but that crashes

    // hence the detour via the nD type and storing (and reloading)

    // individual vectors


    out_v vr ;

    out_nd_ele_v & ndvr = reinterpret_cast < out_nd_ele_v & > ( vr ) ;


    for ( size_t a = 0 ; a < aggregates ; a++ )

    {

      eval ( vr ) ;

      for ( size_t ch = 0 ; ch < channels ; ch++ )

        vresult [ a * channels + ch ] = ndvr [ ch ] ;

    }

    for ( size_t a = 0 ; a < rest ; a++ )

    {

      eval ( leftover[a] ) ;

    }

  }


  void eval ( vigra::MultiArrayView < 1 , out_type > & result )

  {

    assert ( result.shape(0) == shape [ axis ] ) ;


    for ( size_t a = 0 ; a < shape [ axis ] ; a++ )

    {

      eval ( result[a] ) ;

    }

  }


  // TODO: I also want an evaluator which only applies the weights along

  // 'axis', and no weights pertaining to the other axes - to obtain data

  // which are 'partially evaluated' along 'axis'. If this process is

  // repeated for each axis, the result should be the same. This should

  // be more efficient if evaluation loci in the grid are 'close' so that

  // multiple full evaluations share coefficients, and the advantage

  // should increase with dimensionality.


  ~grev_generator()

  {

    // clean up. memory_guard is only set in the c'tor, so copied objects

    // will not have matching memory_guard and this. Only the object which

    // has initially allocated the memory will free it.


    if ( memory_guard == this )

    {

      for ( int d = 0 ; d < dimension ; d++ )

      {

        delete[] grid_weight[d] ;

        delete[] grid_ofs[d] ;

      }

      delete[] shuffled_weights ;

    }

  }

} ;


/// given the generator functor 'grev_generator' (above), performing

/// the grid_eval becomes trivial: construct the generator and pass it

/// to the wielding code. The signature looks complex, but the types

/// needed can all be inferred by ATD - callers just need to pass a

/// grid_spec object, a b-spline evaluator and a target array to accept

/// the result - plus, optionally, a pointer to a cancellation atomic,

/// which, if given, may be set by the calling code to gracefully abort

/// the calculation at the next convenient occasion.

/// Note that grid_eval is specific to b-spline evaluation and will not

/// work with any evaluator type which is not a 'plain', 'raw' b-spline

/// evaluator. So, even functors gained by calling vspline's factory

/// functions (make_evaluator, make_safe_evaluator) will *not* work here,

/// since they aren't of type vspline::evaluator (they are in fact grok_type).

/// To use arbitrary functors on a grid, use gen_grid_eval, below.


template < typename functor_type >

void transform ( std::true_type , // functor is a b-spline evaluator

                 grid_spec < functor_type::dim_in ,

                             typename functor_type::in_ele_type > & grid ,

                 const functor_type & functor ,

                 vigra::MultiArrayView < functor_type::dim_in ,

                                         typename functor_type::out_type >

                   & result  ,

                 int njobs = vspline::default_njobs ,

                 vspline::atomic < bool > * p_cancel = 0

               )

{

  // create the generator functor


  grev_generator < functor_type > gg ( grid ,

                                       functor ,

                                       0 ,

                                       result.shape() ) ;


  // and 'wield' it


  wielding::generate_wield ( gg , result , njobs , p_cancel ) ;

}


/// grid_eval_functor is used for 'generalized' grid evaluation, where

/// the functor passed in is not a bspline evaluator. For this general

/// case, we can't precalculate anything to speed things up, all we

/// need to do is pick the grid values and put them together to form

/// arguments for calls to the functor.


template < typename _inner_type ,

           typename ic_type = int >

struct grid_eval_functor

: public vspline::unary_functor

         < typename vspline::canonical_type

                    < ic_type , _inner_type::dim_in > ,

           typename _inner_type::out_type ,

           _inner_type::vsize >

{

  typedef _inner_type inner_type ;


  enum { vsize = inner_type::vsize } ;

  enum { dimension = inner_type::dim_in } ;


  typedef typename vspline::canonical_type

                    < ic_type , dimension > in_type ;


  typedef typename inner_type::out_type out_type ;


  typedef typename inner_type::in_ele_type rc_type ;


  typedef typename vspline::unary_functor

          < in_type , out_type , vsize > base_type ;


  static_assert ( std::is_integral < ic_type > :: value ,

                  "grid_eval_functor: must use integral coordinates" ) ;


  const inner_type inner ;


  typedef grid_spec < inner_type::dim_in ,

                      typename inner_type::in_ele_type > grid_spec_t ;


  grid_spec_t grid ;


  grid_eval_functor ( grid_spec_t & _grid ,

                      const inner_type & _inner )

  : grid ( _grid ) ,

    inner ( _inner )

  { } ;


  void eval ( const in_type & c , out_type & result ) const

  {

    typename inner_type::in_type cc ;


    // for uniform access, we use reinterpretations of the coordinates

    // as nD types, even if they are only 1D. This is only used to

    // fill in 'cc', the cordinate to be fed to 'inner'.


    typedef typename base_type::in_nd_ele_type nd_ic_type ;

    typedef typename inner_type::in_nd_ele_type nd_rc_type ;


    const nd_ic_type & nd_c ( reinterpret_cast < const nd_ic_type & > ( c ) ) ;

    nd_rc_type & nd_cc ( reinterpret_cast < nd_rc_type & > ( cc ) ) ;


    for ( int d = 0 ; d < dimension ; d++ )

      nd_cc [ d ] = grid [ d ] [ nd_c[d] ] ;


    inner.eval ( cc , result ) ;

  }


  template < typename = std::enable_if < ( vsize > 1 ) > >

  void eval ( const typename base_type::in_v & c ,

              typename base_type::out_v & result ) const

  {

    typename inner_type::in_v cc ;


    typedef typename base_type::in_nd_ele_v nd_ic_v ;

    typedef typename inner_type::in_nd_ele_v nd_rc_v ;


    const nd_ic_v & nd_c ( reinterpret_cast < const nd_ic_v & > ( c ) ) ;

    nd_rc_v & nd_cc ( reinterpret_cast < nd_rc_v & > ( cc ) ) ;


    // TODO: we might optimize in two ways:

    // if the grid data are contiguous, we can issue a gather,

    // and if the coordinates above dimension 0 are equal for all e,

    // we can assign a scalar to nd_cc[d] for d > 0.


    for ( int d = 0 ; d < dimension ; d++ )

      for ( int e = 0 ; e < vsize ; e++ )

        nd_cc[d][e] = grid[d][ nd_c[d][e] ] ;


    inner.eval ( cc , result ) ;

  }

} ;


/// generalized grid evaluation. The production of result values from

/// input values is done by an instance of grid_eval_functor, see above.

/// The template argument, ev_type, has to be a functor (usually this

/// will be a vspline::unary_functor). If the functor's in_type has

/// dim_in components, grid_spec must also contain dim_in 1D arrays,

/// since ev's input is put together by picking a value from each

/// of the arrays grid_spec points to. The result obviously has to have

/// as many dimensions.


template < typename functor_type >

void transform ( std::false_type , // functor is not a b-spline evaluator

                 grid_spec < functor_type::dim_in ,

                             typename functor_type::in_ele_type > & grid ,

                 const functor_type & functor ,

                 vigra::MultiArrayView < functor_type::dim_in ,

                                         typename functor_type::out_type >

                   & result  ,

                 int njobs = vspline::default_njobs ,

                 vspline::atomic < bool > * p_cancel = 0

                )

{

  grid_eval_functor < functor_type > gev ( grid , functor ) ;


  vspline::transform ( gev , result , njobs , p_cancel ) ;

}


} ; // namespace detail


/// This overload of 'transform' takes a vspline::grid_spec as it's

/// first parameter. This object defines a grid where each grid point

/// is made up from components taken from the per-axis grid values.

/// Let's say the grid_spec holds

///

/// { 1 , 2 } and { 3 , 4 },

///

/// then we get grid points

///

/// ( 1 , 3 ) , ( 2 , 3 ) , ( 1 , 4 ) , ( 2 , 4 )

///

/// The grid points are taken as input for the functor, and the results

/// are stored in 'result'. So with the simple example above, result

/// would have to be a 2D array and after the transform it would contain

///

/// { { f ( ( 1 , 3 ) ) , f ( ( 2 , 3 ) ) } ,

///

///   { f ( ( 1 , 4 ) ) , f ( ( 2 , 4 ) ) } }

///

/// The grid specification obviously takes up less space than the whole

/// grid would occupy, and if the functor is a b-spline evaluator,

/// additional performance gain is possible by precalculating the

/// evaluation weights. The distinction is made here by dispatching to

/// two specific overloads in namespace detail, which provide the

/// implementation. Note that only functors of type vspline::evaluator

/// will qualify as b-spline evaluators. The functors produced by the

/// factory functions 'make_evaluator' and 'make_safe_evaluator' will

/// *not* qualify; they are of vspline::grok_type and even though they

/// have a b-spline evaluator 'somewhere inside' this can't be accessed

/// due to the type erasure.

///

/// See the remarks on the two trailing parameters given with the two-array

/// overload of transform.


template < typename functor_type >

void transform ( grid_spec < functor_type::dim_in ,

                             typename functor_type::in_ele_type > & grid ,

                 const functor_type & functor ,

                 vigra::MultiArrayView < functor_type::dim_in ,

                                         typename functor_type::out_type >

                   & result  ,

                 int njobs ,

                 vspline::atomic < bool > * p_cancel ,

                 std::false_type ) // is not 1D

{

  // make sure the grid specification has enough coordinates


  for ( int d = 0 ; d < functor_type::dim_in ; d++ )

    assert ( grid[d].size() >= result.shape ( d ) ) ;


  using uses_spline_t = typename

    std::is_base_of < bspline_evaluator_tag , functor_type > :: type ;


  detail::transform ( uses_spline_t() , grid , functor , result ,

                      njobs , p_cancel ) ;

}


/// overload of grid-based transform for 1D grids. Obviously, this is

/// a case for using 'transform' directly taking the single set of

/// per-axis grid values as input, and this overload is here only for

/// completeness' sake - 'normal' user code will use an array-based

/// transform straight away.


template < typename functor_type >

void transform ( grid_spec < 1 ,

                             typename functor_type::in_ele_type > & grid ,

                 const functor_type & functor ,

                 vigra::MultiArrayView < 1 ,

                                         typename functor_type::out_type >

                   & result  ,

                 int njobs ,

                 vspline::atomic < bool > * p_cancel ,

                 std::true_type ) // is 1D

{

  assert ( grid[0].size() >= result.shape ( 0 ) ) ;

  vspline::transform ( functor , grid[0] , result , njobs , p_cancel ) ;

}


// this is the dispatch routine, differentiating between the 1D and the

// nd case. The two variants are above.


template < typename functor_type >

void transform ( grid_spec < functor_type::dim_in ,

                             typename functor_type::in_ele_type > & grid ,

                 const functor_type & functor ,

                 vigra::MultiArrayView < functor_type::dim_in ,

                                         typename functor_type::out_type >

                   & result  ,

                 int njobs = vspline::default_njobs ,

                 vspline::atomic < bool > * p_cancel = 0 )

{

  static const bool is_1d ( functor_type::dim_in == 1 ) ;


  transform ( grid , functor , result , njobs , p_cancel ,

              std::integral_constant < bool , is_1d > () ) ;

}


/// for backward compatibility, we provide 'grid_eval'. This is deprecated

/// and will eventually go away, new code should use 'transform' above.


template < typename functor_type >

void grid_eval ( grid_spec < functor_type::dim_in ,

                             typename functor_type::in_ele_type > & grid ,

                 const functor_type & functor ,

                 vigra::MultiArrayView < functor_type::dim_in ,

                                         typename functor_type::out_type >

                   & result  ,

                 int njobs = vspline::default_njobs ,

                 vspline::atomic < bool > * p_cancel = 0 )

{

  transform ( grid , functor , result , njobs , p_cancel ) ;

}


/// for backward compatibility, we provide 'gen_grid_eval'. This is deprecated

/// and will eventually go away, new code should use 'transform' above.


template < typename functor_type >

void gen_grid_eval ( grid_spec < functor_type::dim_in ,

                             typename functor_type::in_ele_type > & grid ,

                     const functor_type & functor ,

                     vigra::MultiArrayView < functor_type::dim_in ,

                                             typename functor_type::out_type >

                       & result  ,

                     int njobs = vspline::default_njobs ,

                     vspline::atomic < bool > * p_cancel = 0 )

{

  transform ( grid , functor , result , njobs , p_cancel ) ;

}


/// restore restores the original data from the b-spline coefficients.

/// This is done efficiently using a separable convolution, the kernel

/// is simply a unit-spaced sampling of the basis function.

/// Since the filter uses internal buffering, using this routine

/// in-place is safe - meaning that 'target' may be bspl.core itself.

/// math_type, the data type for performing the actual maths on the

/// buffered data, and the type the data are converted to when they

/// are placed into the buffer, can be chosen, but I could not detect

/// any real benefits from using anything but the default, which is to

/// leave the data in their 'native' type.

///

/// an alternative way to restore is running an index-based

/// transform with an evaluator for the spline. This is much

/// less efficient, but the effect is the same:

///

///   auto ev = vspline::make_evaluator ( bspl ) ;

///   vspline::transform ( ev , target ) ;

///

/// Note that vsize, the vectorization width, can be passed explicitly.

/// If Vc is in use and math_ele_type can be used with hardware

/// vectorization, the arithmetic will be done with Vc::SimdArrays

/// of the given size. Otherwise 'goading' will be used: the data are

/// presented in simd_type of vsize math_ele_type, hoping that the

/// compiler may autovectorize the operation.

///

/// 'math_ele_type', the type used for arithmetic inside the filter,

/// defaults to the vigra RealPromote type of value_type's elementary.

/// This ensures appropriate treatment of integral-valued splines.


// TODO hardcoded default vsize


template < unsigned int dimension ,

           typename value_type ,

           typename math_ele_type

             = typename vigra::NumericTraits

                        < typename vigra::ExpandElementResult

                                   < value_type > :: type

                        > :: RealPromote ,

           size_t vsize

             = vspline::vector_traits < math_ele_type > :: size

         >

void restore ( const vspline::bspline < value_type , dimension > & bspl ,

               vigra::MultiArrayView < dimension , value_type > & target )

{

  if ( target.shape() != bspl.core.shape() )

    throw shape_mismatch

     ( "restore: spline's core shape and target array shape must match" ) ;


  if ( bspl.spline_degree < 2 )

  {

    // we can handle the degree 0 and 1 cases very efficiently,

    // since we needn't apply a filter at all. This is an

    // optimization, the filter code would still perform

    // correctly without it.


    if ( (void*) ( bspl.core.data() ) != (void*) ( target.data() ) )

    {

      // operation is not in-place, copy data to target

      target = bspl.core ;

    }

    return ;

  }


  // first assemble the arguments for the filter


  int degree = bspl.spline_degree ;

  int headroom = degree / 2 ;

  int ksize = headroom * 2 + 1 ;


  // KFJ 2019-09-18 now using new convenience function 'get_kernel'


  vigra::MultiArray < 1 , xlf_type > kernel ( ksize ) ;

  get_kernel ( degree , kernel ) ;


//   xlf_type kernel [ ksize ] ;

//

//   // pick the precomputed basis function values for the kernel.

//   // Note how the values in precomputed_basis_function_values

//   // (see poles.h) are provided at half-unit steps, hence the

//   // index acrobatics.

//

//   for ( int k = - headroom ; k <= headroom ; k++ )

//   {

//     int pick = 2 * std::abs ( k ) ;

//     kernel [ k + headroom ]

//     = vspline_constants

//       ::precomputed_basis_function_values [ degree ]

//         [ pick ] ;

//   }


  // the arguments have to be passed one per axis. While most

  // arguments are the same throughout, the boundary conditions

  // may be different for each axis.


  std::vector < vspline::fir_filter_specs > vspecs ;


  for ( int axis = 0 ; axis < dimension ; axis++ )

  {

    vspecs.push_back

      ( vspline::fir_filter_specs

        ( bspl.bcv [ axis ] , ksize , headroom , kernel.data() ) ) ;

  }


  // KFJ 2018-05-08 with the automatic use of vectorization the

  // distinction whether math_ele_type is 'vectorizable' or not

  // is no longer needed: simdized_type will be a Vc::SimdArray

  // if possible, a vspline::simd_type otherwise.


  typedef typename vspline::convolve

                            < vspline::simdized_type ,

                              math_ele_type ,

                              vsize

                            > filter_type ;


  // now we have the filter's type, create an instance and

  // use it to affect the restoration of the original data.


  vspline::filter

  < value_type , value_type , dimension , filter_type >

  ( bspl.core , target , vspecs ) ;

}


/// overload of 'restore' writing the result of the operation back to

/// the array which is passed in. This looks like an in-place operation,

/// but the data are in fact moved to a buffer stripe by stripe, then

/// the arithmetic is done on the buffer and finally the buffer is

/// written back. This is repeated for each dimension of the array.


template < int dimension ,

           typename value_type ,

           typename math_ele_type

             = typename vigra::NumericTraits

                        < typename vigra::ExpandElementResult

                                   < value_type > :: type

                        > :: RealPromote ,

           size_t vsize = vspline::vector_traits<math_ele_type>::size >

void restore

  ( vspline::bspline < value_type , dimension > & bspl )

{

  restore < dimension , value_type , math_ele_type , vsize >

    ( bspl , bspl.core ) ;

}


} ; // end of namespace vspline


#endif // VSPLINE_TRANSFORM_H

coordinate_type
vigra::TinyVector< float, 2 > coordinate_type
Definition: ca_correct.cc:107

spline_type
vspline::bspline< pixel_type, 2 > spline_type
Definition: ca_correct.cc:111

convolve.h
separable convolution of nD arrays

rc_type
double rc_type
Definition: eval.cc:94

vsize
@ vsize
Definition: eval.cc:96

eval.h
code to evaluate uniform b-splines

multithread.h
code to distribute the processing of bulk data to several threads

vspline::detail::transform
void transform(std::true_type, grid_spec< functor_type::dim_in, typename functor_type::in_ele_type > &grid, const functor_type &functor, vigra::MultiArrayView< functor_type::dim_in, typename functor_type::out_type > &result, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0)
given the generator functor 'grev_generator' (above), performing the grid_eval becomes trivial: const...
Definition: transform.h:1117

vspline
Definition: basis.h:79

vspline::reduce
void reduce(const sink_functor_type &functor, const vigra::MultiArrayView< dimension, typename sink_functor_type::in_type > &input, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0)
reduce function processing by value. sink_functor_type is a functor taking values (which are loaded f...
Definition: transform.h:286

vspline::get_kernel
void get_kernel(const int &degree, vigra::MultiArrayView< 1, target_type > &kernel, const bool &odd=true)
this function deposits the reconstruction kernel in the array 'kernel'. This kernel can be used to co...
Definition: basis.h:589

vspline::transform
void transform(const unary_functor_type &functor, const vigra::MultiArrayView< dimension, typename unary_functor_type::in_type > &input, vigra::MultiArrayView< dimension, typename unary_functor_type::out_type > &output, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0)
implementation of two-array transform using wielding::coupled_wield.
Definition: transform.h:211

vspline::filter
void filter(const vigra::MultiArrayView< D, in_type > &input, vigra::MultiArrayView< D, out_type > &output, types ... args)
vspline::filter is the common entry point for filter operations in vspline. This routine does not yet...
Definition: filter.h:1495

vspline::ET
typename vigra::ExpandElementResult< T > ::type ET
using definition for the 'elementary type' of a type via vigra's ExpandElementResult mechanism....
Definition: common.h:110

vspline::apply
void apply(const unary_functor_type &ev, vigra::MultiArrayView< dimension, typename unary_functor_type::out_type > &output, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0)
we code 'apply' as a special variant of 'transform' where the output is also used as input,...
Definition: transform.h:531

vspline::restore
void restore(const vspline::bspline< value_type, dimension > &bspl, vigra::MultiArrayView< dimension, value_type > &target)
restore restores the original data from the b-spline coefficients. This is done efficiently using a s...
Definition: transform.h:1429

vspline::gen_grid_eval
void gen_grid_eval(grid_spec< functor_type::dim_in, typename functor_type::in_ele_type > &grid, const functor_type &functor, vigra::MultiArrayView< functor_type::dim_in, typename functor_type::out_type > &result, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0)
for backward compatibility, we provide 'gen_grid_eval'. This is deprecated and will eventually go awa...
Definition: transform.h:1376

vspline::default_njobs
const int default_njobs
Definition: multithread.h:220

vspline::bcv_type
vigra::TinyVector< vspline::bc_code, dimension > bcv_type
a type for a set of boundary condition codes, one per axis
Definition: transform.h:554

vspline::canonical_type
typename std::conditional< N==1, ET< T >, vigra::TinyVector< ET< T >, N > > ::type canonical_type
produce the 'canonical type' for a given type T. If T is single-channel, this will be the elementary ...
Definition: common.h:225

vspline::simdized_type
typename vector_traits< T, N > ::type simdized_type
this alias is used as a shorthand to pick the vectorized type for a given type T and a size N from 'v...
Definition: vector.h:459

vspline::remap
void remap(const vigra::MultiArrayView< cf_dimension, original_type > &input, const vigra::MultiArrayView< trg_dimension, coordinate_type > &coordinates, vigra::MultiArrayView< trg_dimension, result_type > &output, bcv_type< bcv_dimension > bcv=bcv_type< bcv_dimension >(MIRROR), int degree=3, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0)
Implementation of 'classic' remap, which directly takes an array of values and remaps it,...
Definition: transform.h:600

vspline::grid_eval
void grid_eval(grid_spec< functor_type::dim_in, typename functor_type::in_ele_type > &grid, const functor_type &functor, vigra::MultiArrayView< functor_type::dim_in, typename functor_type::out_type > &result, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0)
for backward compatibility, we provide 'grid_eval'. This is deprecated and will eventually go away,...
Definition: transform.h:1360

vspline::atomic
std::atomic< T > atomic
Definition: multithread.h:224

vspline::MIRROR
@ MIRROR
Definition: common.h:72

vspline::grid_spec
vigra::TinyVector< vigra::MultiArrayView< 1, rc_ele_type >, dimension > grid_spec
Definition: transform.h:701

wielding::generate_wield
void generate_wield(const functor_type functor, vigra::MultiArrayView< dimension, typename functor_type::out_type > &output, int njobs=vspline::default_njobs, vspline::atomic< bool > *p_cancel=0)
generate_wield uses a generator function to produce data. Inside vspline, this is used for grid_eval,...
Definition: wielding.h:2152

wielding::ic_type
int ic_type
Definition: interleave.h:98

poles.h
precalculated prefilter poles and basis function values

vspline::allocator_traits
vspline creates vigra::MultiArrays of vectorized types. As long as the vectorized types are Vc::SimdA...
Definition: common.h:267

vspline::bspline
class bspline now builds on class bspline_base, adding coefficient storage, while bspline_base provid...
Definition: bspline.h:499

vspline::bspline::core
view_type core
Definition: bspline.h:566

vspline::bspline::spline_degree
int spline_degree
Definition: bspline.h:588

vspline::bspline::prefilter
void prefilter(vspline::xlf_type boost=vspline::xlf_type(1), int njobs=vspline::default_njobs)
prefilter converts the knot point data in the 'core' area into b-spline coefficients....
Definition: bspline.h:815

vspline::bspline::bcv
const bcv_type bcv
Definition: bspline.h:212

vspline::convolve
class convolve provides the combination of class fir_filter above with a vector-friendly buffer....
Definition: convolve.h:387

vspline::detail::grev_generator
we need a 'generator functor' to implement grid_eval using the code in wielding.h....
Definition: transform.h:718

vspline::detail::grev_generator::current_pack
size_t current_pack
Definition: transform.h:786

vspline::detail::grev_generator::target_type
vigra::MultiArrayView< dimension, typename evaluator_type::trg_type > target_type
Definition: transform.h:739

vspline::detail::grev_generator::weight_v
vspline::vector_traits< weight_type, vsize >::type weight_v
Definition: transform.h:736

vspline::detail::grev_generator::weight
vigra::MultiArray< 2, weight_type > weight
Definition: transform.h:797

vspline::detail::grev_generator::vsize
static const size_t vsize
Definition: transform.h:722

vspline::detail::grev_generator::ofs_ele_v
vspline::vector_traits< int, vsize >::type ofs_ele_v
Definition: transform.h:743

vspline::detail::grev_generator::weight_type
evaluator_type::math_ele_type weight_type
Definition: transform.h:726

vspline::detail::grev_generator::grev_generator
grev_generator(const vspline::grid_spec< dimension, rc_type > &_grid, const evaluator_type &_itp, size_t _axis, shape_type _shape)
Definition: transform.h:872

vspline::detail::grev_generator::fetch_weights
void fetch_weights()
Definition: transform.h:854

vspline::detail::grev_generator::rc_type
evaluator_type::rc_ele_type rc_type
Definition: transform.h:737

vspline::detail::grev_generator::socket_offset
int socket_offset
Definition: transform.h:756

vspline::detail::grev_generator::reset
void reset(shape_type crd, size_t _aggregates)
Definition: transform.h:818

vspline::detail::grev_generator::allocator_t
typename vspline::allocator_traits< weight_v > ::type allocator_t
Definition: transform.h:802

vspline::detail::grev_generator::dimension
static const size_t dimension
Definition: transform.h:721

vspline::detail::grev_generator::shape_type
vigra::TinyVector< size_t, dimension > shape_type
Definition: transform.h:740

vspline::detail::grev_generator::tail_crd
shape_type tail_crd
Definition: transform.h:793

vspline::detail::grev_generator::memory_guard
void * memory_guard
Definition: transform.h:764

vspline::detail::grev_generator::out_ele_v
evaluator_type::out_ele_v out_ele_v
Definition: transform.h:732

vspline::detail::grev_generator::inner_type
evaluator_type::inner_type inner_type
Definition: transform.h:725

vspline::detail::grev_generator::vweight
vigra::MultiArray< 2, weight_v, allocator_t > vweight
Definition: transform.h:804

vspline::detail::grev_generator::eval
void eval(out_v &result)
Definition: transform.h:968

vspline::detail::grev_generator::shuffled_weights
weight_type * shuffled_weights
Definition: transform.h:777

vspline::detail::grev_generator::w_pack_sz
const size_t w_pack_sz
Definition: transform.h:754

vspline::detail::grev_generator::axis
const size_t axis
Definition: transform.h:751

vspline::detail::grev_generator::grid_spec_type
vspline::grid_spec< dimension, rc_type > grid_spec_type
Definition: transform.h:745

vspline::detail::grev_generator::out_ele_type
evaluator_type::out_ele_type out_ele_type
Definition: transform.h:729

vspline::detail::grev_generator::out_v
evaluator_type::out_v out_v
Definition: transform.h:731

vspline::detail::grev_generator::~grev_generator
~grev_generator()
Definition: transform.h:1083

vspline::detail::grev_generator::eval
void eval(out_type &result)
Definition: transform.h:1013

vspline::detail::grev_generator::eval
void eval(vigra::MultiArrayView< 1, out_type > &result)
Definition: transform.h:1065

vspline::detail::grev_generator::out_type
evaluator_type::out_type out_type
Definition: transform.h:728

vspline::detail::grev_generator::out_nd_ele_type
evaluator_type::out_nd_ele_type out_nd_ele_type
Definition: transform.h:730

vspline::detail::grev_generator::shape
const shape_type shape
Definition: transform.h:748

vspline::detail::grev_generator::eval
void eval(vigra::MultiArrayView< 1, out_ele_v > &vresult, vigra::MultiArrayView< 1, out_type > &leftover)
Definition: transform.h:1039

vspline::detail::grev_generator::aggregates
const size_t aggregates
Definition: transform.h:752

vspline::detail::grev_generator::grid_ofs
int * grid_ofs[dimension]
Definition: transform.h:781

vspline::detail::grev_generator::out_nd_ele_v
evaluator_type::out_nd_ele_v out_nd_ele_v
Definition: transform.h:733

vspline::detail::grev_generator::grid
const grid_spec_type & grid
Definition: transform.h:749

vspline::detail::grev_generator::evaluator_type
ET evaluator_type
Definition: transform.h:719

vspline::detail::grev_generator::channels
static const size_t channels
Definition: transform.h:723

vspline::detail::grev_generator::itp
const evaluator_type & itp
Definition: transform.h:750

vspline::detail::grev_generator::spline_order
const int spline_order
Definition: transform.h:747

vspline::detail::grev_generator::rest
const size_t rest
Definition: transform.h:753

vspline::detail::grev_generator::grid_weight
weight_type * grid_weight[dimension]
Definition: transform.h:770

vspline::detail::grid_eval_functor
grid_eval_functor is used for 'generalized' grid evaluation, where the functor passed in is not a bsp...
Definition: transform.h:1154

vspline::detail::grid_eval_functor::vsize
@ vsize
Definition: transform.h:1157

vspline::detail::grid_eval_functor::grid_eval_functor
grid_eval_functor(grid_spec_t &_grid, const inner_type &_inner)
Definition: transform.h:1180

vspline::detail::grid_eval_functor::eval
void eval(const in_type &c, out_type &result) const
Definition: transform.h:1186

vspline::detail::grid_eval_functor::rc_type
inner_type::in_ele_type rc_type
Definition: transform.h:1165

vspline::detail::grid_eval_functor::in_type
vspline::canonical_type< ic_type, dimension > in_type
Definition: transform.h:1161

vspline::detail::grid_eval_functor::dimension
@ dimension
Definition: transform.h:1158

vspline::detail::grid_eval_functor::base_type
vspline::unary_functor< in_type, out_type, vsize > base_type
Definition: transform.h:1168

vspline::detail::grid_eval_functor::grid
grid_spec_t grid
Definition: transform.h:1178

vspline::detail::grid_eval_functor::grid_spec_t
grid_spec< inner_type::dim_in, typename inner_type::in_ele_type > grid_spec_t
Definition: transform.h:1176

vspline::detail::grid_eval_functor::inner
const inner_type inner
Definition: transform.h:1173

vspline::detail::grid_eval_functor::inner_type
_inner_type inner_type
Definition: transform.h:1155

vspline::detail::grid_eval_functor::eval
void eval(const typename base_type::in_v &c, typename base_type::out_v &result) const
Definition: transform.h:1207

vspline::detail::grid_eval_functor::out_type
inner_type::out_type out_type
Definition: transform.h:1163

vspline::fir_filter_specs
fir_filter_specs holds the parameters for a filter performing a convolution along a single axis....
Definition: convolve.h:93

vspline::shape_mismatch
shape mismatch is the exception which is thrown if the shapes of an input array and an output array d...
Definition: common.h:297

vspline::unary_functor
class unary_functor provides a functor object which offers a system of types for concrete unary funct...
Definition: unary_functor.h:190

vspline::unary_functor::in_nd_ele_type
vigra::TinyVector< in_ele_type, dim_in > in_nd_ele_type
Definition: unary_functor.h:224

vspline::unary_functor::in_nd_ele_v
vector_traits< IN, vsize >::nd_ele_v in_nd_ele_v
Definition: unary_functor.h:247

vspline::unary_functor::in_v
vector_traits< IN, vsize >::type in_v
vectorized in_type and out_type. vspline::vector_traits supplies these types so that multidimensional...
Definition: unary_functor.h:254

vspline::unary_functor::out_v
vector_traits< OUT, vsize >::type out_v
Definition: unary_functor.h:255

vspline::vector_traits
with the definition of 'simd_traits', we can proceed to implement 'vector_traits': struct vector_trai...
Definition: vector.h:344

wielding::vs_adapter
vs_adapter wraps a vspline::unary_functor to produce a functor which is compatible with the wielding ...
Definition: wielding.h:1960

wielding::vs_sink_adapter
same procedure for a vspline::sink_type
Definition: wielding.h:2002

wielding.h
Implementation of vspline::transform.