vspline 1.1.0
Generic C++11 Code for Uniform B-Splines
|
Cvspline::map_functor< nd_rc_type, _vsize, gate_types >::_map< level, dimension, nd_coordinate_type > | |
Cvspline::map_functor< nd_rc_type, _vsize, gate_types >::_map< 0, 1, coordinate_type > | |
Cvspline::map_functor< nd_rc_type, _vsize, gate_types >::_map< 0, dimension, nd_coordinate_type > | |
►Cstd::allocator | |
Cvspline::allocator_traits< T > | Vspline creates vigra::MultiArrays of vectorized types. As long as the vectorized types are Vc::SimdArray or vspline::simd_type, using std::allocator is fine, but when using other types, using a specific allocator may be necessary. Currently this is never the case, but I have the lookup of allocator type from this traits class in place if it should become necessary |
Cvspline::allocator_traits< hwy_simd_type< T, N > > | |
Cvspline::allocator_traits< vc_simd_type< T, N > > | |
Cstd::allocator_traits< vspline::hwy_simd_type< T, N > > | |
Cstd::allocator_traits< vspline::simd_type< T, N > > | |
►Cstd::array | |
Cvspline::basis_functor< math_type > | Basis_functor is an object producing the b-spline basis function value for given arguments, or optionally a derivative of the basis function. While basis_functor can produce single basis function values for single arguments, it can also produce a set of basis function values for a given 'delta'. This set is a unit-spaced sampling of the basis function sampled at n + delta for all n E N. Such samplings are used to evaluate b-splines; they constitute the set of weights which have to be applied to a set of b-spline coefficients to form the weighted sum which is the spline's value at a given position |
Cvspline::bf_grok_type< _delta_et, _target_et, _vsize > | If there are several differently-typed basis functors to be combined in a multi_bf_type object, we can erase their type, just like grok_type does for vspline::unary_functors. grokking a basis functor may cost a little bit of performance but it makes the code to handle multi_bf_types simple: instead of having to cope for several, potentially differently-typed per-axis functors there is only one type - which may be a bf_grok_type if the need arises to put differently-typed basis functors into the multi_bf_type. With this mechanism, the code to build evaluators can be kept simple (handling only one uniform type of basis functor used for all axes) and still use different basis functors |
Cvspline::bracer< _dimension, _value_type > | Class bracer encodes the entire bracing process. Note that contrary to my initial implementation, class bracer is now used exclusively for populating the frame around a core area of data. It has no code to determine which size a brace/frame should have. This is now determined in class bspline, see especially class bspline's methods get_left_brace_size(), get_right_brace_size() and setup_metrics() |
►Cvspline::bspline_base< _dimension > | Struct bspline is the object in vspline holding b-spline coefficients. In a way, the b-spline 'is' it's coefficients, since it is totally determined by them - while, of course, the 'actual' spline is an n-dimensional curve. So, even if this is a bit sloppy, I often refer to the coefficients as 'the spline', and have named struct bspline so even if it just holds the coefficients |
►Cvspline::bspline_evaluator_tag | Tag class used to identify all vspline::evaluator instantiations |
Cvspline::buffer_handling< _vtype, _dtype, _vsize > | Buffer_handling provides services needed for interfacing with a buffer of simdized/goading data. The init() routine receives two views: one to a buffer accepting incoming data, and one to a buffer providing results. Currently, all filters used in vspline operate in-place, but the two-argument form leaves room to manoevre. get() and put() receive 'bundle' arguments which are used to transfer incoming data to the view defined in in_window, and to transfer result data from the view defined in out_window back to target memory |
►Cvspline::buffer_handling< _vtype, _math_ele_type, _vsize > | |
►Cvspline::buffer_handling< _vtype, _math_ele_type, vspline::vector_traits< _math_ele_type >::size > | |
Cvspline::detail::build_ev< spline_type, rc_type, _vsize, math_ele_type, result_type > | Helper object to create a type-erased vspline::evaluator for a given bspline object. The evaluator is specialized to the spline's degree, so that degree-0 splines are evaluated with nearest neighbour interpolation, degree-1 splines with linear interpolation, and all other splines with general b-spline evaluation. The resulting vspline::evaluator is 'grokked' to erase it's type to make it easier to handle on the receiving side: build_ev will always return a vspline::grok_type, not one of the several possible evaluators which it produces initially. Why the type erasure? Because a function can only return one distinct type. With specialization for degree-0, degre-1 and arbitrary spline degrees, there are three distinct types of evaluator to take care of. If they are to be returned as a common type, type erasure is the only way |
Cvspline::detail::build_safe_ev< level, spline_type, rc_type, _vsize, math_ele_type, result_type, gate_types > | Helper object to create a vspline::mapper object with gate types matching a bspline's boundary conditions and extents matching the spline's lower and upper limits. Please note that these limits depend on the boundary conditions and are not always simply 0 and N-1, as they are for, say, mirror boundary conditions. see lower_limit() and upper_limit() in vspline::bspline |
Cvspline::detail::build_safe_ev< -1, spline_type, rc_type, _vsize, math_ele_type, result_type, gate_types ... > | At level -1, there are no more axes to deal with, here the recursion ends and the actual mapper object is created. Specializing on the spline's degree (0, 1, or indeterminate), an evaluator is created and chained to the mapper object. The resulting functor is grokked to produce a uniform return type, which is returned to the caller |
Cvspline::bundle< dtype, vsize > | Class 'bundle' holds all information needed to access a set of vsize 1D subarrays of an nD array. This is the data structure we use to tell the buffering and unbuffering code which data we want it to put into the buffer or distribute back out. The buffer itself holds the data in compact form, ready for vector code to access them at maximum speed |
Cvspline::callable< derived_type, IN, OUT, vsize > | Mixin 'callable' is used with CRTP: it serves as additional base to unary functors which are meant to provide operator() and takes the derived class as it's first template argument, followed be the argument types and vectorization width, so that the parameter and return type for operator() and - if vsize is greater than 1 - it's vectorized overload can be produced. This formulation has the advantage of not having to rely on the 'out_type_of' mechanism I was using before and provides precisely the operator() overload(s) which are appropriate |
►Cvspline::callable< amplify_type< _in_type, _in_type, _in_type, vspline::vector_traits< _in_type > ::vsize >, _in_type, _in_type, vspline::vector_traits< _in_type > ::vsize > | |
►Cvspline::callable< chain_type< T1, T2 >, T1::in_type, T2::out_type, T1::vsize > | |
►Cvspline::callable< domain_type< coordinate_type, _vsize >, coordinate_type, coordinate_type, _vsize > | |
►Cvspline::callable< domain_type< coordinate_type, vspline::vector_traits< coordinate_type >::vsize >, coordinate_type, coordinate_type, vspline::vector_traits< coordinate_type >::vsize > | |
►Cvspline::callable< evaluator< _coordinate_type, _trg_type, vspline::vector_traits< _trg_type > ::size, -1, default_math_type< _coordinate_type, _trg_type >, _trg_type, multi_bf_type< basis_functor< default_math_type< _coordinate_type, _trg_type > >, vigra::ExpandElementResult< _coordinate_type > ::size > >, _coordinate_type, _trg_type, vspline::vector_traits< _trg_type > ::size > | |
►Cvspline::callable< evaluator< coordinate_type, float, vspline::vector_traits< float > ::size, -1, default_math_type< coordinate_type, float >, float, multi_bf_type< basis_functor< default_math_type< coordinate_type, float > >, vigra::ExpandElementResult< coordinate_type > ::size > >, coordinate_type, float, vspline::vector_traits< float > ::size > | |
►Cvspline::callable< flip< _in_type, vspline::vector_traits< _in_type > ::vsize >, _in_type, _in_type, vspline::vector_traits< _in_type > ::vsize > | |
►Cvspline::callable< grok_type< IN, IN, vspline::vector_traits< IN > ::size >, IN, IN, vspline::vector_traits< IN > ::size > | |
►Cvspline::callable< grok_type< IN, OUT, 1 >, IN, OUT, 1 > | |
►Cvspline::callable< yield_type< crd_t, data_t, _vsize >, crd_t, data_t, _vsize > | |
Cwielding::coupled_aggregator< vsz, ic_type, functor_type, typename > | Aggregator for separate - possibly different - source and target. If source and target are in fact different, the inner functor will read data from source, process them and then write them to target. If source and target are the same, the operation will be in-place, but not explicitly so. vspline uses this style of two-argument functor, and this is the aggregator we use for vspline's array-based transforms. The code in this template will only be used for vectorized operation, If vectorization is not used, only the specialization for vsize == 1 below is used |
Cwielding::coupled_aggregator< 1, ic_type, functor_type > | Specialization for vsz == 1. Here the data are simply processed one by one in a loop, without vectorization |
Cvspline::extrapolator< buffer_type > | Struct extrapolator is a helper class providing extrapolated values for a 1D buffer indexed with possibly out-of-range indices. The extrapolated value is returned by value. boundary conditions PERIODIC , MIRROR , REFLECT, NATURAL and CONSTANT are currently supported. An extrapolator is set up by passing the boundary condition code (see common.h) and a const reference to the 1D data set, coded as a 1D vigra::MultiArrayView. The view has to refer to valid data for the time the extrapolator is in use. Now the extrapolator object can be indexed with arbitrary indices, and it will return extrapolated values. The indexing is done with operator() rather than operator[] to mark the semantic difference. Note how buffers with size 1 are treated specially for some boundary conditions: here we simply return the value at index 0 |
Cvspline::extrapolator< buffer_view_type > | |
►Cvspline::fir_filter_specs | Fir_filter_specs holds the parameters for a filter performing a convolution along a single axis. In vspline, the place where the specifications for a filter are fixed and the place where it is finally created are far apart: the filter is created in the separate worker threads. So this structure serves as a vehicle to transport the arguments. Note the specification of 'headroom': this allows for non-symmetrical and even kernels. When applying the kernel to obtain output[i], the kernel is applied to input [ i - headroom ] , ... , input [ i - headroom + ksize - 1 ] |
Cwielding::generate_aggregator< _vsize, ic_type, functor_type, typename > | Generate_aggregator is very similar to indexed_aggregator, but instead of managing and passing a coordinate to the functor, the functor now manages the argument side of the operation: it acts as a generator. To make this possible, the generator has to hold run-time modifiable state and can't be const like the functors used in the other aggregators, where the functors are 'pure' in a functional programming sense. A 'generator' functor to be used with this body of code is expected to behave in a certain fashion: |
Cwielding::generate_aggregator< 1, ic_type, functor_type > | Specialization for vsz == 1. Here the data are simply processed one by one in a loop, without vectorization |
Cvspline::detail::grev_generator< ET > | We need a 'generator functor' to implement grid_eval using the code in wielding.h. this functor precalculates the b-spline evaluation weights corresponding to the coordinates in the grid and stores them in vectorized format, to speed up their use as much as possible |
Cvspline::homogeneous_mbf_type< bf_type > | Homogeneous_mbf_type can be used for cases where all basis functors are the same. The evaluation code uses operator[] to pick the functor for each axis, so here we merely override operator[] to always yield a const reference to the same basis functor |
CHWY_NAMESPACE::hwy_simd_type< _value_type, _vsize > | |
►Cvspline::iir_filter_specs | Structure to hold specifications for an iir_filter object. This set of parameters has to be passed through from the calling code through the multithreading code to the worker threads where the filter objects are finally constructed. Rather than passing the parameters via some variadic mechanism, it's more concise and expressive to contain them in a structure and pass that around. The filter itself inherits its specification type, and if the code knows the handler's type, it can derive the spec type. This way the argument passing can be formalized, allowing for uniform handling of several different filter types with the same code. Here we have the concrete parameter set needed for b-spline prefiltering. We'll pass one set of 'specs' per axis; it contains: |
Cimage_base_type | |
Cimage_type< vsize > | |
Cimage_type< S > | |
Cwielding::indexed_aggregator< vsz, ic_type, functor_type, typename > | Indexed_aggregator receives the start coordinate and processing axis along with the data to process, this is meant for index-transforms. The coordinate is updated for every call to the 'inner' functor so that the inner functor has the current coordinate as input. The code in this template will only be used for vectorized operation, without vectorization, only the specialization for vsize == 1 below is used |
Cwielding::indexed_aggregator< 1, ic_type, functor_type > | Specialization for vsz == 1. Here the data are simply processed one by one in a loop, without vectorization |
Cwielding::indexed_reductor< vsz, ic_type, functor_type, typename > | Indexed_reductor is used for reductions and has no output. The actual reduction is handled by the functor: each thread has it's own copy of the functor, which does it's own part of the reduction, and 'offloads' it's result to some mutex-protected receptacle when it's destructed, see the 'reduce' functions in transform.h for a more detailed explanation and an example of such a functor. idexed_reductor processes discrete coordinates, whereas yield_reductor (the next class down) processes values. This variant works just like an indexed_aggregator, only that it produces no output - at least not for every coordinate fed to the functor, the functor itself does hold state (the reduction) and is also responsible for offloading per-thread results when the worker threads terminate. This class holds a copy of the functor, and each thread has an instance of this class, ensuring that each worker thread can reduce it's share of the work load independently |
Cwielding::indexed_reductor< 1, ic_type, functor_type > | Specialization for vsz == 1. Here the data are simply processed one by one in a loop, without vectorization |
Cvspline::detail::inner_evaluator< _ic_ele_type, _rc_ele_type, _ofs_ele_type, _cf_ele_type, _math_ele_type, _trg_ele_type, _dimension, _channels, _specialize, _mbf_type > | 'inner_evaluator' implements evaluation of a uniform b-spline, or some other spline-like construct relying on basis functions which can provide sets of weights for given deltas. While class evaluator (below, after namespace detail ends) provides objects derived from vspline::unary_functor which are meant to be used by user code, here we have a 'workhorse' object to which 'evaluator' delegates. This code 'rolls out' the per-axis weights the basis functor produces to the set of coefficients relevant to the current evaluation locus (the support window). We rely on a few constraints: |
►Cinner_type | |
Cstd14::integer_sequence< T, Ints > | |
►Cstd14::integer_sequence< T, Is... > | |
►Cstd::invalid_argument | |
Cvspline::invalid_scalar< T, sz > | |
Cstd14::make_integer_sequence< T, N, Is > | |
CHWY_NAMESPACE::hwy_simd_type< _value_type, _vsize >::masked_type | |
Cvspline::simd_type< _value_type, _vsize >::masked_type | |
Cvspline::vc_simd_type< _value_type, _vsize >::masked_type | |
CHWY_NAMESPACE::mchunk_t< D, _vsize > | Mask type for hwy_simd_type. This is a type which holds a set of masks stored in uint8_t, as the highway mask storing function provides. So this type is memory-backed, just like hwy_simd_type. Template arguments are the corresponding hwy_simd_type's tag type and it's lane count. highway is strict about which vectors and masks can interoperate, and only allows 'direct' interoperation if the types involved 'match' in size. Masks pertaining to vectors of differently-sized T aren't directly interoperable because they don't have the same lane count. One requires k masks of one type and k * 2 ^ i of the other. Here, we follow a different paradigm: The top-level objects we're dealing with have a fixed 'vsize', the number of lanes they hold. This should be a power of two. The paradigm is that objects with equal vsize should be interoperable, no matter what lane count the hardware vectors have which are used to implement their functionality. This makes user code simpler: users pick a vsize which they use for a body of code, all vector-like objects use the common vsize, and the implementation of the vector-like objects takes care of 'rolling out' the operations to hardware vectors. At times this produces what I call 'friction' - if the underlying hardware vectors and masks are not directly compatible, code is needed to interoperate them, and this code can at times be slow. So the recommendation for users is to avoid 'friction' by avoiding mixing differently-sized types, but with the given paradigm, this is a matter of performance tuning rather than imposing constraints on code structure. Some of the 'friction' might be mitigated by additional code using highway's up- and down-scaling routines, but for now the code rather uses 'goading' with small loops over the backing memory, relying on the compiler to handle this efficiently |
Cmultitest< dim, tuple_type, ntypes > | |
Cmultitest< dim, tuple_type, 0 > | |
Cvspline::out_of_bounds | Out_of_bounds is thrown by mapping mode REJECT for out-of-bounds coordinates this exception is left without a message, it only has a very specific application, and there it may be thrown often, so we don't want anything slowing it down |
Crandom_polynomial< dtype > | |
Cvspline::detail::separable_filter< input_array_type, output_array_type, stripe_handler_type > | Struct separable_filter is the central object used for 'wielding' filters. The filters themselves are defined as 1D operations, which is sufficient for a separable filter: the 1D operation is applied to each axis in turn. If the data themselves are 1D, this is inefficient if the run of data is very long: we'd end up with a single thread processing the data without vectorization. So for this special case, we use a bit of trickery: long runs of 1D data are folded up, processed as 2D (with multithreading and vectorization) and the result of this operation, which isn't correct everywhere, is 'mended' where it is wrong. If the data are nD, we process them by buffering chunks collinear to the processing axis and applying the 1D filter to these chunks. 'Chunks' isn't quite the right word to use here - what we're buffering are 'bundles' of 1D subarrays, where a bundle holds as many 1D subarrays as a SIMD vector is wide. this makes it possible to process the buffered data with vectorized code. While most of the time the buffering will simply copy data into and out of the buffer, we use a distinct data type for the buffer which makes sure that arithmetic can be performed in floating point and with sufficient precision to do the data justice. With this provision we can safely process arrays of integral type. Such data are 'promoted' to this type when they are buffered and converted to the result type afterwards. Of course there will be quantization errors if the data are converted to an integral result type; it's best to use a real result type. The type for arithmetic operations inside the filter is fixed via stripe_handler_type, which takes a template argument '_math_ele_type'. This way, the arithmetic type is distributed consistently. Also note that an integral target type will receive the data via a simple type conversion and not with saturation arithmetics. If this is an issue, filter to a real-typed target and process separately. A good way of using integral data is to have integral input and real-typed output. Promoting the integral data to a real type preserves them precisely, and the 'exact' result is then stored in floating point. With such a scheme, raw data (like image data, which are often 8 or 16 bit integers) can be 'sucked in' without need for previous conversion, producing filtered data in, say, float for further processing |
►Cstd::experimental::simd | |
Cvspline::simd_traits< T > | Traits class simd_traits provides three traits: |
►CVc::SimdArray | |
Cvspline::sink_functor_tag< _vsize > | While 'normal' unary_functors are all derived from unary_functor_tag, sink functors will be derived from sink_functor_tag |
►Cvspline::sink_functor_tag< vspline::vector_traits< IN > ::size > | |
►Csink_type | |
Ctest< dim, T > | |
Ctest< 0, T > | |
Cvspline_threadpool::thread_pool | |
►Cvspline::unary_functor_tag< _vsize > | We derive all vspline::unary_functors from this empty class, to have a common base type for all of them. This enables us to easily check if a type is a vspline::unary_functor without having to wrangle with unary_functor's template arguments |
►Cvspline::unary_functor_tag< vspline::vector_traits< coordinate_type > ::size > | |
►Cvspline::unary_functor_tag< vspline::vector_traits< double > ::size > | |
►Cvspline::unary_functor_tag< vspline::vector_traits< float > ::size > | |
►Cvspline::unary_functor_tag< vspline::vector_traits< IN > ::size > | |
Cvspline::vector_traits< T, _vsize, Enable > | With the definition of 'simd_traits', we can proceed to implement 'vector_traits': struct vector_traits is a traits class fixing the types used for vectorized code in vspline. These types go beyond mere vectors of fundamentals: most of the time, the data vspline has to process are not fundamentals, but what I call 'xel' data: pixels, voxels, stereo sound samples, etc. - so, small aggregates of a fundamental type. vector_traits defines how fundamentals and 'xel' data are to be vectorized. with the types defined by vector_traits, a system of type names is introduced which uses a set of patterns: |
Cvspline::vector_traits< T, _vsize, typename std::enable_if< vspline::is_element_expandable< T > ::value > ::type > | Specialization of vector_traits for 'element-expandable' types. These types are recognized by vigra's ExpandElementResult mechanism, resulting in the formation of a 'vectorized' version of the type. These data are what I call 'xel' data. As explained above, vectorization is horizontal, so if T is, say, a pixel of three floats, the type generated here will be a TinyVector of three vectors of vsize floats |
Cwielding::wield< dimension, in_type, out_type > | Reimplementation of wield using the new 'neutral' multithread. The workers now all receive the same task to process one line at a time until all lines are processed. This simplifies the code; the wield object directly calls 'multithread' in it's operator(). And it improves performance, presumably because tail-end idling is reduced: all active threads have data to process until the last line has been picked up by an aggregator. So tail-end idling is in the order of magnitude of a line's worth, in contrast to half a worker's share of the data in the previous implementation. The current implementation does away with specialized partitioning code (at least for the time being); it looks like the performance is decent throughout, even without exploiting locality by partitioning to tiles |
Cwielding::wield< 1, in_type, out_type > | |
Cwielding::yield_reductor< vsz, ic_type, functor_type, typename > | Aggregator to reduce arrays. This is like using indexed_reductor with a functor gathering from an array, but due to the use of 'bunch' this class is faster for certain array types, because it can use load/shuffle operations instead of always gathering |
Cwielding::yield_reductor< 1, ic_type, functor_type > | Specialization for vsz == 1. Here the data are simply processed one by one in a loop, without vectorization |
Cvspline::yield_type< crd_t, data_t, _vsize, enable > | At times we require reading access to an nD array at given coordinates, as a functor which, receiving the coordinates, produces the values from the array. In the scalar case, this is trivial: if the coordinate is integral, we have a simple indexed access, and if it is real, we can use std::round to produce a nearby discrete coordinate. But for the vectorized case we need a bit more effort: We need to translate the access with a vector of coordinates into a gather operation. We start out with a generalized template class 'yield-type': |