Bayesian Filtering is a probabilistic technique for data fusion. The technique combines a concise mathematical formulation of a system with observations of that system. Probabilities are used to represent the state of a system, likelihood functions to represent their relationships. In this form Bayesian inference can be applied and further related probabilities deduced. See Wikipedia for information on Probability theory, Bayes theorem, Bayesian Inference.
For discrete systems the Bayesian formulation results in a naturally iterative data fusion solution. For dynamic systems there is a class of solutions, discrete filters, that combine observed outputs of the system with the system's dynamic model. An estimator computes a estimate of the systems state with each observation of the system. Linear estimators such as the Kalman Filter are commonly applied.
Bayes++ is an open source library of C++ classes. These classes represent and implement a wide variety of numerical algorithms for Bayesian Filtering of discrete systems. The classes provide tested and consistent numerical methods and the class hierarchy explicitly represents the variety of filtering algorithms and system model types
The following documentation summaries the classes available and provides some general comments on their use. Each class's .hpp file provides a complete interface specification and the numerical form of the solution. Implementation issues are documented in the .cpp files.
There is a wide range of different numerical techniques for filtering. For example the Kalman filter (a linear estimator which calculates the 2^{nd} order statistics of the system) is represented using either a state vector and a covariance matrix, or alternatively in information form.
Each numerical technique is a Scheme, and each Scheme is implemented as a class. Each scheme implements the algorithms required by a filter's particular numerical form. The schemes provide a common interface that allows:
Each Scheme has both advantages and disadvantages. The numerical complexity and efficiency varies, as does the numerical stability. The table below lists all Schemes together with any requirement on the representation associated with the algorithm.
Scheme |
Formulation and |
Representation Requirements |
Covariance_schemeExtended Kalman Filter (EKF) |
Classic 2nd order filter with state mean and covariance representation. Non-linear models require a gradient linearisation (Jacobian). Uses the common innovation update formulation. |
Innovation covariance invertible |
Unscented_schemeKalman filter using unscented non-linear approximations |
Unscented non-linear transformations replace the Jacobians used in the EKF- reducing linearisation errors in all cases. |
Innovation covariance invertible |
Iterated_covariance_schemeModified EKF update |
Kalman filter for highly non-linear observation models. Observation update is iterated using an Euler approximation. |
State, observation and innovation covariance invertible |
Information_schemeExtended Information Filter (EIF) |
Information (inverse covariance) filter. Invertible linear model allows direct information form
prediction. |
State, observation and innovation covariance invertible |
Information_root_schemeSquare root information filter (SRIF) |
Numerically stable solution using factorization of inverse covariance as R'R |
Invertible prediction model. Prediction and observation covariance invertible |
UD_schemeSquare root covariance filter |
Numerically stable solution using UdU' factorization of covariance. |
Innovation covariance invertible. Linearized observation requires uncorrelated noise |
CI_scheme |
CI is interesting as it provides a weaker but more robust fusion then traditional covariance based method such as the Kalman filter. It estimates state and an upper bound of what its covariance could be. |
No solution if covariances don't intersect. |
SIR_schemeSequential Importance Re-sampling filter |
A bootstrap representation of the state distribution – no linear or Gaussian assumptions required. Uncorrelated linear roughening of samples. |
Bootstrap samples collapse to a degenerate from with fewer samples then states. |
SIR_kalman_schemeSequential Importance Re-sampling filter |
A bootstrap representation of the state distribution – no linear or Gaussian assumptions required. Additional computes state mean and covariance. Correlated linear roughening of samples. |
As above. Correlation non-computable from degenerate samples. |
This is very simple example for those who have never used the Bayesian Filtering Classes before. The example shows how two classes are built. The first is the prediction model, the second the observation model. In this example they represent a simple linear problem with only one state variable and constant model noises.
The example then constructs a filter. The Unscented filter scheme is chosen to illustrate how this works, even on a simple linear problem. After construction the filter is given the problem’s initial conditions. A prediction using the predefine prediction model is then made. This prediction is then fused with an external observation given by the example and the defined observation model. At each stage, the filter’s state estimate and variance estimate are printed.
This example solves a simple position and velocity observation problem using the Bayesian filter classes. The system has two states, position and velocity, and a noisy position observation is simulated. Two variants:
are demonstrated using a specified numerical scheme. The example shows how to build linear prediction and observation models and to cycle the filter to produce state estimates.
This example implements a “Quadratic Calibration” filter. The observer is trying to estimate the state of a system as well as the systems scale factor and bias. A simple system is simulated where the state is affected by a known perturbation.
The example shows how to build linearized prediction and observation model and cycle a filter to produce state estimates.
The aim of Bayes++ is to provide a powerful and extensible structure for Bayesian filtering. This is achieved by hierarchical composition of related objects. In C++ these are represented by a hierarchy of polymorphic classes. Numerical Schemes are grouped by their state representation. The Schemes themselves provided the filtering operations on these representations.
To complete a filter, an associated prediction and observation model must also be specified. These model classes parameterize the filtering operations. The users responsibility is to chose a model type and provide the numerical algorithm for that model. Again model classes composed using a hierarchy of polymorphic abstract classes to represents the structure of each model type.
This dual abstraction, separating numerical schemes from model structure, provides a great deal of flexibility. In particular it allows a well-specified model to be used with a variety of numerical schemes.
The table below lists a few of the key abstract filter classes upon which filter Schemes are build. The C++ class hierarchy documentation gives the complete structure.
Bayes_filter |
Base class for everything |
Likelihood_filter |
Represents only the Bayesian Likelihood of a state observation |
Functional_filter |
Represents only the filter prediction by a simple functional (non-stochastic) model |
State_filter |
Represents only the filter state and an update on that state |
Kalman_state_filter |
Kalman representation of state statistics. |
Information_state_filter |
Information representation of state statistics. |
Linrz_filter |
Model interface for a linear or gradient linearized Kalman
filters. |
Sample_filter |
Discreate reprensation of state statistic. |
These two tables show some of the classes in the hierarchy upon which models are built.
Sampled_predict_model |
Sampled stochastic prediction model |
Functional_predict_model |
Functional (non-stochastic) prediction model |
Additive_predict_model |
Additive Gaussian noise prediction model. |
Linrz_predict_model |
Linearized prediction model with Jacobian of non-linear functional part |
Linear_predict_model |
Linear prediction model |
Linear_invertable_predict_model |
Linear prediction model which is invertible |
Likelihood_observe_model |
Likelihood observation model |
Functional_observe_model |
Functional (non-stochastic) observation model |
Linrz_uncorrelated_observe_model |
Linearized observation model with Jacobian of non-linear functional part and additive uncorrelated noise |
Linrz_correlated_observe_model |
as above, but with additive correlated noise |
Bayes rule is usually defined in term of Probability Density Functions. However PDFs never appear in Bayes++. They are always represented by their statistics.
This is for good reason, there is very little that can be done algorithmically with a function. However the sufficient statistics, given the assumptions of a filter, can be easily manipulated to implement Bayes rule. Each filter class is derived from a base classes that represent the statistics used. For example the Kalman_filter and Sample_filter base classes.
It would be possible to use a common abstract base class for that enforces the implementation of a PDF function; a function that maps state to probability. This would provide a very weak method to view the PDF of state. However such a function could not be efficiently implemented by all schemes. Therefore to enforce this requirement in the base class would be highly restrictive.
Many of the filters use classical linear estimation techniques, such as the Kalman filter. To make them useful they are applied in modified forms to cope with linearized models of some kind. Commonly a gradient linearized model is used to update uncertainty, while the state is directly propagated through the non-linear model. This is the Extended form used by the Extended Kalman Filter.
However, some numerical schemes cannot be modified using the Extended form. In particular, it is not always possible to use the extended form with correlated noises. Where this is the case, the linearized model is used for uncertainty and state.
There are also many Bayesian filters that work with non-Gaussian noise and non-linear models - such as the SIR_filter. The SIR scheme has been built so it works with Likelihood model.
The filters support discontinuous models such as those depending on angles. In this case the model must be specifically formulated to normalize the states. However, some schemes need to rely on an additional normalization function. Normalization works well for functions that are locally continuous after normalization (such as angles). Non-linear functions that cannot be made into locally continuous models are not appropriate for linear filters.
Where possible, the Schemes have been designed to all have the same interface, providing for easy interchange. However, this is not possible in all cases as the mathematical methods vary greatly. For efficiency some schemes also implement additional functions. The functions can be use to avoid inefficiencies where the general form is not required.
Scheme class constructors are irregular (their parameter list varies) and must be used with care. Each numerical scheme has different requirements, and the representation size needs to be parameterized. A template class Filter_scheme can be used to provide a generic constructor interface. It provides provides specialization for all the Schemes so they can be constructed with a common parameter list.
The design of the class hierarchy is deliberately open. Many of the variables associated with schemes are exposed as public members. For example the covariance filter’s innovation covariance is public. This is to allow efficient algorithms to be implemented using the classes. In particular it is often the case that subsequent computations reuse the values that have already been computed by the numerical schemes. Each scheme defines a public state representation.
Furthermore many temporaries are protected members to allow derived classes to modify a scheme without requiring any additional overhead to allocate its own temporaries.
Open interfaces are potentially hazardous. The danger is that abuse could result in unnecessary dependencies on particular implementation characteristics.
The two functions init and update are defined in the filter class hierarchy for classes with names ending _state_filter. They are very important in allowing the public state representation to be managed.
After the public representation of a scheme is changed (externally) the filter should be initialized with an init function. The scheme may define additional init functions that allow it to be initialized from alternative representations. init() is defined for schemes derived from Kalman_state_filter.
Before the public representation of a scheme is used the filter should be updated with an update function. The scheme may define additional update functions that allow it to update alternative representations. update() is defined for schemes derived from Kalman_state_filter.
The filter hierarchy has been specifically designed to allow state representation to be shared. Schemes state representations are inherited using one or more _state_filter 's as virtual base classes. It is therefore possible to combine multiple schemes (using multiple inheritance) and only a single copy of each state representation will exist.
The init and update functions should be used to coordinate between the schemes and state representation. This allows precise control numerical conversions necessary for different schemes to share the state.
Filter classes can be assigned when they are of the same size. In cases where the class includes members in addition to the public representation this is optimized so that only public representation is assigned. Assignment is equivalent to: update, assignment of public representation and initialization from the new state.
Copy Constructors are NOT defined. Generally the classes are expensive to copy and so copies should be avoided. Instead references or (smart) pointers should be combined with assignment to create copies if necessary.
The classes assume their state representation is a constant size. They define their matrix sizes on construction. Matrices (and vectors!) in public representation should NOT be externally resized.
Since matrix resizing invariable involves reallocation, it is just as efficient to create new matrices as to resize them. Also for a filter scheme to change it size it must recomputed its internal states. Therefore if the size of filter needs to change, the solution is to create a new filter. The new filter can then be initialized from the state of the old filter plus some new states. What you do with these new states usually influences the state representation you choose. Generally new states either enter the system with nothing being know about them (zero information), or extract value being know (zero covariance).
It is important to be able to rely on the numerical results of Bayes++. Generally the Schemes are implemented using the most numerically stable approach available in the literature. Bayes++ provides its own UdU' factorization functions to factorize PSD positive (semi)definite matrices. The factorization is numerically stable, computing and checking the conditioning of matrices.
Exceptions are thrown in the case of numerical failure. They
follow a simple rule:
Bayes_filter_exception
is thrown if an algorithm cannot continue or guarantee the numerical
post-conditions of a function
The only exception guarantee that Bayes++ makes when throwing Bayes_filter_exception from any function, is that no resources will be leaked. That is the Weak guarantee as defined by Reference[9]. Therefore, unless otherwise specified, the numeric value of a class's public state is undefined after this exception is thrown by a member of a class.
Be aware that numerical post-conditions may be met even with extreme input values. For example some inputs may result in overflow of a result. This does not invalidate the post-conditions that the result is positive. Even some Not a Number values may be valid. Therefore the results may be computed without exception, but include NaN or non-number values.
What does this mean when using a filter Scheme? If the Scheme throws an exception then, unless otherwise specified, the numeric value of its state is undefined. Therefore you cannot use any function of the filter that has any preconditions on its numerical state. To continue using a filter either the init() function should be used (which has no pre-conditions) or the filter destructed.
Generally to access the public state of a Scheme you must first call the update() function. If no exception is thrown then the post-conditions of update are guaranteed. The post-conditions of update() may include such useful properties as the covariance matrix is PSD. Many application specific tests may also be required. Just being PSD doesn't say much about X. It could even have Infinity values on the diagonal! Conditioning, trace, determinate, and variance bounds can all be applied at his point.
Why are models and filter Schemes polymorphic ? OR Why are they implemented with virtual functions?
One alternative would be to use generic instead of polymorphic Schemes. This would implement Schemes as templates. If the sizes of matrix operations could be fixed generic models and schemes would directly manipulate matrices for predefined size.
Bayes++ relies on a dual abstraction, separating numerical schemes from model structure. To represent these, a polymorphic design was a very important part of Bayes++ design. After a fair bit of use I still think it was the correct one. Why?
A) Usage : There are many (run time) polymorphic things I
really want to do with the library. I want to be able to build
composite filtering algorithms that switch models and schemes at run
time. This requires both polymorphic filters and models and
runtime sizing of matrices.
A generic approach, especially one
parametrized on matrix size would not allow this. The code size
produced would also bloat massively. Not even STL parameterizes its
containers template with size!
B) Type safety: There is surprising little orthogonality
in filtering schemes. The numerics often dictate restrictions or
additional functionality. The type hierarchy in Bayes++ provides a
succinct and safe way to enforce much of this structure.
In a
Generic approach checking that a particular Scheme models a generic
concept correct would be very difficult to enforce.
C) Compiler problems: Generic programming in C++ has a nasty syntax, is very slow to compile and pushes the limits of compiler reliability.
D) Efficiency: The run time overhead of a polymorphic filter is negligible compared to the matrix algebra required. In fact using common code may increase efficiency on a many computing architectures.
Bayes++ requires an external library to represent matrices and vectors, and to support basic linear algebra operations. Bayes++ uses the uBLAS library from Boost for all these requirements. It an excellent basic linear algebra library. The interface and syntax are easy to use.
Previous versions of the filters were implemented with the Matrix Template Library, MPP and TNT. Public releases using MTL are still available.
The Bayes++ implementation uses a set of adapter classes in the namespace Bayesian_filter_matrix (FM for short) for matrix access. This system should make it simpler to port the library to other matrix types. For uBLAS these function generally have no runtime overhead. This efficiency is due to uBLAS's excellent expression template implementation. For MTL most functions are reasonably efficient but some functions create matrix temporaries to return values.
Most of the filters avoid using Matrix and Vector temporaries. They also have optimizations for a constant (on construction) state size, noise and observations. In particularly the UD_filter scheme avoids creating any temporary objects. All matrices maintain a fixed size, other then when the observation state size is varied.
Why are temporary matrix objects bad? The main difficulty is construction and destruction of Matrices. This generally requires dynamic memory allocation which is very slow. For small matrices this dominates compared to the cost of simple operations such as addition and multiplication.
There are three ways out of this problem.
1. Use fixed size matrices; they can (nearly) always be efficiently allocated on the stack.
2. Use stack based dynamic allocators (such as C99's dynamic arrays or alloca) for temporaries.
3. Don't create temporaries. This is achievable with a combination of hard coding in the algorithms (pre-allocation) and by using a Matrix library with expression templates to avoid creating temporaries for return values.
Bayes++ is implemented to avoid creating temporaries. At present my solution is somewhat of a compromise. MTL is only moderately efficient on most C++ compilers. On release code this results in a 50% performance reduction if temporaries are avoided. uBLAS attains close to optimal efficiency in many situations.
The UD_filter is a good illustration. It has been hand crafted to avoid construction of any temporaries (I use it for embedding) and can be compiled with either uBLAS, MTL or a special fast matrix library. UD_filter is heavily optimized to use row operations to avoid indexing overheads. uBLAS and the special library achieve comparable results, which I believe are as fast as can be achieved.
On the flip side, the current Unscented_filter implementation shows some of the problems. Because of my wrapper classes Row/Column of a 2D matrix are not compatible with the Vec type. This results in a lot of unnecessary copying into pre-allocated temporary vectors.
Future work includes a general method of avoiding implementation temporaries. Filter schemes will be parameterised with helper objects that deal with all the temporaries required. These helper objects can then hide all the hard work of pre-allocation or dynamic allocation of temporary objects.
For numerical precision, all filter calculations use doubles. Templates could be used o provide numerical type parameterization but are not. However the base class Bayes_filter defines Float, which is used throughout as the numerical type.
The Bayes_filter base class is constructed with constant state size. This allows efficient use of the matrix library to implement the algorithms. Each derived class requires additional representation and matrix temporaries and where possible they are restricted to a constant size. In general, the only unknown is the size of the observation state as parameterized by the observation model.
Where the state size varies efficient solutions are still possible using either spares matrix representations or by recreating new filters with increased size.
The Bayesian Filtering Classes are now a stable set of solution for the majority of linear, linearisable and non-linear filtering problems. I am very happy with their form and function. In particular, the two-layer abstraction works extremely well. The classes show best practice in both their design and in efficiency and numerical stability of the algorithms used. So where to go from here?
The basic tenant of Bayesian theory is that the Likelihood function complete captures all that is know to update the state. The Bayesian Filtering classes now allow the modeling systems using Likelihood functions. At present the SIR filter is the only Likelihood filter. Sample filters will grow into a separate branch in the class hierarchy. A general set of adapters will be required to move between these varied representations.
To further improve the abstraction, the method of internal variable exposure needs to be regularized. This will require the addition of a few classes that hold and limit access to filters internal variables.
The ordering of parameters used in Scheme class constructors is prone to error. It requires the parameter list to be varied for each scheme used. An extensible method of model parameterization is required. A common parameterization could then be used with scheme constructor extracting the information they require.
Can the state representation be abstracted away from the numerics of a filter Scheme?
At the moment Bayes++ filter Schemes are a combination of the numerical solution (e.g. innovation update) AND the representation (e.g. Covariance matrix). Can these two functions be separated?
A highly sophisticated solution could use polymorphic (or generic) filter algorithms that depend on the type (or traits) of the representation. I think this in unnecessarily complex. In Bayes++ a Scheme should only implement one algorithm.
The problem I would like to address is a bit more limited. A lot
of representations are naturally hierarchical and dynamic.
e.g.
state := vehicle states & feature states, feature states :=
{feature1 state & feature2 state ....}
There would seem to be too possible ways to solve this
A) HARD: Use a state representation that represents this kind of hierarchy. The filter schemes could therefore use hierarchical numerical solutions that exploit the properties of the hierarchical state.
B) EASY: Allow a sparse matrix representation of state. If the sparse representation is a Graph then the sort of augmented state representation in the example can be easily built without any copy overhead. Each scheme would then just use sparse matrix techniques in its numerical solution. Mostly what is needed is Cholesky and QR decomposition and these have good sparse solutions. Obviously there would be a runtime overhead for this but it would be great for Simultaneous Location and Mapping problems!
Copyright (c) 2003,2004,2005 Michael Stevens. This document is part of the Bayes++ library - see Bayes++.html for copyright license details.
[1] "A New Approach for Filtering Nonlinear Systems", SJ Julier JK Uhlmann HF Durrant-Whyte, American Control Conference 1995
[2] "Factorisation Methods for Discrete Sequential Estimation", Gerald J. Bierman, ISBN 0-12-097350-2
[3] "Real time Kalman Filter Application", Mohinder S. Grewal, Angus P. Andrews, ISBN 0-13-211335-X
[4] "Extension of Square-Root Filtering to Include Process Noise", P. Dyer and S. McReynolds, Journal of Optimization Theory and Applications, Vol.3 No.6 1969
[5] "Novel approach to nonlinear-non-Guassian Bayesian state estimation", NJ Gordon, DJ Salmond, AFM Smith, IEE Proceeding-F, Vol.140 No.2 April 1993
[6] "Tracking and Data Association", Y Bar-Shalom and T Fortmann, Academic Press, ISBN 0-12-079760-7
[7] "Stochastic Models, Estimation, and Control", Peter S Maybeck, Academic Press, ISBN 0-12-480701-1
[8] "A non-divergent Estimation Algorithm in th Presence of Unknown Correlaltion", SJ Julier, JK Uhlmann, Proc. American Control Conference 6/1997
[9] "Exception-Safety in Generic Components", David Abrahams, Proc. of a Dagstuhl Seminar 'Generic Programming', Lecture Notes on Computer Science 1766, http://www.boost.org/more/error_handling.html