Intel's OpenCV is a great computer vision library with high quality implementations of the most common algorithms in the field. In our project we will be using their CONDENSATION[1] algorithm implementation.
CONDENSATION stands for "Conditional Density Propagation" and is based on particle filter techniques in order to track objects. Density here represents a probability distribution of the location of the object. Previously the famous Kalman filter had been used for tracking but the Kalman filter proves inadequate in many cases where, for example, there are simultaneous possible objects to track, since it is based on Gaussian densities. Even if there is only one object to track, a cluttered background could provide false alternative hypotheses. Another drawback of the Kalman filter is that it estimates the state of a linear dynamic system, which cannot be generally assumed. The bouncing of a ball, for example, is not linear the moment it bounces at the floor.
Another advantage of the CONDENSATION algorithm is that it is much simpler than the Kalman filter. It is based on factored sampling, which is a method that aims at transforming uniform densities into weighted densities, but applied iteratively. Factored sampling consists of the following steps:
1. Generate a sample set {s_1, ... , s_n} from a prior density p(x).
2. Choose a sample with index i in the range {1, ... , N} with probability pi_n, where pi_n is a weight calculated from the observation, normalised with the total weight sum.
The CONDENSATION algorithm applies factored sampling iteratively to successive image frames in sequence, the images where we want to track our object. Each iteration is an execution of the factored sampling algorithm using as prior density the weighted sample set of the last iteration. Therefore, to start the algorithm must begin with a prior density. The prior density at time t is denoted as p(x_t | z_t-1).
Given an old N-sampled set {sn_t-1, pin_t-1, cn_t-1, n= 1, .., N} at time t-1 a new sample set is constructed for time t. Each of these new samples are generated as follows:
1. We select a sample by:
(a) ...
(b) ...
(c) ...
The cumulative probability is used for efficiency. Since we are selecting a number from the uniform distribution this implies all numbers have the same probability, which means we can select all values at an equal distance from the range [total_confidence/num_samples, total confidence]. Values with greater weight span across a greater range in the cumulative distribution, thus elements with higher probability are likely to be chosen several times while others with lower probability might not be chosen at all.
2. Prediction stage. This process consists of two stages: drift and diffusion. The first stage, the drift, is a deterministic process which aims at predicting where the object will be in the next iteration considering its motion dynamics. The second stage, the diffusion, is a stochastic process which adds randomness to the density so as to mimic prediction and measurement noise and effectively separate the elements which were chosen several times.
sn_t = Asn_t + Bwn_t, where A is the object dynamics matrix, B the process noise standard deviation and wn_t a random value from the Gaussian distribution.
At the end of this stage the new sample has been generated by prediction and its weight in the new density has to be measured.
3. Measurement stage. In this stage the observation comes into play. The location of the sample is considered and its weight is calculated according to the observation using a defined weight function.
When defining this weight function tt should be noted that weight 0 can't be given, otherwise we lose randomness when generating samples and samples could potentially disappear, if cumulative probability is 0.
[1] Isard, M. and Blake, A. 1998. CONDENSATION -- conditional density propagation for visual tracking, Int. J. Computer Vision, 29, 1, 5--28.
martes, 20 de octubre de 2009
Suscribirse a:
Entradas (Atom)