miércoles, 16 de diciembre de 2009

Possible improvements out of our scope

1. Other skin models so as to include other races
2. Motion recognition
3. Expand gesture set making use of the fingertips and the angles between them.
4. Recognize gestures in movement - useful for gaming, for example.
5. Improve recognition in heavy clutter
6. Improve performance
7. Multiple hands tracking for enhanced functionality.
8. Improve hand presence detection
9. Find a way to segment hand and discard forearm without sleeves
10. Improve dynamical model
11. Find a more cost-effective segmentation method and enhance robustness to changes in light.
12. Enhance mouse motion with acceleration, for example.

TODO list

As of today, the bulk of the project can be considered finished and the remaining is about polishing what we have already in order to make it more robust.

The final aim is to have at least 2 gestures that can be solidly recognised. For that the following issues need to be addressed:

1. False holes in the perimeter causing false convexity defects to be recognised.
2. Correct convexity defects detection should be more robust as well.
3. Think of a way to deal with borders.
4. Adaptive skin modelling.

viernes, 11 de diciembre de 2009

Implementing the first gestures: left click

So far the segmentation process is not robust enough and lots of incorrect gestures are detected.

As a first test, we decided to implement the left button click, which would be triggered when no convexity defects were detected (closed fist). No convexity defects would be interpreted as 'left button down' and otherwise 'left button up'. TODO: check if mouse was up/down and call functions only when necessary (now they're called whenever the aforementioned conditions are met).

Firstly, it was necessary to check if the hand was completely within the viewport. In that case, convexity defects were detected and the left button functions triggered.

Gesture recognition was only allowed when the pointer was moving less than 4 pixels in either direction. This was necessary since the tracker is not completely precise.

miércoles, 9 de diciembre de 2009

Gesture recognition

In "Learning OpenCV" a method using histograms is suggested for basic gesture recognition. They suggested computing the histogram of the picture to detect the hand region, calculate the image gradient and then compute the histograms for the gesture.

This method, however, was not rotation invariant and we had interest in it being so. A similar method, in the sense of counting skin pixels, was used. It consisted in calculating the difference between the areas of the convex hull of the hand and its polygonal approximation. The result would be the convexity defects.

Steps:
1. Contour detection of the hand
2. Douglas-Peucker algorithm for polygonal approximation
3. Convex hull.
4. Calculate convex defects and consider the deepest points.

The problem with the convexity defects' deepest points, however, is that often there are points which are not of of our interest.

At first, the minimum bounding box was considered to see if it could be used to discard the unwanted points. The idea behind it was to fix a point of the box, join it with the estimated hand location (approximately at the centre of the hand) and compute the angle with the points. The problem was that it was not so easy to detect the orientation of the hand, and thus fix the point at the bounding box.

Another way was necessary and the distance of the points to the convex hull was considered. From observing the results we had so far, we noticed that the points at the valleys between fingers were at a further distance with respect to the unwanted points. The maximum distance to the convex hull was calculated and then the points which were at less than 0.6 of this maximum_distance were considered discardable, which generally worked fairly well. Important valleys were discarded sometimes, but they seemed to be more related to poor segmentation.

viernes, 4 de diciembre de 2009

Hand detection

Initially when there is no hand the tracker will be in a steady state with all particles being randomly spread across the image, giving an estimation of the location of the object at the centre of the image, roughly. However, we cannot use this estimation because there is no actual hand. The problem we are presented is therefore about detecting when the hand comes into scene.

Using the median of densities

Let the probability densities be measured as the Mahalanobis distance from the colour of a pixel to the mean colour of a skin pixel. At first, the median of the probability densities at each pixel within an 8x8 window centered at the estimated location was used to determine if a hand was in place. The reason behind this approach was that if the median was evaluated as 'skin' then most pixels within the window would be 'skin' and hence a hand was in place.

This method worked intermitently since the tracker was not able to follow the hand with a high enough precision when it was moved at varying speeds and directions. The moved hand would cause the median of the window to change drastically when the estimation was close to an edge of the hand.

Using the standard deviation

Another method tried was using the standard deviation of the particles. An assumption was made that if it was below a certain threshold then a hand was detected.

This method proved to be very robust even though it still had a weakness. If the noise in the image was not properly removed the tracker could be following wrong objects and thus mistakenly detecting a hand.

Motion detection

To start the tracker a basic motion detection of the hand could be used. However, one could argue that some automatism and convenience is lost.

lunes, 23 de noviembre de 2009

Mouse interface

The second part of the project consists of creating a mouse interface. The first objective is to be able to move the mouse pointer with the movement of our tracked hand. Afterwards, the system has to be able to recognise basic hand gestures corresponding to mouse events or other functionality.

To achieve the first objective, the tracker has to be able to recognise when the hand is in sight. To do so, we find the median m of the mahalanobis distances of the pixels in the window centered at the estimated location of the hand and accept it as a hand when m < 3.5.

Once the hand is detected, making the the mouse pointer move like the movement of the hand is quite straightforward. The algorithm is described below:

Variables: hand_out_of_sight = true.

1. Detect hand
2. If hand_detected and hand_out_of_sight = true then
mouse_position = new_hand_position - current_position (ensure the values do not go out of screen bounds)

Set current_position to the new position of the hand.
3. Set hand_out_of_sight according to 1.

The movement of the pointer is therefore relative to the movement of the hand with respect to the position where it was first detected. This way we avoid using absolute positioning and hence weird pointer jumps.

martes, 3 de noviembre de 2009

Hand segmentation

In order to do the tracking we segmented our hand prior to the tracking so that the measurement stage could be done against a binary image. Speed was therefore a big concern since the tracker had to be able to run in real time.

Several algorithms were considered, especially those which had been found to have higher rates of true positives as exposed by Vezhnevets et al. [1].

...

[1] Vezhnevets, V., Sazonov, V., Andreeva, A., 2003. A survey on pixel-based skin color detection techniques, GRAPHICON03, pp. 85-92.