viernes, 18 de diciembre de 2009

Further improving gesture recognition

So far the minimal bounding box has been used to check that the whole hand is inside the viewport but this proved unreliable when the hand orientation was horizontal/vertical, as the box would be horizontal/vertical as well. Instead, the bounding circle was used. Only if it was completely inside would the gesture considered to be recognised.

Aside from this, to make pointer movement easier pointer movement could be enabled/disabled with gestures, which would make the pointer easier to control as the camera quite probably will not cover all the screen, so the user might find himself moving the hand in and out of the screen towards the desired direction in order to get the pointer to the desired location.


miércoles, 16 de diciembre de 2009

Improving gesture recognition: discarding invalid gestures

Several problems arised with our method of gesture recognition based on convexity defects with respect to the convex hull. We were able to discard some invalid convexity points based on the distance from the deepest point in the defect to the convex hull.

However, this approach still had a problem, which is shown in the next picture.

Poor segmentation can yield false positives, as is shown in the picture.

We did not find an easy way to tackle it, so in the end it was decided that this gesture would not be recognised. Instead, we decided that the only gesture with one valid convexity defect which would be accepted would be the thumb-up gesture, rotation-invariant.


As can be observed, this gesture has a particular width to height ratio, so gestures which don't meet a certain ratio threshold can be discarded. In particular, 1.6 was selected, being the result of dividing width/height or the inverse, whichever is >1.

Another criteria we used to validate gestures was to establish a minimum and maximum number of sides that the gesture's polygonal approximation can have. With the polygonal approximation we have chosen, the start and finish points of the convexity defect, the fingertips, can be either spikes or flat.

We chose the minimum number of sides to be 6, which we found happened with a closed fist. Then, for every convexity defect we can count 4 sides. Invalid defects change what would have been one side to two or more. Removing already counted sides, we get the following formula:

Min_sides = 6
Max_sides = min_sides + (4 x valid_defects) - (valid_defects-1) - invalid_defects

Even though not perfect, with these two simple methods we are able to discard many invalid gestures due to poor segmentation and thus have a more robust recognition.

Possible improvements out of our scope

1. Other skin models so as to include other races
2. Motion recognition
3. Expand gesture set making use of the fingertips and the angles between them.
4. Recognize gestures in movement - useful for gaming, for example.
5. Improve recognition in heavy clutter
6. Improve performance
7. Multiple hands tracking for enhanced functionality.
8. Improve hand presence detection
9. Find a way to segment hand and discard forearm without sleeves
10. Improve dynamical model
11. Find a more cost-effective segmentation method and enhance robustness to changes in light.
12. Enhance mouse motion with acceleration, for example.

TODO list

As of today, the bulk of the project can be considered finished and the remaining is about polishing what we have already in order to make it more robust.

The final aim is to have at least 2 gestures that can be solidly recognised. For that the following issues need to be addressed:

1. False holes in the perimeter causing false convexity defects to be recognised.
2. Correct convexity defects detection should be more robust as well.
3. Think of a way to deal with borders.
4. Adaptive skin modelling.

viernes, 11 de diciembre de 2009

Implementing the first gestures: left click

So far the segmentation process is not robust enough and lots of incorrect gestures are detected.

As a first test, we decided to implement the left button click, which would be triggered when no convexity defects were detected (closed fist). No convexity defects would be interpreted as 'left button down' and otherwise 'left button up'. TODO: check if mouse was up/down and call functions only when necessary (now they're called whenever the aforementioned conditions are met).

Firstly, it was necessary to check if the hand was completely within the viewport. In that case, convexity defects were detected and the left button functions triggered.

Gesture recognition was only allowed when the pointer was moving less than 4 pixels in either direction. This was necessary since the tracker is not completely precise.

miércoles, 9 de diciembre de 2009

Gesture recognition

In "Learning OpenCV" a method using histograms is suggested for basic gesture recognition. They suggested computing the histogram of the picture to detect the hand region, calculate the image gradient and then compute the histograms for the gesture.

This method, however, was not rotation invariant and we had interest in it being so. A similar method, in the sense of counting skin pixels, was used. It consisted in calculating the difference between the areas of the convex hull of the hand and its polygonal approximation. The result would be the convexity defects.

Steps:
1. Contour detection of the hand
2. Douglas-Peucker algorithm for polygonal approximation
3. Convex hull.
4. Calculate convex defects and consider the deepest points.

The problem with the convexity defects' deepest points, however, is that often there are points which are not of of our interest.

At first, the minimum bounding box was considered to see if it could be used to discard the unwanted points. The idea behind it was to fix a point of the box, join it with the estimated hand location (approximately at the centre of the hand) and compute the angle with the points. The problem was that it was not so easy to detect the orientation of the hand, and thus fix the point at the bounding box.

Another way was necessary and the distance of the points to the convex hull was considered. From observing the results we had so far, we noticed that the points at the valleys between fingers were at a further distance with respect to the unwanted points. The maximum distance to the convex hull was calculated and then the points which were at less than 0.6 of this maximum_distance were considered discardable, which generally worked fairly well. Important valleys were discarded sometimes, but they seemed to be more related to poor segmentation.

viernes, 4 de diciembre de 2009

Hand detection

Initially when there is no hand the tracker will be in a steady state with all particles being randomly spread across the image, giving an estimation of the location of the object at the centre of the image, roughly. However, we cannot use this estimation because there is no actual hand. The problem we are presented is therefore about detecting when the hand comes into scene.

Using the median of densities

Let the probability densities be measured as the Mahalanobis distance from the colour of a pixel to the mean colour of a skin pixel. At first, the median of the probability densities at each pixel within an 8x8 window centered at the estimated location was used to determine if a hand was in place. The reason behind this approach was that if the median was evaluated as 'skin' then most pixels within the window would be 'skin' and hence a hand was in place.

This method worked intermitently since the tracker was not able to follow the hand with a high enough precision when it was moved at varying speeds and directions. The moved hand would cause the median of the window to change drastically when the estimation was close to an edge of the hand.

Using the standard deviation

Another method tried was using the standard deviation of the particles. An assumption was made that if it was below a certain threshold then a hand was detected.

This method proved to be very robust even though it still had a weakness. If the noise in the image was not properly removed the tracker could be following wrong objects and thus mistakenly detecting a hand.

Motion detection

To start the tracker a basic motion detection of the hand could be used. However, one could argue that some automatism and convenience is lost.