Posts Tagged ‘virtual reality’
A quick peek behind the curtain: Position detection, “Where are you?” (Part 3)
Hi everyone, and Happy new year to you from 3DCalifornia!
First of all, we would like to thank you all for this tremendous year 2010 we had, and wish you the best for 2011! And because we love 3D, we did a small demo for you, using our partner’s technology D’Fusion, and it is available here. Feel free to try it and tell us what you think of it!
Let’s open our eyes
So we were on this series of articles about position detection, and this episode is supposed to be showing how computer vision can be used for that. Here we go!
First things first, what is computer vision? We explained briefly in one of the previous episodes what light was and what colors were. Our eyes can perceive lights and colors, and that’s mainly what they do. Then they send the information to the brain where lights and colors (low level information) become distances, objects, faces (known, unknown), words, etc (high level information).
Computer vision is the domain that studies the algorithm a computer needs to see, and to see high level information. To recognize that 2 objects are identical in 2 different images, to recognize a bunch of pixels as a tree or a bike, to recognize a face in an image and to know that it’s someone’s face in particular. And as this is the subject of this article, to determine the position of an object inside an image.
Two birds with one stone
So, why should we use computer vision for position detection? Because in most cases, we already have all the hardware up and running: we’re trying to do augmented reality, and for that, we need to add virtual objects to a live video stream. So in most cases, we already have a camera. This single piece of hardware will be allowing us to do both reality sampling and position detection. No need for expensive magnetic device, no need for infrared lights. Just a camera and a computer.
A computer vision algorithm can be more or less complicated, but it will usually rely on one thing: the value of the pixels. We have a pixel. Its color, or brightness can be quantified. So if we replace all the pixels by their numerical value, we now have a grid of numbers, a matrix, and mathematics are really good at taking information out of matrices, so that’s all. We throw some maths at our digital image, and we get the info.
So what’s the battle plan? We have an incoming video stream and we want to output its coordinates, as fast as possible. There are multiple solutions, and most of them will differ by 3 aspects: Assumptions, Learning and Running. In order to find your favorite teapot, you first have to know what a teapot is, what your favorite teapot looks like and then you need to look everywhere and figure out if you can see it. The steps are the same here.
“The least questioned assumptions are often the most questionable”
The quote is from Paul Broca. The question is “what do we know about the object we want to search”. For example, if it is a building, we know that we will probably see a lot of horizontal and vertical lines. If we’re looking for an old augmented reality marker, we will see a thick black square with black and white squares inside. And with Total Immersion’s technology, Markerless Tracking (MLT), only few assumptions are necessary, only considering we will be seeing an image that has no symmetry and that has some contrast in it. So first, we decide what kind of assumptions we keep. The larger the assumptions, the easier computation will be, but the more constraints it will create.
“The moment you stop learning, you stop leading”
This quote of Rick Warren will sure be a good introduction. Learning is the phase where we give the algorithm a way to differentiate any image that fits the assumptions and the image we’re specifically looking for. With MLT, for example, it is the step where we give the algorithm a clear view of the target we’ll be tracking. In this phase, which is usually not real time, we will extract features (middle level information) from images such as borders, corners, interest points or keypoints.
Keypoints are points that have a special property (usually a mathematical property), and this property is chosen to be stable, which means that when the object moves within your video stream, these properties will stay with the objects and still be visible. During the learning phase, you learn how to position the keypoints from one another. And now you’re ready for the race.
Running… For president?
So this is it. Now, we have an user in front of the camera, and we need to get the position of a target he has in his hands. So what do we do? We use the algorithm we prepared. If we learned the positions on the corners, we’ll be analyzing the image, searching for corners. If we learned the keypoints, we’ll look for the keypoints. It’s easy to determine that the image we’re looking for is indeed in the video stream. The hard part is to determine where. That’s where some clever filtering and modeling algorithms like RANSAC take place. RANSAC (for RANdom SAmple Consensus) take some data as an input (let’s say the points in the image A), and a model (let’s say a line) from which we want to find the parameters (parameters for a line will be height and slope for example), then will look for the points that fits the model the best, and will completely ignore the points that don’t fit it. It will then output the good points, and their good model (the blue line).
In an exact similar way, given all the keypoints on the image and a model (the keypoints from the learning, the unknown parameters will be their position and orientation), and RANSAC gives us the proper model (position and orientation) that fit our object’s keypoint, ignoring the background keypoints.
Conclusion
Pfew, that was quite a trip, wasn’t it? We did it! We now have the position and orientation of our object in the video stream. And, even better: now we know the position in this image, it will be even easier to find it in the next image, because we know it can’t have moved that much. We can make new assumptions, and new assumptions mean it’s easier to compute, which means it runs faster!
So that was a few selected ideas of how mathematics and their applications in computer vision can really make your life easier when you’re trying to augment reality. And this concludes this three-parted Quick peek behind the curtain article. I hope you liked it! If you want me to tackle a specific subject on Augmented and Virtual technologies next time, feel free to drop a comment, a mail, or anything.
A quick peek behind the curtain: Position detection, “Where are you?” (Part 2)
And here we go again on our trip to understanding the coolest technologies for 3D and Augmented Reality, and today’s subject will be, again, tracking. We have so much to say that the last article was nearly not enough. Before all, I would like to thank you for the feedback on these articles. And now, let’s go !
Actually, three of the things I’m going to talk about today are present in a single small object, and I’m talking about that :
I see Infrared waves
Did you know that the Nintedo Wiimote is equipped with one of the best infrared camera you can find on the market for this price? So you may wonder why there is an infrared camera, and maybe even what an infrared camera is… So let’s start from the beginning: The human eye can see colors. But what is a color? It’s an electromagnetic wave whose wavelength is between 400 nanometers and 800 nanometers. (1 nanometer is a billionth of a meter). Wavelength is linked to frequency (the lower the wavelength, the higher the frequency). Red has the lowest frequency of all colors, then, the order of colors follow the rainbow and finishes with violet. Frequencies above violet can’t be seen by the human eye. They include X rays and Gamma rays, and are called ultraviolet. Frequencies under red can’t be seen either. They include the waves use to listen to the radio, micro-wave from your oven, Wi-Fi and cell phone waves, and, as you have probably guessed, they are called infrared. But if the eye can’t see them, some kind of cameras can, it’s a normal source of light that can be seen. And this kind of camera is mounted at the front of your Wiimote. What is it starring at? The small device Nintendo called the “Sensor bar”, which is not a logical name, as it has no sensors in it. The sensor bar is composed of 2 infrared LEDs. When it sees them, it can measure the angle between them and the camera, thus deriving it’s position compared to the sensor bar.
A cool feature of infrared is that it is linked to heat. Whenever a body is producing heat, it will emit infrared waves, and your body is creating quite an amount of heat, so if you touch a flat surface that has an infrared camera on the other side, the camera will see an infrared point where you touch it. This can be used for multi-touch screens. The video below shows how to create a DIY touch screen with a normal computer, a wiimote and a few things.
Move with style !
Let’s continue with 2 other features in the Wiimote : the accelerometer and the gyroscope.
The universal gravitation law states that the Earth applies a force on everything that has a mass, pulling it toward our planet’s center, which is called gravitation. Newton’s law states that this force creates acceleration, going to the same direction. So if we measure the acceleration in 3 directions, we can find which one is “the direction to the center of the Earth” which is usually called “down”. So all we need is an accelerometer, and that’s good because accelerometer is easy to build. We simply use a piezoelectric material, like Quartz, which is a kind of material which generates an electric tension when a force is applied. And that’s it. Measure the tension, you have the force, measure the force in the 3 directions and you know where is “down”, so you know your orientation. Well not exactly, because you still miss some orientation information. And that’s where the gyroscope comes in handy. This small device is mainly a wheel that is rotating fast. And calculus says that the faster the wheel turns, the more difficult it will be to change its rotation axis, so you can have an object that will keep the same orientation even if the things it’s attached to turns. With the gyroscope and the accelerometer, you can have your complete orientation.
If you want to know more about the wiimote, you can try there.
And if you want to see the power of a gyroscope, watch that :
So there will be a part 3, for Computer Vision, the coolest of the coolest technology out there (in my humblest opinion). Stay tuned! Goodbye!
A quick peek behind the curtain: Position detection, “Where are you?” (Part 1)
Ah, there you are! It’s time for another short article on the insides of augmented (and virtual) reality techniques. One of the big challenges is that we need to put in the exact same position our real camera and our virtual camera when trying to merge real and virtual objects. This means that our real camera must be able to communicate its coordinates from a reference object, which will have a virtual representation. So the point is: given a real world (let’s say, ours, for example) we need the coordinates of a moving object. This position detection, or tracking, how can we do that?
I, robot arm
The simplest solution is to use the object robot arm equipped with angle or slide sensors. Moving the object will change the angles and distances measured, which can be detected by a computer that will update the virtual object’s position using the new values. It can even be double sided, if your robot arm is equipped with sensors AND motors, then it can have a force feedback, to prevent you from entering solid (virtual) objects. You can see an example below of a haptic (i.e. linked to the sense of touching) device. The main problem we have is a quite limited range, because the arm is usually expensive, and building an arm that has more than 50 cm^3 of liberty of movement is not easy.
A whole new (magnetic) field
Another way to detect an object’s position is using a magnetic field. There are different ways to do it, but many of them are really similar. The plan is to generate a magnetic field with an electromagnet, and to measure the intensity of this field along the 3 directions of space: the closer to the source, the more intense. And we can have the distance from the source to the sensor in all 3 directions, thus giving the coordinates, et voilà. How can we do it? This is a question of physics: if we make a small circuit that has a coil in it, then when there is a moving magnetic field, there will be electricity in the circuit, and we can measure the amount of electricity. It’s the same principle that we use to generate electricity in power plants. But there is a problem. We need to know the shape of the magnetic field we generate, and it’s highly dependent of the environment we’re in, and especially the presence of metal objects. So this is a great system, but if someone brings a metallic chair to watch it, it will not work anymore. And how can I talk about magnetic tracking without mentioning another device that has been used for thousands of years: the good old compass, which uses the earth magnetic field and a natural magnet to point North and help lost travelers. Well even this old trick found its way to Augmented Reality:
Let’s throw our friends into outer space!
While we’re talking about compass, we may also consider another (more recent) technology that can be used for tracking, and that has a huge range, it’s Global Positioning System, better known as GPS. Imagine. You’re on a road, but you don’t know where. You have a friend, on the 30th mile of the road. If you know that you’re 10 mile away from him, then you’re either on the 20th mile or on the 40th. Now if you know you have another friend on the 50th mile, and you’re 10 miles away from him too, well, you know where you are. And you’ve just created a one dimensional GPS. For a three dimensional one, you need 4 friends. So let’s say your friends are satellites revolving around the Earth, and their position is known. If you can tell how far you are from each of them, then you can derive your own position. So your 4 satellite friends, who have very precise clocks in their electronic parts, will send a message containing the current time, and this message will travel through space at the speed of light to get to the GPS device in your car. So when you compare the message you receive and the current time, there is a slight difference due to the travelling time, and knowing the speed of light, this time difference can become a distance to the satellite. And thanks to your space friends, you now know the closest path to the nearest grocery store! GPS is used for most of the Augmented Reality features we can find in iPhone apps today, when it points a direction and a distance, it uses GPS. But there are some limitations to this principle. GPS is not very accurate for most devices, and even if you can get your position with an error of less than a meter, many Augmented Reality applications require something more like a centimeter or even a millimeter. And GPS does not give you your orientation, so even if you use a compass, you will still need other informations to have all the informations we need. Pretty much all we can do with a GPS alone for Augmented Reality is something like that :
So that’s it for today, but the “part 2” will soon be there for you 3D fans. We’ll be talking about infrared light, accelerometers and gyroscopes and finally Computer Vision. Stay tuned !
From Virtual Reality to Augmented Reality
There’s more to remember from 2009 than the economy slowdown. Techrunch Co-Editor Erick Schonfeld quoted a very interesting chart showing that from a Google Trends point of view, « Augmented Reality » just passed « Virtual Reality » a couple of months ago.
Source : Google Trends
Actually, this Google Trends chart shows many different facts; let’s go through some of them.
When Press Magazine plays with AR
The men’s magazine Esquire, dedicated its cover of the December edition to the actor Robert Downey Jr.,on this occasion the magazine propose an AR (Augmented Reality) Issue including several AR experiences for the reader.
Choice was made for big black & white markers on each targeted page of the magazine, and for exe download with no clear information of the Publisher in the Windows dialog box. (download from Esquire website) So far, the interaction with the content is very simple and having celebrities involved in these kind of initiatives is always exciting.
Another style of AR experience, with www.instyle.com (belongs to Time Inc) and still involving celebrities … This time, we have marker less technology – your webcam will track nice images directly from the magazine – and you download a well identified pluggin (ActiveX in a browser).
Who will be next ? Which magazine ? Which celebrity ? Let me know your thoughts …









