Since iOS 11, Apple has stepped into the AR revolution with the introduction of the ARKit; a set of dev tools that allow creating Augmented reality apps for iPhone and iPad devices.
While the average user might never actually use ARKit, you have at one point or the other used apps that make use of the technology and experience the results firsthand.
What is Apple’s ARKit and what does it offer?
ARKit works by placing three-dimensional images in your world using what’s called visual-inertial odometry. The built-in sensors track your world and pinpoint the device’s orientation and location relative to the scene you’re looking at.
The beauty of AR applications is that they allow you to integrate virtual content within the world around you. With ARKit, you don’t need any special hardware to run AR apps, just an Apple device with the A9, A10, A11, or higher Bionic processors. This means that if you have an iPhone 6 and above or an iPad, then you can easily run AR apps.
A practical use case for this would be the Ikea’s demo store that showcases the ability to try out various furniture within any given space. This allows you to browse their catalog and see what’s available and how it would fit within your own space.
With the release of iOS 11.3, users are able to place virtual objects against walls and doors, as well as recognize irregularly shaped objects in the real world.
How does face detection algorithm work?
One of the key updates in ARKit 3.0 is the improvement of face detection, but before we delve into that, we must first understand what that is. Face detection is a special type of object recognition, but here, the task is to find the locations and sizes of all objects in an image that belongs to a given class (e.g., a human face).
“Face-detection algorithms focus on the detection of frontal human faces. It is analogous to image detection in which the image of a person is matched bit by bit. Image matches with the image stored in the database. Any facial feature changes in the database will invalidate the matching process”- Wikipedia
When a candidate image is presented, it is first normalized to reduce both the lighting effect caused by uneven illumination, and the shirring effect due to the motion of the head, after which each bit is then matched and processed.
How is ARKit being used for face tracking?
One of the cool new features in ARKit is the ability to track your face in real-time. This is due to the presence of the TrueDepth camera, but we will talk more on that shortly.
According to Allan Schaffer on Apple tech talks:
“ARKit turns its focus to you, providing face tracking using the front-facing camera. This new ability enables robust face detection and positional tracking in six degrees of freedom.
Facial expressions are also tracked in real-time, and your apps provided with a fitted triangle mesh and weighted parameters representing over 50 specific muscle movements of the detected face. For AR, we provide the front-facing color image from the camera, as well as a front-depth image.
And ARKit uses your face as a light probe to estimate lighting conditions, and generates spherical harmonics coefficients that you can apply to your rendering.“
TrueDepth Camera
The key to ARKit’s face-tracking technology is the TrueDepth camera. It can be considered one of the most intriguing features of Apple’s new flagship phone, the iPhone X.
Face ID replaces the fingerprint sensor and TouchID with facial recognition. This is made possible by Apple’s new TrueDepth front-facing camera system. TrueDepth also enables Apple’s new Animojis and other special effects that require a 3D model of the user’s face and head.
Let’s take a look at how the TrueDepth camera and sensors work.
There is a lot packed into this little camera. TrueDepth starts with a traditional 7MP front-facing “selfie” camera. It adds an infrared emitter that projects over 30,000 dots in a known pattern onto the user’s face. These dots are then photographed by a dedicated infrared camera for analysis.
There is a proximity sensor that we would assume is used by the system to know when a user is close enough to activate, plus an ambient light sensor that helps the system set output light levels.
Apple also unveils a Flood Illuminator. But we are yet to get an explicit explanation of what it does.
However, it would make sense that in low light flood-filling, the scene with IR would help the system get an image of the user’s face to complement the depth map. This would explain how it could possibly work in the dark. IR can also be used to pick up sub-surface features from skin, which might also be useful in making sure masks can’t fool the system.
ARKit 3 improvements
We see a number of improvements and new features in version 3.0 of ARKit, and some of them bring new life to the toolset.
People occlusion
With ARKit 3.0, AR content realistically passes behind and in front of people in the real world, making AR experiences more immersive while also enabling green screen-style effects in almost any environment.
Motion capture
Capture the motion of a person in real-time with a single camera. By understanding body position and movement as a series of joints and bones, you can use motion and poses as an input to the AR experience — placing people at the center of AR.
Simultaneous use of the front and back camera
Now you can simultaneously use face and world tracking on the front and back cameras, opening up new possibilities. For example, users can interact with AR content in the back camera view using just their faces.
Collaborative sessions
With live collaborative sessions between multiple people, you can build a collaborative world map, making it faster for you to develop AR experiences and for users to get into shared AR experiences like multiplayer games.
AR Face Geometry
This is a class within the ARKit that provides a general model for the detailed topology of a face in the form of a 3D mesh appropriate for use with various rendering technologies or for exporting 3D assets.
When you obtain a face geometry from an ARFaceAnchor object in a face-tracking AR session, the model conforms to match the dimensions, shape, and current expression of the detected face.
You can also create a face mesh using a dictionary of named blend shape coefficients, which provides a detailed, but more efficient description of the face’s current expression. In an AR session, you can use this model as the basis for overlaying content that follows the shape of the user’s face—for example, to apply virtual makeup or tattoos.
You can also use this model to create occlusion geometry, which hides other virtual content behind the 3D shape of the detected face in the camera image.
The possibilities for Devs
Imagine you are at a significant corporate event with colleagues from different offices getting together. You want to use this occasion to personally meet a colleague with whom you have been exchanging emails while working on the same project, but how can you find that single individual amongst over two hundred people?
To make matters more complicated, that person might be dressed up more than usual and look a little different from normal. Or take a scenario where you are trying to validate guests that are actually invited to an event from those that are just crashing one. Or better yet, how about making sure that a speaker is always in focus when they are moving all over the place.
All these are instances where good facial tracking and recognition would make a world of difference. But we are not talking about just being able to recognize those around you – which is a plus – the technology can be used from the game industry to security to entertainment.
Conclusion: why is ARKit 3.0 worth trying?
The kind of business that can benefit from facial recognition is one that involves people, be it in terms of security, entertainment, or just social interactions. This basically means every business you can think of.
While the current uses of facial recognition seem to be championed by governments, large businesses, or tech startups, there’s no reason why your business can’t benefit from it. Just the use of ID verification alone would improve security for any business establishment.
The possibilities are truly endless when we combine a little bit of creative thinking:
- greeting and identifying customers in a hotel, locating your friend in a sea of people,
- finding people with similar faces (maybe to be used as actors),
- detecting personalities for job interviews (again, we’re just letting the imagination run wild here; there may not be anything substantial in such a study),
- customizing banking experience when a high-value client walks in..
There are many ways to use facial recognition to expand the scope and user experience of your business.
Pretty soon, facial recognition will become so widespread and so common that we won’t even notice it. The underlying technology has been nearly perfected, but in the real world, it’s not just about detecting faces—it’s about what we can do with that ability.