Site icon Kaptur

The iPhone X deep-dive – Why this phone will disrupt the mobile imaging ecosystem

iphone X

Key takeaways:

Finally, the iPhone X

After months of speculation, on Wednesday Apple unveiled its new flagship phone, the iPhone X. This new addition to the Apple family marks the 10th anniversary of the iPhone, the device that changed the world at a scale and in a time span, no one, not even Steve Jobs, could have foreseen.

What makes the iPhone X unique isn’t its design (I am personally of the opinion that Apple’s design has been uninspired for years), or even its massively immersive 5.8″ OLED screen – it is the machine learning, depth imaging sensors and processors that set it apart and justify the phone’s $999 entry price point. The iPhone X is a groundbreaking device that gives us a window into the future of mobile imaging.

The media has already compared the iPhone X to a concept car, a comparison that neither holds up nor does justice to Apple’s achievement. A concept car shows us a future that we won’t be able to enjoy for the foreseeable future, if ever. The iPhone X’s future is November 3, when the phone goes on sale.

 Visual communication – wasn’t that the domain of camera makers?

The future of the smartphone lies in capturing and communicating the full spectrum of emotions associated with human communication. The iPhone X is an important step towards this future. Camera manufacturers woke up to a new reality today (which was years in the making) and should be truly concerned by now. The smartphone already ate the entry level point-and-shoot market and is now dipping its toes into the high-end point-and-shoot and DSLR territories through truly differentiated imaging capabilities that leverage the computing prowess of its maker, the richest and biggest tech company in the world that also happens to excel in design. And you can bet that Google and its ecosystem of Android manufacturers will follow suit with the next generation of Pixel, Galaxy, and other phones.

Computational imaging + full hardware control

In this world of increasingly rich communication, computational imaging is the big game changer. Thanks to the iPhone X’s custom hardware chips and powerful algorithms, consumers can capture the world with uncanny detail and depth. Apple offers this capability by increasingly taking full control of its own hardware stack (starting with the most critical chipsets), and now by designing its own CPUs, GPUs, performance controller, and wireless, motion, haptic and machine learning chips. More importantly, Apple is the only company that controls both hardware and software across the board and is thus capable of offering vertically integrated hardware, resulting in synergies which are the envy of most other tech companies.

 

Apple highlighted many of the new features on stage:

●     Two performance cores

●     Four high-efficiency cores

●     2nd-gen Apple-designed performance controller

●     Apple-designed GPU

●     Apple-designed ISP

●     Apple-designed video encoder

●     Neural engine

●     Secure enclave

All of these hardware ingredients are Apple-designed; the ones that Apple highlighted are the ones announced this week.

The neural engine is the most interesting of all these chips since it will accelerate Apple’s machine learning algorithms, which are increasingly the core of the iOS experience. The neural engine is responsible for the iPhone X’s most innovative features: Face ID, Animoji, and some of the new imaging features that rely on neural networks. The main benefit of a dedicated neural engine is that the AI is optimized for running neural networks on the device itself, thus greatly speeding up the process and the quality of the results, and reducing the battery load.

Depth – the next battleground

The most important feature of the new iPhone is depth. Although the iPhone 7 Plus already offered a Portrait Mode, as was covered in Mobile Photo Perspectives before, the iPhone X doubles-down on depth-measuring capabilities and applications.

Our understanding of the world is transitioning from a 2D to a 3D paradigm, and Apple is leading the way. Technologies like AR, VR, robotics, drones, and self-driving cars are all manifestations of this shift from a 2D to 3D world. These technologies all use a 3D model and understanding of our world to deliver specific value to the consumer. Aided by massive amounts of VC money, I have no doubt that we’ll see rapid and ongoing progress in how devices will capture and make sense of 3D data, including virtually real-time scanning, analysis, and 3D model computations. Imagine a world in which Google Street View updates in real-time, as soon as it is fed new photos.

Two-years ago, Google announced Project Tango, an AR computing platform that specifies how mobile devices can leverage advanced depth-sensing sensors to map the world in point clouds, so that apps could add AR content when the user views indoor environments, such as retail stores, through their Tango-type smartphone. The concern of many observers has been whether Project Tango could be appealing enough to attract hardware device manufacturers to develop these types of phones. But guess what? The new iPhone X is Apple’s Project Tango phone. It features similar tech and also costs you an arm and a leg. Google may have led the way and made the first move, but Apple gets it right and is moving full-steam ahead by making its Tango-like device available to millions. Would you rather buy a Lenovo Phab 2 or an iPhone X, all else being equal? Ultimately, the iPhone X playground will be a fruitful one that will benefit the tech industry at large.

Here’s a more detailed overview of all the imaging features that depth technologies, combined with machine learning, make possible:

Depth Sensors – front + back

While the iPhone 7 Plus already had a dual-camera on the back with depth-sensing capabilities, the iPhone X features a front-facing sensor bar chock-full of state-of-the-art technology, including a 7MP TrueDepth camera. The impact of a front-facing depth camera cannot be understated. It is the camera that we use for shooting selfies and doing video chats. For many people, the front-camera is more important than the rear-camera(s). Ask Snapchat, which has already released filters optimized for the iPhone X selfie-focused front camera.

 

The depth bar uses technology originally developed by PrimeSense, an Israeli imaging company that made the Kinect camera that launched with Microsoft’s Xbox 360 video game console. The Kinect was a relatively bulky device the size of a portable speaker, and it is amazing that this technology now has been shrunk to fit into a 7mm-thin smartphone. A tiny space now houses some of the most sophisticated technologies, including the cameras and sensors that enable Face ID and many of the most innovative features of the device.

One app that makes good use of the TrueDepth camera is Apple’s own Clips app. I have always said that this app was a precursor to the company’s AR efforts and it appears that this is indeed the case. The new Selfie Scenes on iPhone X allow you to immerse yourself in 360-degree animated landscapes, making use of your 3D model (again). This also hints that the TrueDepth camera will be incredibly useful for Augmented Virtuality (AV) applications in which a real person or object is projected onto a predominantly virtual environment. More about this in our upcoming AR report! (contact us if you would like to know more about this report).

Portrait Mode – improved + front and back

The Portrait Mode feature enabled the iPhone 7 Plus to emulate the bokeh and shallow depth of field of expensive lenses by making smart use of its depth sensors and algorithms. The iPhone X will double-down on this feature. Not only is the Portrait Mode improved with better low-light sensitivity, but the real game changer is that it is now offered through the front camera as well. Soon you will be taking your selfies with beautiful bokeh in the background, less noise, and professional lighting.

Portrait Lighting – post-production lighting effects

The biggest highlight (no pun intended) is a new feature named “Portrait Lighting,” which puts both the Quad-LED True Tone Flash and the new neural engine to work to emulate professional photographic lighting. To do so, it leverages the new machine learning chip and powerful algorithms, trained on real-world data of how professional photographers use light in their pictures. Portrait Lighting allows users to apply different lighting styles to the already impressive “simulated bokeh portraits,” which we know from the iPhone 7 Plus. The initial modes are Natural Light, Studio Light, Contour Light, Stage Light, and Stage Light Mono. Apple uses the depth image of the camera sensors to build a 3D model of the subject and then analyzes it through neural networks, after which you can apply different lighting styles as if they were Instagram filters. Your $1000 iPhone is now emulating flash lighting setups that normally cost thousands of dollars. Portrait Lighting is a major breakthrough in mobile imaging and one of the signature features of the iPhone X.

Another new professional photography feature is Slow sync flash. This is a function already found on many cameras that tell the camera to shoot with both a longer shutter speed as well as firing the flash. With Slow sync flash, you get the best of both worlds: a sharp shot of your main subject, combined with ambient light coming in from the background and foreground.

Compression – more than a nice-to-have

The same kind of wizardry is applied in Apple’s HEIF and HEVC compression algorithms. Apple’s keynote mentioned 4K video at 60FPS and 240FPS to generate slow motion videos that make use of Apple’s machine-learning to divide the image into thousands of squares that are analyzed on the fly to optimize the compression. The result is the highest possible image quality with the lowest possible storage and bandwidth requirements. Why is this important? VR and AR are around the corner and require massive amounts of bandwidth and storage to offer rich and immersive experiences. For more about on-device and in-cloud compression through Apple’s new compression standards, hear from Beamr President, Eli Lubitch, at Mobile Photo Connect.

Augmented Reality – making it look real

Although the iPhone X is not marketed as an AR device, it was definitely designed with AR in mind. And I agree with tech website The Verge, “Apple might be right to undersell augmented reality.” As mentioned earlier, the iPhone X is Apple’s answer to Google’s Project Tango project. Most of the (imaging) technology in the iPhone X is going to be a boon for high-quality AR experiences on the device.

For instance, Apple demoed a new version of Snapchat to show that anchoring of any virtual content on photos and videos will now be leagues above what is feasible today. Since AR content is anchored to your face, there is a chance that when you shake your older iPhone, the imagery looks artificial and glitchy. Not only does the iPhone X avoid generating these artifacts, but the mixing of realistic and augmented content also looks much more real. Unlike the current Snapchat filters that look 2D and seem to be stickers on top of your image, the new filters are more akin to masks that are wrapped around your face in 3D, thus appearing more natural and realistic. This is all made possible by the new front-facing depth-measuring sensors, which acquire a 3D representation of your face in incredibly high detail.  App makers can leverage this for better content positioning.

Face ID – more than just a way to unlock your phone

 

Another signature feature of the new iPhone is Face ID, the successor to Touch ID according to Apple. Face ID is enabled by the TrueDepth camera and is as simple to set up as TouchID. It projects and analyzes more than 30,000 invisible dots to create a precise depth map of your face. Much has been said about the security advantages of Face ID over Touch ID, but since this article is about the phone’s imaging-related features, I just want to highlight the technical sophistication of Face ID, which uses advanced algorithms and the new neural chip to make a 3D point cloud of your face, including your facial expressions and appearances (e.g., with or without a hat, glasses, makeup). The system is self-learning, so when you change your look, let’s say you grow a beard, it learns this, i.e., you don’t have to set up the feature again with a new 3D model.

Animoji – Snapchat Lenses on steroid

While Face ID is mostly a security feature, the same technology is put to use to make communication richer, more emotional, and fun. Apple’s first attempt in this area was with the Apple Watch, which allows the user to send “haptic messages,” most notably your heartbeat, a custom-tapped rhythm on the screen, or custom doodles. Apple Watch also debuted the first animated emoji that bounced around when they were viewed.

Apple is taking all of this to the next level with animated emoji. Animoji, as they are called, are emoji that are animated with human-like expressions that can communicate our full emotional spectrum. This is achieved by scanning your facial expressions with the depth camera (the same used for Face ID), detecting your emotions, and mapping these into the Animoji. Like the Portrait feature with the iPhone 7 Plus before it, it launches in beta with the new iPhone X. It analyzes more than 50 different muscle movements to mirror your expressions in 12 different emoji, including the panda, robot… and poop emoji. This way, Apple leverages the technology it acquired from FaceShift in 2015, a maker of motion-capture technology known for their Hollywood work, including the work done for Star Wars.

A user can create Animoji by pointing the front-facing camera of the iPhone X at their face, hitting record, after which the phone’s depth sensors then record a 3D point cloud of the face. This 3D point cloud is then mapped to the Animoji and made resemble the face’s emotion. It even records the audio when you speak, performing full lip-sync in the process. If Snapchat’s Lens feature is any indication, I expect Animoji to be a huge success, with users creating 3D avatars of themselves that reflect their emotional state and facial expression.

Dual-stabilized dual camera – boosting the capture quality

 

What would otherwise be a highlight in a new iPhone announcement is now a mere footnote: the new iPhone will have optical image stabilization on both of its rear cameras. Both the 12MP f/1.8 wide lens and 12MP f/2.5 telephoto lens have a faster aperture; both sensors are larger and faster, have deeper pixels, and come with a new color filter.

Parting thoughts

All of these iPhone announcements really make you wonder: why is this technology available on a $1000 iPhone but not on a $5000 Nikon or Canon camera? Have the traditional camera manufacturers lost their edge? And are they now lagging so far behind that it is doubtful that they will ever be able to catch up? The main advantage they have always had was in optical engineering, but this is being reduced by the sheer computational power of phones such as the iPhone X. When thinking about the leading (consumer) imaging companies these days, you don’t think first and foremost of Japan – it’s the US, in most cases Silicon Valley to be precise. Companies like Apple, Google, Microsoft, Facebook, Amazon, Qualcomm, and Nvidia are all pushing the boundaries of computation-heavy imaging, in many cases leveraging the steep investments made to develop self-driving cars. Computational imaging is the future.

One thing that keeps coming back to my mind is privacy. In the next three years, when this technology trickles down to the entire iPhone line, Apple will scan the faces, appearance, and emotions of a billion people on earth, dozens of times a day. What are the privacy implications of this? Is technology getting too intimate? Apple says it does not gather any of this data by saving everything locally on the device, but one still wonders what is happening to all the metadata that is used to train the algorithms.

Related to this is our medical and mental condition. Many health problems can be detected by monitoring changes over times in our facial expression, eyes, and voice. Is this where Face ID technology is heading? Apple’s depth technology could potentially be of great value to the medical field in the future. Apple announced on stage that it is working with Stanford researchers on an app to detect abnormal heart rhythms. And that will only be the beginning. Apple is poised to transform healthcare in the next decades with a bottom up approach, starting from the user experience and focused on prevention. Imaging technologies and neural networks will play a major role!

***

And a few more things…

C+A GlobalPolaroid and Kodak brands reunited in the instant printing business: in addition to licensing the Polaroid brand for its instant print cameras, C+A announced a Kodak-branded instant print camera. Hear more about C+A Global’s instant print camera strategies, as well as its portfolio of other photo businesses, from CEO Chaim Pikarski at Mobile Photo Connect, Oct. 24-25 in San Francisco.

Albelli. Albelli buys ReSnap. Auto-curated instant photobooks have a great potential to expand beyond the traditional photobook market. So it is not a complete surprise that Dutch photobook provider Albelli/Albumprinter – in the process of being spun off from VistaPrint/Cimpress — acquires fellow Dutch company ReSnap.

Lightricks. After Facetune, Enlight and Enlight Photofox, Lightricks is at it again with Quickshot, leveraging Lightrick’s imaging expertise through a camera app – and yes, another subscription-based app! Here more about the latest and greatest Lightricks apps at Mobile Photo Connect, Oct. 24-25 in San Francisco, when co-founder Itai Tsiddon will participate in a special fireside chat session!

Author: Floris van Eck

Floris van Eck is a technology strategist, visual culture anthropologist and speaker on emerging technologies. He is the co-founder of Imaging Mind and Notilde. Imaging Mind is a visual culture community and futurist agency dedicated to uncovering the future of imaging and how it manifests itself in technology and society. Aiming to build an ‘Imaging Mind’ of connected nodes. Notilde helps organisations explore and navigate new technological frontiers at the intersection of culture and technology. They do this through a combination of investigative journalism and experiential content. Their narratives provide insights outside the radar of traditional R&D.

Exit mobile version