It might have all started when Snapchat rebranded itself “A camera company”. Or even long before, in 2002, when Nokia decided to put a camera on their cell phones. Whenever it happened, it is still really only the beginning…
“It” is the moment the camera became forever separated from its original format, a lens on a movable box, and its original intent, recording personal memories. The shift was triggered when film became digital and thus transforming images into data files. Not only could images be easily and instantaneously transferred from one point to another, they could be parsed, modified, analyzed and interpreted. From a dumb copying machine, cameras became a data collecting device. They now create documents for processing. And while they can continue to record memories, they are also now used to communicate (Snapchat, WhatsApp, Instagram), analyze (content recognition), identify ( face recognition), classify (inventory analysis), monitor (Industrial QA) and drive vehicles ( self-driving cars, trucks, and boats). Cameras as data collecting devices are becoming so prominent that, according to a new study released by LDV Capital, there will 45 billion of them by 2022. And that’s a conservative estimate…
As technology continues to progress and more of our tools become interconnected, so does the need to include context in order to process information accordingly. And as natural evolution has already proven, there is no better way than vision.
Take a fridge, for example. Currently, they sit dumbly in a kitchen. In order to be really helpful, it needs to know 2 things. Who is using it and what type of product it contains. From there, it can not only start adjusting its temperature accordingly but predict when an item will run out (and order more), when another one is about to turn sour and even which 2 or 3 items would taste great together, especially for that one person that seems to love sweets. It can only do this if it uses computer vision. There are 8 million fridges sold in the US each year. If each has minimum two cameras (one outside, one inside), that’s already 16 million cameras added each year.
In the LDV Capital study, over 30 industries, including IoT, were identified as having or expected to have products with embedded cameras in the next five years. Mostly driven by 3D/Depth capture, which is estimated to be responsible for the greatest increase of cameras by 2022. Any self-moving machine needs depth perception, like cars, trains, boats, robots or drones. As well, any virtual reality enabled device. Manufacturing, farming, medical, scientific, law enforcement, who seek to not only understand content but its position in absolute and relative space.
In fact, continues the LDV Capital study, there is no real AI without computer vision: ” Major technology companies and startups are at war over having the most valuable artificial intelligence. At the core of this war is possessing unique, high-quality visual data. This battle will be won by owning the connected camera.” Which is both the challenge and the Holy Grail. None of the data collected by these 45 billion cameras will ever be seen by a human being, ever. And most of it will be useless, even for a machine. Thus behind this explosion of “visual data recording devices” or VDRD’s is the race and challenge to process, analyze and interpret this massive amount of data at scale. At stake, a market estimated to be at 38.92 Billion USD by 2021.
For now, Image Recognition can do a good job at identifying objects, people, and locations and is already used in many business-critical operations. But as the demand grows and deep learning evolves, IR will become instrumental in the operation of all of our devices, a bit like electricity is currently critical to our devices today. There will be 45 billion cameras running with all their data being instantly processed via image recognition.
But as the study points out, there are still major hurdles to pass:” Extensive computing power is needed to teach a data model how to make predictions based on high-quality visual data signals. The enormous computing power and time required to train deep learning models cannot yet be done on a device and is being developed by cloud computing services. It is still incredibly expensive.” Remains to be seen if it can be solved within the next 5 years or if these 45 billion cameras will need to sit idle until processing can catch up.
LDV Capital full study can be downloaded here
Photo by MarcosCousseau
Author: Paul Melcher
Paul Melcher is the founder of Kaptur. He is an entrepreneur, advisor, and consultant with a rich background in visual tech, content licensing, business strategy, and technology with more than 20 years experience in developing world-renowned photo based companies with already two successful exits.