Every year one enters the hallway of the SVA Theatre in the heart of New York thinking that there is no way the organizers of the LDV Vision Summit can outperform the previous year. And every year you step out two days later, happily disappointed that your prediction was wrong. The combination of the organizing team’s ( Evan Nisselson, Serge Belongie, and Rebecca Paoletti) talent to surface captivating speakers and the topic ( visual tech) seemingly boundless reach into almost every vertical, makes this event a feast for the eyes ( of course), the ears and most importantly, the brain. The sheer volume of brilliant intelligence and knowledge gathered for two days, even amongst the attendees, is reason enough never to want to miss this event.
The 2018 edition of the LDV Vision Summit was, like its four predecessors, a refreshing display of creative ingenuity, stunning engineering and brilliant problem-solving. With successive keynotes never longer than 5 minutes each, subtlety interrupted by more extended fire chats or panels, there is no room for boredom. Even if a topic or speaker might not be in your field of interest, it is not long before another will keep you glued to your chair and wanting to know more. Which, thanks to the casual and friendly networking, is not hard to do. Speakers are all available to explain more if needed and do business, if appropriate. No superstars here leaving in limos the minute they step out the stage. Instead, a succession of people passionate about their work as much as the audience.
The topics varied wide and broad with some making a frequent appearance in different talks. The star of this edition was undoubtedly the Internet of Eyes. Coined by Evan Nisselson a few years back, the internet of eyes is the realization that cameras will be everywhere if they are not already. From our manufacturing plants to our satellites, from our cell phones to our fridges, soon to our cars and within our walls. And to process all this data? Smarter and more efficient computer vision algorithms that will quickly be able to pull out more information from visual content than humans can. And in a way, they already do.
The key conversation on that theme was probably the fire chat conversation with Eric Fossum, Inventor of the CMOS Image sensor and currently working on the quanta image sensor. From his early days working with NASA to his current work, Professor Fossum seems to have been instrumental in every significant development of computer vision. His latest research, quanta image sensor, is an image sensor that captures one photon at a time. Each and every smallest element of light. One can access the location and arrival time of every single photon. The first apparent large scale application is low light imaging. However, beyond, in medical, space and even consumer application, like driverless cars, the potential is only limited by the ingenuity of those creating algorithms to interpret all of this data. Dr. Fossum, along with partners, has formed a company called Gigajot around this invention.
AR also had a recurring role over the two days of the summit. It had its advocates who proclaimed its kingdom ( Facebook, for example) and its skeptics who doubted its maturity. Either way, it kept on resurfacing in many presentations and panels, along with VR, who took more of a side role this year.
Amazon Go and its shopping tracking technology was also an important focal point, at least during the first. Not only on how it could benefit everyone’s lives ( goodbye lines at the register) and save hours of unproductive time but how it could be applied to other uses beyond retail. All this powered, naturally, by image recognition.
Standing out amongst the presentation was a company, Twentybn, that is patiently teaching computers to understand human gesture. In a world that is becoming more and more visual ( and tracked by cameras everywhere), understanding what humans mean via their body languages is critical. It can, for example, allow for speechless high-level communication with machines. It can also enable computers to comprehend intent even sometimes before the human is conscient of it. Like if a pedestrian is about to cross the street.
Every year, the summit also brings forth advances in medical imaging, where computer vision is solving incredibly hard problems. From mapping the flows in our brains to help diagnostic depression and anxiety to predicting genetic diseases using facial recognition and A.I. However, it’s not all for large research hospitals. Companies like Nanit helps everyday parents survey their children sleep patterns for better cycle management…and longer nights.
All this brings us to the more significant underlying theme of this conference. Large and complicated datasets. Whether those that are sent to analysis or those used to train A.I. visual recognition, datasets are key to this industry. For training, they have to be perfectly organized and free of any human biases to avoid inaccurate results. The more, the better. However, for some models, they are not easy to find, if at all. At this point, enters CGI or synthetic data, which can either replicate or create large quantities of training data without ever needing to seek real-life examples. As this technology becomes cheaper and more performant, it will be possible to A.I. to recognize objects and situation in even more details than ever imaginable. And it will cut the current competitive advantage of the big data hoarders like Google, Apple, Amazon, and others.
As much as vision is critical to humans, as much as visual tech is becoming essential in our lives. Even if most see photography and video as a cute past time, there are thousands of researchers and entrepreneurs who see it as invaluable data that can significantly increase the quality of our lives. This is what this Summit is all about. Teaching machines what we know about our visual environment so they can, in turn, make us better human beings.
Author: Paul Melcher
Paul Melcher is the founder of Kaptur and Managing Director of Melcher System, a consultancy for visual technology firms. He is an entrepreneur, advisor, and consultant with a rich background in visual tech, content licensing, business strategy, and technology with more than 20 years experience in developing world-renowned photo-based companies with already two successful exits.