Just before the worldwide lockdown in February, I unwittingly flew to the Weizmann Institute to do a 3-month internship in Artificial Intelligence.
My 3-month intended internship met up against Covid19 which resulted in me not being able to leave Israel at the end of that time! So, my stay increased by another month, then another until I managed to return to my native Bulgaria (Sofia) in September; Unfortunate as that may sound, the lockdown and restrictions on international travel were perfect for me to remain at the Institute, employed and gaining work-experience.
Now, “Artificial Intelligence”, or AI, is one of those buzz words nowadays. People use it in virtually everything that needs to attract attention and I am pleased to offer a more understandable breakdown of just what my work was about.
One of the main areas in which AI is utilized is in the processing of images; the detection of cars by a TESLA’s auto-pilot or the recognition of cancer cells from visual data are good examples of AI’s capabilities in that regard. The AI programmes which perform those tasks are often thought of as resampling the human vision; indeed, that is one reason why such programmes are referred to as neural networks, albeit artificial.
However, does an AI computer program see the same in an image that you see in real life? One may say that we have a sense of depth in what we see, which an image doesn’t provide, but the difference is even more fundamental. The human eye has what is called “variable” resolution. You’re reading this article now, but would you be able to do so if you moved your screen into your peripheral vision, without turning your eye in that direction?
Unless you’re using a 70pt+ font, you probably won’t be very successful at reading. We have to ask ourselves why is that? Only the centre of our vision has high-resolution fovea, which makes sense from an evolutionary perspective; vision is expensive! Over 50% of our brain is already dedicated to processing what we see, so, if we saw in high-resolution everywhere, we would probably need a massive head the size of an alien to house our brain!
This discrepancy between what our eyes see in a particular field of view versus what a camera captures is well illustrated in the car images. The resolution of the blurred image falls away from the central point in view, resembling the “variable” resolution of the human eye, whereas the image in good quality displays what a conventional digital camera captures – uniform high-resolution. This visible difference between the images also translates to the amount of information each one captures: the “variable” resolution image contains only ca 5% of what its high-resolution counterpart does.
It is often deemed that AI programs outperform humans in image classification, which is true in the conventional sense. However, if our brain receives “variable” resolution input – containing a lot less information – to recognize what we see and an AI program receives full-resolution input, is it fair to compare the two? Probably not. Does that mean we’re actually “smarter” by being able to recognize objects given less information, or can AI do the same if required?
Pondering on this question gave birth to my research topic: What would happen if we gave AI programs the same “variable” resolution “mess” that we see? Would they still do well? Can we build an AI program that is tuned specifically for such “variable” resolution input?
In my research, I experimented and designed a number of artificial neural networks to come up with an answer. I constructed neural networks for two comparable tasks: classification of high-resolution images and classification of “variable” resolution images.
The pictures of a table lamp illustrate the tasks of the two separate neural network programmes: one programme seeks to classify an object using a “variable” resolution picture – providing only ca 5% of the original data (such objects look like the blurred image) – and the other to classify the same object but using a high-resolution picture.
The two types of AI programs competed in classifying the same set of 1.2 million various objects – provided in both resolution configurations respectively.
It was found that AI with full resolution capability identified 85% of the images correctly. On the other hand, the AI with “variable” resolution capability identified 76% of the images correctly, BUT – and this is the interesting part – using the staggering 20X (2,000%) less information! Just a 9% difference for so much less data!
One now begins to understand why our human vision has evolved to be “variable” – much visual information is redundant and evolution has figured this out on its own!
Such was the result of my research work carried out at the Institute. I found it fascinating to be able to reproduce evolutionary conclusions through artificial neural networks.
None of this work would have been possible without the generous support of SIM in providing financial support to sustain myself throughout the beginning of my stay, before employment by the Institute.
I sincerely thank my Mentor Dr Barrie Reece (in photo above with Andrey), previous Master Kenneth Sanders, and Clerk Misha Hebel for making this possible.
Andrey Gizdov, November 2020