Convolutional Neural Net - Studying the Human Eye

Arjun Tambe / amtambe

In the architecture of the human eye, there are physical limits on the number of connections between cells directly receiving visual data, and cells that process that data. More of those connections exist from the center (fovea) of the eye, than from more peripheral areas of the visual field, which means the human visual system has higher visual acuity in the center of the visual field than the periphery. The tradeoff, of course, is lower visual acuity in the periphery of the visual field. This project attempts to address why evolution selected this architecture by comparing the classification accuracy of a neural network on images that implement a retina-like model of the visual field, against images that implement a more standard model of the visual field.

I implemented AlexNet, with a "fisheye" and a "uniform blur" dataset. The first models the eye's visual field, while the second is an alternative model in which visual acuity is uniform across the visual field. Because the number of physical connections to the retina are limited, this alternative model would have lower visual acuity than the peak acuity of the retina which occurs in the center of the visual field. Therefore, we perform two transformations on the data: the first is a model of the retina, developed by the DiCarlo and Cox lab, and the second is a uniform blur over the whole image, where the size of the kernel is equal to the average blur size of the retina model. We train, and then test the model on the Fisheyed data, and compare it to the performance when we train and test the model on the Retina warped data.

If it is the case that better recognition is an evolutionary mechanism explaining the greater density of retinal connections in the fovea, then we should find that the Uniform Blur data has lower classification accuracy than the Retina Warp dataset. We hypothesize that this will be the case.

We use cross-entropy loss, as in the original AlexNet paper with the "ADAM" optimizer from TensorFlow. We used the CIFAR-10 dataset in order to speed up the learning process, especially because learning is performed locally. For the same reason, we used a subset of the CIFAR-10 dataset. Interestingly, much of the time required to produce this model results from the process of downloading, and applying a transformation to each element, iteratively, of the dataset. Other processing, including actually training the model, batches the images into groups of 32; along with various Tensorflow optimizations, this makes the training time quite low once the data has actually been processed.

After training the data on 1000 CIFAR-10 images, and testing on 250 new images, we found that the "Fisheye" model obtains an accuracy of .868. With 1000 training images, AlexNet training and testing on completely unmodified CIFAR-10 data can achieve 100% accuracy, so 1000 training images (for 10 output classes) should be sufficient.

Meanwhile, the Uniform Blur model reaches an accuracy of .968. This disconfirms our hypothesis that the Uniform Blur model would have a lower classification accuracy than the Fisheye Model; no additional statistical testing is required to disconfirm our hypothesis. We speculate, based on a qualitative assessment of the images on which Fisheyed images fail and the Blurred images succeed, that the Fisheye model minimizes and distorts features relevant for classification - for instance, eyes of a cat, limbs of a horse, etc.

There are several limitations of this work. First, we only use CIFAR-10 data. It is possible that on a task with more classification classes, our results would change, as more precise visual acuity in the image center may be of more help in classifying an image when there are more classes. It is also possible that our result would change on a dataset like ImageNet, in which images are not always in the center of the visual field. However, when images are not centered, the distortion to the images would be more, not less, severe, which should worsen classification accuracy. We also train and test on a relatively small subset of the training data (1250, of the 60,000 CIFAR-10 images); with greater computing resources, we could also train on more data and perhaps find different results. Our finding suggests that there is another basis than classification accuracy explaining the greater density of retinal connections in the fovea.

The Retina Warp function is from a Dicarlo and Cox lab Github repository. Most of the other code uses TensorFlow functions but was written myself as I familiarized myself with TensorFlow. The AlexNet code is, of course, modeled from the original AlexNet paper, which I gave a presentation on several weeks ago. To run the code, you should just be able to run from Python3. Some of the results may show slight differences because the process for downloading and iterating through Cifar-10 appears to load the data in a different order, and since I did not set the random seed at first, the order may be different on another trial, resulting in different behavior in training.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Psych250Final.py		Psych250Final.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Convolutional Neural Net - Studying the Human Eye

Arjun Tambe / amtambe

About

Uh oh!

Releases

Packages

Languages

amtambe/Psych250Project

Folders and files

Latest commit

History

Repository files navigation

Convolutional Neural Net - Studying the Human Eye

Arjun Tambe / amtambe

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages