I am currently taking this psych class on campus. Its all about sensation and perception, about how we see things, smell things, touch things, feel things, and how our brain processes all of this information to make the world perception we see. As a part of this semester, I have spent a considerable amount of time investigating neural networks, and their applications to opinion and sentiment understanding in english text. Now, after having read the biological advances in neuroscience, I have come to fully appreciate how much machine learning has begged, borrowed, and stolen from neuroscience to get to where it is today.
Consider the visual pathway for a second. In humans, the pathway begins in the retina, where proteins (specifically opsins and then chomophores for those who are curious) absorb photons and convert light energy into electrical energy. This mimics the concept of a 1 bit in computer science almost perfectly. The electrical energy is then propagated down the rod or cone (depending on if its daytime / nighttime) to the ganglion neuron connected at the end. This ganglion neuron transmits its signal through the optic nerve all the way up to the Lateral Geniculate Nucleus in the Thalamus. In machine learning, we call that an input layer. You can think of the retina’s cones and rods as a preprocessing step that converts whatever physical function (NLP, visual processing, etc) that we are trying to approximate to a series of numbers that can be expressed as bits. Then the ganglion cells become essentially the input layer of the Neural Network (NN). So far, its a complete copy! The Lateral Geniculate Nucleus serves as a relay point for data. Data from both sides of the eye have crossed over at the optic chiasm to arrange all the bits containing information about the left side of the visual space to the right side of the LGN and vice versa for the right side of the visual space. One can think about this like a fully connected layer, where exactly half of the connections have zero weight because they should only be transposed to a single side of the next layer. However, the dimensionality has not changed yet. The LGN then relays this signal into the V1 or primary visual cortex. While the LGN in reality is multiple layers deep and wide, for our comparison we can consider it to be somewhat more simplified. Its primary purpose is to serve as a relay, and to amplify and organize the signal before forwarding it to the correct destination. One can think of the LGN as a set of fully connected hidden layers of a neural network. The V1 is where the fun begins. The V1 first extrapolates the incoming signals from the LGN to represent receptive field bars with orientation and spatial frequency tuning. We can consider this to be essentially a multi-dimensional convolutional layer. The spatial frequency and orientation tuning comes from a very high dimensional convolution. The tuning is the same as the weights applied to each connection in a NN connection. The big mystery that remains is how is object recognition done in the V1. Due to spatial frequency tuning, we know that the V1 can extract out different frequency bands, such as low, medium, and high, but this doesn’t help us understand how the information is arranged so that objects can be recognized from mostly position invariant situations.
The current theory is that there exist certain position invariant features that our V1 cortex focuses on. The T junctions (where a line is met by a perpendicular one), the Y junctions, and the arrow junctions are considered position invariant as they remain the same no matter how you rotate them. However, clearly position invariance is not a perfect theory as there do exist certain angles at which our brains cannot tell what the object we are looking at is. Specifically, when an accidental viewpoint causes an image that seems impossible to appear. As we uncover how the brain works, it will become more and more evident how we can mimic computers to recognize objects as well.