An illustration of two visual pathways in the brain: the dorsal stream (green) and the ventral stream (purple). Dicarlo investigated how activity in the ventral processing stream supports visual object recognition, focusing particularly on neuronal spiking in higher levels of this region. (Image Source: Wikimedia Commons)

An illustration of two visual pathways in the brain: the dorsal stream (green) and the ventral stream (purple). DiCarlo investigated how activity in the ventral processing stream supports visual object recognition, focusing particularly on neuronal spiking in higher levels of this region. (Image Source: Wikimedia Commons)

James DiCarlo, Department Head and Professor of Neuroscience and Investigator at MIT, discussed his research on the neural mechanisms underlying humans’ seemingly effortless ability to solve complex problems of object recognition.

Humans rapidly and accurately analyze visual environments, extracting latent content—such as category information, position, and size—from a pattern of pixels. Yet this ease in digesting a scene belies the real, computational complexity of this task; information that appears obvious is actually implicit in pixel representation, and perceptual processes that seem automatic actually involve a series of transformations from pixels to higher-level visual representation. DiCarlo and his team sought to understand this transformation. Specifically, their research focused on core object recognition, a subdomain of object perception that involves categorizing images containing a single object.

The human brain excels at core object recognition: when shown a rapid succession of image frames, people easily recognize specific objects. The challenge, then, is not the diversity of objects but rather that a common physical source generates infinite images. How does the brain determine the identity of an object partially occluded, tilted, or altered in color? The ability to overcome this challenge, the computational crux of core object recognition known as the invariance problem, separates humans from computers.

The neural mechanisms that solve core object perception lie in the ventral visual stream, a hierarchy of cortical areas (V1, V2, and V4) culminating in the inferior temporal cortex (IT) and encoding visual information with increasing selectivity and tolerance. DiCarlo generated models to explain two processes that occur in the ventral stream: encoding, the transformation of retinal images to population patterns of neural activity, and decoding, the transformation from these population patterns to object recognition behaviors such as verbal reports.

To generate stimuli that tested object recognition, DiCarlo performed identity-preserving image variations on pictures of objects such as translations, rotations, and placement onto random backgrounds. By manipulating latent variables to create non-naturalistic settings, he tested human and primate abilities to make identity, rather than context, distinctions.

The researchers used non-human primates (NHPs) because of parallels between primate and human brains. Non-human primates are relatively easy to train and display similar visual acuity and recognition abilities. In one test, for instance, the researchers compared human and rhesus recognition behavior for twenty-four objects and found identical confusion patterns across the images. These similarities suggest that, for humans and non-human primates, basic object recognition is indistinguishable for categorization tasks and independent from reporting behavior.

DiCarlo hypothesized that learning object recognition tasks occurs when neurons downstream from IT either prune or strengthen their synaptic inputs from ventral stream neurons. His decoding model implemented simple, linear classifiers that approximated this downstream neuron learning and predicted behavioral performance. Through object recognition tests, he found that downstream neurons sample approximately fifty thousand neurons spatially distributed over IT and measure the average spiking response, creating a weighted sum of outputs that judges the likelihood that a particular object is present. DiCarlo’s decoding model, subsequently named the Model of Learned Weighted Sums of Random Average Distributed over IT (LaWS of RAD IT), accurately predicted human confusion patterns and behavior with a correlation of 0.91.

If downstream neurons attach to specific patches of the IT cortex, then suppression of IT neurons should generate predictable patterns in behavioral deficits. To establish a causal link between neuronal activity and object recognition behavior, the researchers directly perturbed neuronal activity and measured the effects on behavior. Using optogenetic techniques—preferable to pharmacological manipulations that persist for hours—they briefly inhibited neurons in the IT cortex with subdural LED lights. Selective suppression of neural subpopulations associated with identifying gender resulted in a 2% deficit in gender discrimination tasks—a significant decrement expected based on the size of the regions and amount of knockdown. This inactivation suggests a link between neuronal firing patterns in the inferior temporal cortex and object discrimination behavior.

Despite the impressive ability of his models to capture the neural mechanisms of core object recognition, DiCarlo did note limitations in his research. His models only predict behavior for a subset of behavioral tasks and do not explain the function of multiple cortical layers. In the future, DiCarlo plans to expand his research to explore the entire domain of core object recognition. Eventually, he hopes, he can generalize his models to all sensory domains.



  1. DiCarlo, James. “Neural Mechanisms Underlying Visual Object Recognition.” PBS Colloquium. New Hampshire, Hanover. 21 Oct. 2016. Lecture.
  2. Rosselli, Federica B., Alireza Alemi, Alessio Ansuini, and Davide Zoccolan. “Object Similarity Affects the Perceptual Strategy Underlying Invariant Visual Object Recognition in Rats.” Frontiers in Neural Circuits 9 (2015): n. pag. Web. 22 Oct. 2016.
  3. Ventral-dorsal Streams. Digital image. Wikimedia Commons. N.p., 15 Dec. 2007. Web. 25 Oct. 2016.