Neuroscientists have overturned our understanding of Pavlovian learning

Recent research published in natural neuroscience This suggests that the brain learns to associate certain signals with rewards based on the amount of time that passes between rewards, rather than the number of repetitions. This challenges century-old assumptions about conditioning and provides evidence that the total amount of learning over a period of time is entirely dependent on timing. These discoveries could change our understanding of learning in both animals and humans.

For over 100 years, scientists have generally accepted that associative learning works through trial and error. Associative learning is the process by which humans and animals learn to associate certain signals with certain outcomes. This is similar to how a dog learns that a bell means dinner is ready. It is a common belief that more practice leads to better learning.

Scientists have previously developed a mathematical model that suggests animals learn by looking back in time to identify the causes of meaningful effects. In this framework, rather than trying to predict the future effects of a cue, the brain works backwards from the reward to figure out what was predicted. While testing this idea, scientists noticed that when the time between rewards was extended, the animals learned proportionately faster.

“Shortly after publication of this paper, we realized that the model predicted that animals would learn the cue-reward association proportionately faster when trials were spaced out. This should mean that the total amount of learning over a fixation period should be independent of the number of cue-reward combinations experienced,” said study author Vijay Mohan K. Namboodiri, an associate professor at the University of California, San Francisco.

This observation led researchers to test whether strict mathematical rules control the speed of learning. They aimed to determine whether learning speeds up in proportion to the time elapsed between stimulus and reward experience. They designed a series of experiments that measured both physical behavior and brain chemistry in real time.

“We decided to test whether there are rules that govern the control of the learning rate and whether the learning rate varies linearly with the time between cue and reward experience,” Namboodiri explained.

The researchers conducted the study using 101 adult male and female mice. They classically conditioned thirsty mice by playing a short auditory tone, followed by giving them sugar-sweetened water. Mice were physically held in a fixed position to ensure that testing conditions were controlled and uniform across all subjects.

After learning this association, the rat started licking the spout as soon as it heard the sound, expecting sugar water to come out. To measure the underlying brain activity, the researchers used a technique called fiber photometry. They injected a special fluorescent sensor into the nucleus accumbens core, a brain region deeply involved in reward processing.

This sensor lights up when the brain releases dopamine, a chemical messenger strongly associated with pleasure, motivation, and learning. This allowed scientists to monitor exactly when the brain processes tones and rewards. The researchers divided the mice into different groups based on the time elapsed during the experiment. Some mice experienced the tone and reward every 60 seconds, while others waited 600 seconds between pairings.

Mice that waited 600 seconds learned the association in about one-tenth the number of trials compared to mice on a 60-second schedule. This indicates a proportional relationship in which the trial-by-trial learning rate increases as the time between rewards increases. As a result, both groups of mice learned the association in exactly the same total conditioning time, even though one group experienced far fewer total tone-reward pairs.

“The main finding of the study was that the learning rate (how much you learn from each experience) changes linearly with the time between rewards, which was quite surprising,” Namboodiri told SciPost. “This was a prediction made by the retrospective learning model described above, but we expected that the prediction would be incorrect in the first experiment and require an update of the model.”

Measurements of dopamine provided evidence consistent with behavioral observations. Mice with longer gaps between rewards needed proportionally less experience before their brains started releasing dopamine in response to sound alone. The actual dopamine response occurred several trials before the mice began to physically lick the spout in anticipation.

“In each experiment, we tracked how dopamine responses to cues evolved during learning under the same timing manipulations that we used behaviorally,” Namboodiri said. “We found that dopamine signals followed the same learning rules, meaning that the rate and magnitude of changes in dopamine cue responses depended on the average time between rewards, rather than the raw number of cue-reward combinations. This similarity between behavior and dopamine activity shows that the brain’s reward system is enacting time-based learning rules, and reveals a simple biological basis for how animals learn from rewards.”

To ensure that the results were not caused by other factors, the scientists performed several control experiments. The researchers tested whether the mice learned faster simply because they received fewer rewards each day. That may be why sugar water seems more novel.

The researchers also tested whether spending more time in the exam room without listening to sounds had an effect. The proportional scaling rule remained consistent even when controlling for these variables. The time between rewards consistently determines the rate of trial-by-trial learning.

The scientists then tested aversive learning by combining sound and mild foot shocks in freely moving mice. They observed the same proportional scaling rule in this scenario. Mice with longer time between shocks learned to freeze in response to the sound in proportionately fewer trials.

In another variation, researchers tested partial reinforcement. They played a tone every 60 seconds, but only gave them sugar water 10 to 50 percent of the time. Because the actual rewards were further apart in time, the mice learned the underlying dopamine associations in far fewer reward trials than mice that received a reward each time.

Traditional learning theory assumes that the brain calculates prediction errors moment by moment. Prediction error is the difference between the reward an animal expects and the reward it actually receives. The researchers compared these older models to a new framework that calculates relevance retrospectively only when a reward is received.

When running computer simulations of these different theories, traditional models did not match mouse behavior. Traditional models cannot explain why the learning rate increases linearly with the time between rewards. A new retrospective model naturally predicted this exact proportional scaling and provided strong theoretical support for our experimental results.

“The key takeaway from our study is that what actually drives reward-based learning is not how many cue-reward combinations an animal experiences, but rather how much time passes between rewards,” Namboodiri summarized. “Simply put, we found that when rewards are separated in time, each reward leads to proportionally more learning. Therefore, if the rewards occur 10 times apart, each reward leads to approximately 10 times more learning.”

“As a result, over a period of time, the total amount of learning is the same even though the number of cue-reward experiences varies widely (by a factor of 20 or more). This previously unknown learning rule suggests that the total number of experiences is not the primary determinant of learning, which is part of a long-standing assumption in neuroscience and reinforcement learning. Although it was known in the field that distributing pairings in time would speed up learning per pairing, it was assumed that the final learning level still depended on “In our experiments, we found that the total amount of learning is determined by time, not count. ”

Readers may easily confuse these findings with the well-known spacing effect. The spacing effect is a broad educational concept that suggests that taking breaks between study sessions results in better learning outcomes than cramming. A new study points to something much more specific than the general benefits of taking a break.

“We would like to emphasize that our results do not simply restate the spacing effect or its biological basis, but rather identify a previously unknown learning rule,” Namboodiri told PsyPost. “The spacing effect can be broadly summarized as ‘spaced out experiences = better learning,’ meaning that as experiences are brought closer together in time, there is less benefit from their contribution to learning.”

“However, our finding that the learning rate varies linearly with the time between rewards (especially rewards) requires a fundamental change from the above perspective, because (as we show) it requires that over a given period of time, the number of cue and reward experiences has no effect on overall learning.”

One potential limitation is that the researchers tested this particular rule primarily in a simple setup using mice. They also noted that the proportional scaling rule tended to break down at extreme intervals, such as when mice waited a full hour before being rewarded.

Future research will investigate where exactly this time is calculated in the brain. Scientists also plan to investigate whether this rule also applies to drug rewards, which could provide insight into addiction and habit formation. For example, nicotine patches provide a continuous supply of nicotine, which can disrupt the brain’s association between the act of smoking and reward, blunting the urge to smoke.

Applying these timing principles to artificial intelligence systems could also help machines learn faster from less data. Current systems make small improvements after billions of interactions, making them slow to learn. Models that borrow from these new biological discoveries have the potential to accelerate artificial learning.

The study, “Duration between rewards controls the rate of behavioral and dopaminergic learning,” was authored by Dennis A. Burke, Annie Taylor, Huijeong Jeong, SeulAh Lee, Leo Zsembik, Brenda Wu, Joseph R. Floeder, Gautam A. Naik, Ritchie Chen, and Vijay Mohan K Namboodiri.

Source link

Visited 1 times, 1 visit(s) today