Recent research published in psychological science This provides evidence that people naturally absorb its underlying rules simply by listening to music over a lifetime. The findings suggest that both trained musicians and people with no musical background use harmonic context in strikingly similar ways to predict and remember musical patterns.
Scholars have long debated whether formal training is necessary to understand music’s deeper harmonic framework. Just as language is structured into words and sentences, music is organized into layers of notes, phrases, and sections. Some experts believe that understanding this organization requires explicit instruction in music theory.
Other scientists argue that the brain can learn these rules implicitly by simply being passively exposed to music. Previous studies have yielded mixed results regarding how formal training affects listeners’ ability to process sonic context. Tonal context refers to the overarching harmonic structure or key of a musical piece.
“There has been considerable research into how listeners construct music context, but an open question has been how much context is actually used,” said corresponding author Leesa Y. Cassano-Coleman, a doctoral candidate at the University of Rochester and member of the SoNIC (“Science of Neural, Interpersonal Communication”) lab.
“This study was also motivated by a similar line of research on language/narrative: different regions of the brain respond to different amounts of coherent context in a story (Lerner et al., 2011 J. Neurosci.). Basically everyone is an ‘expert’ in storytelling (at least in the sense that people use language and stories to communicate in everyday life), but not everyone is an expert in music. Music therefore provides an interesting test case. Is formal training in music necessary to understand? What is the musical structure? ”
To resolve this debate, researchers designed a systematic way to test listeners. They wanted to find out how the amount of consistent musical information affects a person’s ability to encode, predict, and segment music. The scientists manipulated the amount of musical context available to listeners by scrambling songs at different time intervals.
The researchers conducted four separate experiments using piano music from Pyotr Ilyich Tchaikovsky’s Album for Youth. They created different versions of the music by scrambling the songs on different timescales. Conditions included 1-bar scramble, 2-bar scramble, 8-bar scramble, and completely intact music.
A musical bar, or bar, is a small segment of time that contains a specific number of beats. By holding features such as volume, instrument type, and speed constant, the scientists ensured that participants responded only to changes in harmonic structure.
“On the surface, our various musical stimuli sound fairly uniform: the same piano tone, the same tempo, no changes in dynamics (volume),” Cassano-Coleman explained. “So it’s just the underlying structure that changes depending on the conditions. What we wanted to test was how much listeners use that structure, specifically to remember and predict in music.”
In the first experiment, researchers tested musical memory. They recruited 108 adults between the ages of 19 and 41, split evenly between musicians with at least five years of training and non-musicians with no training.
Participants heard a 16-s prompt from one of the scramble situations. After a short delay, it took one and a half seconds to identify which of the two short musical clips appeared in the prompt. The results showed that both groups’ memory improved as the music became less scrambled.
Although the musicians performed better overall on the memory task, both groups benefited from playing the same music for longer at exactly the same rate. Even within a group of musicians, more years of practice does not necessarily mean better memory. This suggests that the mechanisms underlying memory encoding are shared.
In the second experiment, we tested musical predictions using a separate sample of 108 adults. This was also split evenly between musicians and non-musicians. Participants listened to a 14-second prompt and chose which of two short clips best completed the musical sequence. The data revealed that prediction accuracy increases as the amount of intact musical context increases.
Musicians did not perform better than non-musicians on this prediction task. Both groups made equally good use of the available overtone information to guess the next note. This provides evidence that people unconsciously apply the rules of music theory to predict what will happen next, regardless of whether they have formal education or not.
In the third experiment, the scientists investigated event segmentation, or how people mentally break up continuous sounds into meaningful chunks. Ninety-five adults listened to one-minute long musical pieces. They were instructed to press the spacebar on their keyboard every time they heard a meaningful change in the music.
“Event segmentation tasks require real-time context integration. Segmenting music into meaningful events requires, while listening, remembering what you just heard, predicting what will happen next, and determining whether it is a meaningful enough change to mark the boundaries of the event,” Cassano-Coleman told PsyPost. “This gives us some insight into how these processes unfold over tens of seconds or minutes under more natural listening conditions, rather than the 15 seconds or so in memory and prediction tasks.”
As the music became more intensely scrambled, all participants pressed the button more frequently. Their reactions naturally matched the new boundaries created by the scrambling process. When the music was left alone, both groups were successful in identifying standard eight-bar phrases as meaningful events.
Differences between groups became apparent when looking at longer musical structures. Musicians tended to press buttons in time with 16-bar hyperphrases, which are large musical sections made up of multiple smaller phrases. Non-musicians tended to focus on identifying short 8-bar phrases.
The researchers conducted additional checks to ensure that simple changes in pitch height or rhythmic speed were not leading to these responses. This provides evidence that listeners are purely tracking the underlying harmonic rules.
The fourth experiment tested explicit perception of structural failure. The researchers asked the 108 participants in the first two experiments to listen to a one-minute piece of music and specifically identify the level of scrambling. Participants had to choose whether to scramble the music every bar, every 2 bars, every 8 bars, or leave it as is.
Musicians performed better on this classification task than non-musicians. This suggests that explicit theory training helps people consciously reason about musical structure. However, both groups had the most trouble discriminating between completely intact music and highly chaotic one-bar scrambling, and achieved their best accuracy with medium-level scrambling.
“What we found is that listeners integrate musical context over time and don’t need formal training to take advantage of it,” Cassano-Coleman summarized. “In other words, disrupting the structure (by scrambling) disrupted listeners’ ability to remember and accurately predict what would happen next. What surprised us most was how similarly musicians and non-musicians performed on these tasks. Explicit labeling appeared to give musicians an advantage (Experiment 4), but otherwise both groups’ performance improved (at similar rates) in a more complete context.”
Although this study examines music cognition in detail, it has several limitations. The researchers used pieces of Western classical music that follow very specific harmonic rules. It remains unclear whether listeners show exactly the same pattern of contextual integration when listening to music from different genres or from other cultures.
Future research could investigate how different musical features, such as rhythmic changes or different combinations of instruments, drive event segmentation. The scientists also plan to investigate how highly trained musicians use this type of context when actively performing. A combination of behavioral tests and brain imaging could help pinpoint how the mind integrates multiple streams of auditory information.
“In terms of future directions, we are interested in what happens in the brain when people listen to these scrambled stimuli in an fMRI scanner or when they are played by an expert pianist,” Cassano-Coleman said. “Look forward to that!”
The study, “Listeners Systematically Integrate Hierarchical Tonal Contexts Regardless of Musical Training,” was authored by Riesa Y. Cassano-Coleman, Sarah C. Izen, and Elise A. Piazza.

