When we see a photo of someone who has another photo, we implicitly judge that the individual in the nested photo is less capable of thinking and feeling. A new study published in the journal Cognition reveals that this visual bias remains consistent whether the face is upside down, covered by a mask, or generated entirely by artificial intelligence. This study shows that the structural representation of nested images significantly overrides the actual physical details of the human face itself.
Psychologists assess how we see the inner lives of others through a concept called mind perception. This theory proposes that people intuitively judge the mental abilities of different beings along two main dimensions. How we ascribe these mental states determines our moral judgments, empathy, and behavioral expectations in our social environments.
The first dimension represents subjectivity. This is the ability to think, plan and act according to one’s own will. The second dimension represents experience, the ability to feel the environment and feel emotions. We naturally ascribe high levels of agency and experience to living humans, placing them firmly at the top of the social hierarchy. We believe that two-dimensional representations such as animals, robots, and photographs significantly reduce mental quality.
Previous research has established a hierarchical decline in mental cognition based on visual abstraction, a phenomenon called the Medusa effect. Observers consistently believe that people depicted in paintings have reduced mental capacity and realism compared to people directly in photographs. A single photo represents a major level of abstraction. Pictures within pictures function as nested representations, separating the viewer through secondary levels of abstraction.
Kyushu University researcher Jin Han and a team of colleagues conducted a new study to investigate the cognitive dynamics underlying this psychological phenomenon. The researchers wanted to know whether the Medusa effect could be disrupted or erased by manipulating how facial information is processed. They set out to test for bias using culturally adapted photographs, synthetic media, and physical obstacles that interfere with standard visual processing.
Recognizing human faces requires two parallel visual evaluation pathways. Holistic processing involves recognizing the overall composition of the face and intuitively understanding the arrangement of parts as a unified whole. Feature processing relies on identifying specific individual components, such as the shape of the eyes or the curve of the mouth. The researchers designed a series of eight psychological experiments to systematically disrupt these visual pathways.
In the first experiment, the team recruited Japanese participants online and presented them with a new set of culturally adapted images featuring Asian models. Participants viewed an image of a primary person holding a nested portrait of a secondary person. Participants rated subjects on agency, experience, and realism, assigning them numerical scores from 0 to 10. The research team found that the Medusa effect applied to the Asian model in the same way it applied to Western populations in previous literature.
Subsequent tests targeted global visual processing. The researchers flipped the images from the first experiment vertically. Facial recognition degrades excessively when faces are displayed upside down compared to other objects, such as houses or vehicles. Although the reversal was successful in lowering the broad mental attribution scores for all people in the photos, participants still rated the people in the nested photos lower than the main subjects holding the photos.
Next, the team targeted feature processing by occluding specific parts of the face. In three consecutive experiments, they photographed models wearing surgical face masks, dark sunglasses, or both accessories at the same time. Covering the lower face and eyes typically deprives observers of necessary visual cues that indicate emotions and inner mental states.
Physical accessories significantly reduced general perceptions of spirituality across trials. Observers found it much more difficult to recognize subjects’ subjectivities and experiences hidden behind masks and sunglasses. However, the relative differences in mental perception remained intact. Nested subjects were always judged to be significantly less intelligent than directly photographed subjects.
The researchers also investigated the impact of authenticity and artificial intelligence. The steady proliferation of synthetic media has made it incredibly easy to generate faces that are indistinguishable from photos of real people. The researchers used image generation software to create a completely artificial scene in which synthetic people held photos of other synthetic people. Participants rated these images without being informed that they were generated by artificial intelligence.
Observers intuitively believed that synthetic subjects had less mind than real humans in previous tests. Even within these artificial generations, psychological gaps remained. The artificial heart in the first photo was rated higher than the artificial heart in the nested photo.
The final operation involves spatial scrambling. The team rearranged the model’s internal facial features, making the eyes, nose, eyebrows, and mouth look unnaturally scattered. Scrambling completely removes the ability to interpret the stimulus as a coherent social agent. Assessment scores plummeted, establishing the lowest psychoperception scores in the entire study. Despite rating highly distorted faces, observers still demonstrated the Medusa effect by rating the nested scrambled faces lower than the first scrambled faces.
The results show that the Medusa effect ranks as an incredibly powerful phenomenon that defies fundamental perceptual perturbations. It appears to operate almost independently of any physical or structural information present in recognizable faces. Researchers suggest this effect may stem from a psychological concept known as construal level theory. This theory posits that creating spatial, temporal, or virtual distance facilitates more abstract mental associations in the human brain.
Nested photographs indicate psychological distance, making the individual appear existentially distant to the person evaluating the image. The Medusa effect may also reflect deeper categorization processes. Observers may unconsciously treat an image embedded in another image more like a decorative object than a human agent.
The researchers noted several specific limitations that require more extensive evaluation in future tests. In current photography, the field of view is largely restricted to the natural face and upper body, eliminating the effects of full-body posture. The body contributes a huge amount of social information about emotions and identity that can distort visual evaluations.
Future research should test how nested abstractions influence our social judgments by changing body language or introducing animated or robotic figures. Assessing individual differences in visual processing speed and accuracy may also help explain why some people are more susceptible to this visual bias than others.
The study, “Robust Medusa Effect Through Facial Manipulation,” was authored by Jing Han, Kyoshiro Sasaki, Fumiya Yonemitsu, Kaito Takashima, and Yuki Yamada.

