Conclusion: A machine learning model that analyzes patient demographics, electronic medical record data, and routine blood test results predicted patients’ risk for hepatocellular carcinoma (HCC), the most common type of liver cancer, with high accuracy.
Journal in which the study was published: cancer discoveryJournal of the American Association for Cancer Research (AACR)
author: Carolyn Schneider, MD, is co-senior author and corresponding author and assistant professor at RWTH Aachen University, Germany. Schneider co-led the study with Jakob Kasser, MD, professor of clinical artificial intelligence at Germany’s Dresden University of Technology.
Jan Klusmann, MD, is the study’s lead author and a clinical scientist at the Technical University of Dresden.
background: People considered to be at high risk for HCC may be eligible for image- and blood-based cancer screening to allow for early detection of the disease. However, Schneider explained that current guidelines focus on a limited number of high-risk populations and miss many at-risk people.
“Screening tests are recommended in patients with known cirrhosis or severe liver disease, as they typically have more cases of hepatocellular carcinoma, but many people have undiagnosed cirrhosis or other risk factors and may benefit from liver cancer screening,” he said.
Other factors that increase a patient’s risk of developing HCC include being male, smoking, and drinking a lot of alcohol, Klassman added.
“With so many factors influencing risk, there is an urgent need for effective tools to help clinicians identify high-risk patients,” he said. “Machine learning tools that can work with different types of clinical data simultaneously could be particularly useful for this large clinical challenge.”
How the study was conducted: In this study, Clusmann, Kather, Schneider and colleagues used data from the UK Biobank to develop a machine learning model to analyze different types of clinical data to assess HCC risk. The UK Biobank contains data from more than 500,000 individuals in the UK and included 538 HCC cases, 69% of which occurred in patients without a prior diagnosis of cirrhosis, viral hepatitis, or other chronic liver disease.
The researchers trained the model on 80% of the data from the UK Biobank and performed initial validation on the remaining 20%. External validation was performed using the All of Us registry, which includes data from more than 400,000 individuals in the United States, including substantial representation of populations historically underrepresented in medical research, the authors noted. The registry included 445 HCC cases.
The model the authors developed used a random forest architecture, a technique that combines hundreds of decision trees. Each tree makes a series of simple “yes” or “no” decisions based on a set of variables from the patient data, and the final prediction is determined by aggregating the results of all the trees, making the model more robust, reliable, and interpretable, Klusman explained.
Separate random forest models were trained on each of five different types of clinical data and on progressive combinations of data in order of decreasing clinical availability: patient demographics, electronic medical record data, blood test results, genomics, and metabolomics. The performance of these models was evaluated by calculating the area under the receiver operating characteristic (AUROC). AUROC represents the ability of the algorithm to distinguish between two groups (in this case, patients in the validation cohort with HCC and patients without HCC) (1 being a perfect score).
result: The researchers found that a model that combined demographics, electronic health records, and blood tests (Model C) yielded the best performance with an AUROC of 0.88. Adding genomics or metabolomics data did not significantly improve performance.
“This showed that HCC risk can be predicted using simple, readily available data without the need for complex and expensive genetic sequencing,” Schneider said, noting that this feature increases the likelihood that the model will be widely used, especially in resource-limited settings.
The researchers then compared the performance of their model to that of previously reported liver cancer risk prediction models. These include clinically available FIB-4, APRI, and NFS scores. These scores are commonly used to determine a patient’s likelihood of liver fibrosis, a known risk factor for liver cancer. It also includes your aMAP score. The aMAP score predicts liver cancer risk in patients with chronic liver disease using clinical factors such as age, gender, albumin levels, bilirubin levels, and platelet counts.
The authors found that their model outperformed existing scores at finding true cases of HCC and also produced fewer false positives. To make Model C more practical in clinical practice, the researchers reduced the number of clinical features tested in so-called “ablation experiments.” As a result, a simplified model version that examined just 15 routinely collected clinical features still outperformed existing risk prediction models.
Author’s comment: “Our study highlights the potential of simple and easily available machine learning models to improve risk stratification of hepatocellular carcinoma using only routinely collected clinical data,” Schneider said. “If our model is validated in a larger population, primary care physicians will be able to efficiently identify at-risk patients and refer them for liver cancer screening. This could lead to earlier detection and improved outcomes for patients with this aggressive disease.”
Klasman added that the final model showed strong generalizability. Despite being trained primarily on data from Caucasian participants in the UK Biobank, it maintained robust performance when specifically evaluated in the non-Caucasian subgroup of the more ethnically diverse All of Us cohort, suggesting broad applicability across the population.
Research limitations: Limitations of this study include its retrospective design and the low proportion of patients with viral hepatitis, a known risk factor for liver cancer, in the training and validation cohorts. The authors note that further validation is needed to evaluate the performance of machine learning models in different populations.
sauce:
American Association for Cancer Research
Reference magazines:
Klasman, J.et al. (2026). Machine learning predicts hepatocellular carcinoma risk from routine clinical data. This is a large population-based, multicentric study. cancer discovery. DOI: 10.1158/2159-8290.CD-25-1323. https://aacrjournals.org/cancerdiscovery/article/doi/10.1158/2159-8290.CD-25-1323/775574/Machine-learning-predicts-hepatocelle-carcinoma

