Large Language Models and Artificial Intelligence in Systematic Review

Danyang Dai

The University of Queensland and National Aging Research Institute

Latest release of ChatGPT-5

GPT: Generative Pre-trained Transformer
GPTs are a family of transformer‑based models pre‑trained on massive text corpora and often fine‑tuned for specific tasks; a subset of LLMs.
Large Language Moodels: models with many parameters trained on vast text to process and generate language.

Conducting systematic reviews and meta‑analyses is labour‑intensive.
Several studies have explored adopting LLMs and AI within the review process.
For example: GPT 3.5 was used in the screening process and showed more than 80% of accuracy ¹.
For example: Claude 2 was used in the data extraction process and showed an overall 96.3% accuracy ².
More resources on the use of AI and LLM in systematic review by Cochrane: Collection: Artificial Intelligence (AI) methods in evidence synthesis

What is Elicit: an AI‑powered research assistant to help researchers find, organise, and extract information from academic literature.
Designed to assist systematic reviews.
Functionality includes literature searches, screening, summarisation, classification, data extraction, and report generation.

Example 1: Setting standards in residential aged care: identifying achievable benchmarks of care for long-term aged care services
Example 2: Performance indicators on long-term care for older people in 43 high-and middleincome countries: literature review, web search and expert consultation

Using artificial intelligence for systematic review: the example of elicit ¹.
Evaluated Elicit for literature searching, title/abstract screening, full‑text review, and final inclusion.
For the final inclusion, Elicit only included 17.6% (3/17) of the studies compared to human results using the classic screening method.
This result suggests that while AI and LLM tools such as Elicit are complementary for assisting systematic review, however, they have not yet reached a level of development where they can fully replace traditional approaches.

Extracting numeric data (e.g., sample sizes in control and treatment groups).
From experience, LLM/AI tools work best on structured, consistent tasks.
Stay tuned for my package: metaextractoR, open sourced R package that extract data for systematic reviews.