Large Language Models and Artificial Intelligence in Systematic Review

Danyang Dai
The University of Queensland and National Aging Research Institute

Latest release of ChatGPT-5

What is GPT and how does LLM work?

  • GPT: Generative Pre-trained Transformer

  • GPTs are a family of transformer‑based models pre‑trained on massive text corpora and often fine‑tuned for specific tasks; a subset of LLMs.

  • Large Language Moodels: models with many parameters trained on vast text to process and generate language.

  • 1

Use of LLM in systematic review

  • Conducting systematic reviews and meta‑analyses is labour‑intensive.

  • Several studies have explored adopting LLMs and AI within the review process.

  • For example: GPT 3.5 was used in the screening process and showed more than 80% of accuracy 1.

  • For example: Claude 2 was used in the data extraction process and showed an overall 96.3% accuracy 2.

  • More resources on the use of AI and LLM in systematic review by Cochrane: Collection: Artificial Intelligence (AI) methods in evidence synthesis

Example of using Elicit

  • What is Elicit: an AI‑powered research assistant to help researchers find, organise, and extract information from academic literature.

  • Designed to assist systematic reviews.

  • Functionality includes literature searches, screening, summarisation, classification, data extraction, and report generation.

Example of using Elicit

  • Example 1: Setting standards in residential aged care: identifying achievable benchmarks of care for long-term aged care services

  • Example 2: Performance indicators on long-term care for older people in 43 high-and middleincome countries: literature review, web search and expert consultation

Academic evaluation of Elicit

  • Using artificial intelligence for systematic review: the example of elicit 1.

  • Evaluated Elicit for literature searching, title/abstract screening, full‑text review, and final inclusion.

  • For the final inclusion, Elicit only included 17.6% (3/17) of the studies compared to human results using the classic screening method.

  • This result suggests that while AI and LLM tools such as Elicit are complementary for assisting systematic review, however, they have not yet reached a level of development where they can fully replace traditional approaches.

So what’s next?

  • Learning what works with AI/LLM tools in systematic reviews and meta‑analyses.

  • Learning how to use these tools to expedite labour‑intensive tasks.

  • Learning how to effectively verify LLM outputs.

  • Extracting numeric data (e.g., sample sizes in control and treatment groups).

  • From experience, LLM/AI tools work best on structured, consistent tasks.

  • Stay tuned for my package: metaextractoR, open sourced R package that extract data for systematic reviews.

Thank you.

  • Professor Tracy Comans

  • Alyssa, Tofunmi and Zhichao

  • Professor Jason Pole and Dr. Emi Tanaka