Exploring large language models in paleontology: a prototype RAG-based knowledge QA system
Article
Figures
Metrics
Preview PDF
Reference
Related
Cited by
Materials
Abstract:
In recent years, AI for Science has profoundly transformed scientific research paradigms. With the rapid ad vancement of Large Language Models (LLMs), an increasing number of researchers have begun integrating these tools into various aspects of their scientific work. However, the potential of LLMs in scientific research extends far beyond basic applications such as literature review. Developing AI systems for scientific research requires a more systematic en gineering approach. This paper presents a case study from graduate paleontology education, demonstrating the implemen tation and application of Naive RAG-based large language models in scientific research and discussing the future devel opment of related systems. The study details the complete workflow of building a domain-specific knowledge base, in cluding data cleaning, text segmentation, vectorization, and the implementation of retrieval-augmented generation (RAG). Through this practical case, we demonstrate how locally deployed RAG-based open-source LLMs can ensure data security and privacy while effectively facilitating domain knowledge acquisition through interactive Q&A, specifically in teaching paleontological taxonomy. Our experience shows that current developments in Graph RAG systems, Mul ti-agent RAG systems, and AI agents — already demonstrating promising results across various disciplines —could potentially enhance paleontological education and research. As paleontology increasingly embraces interdisciplinary approaches, staying abreast of these technological advancements is crucial for the field's future progress. Based on current technological trends, this paper further explores the application prospects, challenges, and potential strategies of retrieval-augmented generation adaptive reasoning systems that integrate Test-Time Compute (TTC) and Test-Time Scaling (TTS) in the context of paleontology's interdisciplinary development.