[关键词]
[摘要]
面向科学的人工智能正推动科学研究的范式变革。大语言模型在科研领域的潜力远不止于文献阅读等基础应用, 众多学科已开始构建领域研究的人工智能系统。商业模型存在使用门槛高、数据安全性难以保障等制约因素, 而开源大模型的本地部署为解决这些问题提供了新思路。本文以教学过程中涉及的构建腕足动物知识问答系统为例, 探索了基于基础检索增强生成(Naive RAG)的开源大语言模型在专业知识获取中的应用价值。该系统通过交互式问答能促进学生对专业知识的掌握, 但仍存在明显优化空间。针对现有局限, 本文提出了基于图检索增强生成(Graph RAG)和代理检索增强生成(Agent RAG)等的改进方案设想。结合当前技术发展趋势,探讨了检索增强生成的自适应推理系统在古生物学跨学科发展势态下的前景、困难与策略。
[Key word]
[Abstract]
In recent years, AI for Science has profoundly transformed scientific research paradigms. With the rapid ad vancement of Large Language Models (LLMs), an increasing number of researchers have begun integrating these tools into various aspects of their scientific work. However, the potential of LLMs in scientific research extends far beyond basic applications such as literature review. Developing AI systems for scientific research requires a more systematic en gineering approach. This paper presents a case study from graduate paleontology education, demonstrating the implemen tation and application of Naive RAG-based large language models in scientific research and discussing the future devel opment of related systems. The study details the complete workflow of building a domain-specific knowledge base, in cluding data cleaning, text segmentation, vectorization, and the implementation of retrieval-augmented generation (RAG). Through this practical case, we demonstrate how locally deployed RAG-based open-source LLMs can ensure data security and privacy while effectively facilitating domain knowledge acquisition through interactive Q&A, specifically in teaching paleontological taxonomy. Our experience shows that current developments in Graph RAG systems, Mul ti-agent RAG systems, and AI agents — already demonstrating promising results across various disciplines —could potentially enhance paleontological education and research. As paleontology increasingly embraces interdisciplinary approaches, staying abreast of these technological advancements is crucial for the field's future progress. Based on current technological trends, this paper further explores the application prospects, challenges, and potential strategies of retrieval-augmented generation adaptive reasoning systems that integrate Test-Time Compute (TTC) and Test-Time Scaling (TTS) in the context of paleontology's interdisciplinary development.
[中图分类号]
[基金项目]
国家自然科学基金项目(NSFC42272007)资助