[关键词]
[摘要]
近半个世纪以来, 基于数据的古生物学研究日益占据重要位置。当下的科学研究进入大数据时代已得到公认, 虽然囿于总体上非实验性学科的特点, 古生物学数据产出速度有限, 目前尚难符合大数据的多数基本特征, 但大数据时代及相关理念显然对古生物学研究产生了积极效应, 比如近年来古生物学数据产出的多元化, 数学方法与模型的复杂化均与之密不可分。本文主要基于作者研究经验, 浅议了古生物学定量化研究历史三个阶段, 同时考虑古生物不同门类数据的相通性, 在大数据语境下将古生物学数据分为结构型、半结构型与非结构型, 并简介了基本研究方法。在讨论了定量古生物学与分析古生物学两大研究视角的异同后, 基于古生物学代表期刊最新论文的小样本抽样, 本文强调了分析古生物学的研究思路与统计模型较传统统计方法的优势。近年来古生物学展示了数据驱动型研究的特点, 未来可能需要同时重视模型驱动的研究视角, 将自上而下问题导向的模型设计与自下而上基于数据的收集分析相结合, 以保证古生物学数据研究的可持续发展。此外, 古生物学不是数据密集型科学, 未来将之与地学其他领域的数据有机整合, 会促成古生物学在学科交叉方向的深入。最后, 统计学领域最新的倡议同样需要古生物学者重视, 对统计模型的选择与对数据的解读需要考虑系统的复杂性与多解性, 统计显著性相关的因果规律识别尤其需要慎重。
[Key word]
[Abstract]
Over the past half-century, data-based research in paleontology has increasingly assumed a prominent role. It is widely acknowledged that contemporary scientific research has entered the era of Big Data. However, owing to the inherent characteristics of non-laboratory disciplines, the rate of production of paleontological data resources is limited, making it challenging to align with the fundamental characteristics associated with Big Data temporarily. Nevertheless, the era of Big Data and its associated concepts have clearly exerted a positive influence on paleontology. For instance, recent years have witnessed the diversification of data output in paleontology, along with the inherent complexity of mathematical methods and models, which are closely linked to this era. This article, primarily based on the author’s research background, offers a concise overview of the three key stages in the history of quantitative paleontological research. Simultaneously considering the commonalities among paleontological data, it categorizes paleontological data within the context of Big Data as structural, semi-structured, or non-structured, while also providing an introduction to fundamental research methodologies. Following a discussion of the similarities and differences between the two major research perspectives of quantitative paleontology and analytical paleobiology, the article emphasizes the advantages of analytical paleobiology’s research methodologies and statistical models over traditional statistical approaches. In recent years, paleontology has unmistakably displayed characteristics indicative of data-driven research. However, a model-driven research perspective may be necessary. The methodology combines top-down model design with bottom-up data collection and analysis could ensure the sustainability of paleontological data research. Furthermore, given that paleontology is not inherently a data-intensive discipline, its collaboration with data from other geoscientific fields, will in turn promote the interdisciplinary growth of paleontology. Finally, the latest developments in the field of statistics also warrant the attention of paleontologists. The selection of appropriate statistical models and the nuanced interpretation of data should account for the inherent complexity and potential multiple solutions within paleontological studies. Particular caution should be exercised when identifying causal relationships related to statistical significance.
[中图分类号]
[基金项目]
中国科学院战略性先导科技专项(B 类) (XDB26000000)资助