Automated Mineral Identification by Stoichiometry (MIST): A Tool for Geochemical Dataset Standardization
Session: Transforming Earth and Planetary Science Through Data and Data Management: In Honor of MSA Distinguished Public Service Medal Awardee, Kerstin Lehnert
Presenting Author:
Kirsten SiebachAuthors:
Siebach, Kirsten Leigh1, Moreland, Eleanor Louise2, Costin, Gelu3, Jiang, Yueyang4(1) Rice University, Dept. of Earth, Environmental and Planetary Sciences, Houston, TX, USA, (2) Rice University, Dept. of Earth, Environmental and Planetary Sciences, Houston, TX, USA, (3) Rice University, Dept. of Earth, Environmental and Planetary Sciences, Houston, TX, USA, (4) Rice University, Dept. of Earth, Environmental and Planetary Sciences, Houston, TX, USA,
Abstract:
Mineral chemistry databases are opening the field of mineralogy to data-driven studies that provide new insights into fields including natural resources, mineral paragenesis, and terrestrial and planetary geology. However, databases that compile mineral chemistry data from generations of publications are subject to any inaccurate labels, poor measurements, or missing data that are present in the original publications. This is problematic for machine learning models, which must be trained on high-quality data. If individual users filter the data differently, it becomes harder to interpret accuracy and compare the results of such models.
MIST (Mineral Identification by Stoichiometry) is a stoichiometry-based computational algorithm to identify geochemical observations with normalized elemental ratios matching natural minerals. There are 246 minerals encoded in the current version of MIST, which was recently published in Computers and Geosciences and is free to use via an online API at https://mist.rice.edu or on GitHub. MIST can standardize the process of filtering geochemical measurements or databases to recognize encoded minerals, preparing datasets for better use in research projects and for training machine learning models. The stoichiometric filters that were manually coded in MIST for each mineral species are based on reported mineral formulas and well-documented examples of mineral chemistry reported in RRUFF and associated databases, typically including a ~5-10% tolerance in stoichiometric ratios based on measurement errors, vacancies, and substitutions. MIST uses normalized oxides, which makes it agnostic to the instrument or source of mineral chemistry data; furthermore, when total oxides are available, they can provide a secondary test on MIST identifications. An additional benefit of MIST is that the output includes detailed stoichiometric mineral formulas for each observation, a.p.f.u. for 24 oxygens, standardized mineral endmembers (e.g., Fo for olivine), and standardized formats for all oxides.
We filtered several of the GEOROC mineral compilations using MIST to identify appropriate mineral compositions for future studies and will post all the mineral results in the DIGIS repository. For example, for the 267,944 olivine compositions, 92.8% of the input compositions included major oxides (SiO2 > 0 wt%), and of those, 97.6% passed MIST filters as either fayalite or forsterite. MIST provides a consistent filtering of natural compositions for stoichiometric consistency with mineral formulas and can enable filtering of generated datasets to significantly expand available mineral data.
Automated Mineral Identification by Stoichiometry (MIST): A Tool for Geochemical Dataset Standardization
Category
Topical Sessions
Description
Preferred Presentation Format: Either
Categories: Mineralogy/Crystallography; Geoinformatics and Data Science
Back to Session