Introduction
The Commission on New Minerals, Nomenclature and Classification (CNMNC) of the International Mineralogical Association (IMA) has given a definitive explanation for the term ‘mineral’ (Nickel, Reference Nickel1995), and the terms ‘mineral species’ and ‘mineral’ are considered to be identical (Dunn and Mandarino, Reference Dunn and Mandarino1987; Nickel and Grice, Reference Nickel and Grice1998). There are many standard classifications available for classifying minerals on the basis of their chemical and/or crystal-chemical composition e.g.: Dana's New Mineralogy (Gaines et al., Reference Gaines, Dana and Dana1997); the Nickel–Strunz classification (Strunz and Nickel, Reference Strunz and Nickel2001); and a ‘Systematic Classification’ of minerals (Ferraiolo et al., Reference Ferraiolo, Dana and Dana1982). However, there has been no systematic approach to classifying and arranging mineral-related names and substances, or a definite regulatory nomenclature system. The existing names of mineral varieties or synonyms, which are not regarded as valid species, do not come under the jurisdiction of the IMA–CNMNC or the former Commission on New Minerals and Mineral Names (CNMMN) and are therefore unregulated (Nickel and Grice, Reference Nickel and Grice1998).
Much work has been done on developing a mineral naming system (Nickel and Grice, Reference Nickel and Grice1998; De Fourestier, Reference De Fourestier2002), tidying up mineral names (Burke, Reference Burke2006), and mass discreditation of GQN (‘G’: Grandfathered, i.e. names considered to represent valid species, described before 1959; ‘Q’: Questionable minerals, i.e. considered not to represent valid species, described before 1959; and ‘N’: non-approved names, i.e. published after 1959 without CNMMN approval) minerals (Burke, Reference Burke2006; Hatert et al., Reference Hatert, Pasero, Mills and Hålenius2017). In addition, serious efforts have been applied to providing a view on suffix nomenclature versus prefix nomenclature, correcting mineral names with diacritical marks, converting two-word mineral names to one-word only (Burke, Reference Burke2006; Hatert et al., Reference Hatert, Mills, Pasero and Williams2013; Reference Hatert, Mills, Pasero, Miyawaki and Bosi2023), adopting an international standard for mineral abbreviations (Whitney and Evans, Reference Whitney and Evans2010; Warr, Reference Warr2021), and organising the unnamed species (Smith and Nickel, Reference Smith and Nickel2007).
Substantially less work has been done regarding non-approved mineral-related names or substances, including synonyms and varieties. One of the earliest systematic works on structuring and arranging mineral synonyms was probably done by Thomas Allan in 1814 when he published “Mineralogical Nomenclature: Alphabetically Arranged, with Synoptic Tables of the Chemical Analyses of Minerals” (Allan, Reference Allan1814). The latter nomenclature follows Haüy's system (Haüy, Reference Haüy1801) and provides thousands of synonyms of English, French and German origins. The other outstanding contributor to cataloguing mineral synonyms was Thomas Egleston, who compiled “A Glossary of Minerals and Synonyms” (Egleston, Reference Egleston1892). The latter was an ongoing project for nearly 30 years and was finished by adopting synonyms of mostly English, French and German spellings. The aim was to create a reliable management system for Columbia University's School of Mines mineral collection, now the Fu Foundation School of Engineering and Applied Science. Other distinguished works on listing and documenting mineral names and synonyms are the earlier published Endlich (Reference Endlich1888) and later publications by Chester (Reference Chester1896), Hey (Reference Hey1962), De Fourestier and Ivanyuk (Reference De Fourestier and Ivanyuk1999) and Bayliss (Reference Bayliss2000).
A definition of chemical, structural, and chemical–structural mineral varieties developed by Bulakh (Reference Bulakh2008, Reference Bulakh2010) had been mentioned by Rogers (Reference Rogers1913) and later by Povarennykh (Reference Povarennykh1972). Bulakh (Reference Bulakh2008) defined a system of taxons where minerals are subdivided into subspecies and varieties according to their composition and structure. Recently, a taxonomy for anthropotype or human-mediated mineral-like compounds and a taxonomy of historical mineral kinds has been proposed (Hazen et al., Reference Hazen, Grew, Origlieri and Downs2017; Hazen, Reference Hazen2019; Clelend et al., Reference Cleland, Hazen and Morrison2021). In 2009, a hierarchical scheme for group nomenclature and mineral classification was introduced and applied to recent nomenclature within the IMA–CNMNC framework (Mills et al., Reference Mills, Hatert, Nickel and Ferraris2009). In addition, nomenclatures for supergroups are continuously improved and proposed (Hawthorne et al., Reference Hawthorne, Oberti, Harlow, Maresch, Martin, Schumacher and Welch2012; Christy and Atencio, Reference Christy and Atencio2013).
From the data analytics point of view, there has been a massive increase in mineral and related data worldwide over the last few decades. Several web resources list and provide data on thousands of synonyms, varieties and obsolete mineral names. The complete web resource for storing this data is mindat.org, launched in 2000. According to Mindat's statistics, the database contains 5815 IMA-approved species, 38,176 synonyms, 1850 varieties, 3014 rock names, 128 commodity names and more than 5000 unclassified entries (as of 23rd of July, 2022).
As Rogers (Reference Rogers1913, p. 615) mentioned, “The task of descriptive mineralogy is to establish and define the distinctive minerals or mineral species, but the science is greatly handicapped by hundreds of varietal names which are worse than useless”. Recently, the problem of storing this unrelated and, until now, uncategorised data on the database level has become more problematic due to the following reasons: (1) the number of contributors to open-access data repositories has increased enormously during the last decade; and (2) our ability to use automated data extraction and parsing algorithms allows more efficient collecting of data. The latter leads to a phenomenon when the data storage volume expands and increases daily while more than 80% of its entries remain unregulated under a simple and reliable relational scheme that would provide links between minerals and where related entries would be established through the application of iterative or recursive queries. The categorising rules for mineral-related species, such as synonyms, varieties, or polytypes, remain essentially undefined. From the database design point of view, such a categorisation system is an asset for searching and traversing relationships between minerals and their related names. Another complexity lies in the lack of terminology and definition of these related materials.
This research aims at providing the first database-oriented scheme for classifying mineral-related names, specifically ones non-approved by the IMA–CNMNC. This classification would allow linking of geological materials, minerals, and their obsolete names by storing all entries at a single relational database level with a many-to-many (M:N) type of cardinality mapping. This classification is developed for differentiating and arranging non-IMA species on the SQL (Structured Query Language) database level. It has full potential for internal usage in mineralogical databases like mindat.org, earthchem.org, or georoc.eu. The terminology and methodology proposed could be expanded, improved, and used in broader geo-oriented communities, specifically for solving the issues of database interoperability and creating a standard schema or structure for the data in each database. The scheme could also be applied as a middleware solution, allowing different mineralogical databases to communicate using the same standard.
Note, in this paper mineral names not currently approved by the IMA are not placed in quotes due to the number of different types of ‘mineral’ names involved, instead each are explained in context.
Materials and methods
Open-access web resources were the primary data source studied extensively for developing this classification. Additionally, a wide range of journals was consulted in developing the classification presented in the paper. In particular, substantial use was made of the available classification schemes in American Mineralogist and The Canadian Mineralogist.
Materials
Mindat
The online mineral database Mindat (www.mindat.org) was founded by Jolyon Ralph and went live in October 2000. The database is considered the complete resource for storing mineral and related data. As of 23rd July 2022, Mindat lists data on over 54,000 species, most synonyms, and over a million mineral photos. The data on mineral-related names and their description kindly provided by Mindat's management team was the primary source of information. These data was then screened visually with a subsequent search about the name's origins.
The IMA list of mineral species
The official list of IMA-approved mineral species (Footnote 1http://cnmnc.units.it/ – Pasero, Reference Pasero2023) is also accessible via the RRUFF project (www.rruff.info/ima, Lafuente et al., Reference Lafuente, Downs, Yang, Stone, Armbruster and Danisi2015), maintained at the Department of Geosciences, The University of Arizona. The RRUFF IMA list allows users to search over 5829 (as of 8th August 2022) species by mineral name, IMA status, and several mineral attributes. The RRUFF list of IMA-approved minerals was used to filter out and assign a specific status to minerals regulated by the IMA–CNMNC.
Athena
The Athena mineral database was the first complete mineral database on the web, created during 1986–1987 at the Department of Mineralogy of the Geneva Natural History Museum. In 1994, the database was published online (https://athena.unige.ch/athena/mineral/mineral.html) and has become accessible to the geoscientific community.
Handbook of Mineralogy
The Handbook of Mineralogy (www.handbookofmineralogy.org) is another web resource for accessing data on 5330 IMA-approved minerals (as of 8th August 2022), maintained by the Mineralogical Society of America (MSA) since 2001 (Anthony et al., Reference Anthony, Bideaux, Bladh and Nichols2001). A variety of data are stored under the website: crystal data, physical properties, optical properties, occurrence, association, distribution, name origins, and references.
Over 10,000 related names were analysed during the research to distinguish levels and ranks of the classification proposed. Classification of related names was performed within the core database of mineralogy.rocks computing community (https://github.com/mineralogy-rocks), synced with Mindat's data mentioned above in September 2022. The code for syncing the databases is developed using the best practices of clean code development and is accessible under the MIT license in the repository https://github.com/mineralogy-rocks/db-sync.
All web resources cited above, other sources used in insignificant amounts in this research, and online and hard-copy publications used to develop the core analytics of the research were accessed under ‘fair use’ conditions governed by The Copyright Act of 1976 – a United States copyright law (17 U.S.C. § 107) and what is known as ‘fair dealing’ in other countries (Canada, Australia, UK, EU and its Member States). All the data in this research resides in the public domain and is freely accessible through web interfaces.
The database for the classification is developed using PostgreSQL 13, and public access to it through API (Application Programming Interface) is currently a work in progress (Gavryliv et al., Reference Gavryliv, Ponomar, Bermanec and Putiš2022a). A database schema with tables storing the classification data and relations is provided in Fig. 1. The master table is called ‘mineral_log’, and it is referenced in all other one-to-one and many-to-many tables with several mineral attributes. As a rule, the database administrators prefer to store a unique serial key identifier of each entry (species) and a hash for it in one main table and use it for retrieving the referenced or related entries. The latter increases the security, consistency and integrity of data models and the performance of the database.
Classification scheme
This classification scheme is subdivided into two hierarchical levels to distinguish the major category of the level. For instance, several synonym types are defined in this research, but all belong to one collective category – the ‘synonym category’ with the shorthand ‘syn’.
From a database point of view, a mineral-related name is a term or label used to describe and classify minerals and mineral-related concepts in a structured and organised way. These names are used in mineralogical databases to organise and categorise the information about minerals, making it easier to search, retrieve and analyse. The database software can then use these mineral-related names to create indexes and search criteria, allowing users to find and retrieve the information they need quickly. Some of these names are officially regulated, however the bulk of them remain unregulated, and no systematics have been applied until now. For instance, the names of mineral varieties, which are not regarded as species, are unregulated and do not come under the jurisdiction of the IMA–CNMNC (Nickel and Grice, Reference Nickel and Grice1998). Moreover, the introduction of new varietal names is discouraged. However, the classification is intended to subordinate these names under a scalable scheme, easily applicable in a relational database.
Grouping category (gr)
The grouping level covers the names of supergroups, groups and subgroups or series, as defined by Mills et al. (Reference Mills, Hatert, Nickel and Ferraris2009). Additionally, the term ‘root name’ is added. Most of these names are regulated by the Subcommittee on Mineral Groups of the IMA–CNMNC under the nomenclature of mineral groups such as amphiboles, micas and pyrochlores. However, there have been no strict boundaries on how some supergroup or group names are approved, and some have been discarded. Another problem is that some of these names were approved just recently; for example, the gadolinite supergroup was approved only in August 2016. Until that time, the use of the name and nomenclature of the supergroup was not clear at all – the gadolinite ‘group’ was referred to as ‘datolite group’ and ‘gadolinite–datolite group’ (Bačík et al., Reference Bačík, Miyawaki, Atencio, Cámara and Fridrichová2017). Therefore, some of the recently widely used non-approved grouping names may be occasionally approved by the IMA–CNMNC. The classification proposed covers all of those names – both approved and non-approved as the consistency of relations and links between names is more critical for relational database maintenance.
Supergroup (gr:spr)
There are 34 supergroup entries defined and available through mindat.org (as of 17th Aug 2022). Typically, a mineral supergroup consists of at least two or more groups with the same structure and similar chemistry and may also contain isolated mineral species (Mills et al., Reference Mills, Hatert, Nickel and Ferraris2009). Examples are the garnet supergroup (Grew et al., Reference Grew, Locock, Mills, Galuskina, Galuskin and Hålenius2013), apatite supergroup (Pasero et al., Reference Pasero, Kampf, Ferraris, Pekov, Rakovan and White2010), tourmaline supergroup (Henry et al., Reference Henry, Novák, Hawthorne, Ertl, Dutrow, Uher and Pezzotta2011) and alunite supergroup (Jambor, Reference Jambor1999). In the case of feldspathoid, it is not considered a valid mineral supergroup because of substantial structural differences and is likely to remain unapproved by the IMA–CNMNC. However, even though it has never been approved this term has become very common in geology. Another problem is that formal rules defined for supergroups are not always followed. For instance, mindat.org and athena.unige.ch provide ‘Gypsum Supergroup’, which is not approved and contains four isolated mineral species without any intermediate hierarchy level between – brushite, churchite-(Y), gypsum and pharmacolite. The same applies to the ‘Marokite Supergroup’, which is mentioned in Sharygin et al. (Reference Sharygin, Britvin, Kaminsky, Wirth, Nigmatulina, Yakovlev, Novoselov and Murashko2021), also available on mindat.org, athena.unige.ch and mineralienatlas.de. These discrepancies apply even to some IMA-approved supergroups, for example, the nordite supergroup contains only one group member and one isolated mineral (Miyawaki et al., Reference Miyawaki, Hatert, Pasero and Mills2021).
Group (gr:grp)
A substantially greater number of group entries are available on mindat.org – 469 names (as of 18th Aug 2022). Some group names have synonyms, for example, feldspar is a synonym of the feldspar group and is known to be the most common mineral group found on Earth (Smith and Brown, Reference Smith and Brown1988). As in the case with supergroups, the majority of groups are approved though some are only formally defined or are synonyms, e.g.: alum group; hollandite group (a synonym of coronadite group) – a term used in petrological texts to describe hollandite and related species; whitmoreite group (a synonym of arthurite group); the gadolinite–datolite group (a synonym of gadolinite group, which is redefined to be a supergroup by the IMA in 2016); and jarosite group (a synonym of alunite group).
Subgroup (gr:sbg)
More than a hundred subgroup names are available on mindat.org (as of the 18th of August, 2022). Most of these are official names and clearly defined. These can also have synonyms or alternative names, e.g.: thomsonite is a synonym of the thomsonite subgroup; allanite is a synonym of the allanite group; and the biotite subgroup is a synonym of trioctahedral mica, where there are three octahedrally coordinated D cations per formula unit. An exceptional case for subgroup names is defined in the tourmaline nomenclature (Henry et al., Reference Henry, Novák, Hawthorne, Ertl, Dutrow, Uher and Pezzotta2011). According to it, subgroups are defined within groups using subgroup names combined with ordinal numbers where subgroup 1 is the fundamental one, e.g. alkali-subgroup 1, alkali-subgroup 2.
Root (gr:rt)
Although not officially covered by the IMA–CNMNC, there are 31 root names provided by mindat.org, most of which were proposed as intermediary names when forming the name of species in official nomenclatures. The root names are assigned to distinct arrangements of formal charges at the ion sites. The appropriate prefix modifiers (e.g. chloro-, ferri- or fluoro-) are assigned to describe homovalent variation in the dominant ion of the root composition (Henry et al., Reference Henry, Novák, Hawthorne, Ertl, Dutrow, Uher and Pezzotta2011; Hawthorne et al., Reference Hawthorne, Oberti, Harlow, Maresch, Martin, Schumacher and Welch2012; Oberti et al., Reference Oberti, Cannillo and Toscani2012). Examples are actinolite root, tremolite root and edenite root. These names have a core meaning when assigning a full species name. The root names designate another hierarchical grouping level from the database relationality point of view and consistency of the scheme proposed. Additionally, the term ‘root’ is widely used by mineral collectors in a slightly different sense and is likely to appear in every mineral data storage.
Series (gr:srs)
The names of solid-solution series are regulated by the IMA–CNMNC (Nickel, Reference Nickel1992). Around 400 series names are available on mindat.org. Some of these names have alternative synonyms with a reversed order of their end members, e.g.: the forsterite–fayalite series is a synonym of the fayalite–forsterite series, but only the forsterite–fayalite series is approved; and the dolomite–ankerite series is a synonym of the ankerite–dolomite series. Some unusual cases are discarded series names. For example, the biotite–phlogopite series is out of use, and now biotite is defined as a series name and invalid species (Rieder et al., Reference Rieder, Cavazzini, D'yakonov, Frank-Kamenetskii, Gottardi, Guggenheim, Koval, Mueller, Neiva and Radoslovich1998). Therefore, the biotite–phlogopite series is a synonym for biotite. Likewise, the bobdownsite–whitlockite series is now obsolete as ‘bobdownsite’ was determined to be identical to whitlockite.
Officially-regulated category
These are the officially approved and adopted names of minerals or rocks regulated by international committees such as the IMA–CNMNC, or fall under the recommendation of the International Union of Geological Sciences (IUGS) subcommissions.
IMA-approved minerals (ima)
This category comprises the minerals approved as valid species by the IMA–CNMNC according to procedures and guidelines on the criteria for a new mineral species (Nickel and Grice, Reference Nickel and Grice1998). As a rule, new reports are published monthly on the main website of the IMA–CNMNC, http://cnmnc.units.it/. As of 30th Sep 2022, there are 5834 IMA-approved species listed by mindat.org, which exceeded the official website (then at cnmnc.main.jp) by five species. The latter frequently happens as mindat.org attracts the attention of mineral collectors, enthusiasts, and researchers, who contribute to improving the data consistency of the resource. Therefore, it is common for yet-to-be-approved minerals to appear faster on mindat.org than it is for official IMA reports to be published at http://cnmnc.units.it/.
Rocks (rck)
Rock names often appear in mineralogical databases such as Mindat or even as misleading mineral names in collections and auctions. For example, many different rocks and minerals have been marketed as jade, especially nephrite and serpentine, green quartz and vesuvianite (californite). Unlike IMA-approved systematics, there is no unique worldwide resource for approved rock names except the recommendations and systematics proposed by the IUGS (Cross et al., Reference Cross, Iddings, Pirsson and Washington1902; Streckeisen, Reference Streckeisen1980; Le Maitre, Reference Le Maitre1984; Le Bas and Streckeisen, Reference Le Bas and Streckeisen1991; Le Maitre et al., Reference Le Maitre, Streckeisen, Zanettin, Le Bas, Bonin and Bateman2005; Oliveira et al., Reference Oliveira, Brod, Junqueira-Brod, Reimold and Fuck2022), comments, proposals, glossaries and classifications published by individual research groups. Note, the IUGS Subcommission on the Systematics of Igneous Rocks recommends avoiding using the term ‘diabase’ and ‘dolerite’ and advises using ‘microgabbro’ instead. However, all of such names fall under the ‘rocks’ category within the scope of classification proposed as long as no solid approval/discreditation mechanism is proposed. As of 30th Sep 2022, 3052 rock names were provided by mindat.org.
Anthropotype category
According to Hazen et al. (Reference Hazen, Grew, Origlieri and Downs2017), more than 200 IMA-approved species occur principally or exclusively as a consequence of three human processes: (1) the manufacturing of synthetic mineral compounds, (2) the movements of rocks and sediments as a consequence of mining operations, and (3) redistributing select natural minerals. In the current research, this level designates human-mediated phases (anthropogenic) as defined by Hazen et al. (Reference Hazen, Grew, Origlieri and Downs2017) and synthetic phases.
Anthropogenic phases (ant)
These are phases of anthropogenic origin, typically non-approved by the IMA because current regulations do not allow such substances to be approved as valid mineral species. They include alteration phases recovered from ore dumps or associated with mine tunnel walls, dump fires, minerals found in the slag or the walls of smelters. Examples are arnhemite, (K,Na)4Mg2(P2O7)2⋅5H2O, a hydrated pyrophosphate that originated from the hydration of slag formed by bat guano combustion in Arnhem Cave, Namibia, about 2000 years ago (Martini, Reference Martini1994); and igumnovite, Ca3Al2[SiO4]2[□Cl4], a substance originated from a burning coal-mine dump (Chesnokov et al., Reference Chesnokov, Kotrly and Nisanbajev1998). Many of the IMA-approved minerals can also be formed by anthropogenic processes. However, they do not receive a separate mineral name and can be distinguished by the description of the location. It is worth noting that some anthropogenic minerals can also be found in natural environments, such as industrial slags and mine tailings that rivers or other natural processes have transported. In these cases, the location of the mineral may not be a definitive indicator of whether it is anthropogenic or natural.
Synthetic phases (ant:snt)
The IMA–CNMNC does not consider the synthetic products of human intervention, industry and commercial activities. Because these phases are not regarded as minerals, the unmodified mineral names should not generally be used for synthetic substances corresponding to existing minerals (Nickel and Grice, Reference Nickel and Grice1998). Most synthetic phases are manufactured for applications, e.g. durable metal alloys, abrasives, or laser crystals. Unlike anthropogenic phases, synthetic products were produced intentionally for valuable properties. Examples include yttrium aluminium garnet (YAG), Y3Al5O12, or gadolinium gallium garnet (GGG), Gd3Ga5O12, both used as a synthetic gemstone and diamond simulant. However, though the enhanced or treated variety and a synthetic phase are produced artificially on purpose, there is a firm boundary between these statuses. Unlike ‘treatment’, a synthesis implies the production of an entirely new phase from scratch, typically in a laboratory, whereas treatment implies the enhancement of already existing material.
Unnamed category (unm)
The Subcommittee for Unnamed Minerals of the IMA–CNMNC, founded in 2007 by Dorian G.W. Smith and Ernest H. Nickel, regulates the assignment of numbers to future unnamed minerals and makes recommendations to the CNMNC regarding the status of unnamed minerals (De Fourestier, Reference De Fourestier2014). Accordingly, there is only one approved systematic method for codifying unnamed minerals (Smith and Nickel, Reference Smith and Nickel2007). However, many unnamed species do not follow this approach; therefore, the category is divided into two subcategories.
Unnamed Mineral (by Smith and Nickel, Reference Smith and Nickel2007) (unm:cod)
According to Mindat's database, more than 1600 unnamed species follow the adopted codification system (as of 17th Aug 2022). All of these names start with ‘UM’, followed by two groups of numerals, representing the year and a serial number. However, species may also be assigned a code that begins with ‘UKI’ (standing for ‘unknown’ and ‘interim’) for interim coding when pending approval of the final coding, for example, UKI-2006-(PO:AlCuFeH), Fe2+Al3+2(PO4)2(OH)2⋅4H2O (Sejkora et al., Reference Sejkora, Skoda and Ondrus2006).
Unnamed Mineral (unm)
More than 800 unnamed species do not follow the recognised codification system, according to Mindat's data. These names start with ‘Unnamed’ and contain a short description in parentheses. For instance, Unnamed (Ni Antimonide), Ni3Sb, is probably identical with UM1990-49-Sb:Ni (Tredoux et al., Reference Tredoux, Zaccarini, Garuti and Miller2016). In exceptional cases, a custom codification system is devised for private purposes, such as making the codes indexable in private management systems or collections. For example, there are 56 species following the custom coding system, starting with ‘MSH UK’ letters followed by ordinal numbers. The designation ‘UK’ (unknown) has been used for partially studied or unnamed minerals from Mont-Saint-Hilaire (MSH), Quebec, Canada (Larsen, Reference Larsen2020).
Questionable and non-approved species (qst)
This category comprises the mineral names that represent distinct mineral matter but were not approved by the IMA due to rejection and discreditation, or that were never submitted to the IMA. The common reason is the questionable origin of the mineral, inadequately or poorly studied material, or their unknown origin. For example, agardite-(Dy), (Dy,La)Cu6(AsO4)3(OH)6⋅3H2O, is an incompletely characterised and highly questionable member of the mixite group and allophite is a doubtful magnesium aluminosilicate that was probably never submitted to the IMA.
In many cases, the reasons for the discreditation and rejection of the minerals from this group are uncertain and unavailable through public access. For example, almbosite, Fe2+5Fe3+4V6+4Si3O27, is a discredited iron silicate mineral (Hey, Reference Hey1982). An exciting example is antitaenite, Fe3Ni, a meteoritic mineral rejected by the IMA as a variety of taenite (Wojnarowska et al., Reference Wojnarowska, Dziel, Gałązka-Friedman and Karwowski2008). However, later research showed that they differ in their electronic structures: taenite has a high magnetic moment, whereas antitaenite has a low magnetic moment (Lagarec et al., Reference Lagarec, Rancourt, Bose, Sanyal and Dunlap2001). The minerals that are probably identical to other IMA-approved species, such as apatelite and argyropyrite, also fall in this category when their identity has yet to be confirmed experimentally.
Synonyms category (syn)
The alternative names of approved rock names or mineral species or their varieties are collectively called ‘synonyms’. Some of these names are used more commonly than approved ones; some are old, discarded, discredited, or created for trade purposes only. None of the names classified as synonyms in this scheme are valid, approved, or regulated. Nearly 6000 random synonym names from mindat.org were carefully studied to recognise the general types of synonyms. As a result, at least 11 distinct categories of synonyms were discovered.
Chemical name synonym (syn:chm)
These are the names where a complete chemical name for a mineral formula is used instead of a valid mineral name. Note that there could be two subdivisions of this category: an exact match of the chemical name and an inexact one. For example, calcium nitrate is an exact one for nitrocalcite whilst calcium oxide is an inexact synonym for lime. However, iron oxide is an inexact synonym for magnetite, hematite and maghemite. Another similar case is lead oxide, a synonym of massicot, PbO, and minium, Pb3O4 – a different approved species. These examples can also be classified as general term synonyms (syn:gnr).
Commercial synonym (syn:cmr)
These are the commercial, commodity, or trade names of valid minerals or rocks used in the construction, natural stone industry, and jewellery for trade purposes. The trade practices for applying descriptions to most of these materials are regulated by The Commissions for Diamonds, Gemstones, Pearls, and Precious Metals (CIBJO, The World Jewellery Confederation). The CIBJO series of blue books define grading standards and nomenclature for diamonds, coloured gemstones, pearls and precious metals. The typical examples of commercial names for minerals and their varieties are golden beryl (heliodor, Be3Al2[Si6O18]), pink beryl (morganite, Be3Al2[Si6O18]), adelaide ruby (almandine, Fe2+3Al2[SiO4]3).
It should be noted that some commercial names fall under other synonyms or variety categories. For example, Adelaide Ruby and Alabandine Ruby are common commercial names for almandine from different regions. However, within the scheme's scope, it is also considered a misleading term and should be classified accordingly (see ‘Misleading synonym subsection’). Damsonite is a trade name for a light violet to dark purple variety of chalcedony from Arizona, thus following into two categories – commercial synonym and variety.
Common synonym (syn:cmn)
Common names are synonyms currently widely used in communication and mineralogical publications. Typically, these names are more commonly used than the official ones. For example, morion is a common synonym of smoky quartz, more frequently used to designate nearly black smoky quartz. Another example is sphene – a common alternative name for titanite, CaTi(SiO4)O; wolfram for wolframite, (Fe2+)WO4 to (Mn2+)WO4. The category could be applied to more general terms – placer gold is commonly used instead of alluvial gold. The common synonym does not change a general meaning when applied to other synonyms – cherry quartz is used instead of strawberry quartz, and flower stone instead of chrysanthemum stone (an ornamental stone of variable composition, typically aragonite).
Obsolete synonym (syn:obs)
A substantial amount of obsolete names are listed in “Obsolete Mineral Names” by Bayliss (Reference Bayliss2000), which is a complement to “Glossary of Mineral Species” by Fleischer (Reference Fleischer1995). All non-approved species, their varieties, synonyms, and mixtures are considered obsolete in Bayliss's work. On the contrary, in this current work, the obsolete names are those without clear mineralogical context or deprecated names due to the lack of historical or other data about the name's relevance. For example, dekalbite is a synonym of diopside, but its origin and history are difficult to establish, so this mineral name has little mineralogical meaning.
This category also includes identical names accidentally described as new species but later recognised as identical to already described minerals. For instance, until 1871, acmite (Ström, Reference Ström1821) was considered a separate species from aegirine, NaFe3+Si2O6, one belonging to the amphiboles and the other to the pyroxenes. However, it was eventually shown that both minerals belong to pyroxenes and are identical. Whereas aegirine is a preferable name now, and acmite is regarded as a synonym, this name is still widely used in publications because acmite had priority for several years (Dana, Reference Dana1868). Moreover, it was a common practice in experimental petrology for decades to use the abbreviation Ac for NaFe3+Si2O6 (Fabries et al., Reference Fabries, Ferguson, Ginzburg, Ross, Seifert, Zussman, Aoki and Gottardi1988). The same applies to dakeite (Larsen Jr and Gonyer, Reference Larsen and Gonyer1937), which is now considered identical to schröckingerite, NaCa3(UO2)(CO3)3(SO4)F⋅10H2O (Schrauf, Reference Schrauf1873), or droogmansite, which was found to be identical to kasolite, Pb(UO2)[SiO4]⋅H2O (Deliens, Reference Deliens1978).
The old or renamed synonyms designate the historical and old names that approved names or more appropriate ones subsequently replaced. For instance, abukumalite, (Y,Ca)5(SiO4)3OH, was first described by Hata (Reference Hata1938) from the Suishoyama pegmatite, Fukushima Prefecture, Japan. For a while, it was renamed to britholite-(Y) due to its relation to britholite-(Ce) and the dominance of Y in the composition as part of the changes in nomenclature for rare-earth minerals (Levinson, Reference Levinson1966). Another example is bergmannite, Na2Al2Si3O10⋅2H2O, named by Schumacher (Reference Schumacher1801) and later renamed to spreustein by Werner (Reference Werner1817). Note that there is no strict boundary between obsolete and old names; therefore, they are combined into one category. Additionally, this category contains names typically used by miners, i.e. chalybite for siderite and fluorspar for fluorite. Fluorspar is an old traditional British name for fluorite. At the same time, it is also a commercial synonym for fluorite, introduced in 1530 by Agricola (Morello, Reference Morello1994). Therefore, the fluorspar can be assigned two statuses in a database – syn:cmr and syn:obs.
General term synonym (syn:gnr)
The general names are widely-used synonyms for groups of minerals, individual species, or rocks. For example, garnierite is a generic name for a green nickel ore that has formed due to lateritic weathering of ultramafic rocks and thus can be considered a synonymic name for several nickel silicates, such as népouite, pimelite and willemseite. This category can also be applied to synonyms of the grouping level names. Olivine is typically used for the forsterite–fayalite series, plagioclase for the albite–anorthite series, and biotite for the K-rich subgroup of the trioctahedral mica group. The latter is usually appropriate for geological and related disciplines where the exact name of a group or solid-solution member is not as crucial in the context.
Language synonym (syn:lng)
Language synonyms are names originating from other languages, typically Old English, Old German, French, Italian, and Russian. Generally, these names mean exactly or almost the same in a given language. For example, moor's head tourmaline, A(D3)G6(T6O18)(BO3)3X3Z, is a colourless to pale greenish in the body with a dark brown to black cap variety of tourmaline (Wilk and Medenbach, Reference Wilk and Medenbach1986). At the same time, it is just an English translation of German mohrenkopfturmalin (Rinne, Reference Rinne1924). Other examples are braunbleierz (Old German name, translating to “brown lead ore”) as a synonym of pyromorphite, Pb5(PO4)3C and malaquita (Spanish) for malachite, Cu2(CO3)(OH)2.
Regional name (syn:rgn)
These are the names linked to a country, region, or a specific locality of species. As a rule, these are not the type localities of species but rather the rare localities where these species occur within the specific region. A typical example is josephinite as a synonym of awaruite, Ni3Fe, sampled from Josephine County, Oregon, USA; Malaia garnet is a synonym of umbalite, Mg3Al2(SiO4)3, from Beseva Malaya garnets mining area, Madagascar. Also, many regional synonyms can be applied to tektite, a natural glass formed from a meteorite impact melting the local rock. For example, bikolite (Bikol area of the Philippines), billitonite (Billiton Island, Indonesia), bediasite (Chesapeake Bay impact crater, Texas, USA) and zhamanshinite (Zhamanshin meteor crater, Kazakhstan) – are all considered regional synonyms for a tektite. A country name is also applied to these synonyms in exceptional cases to highlight their provenance: rumanite for opal or amber from Romania or chinite for a tektite from China.
Spelling synonym (syn:spl)
Several erroneous, alternative and spelling variations of names are present in literature, and all point to the same species. These include the capitalisation of the first letter, a hyphen to distinguish a prefix from the root name and the presence or absence of an apostrophe, etc. The most widespread are those missing diacritic marks, e.g. achavalite for achávalite, (Fe,Cu)Se, or felsobanyite for felsőbányaite, Al4(SO4)(OH)10⋅4H2O. The other part is the names with different suffixes or endings, e.g. heliodore for heliodor, Be3Al2(Si6O18); cobaltian calcite and cobalt calcite – both considered spelling synonyms of cobalt-bearing calcite, (Ca,Co)CO3; and chromium dravite and chromian dravite – both are synonyms of chromium-bearing dravite, Na([Mg,Cr]3)(Al,Cr)6(Si6O18)(BO3)3(OH)3(OH). Note that spelling synonyms should not be confused with language synonyms: bastnasite is a spelling synonym of bastnäsite, but bastnaesit or bastnäsita is a language due to German and Spanish origins. However, the latter does not apply to British and American spellings, which are also considered spelling synonyms, e.g. beta-sulphur (British English) and beta-sulfur (American English); recently, beta-sulfur was renamed to clinosulfur (Miyawaki et al., Reference Miyawaki, Hatert, Pasero and Mills2022; Hatert et al., Reference Hatert, Mills, Pasero, Miyawaki and Bosi2023). The spelling and language synonyms are closely related categories, therefore, are hard to distinguish in some cases. On a database level, each mineral status entry has a corresponding Boolean field responsible for the certainty of the classification. Therefore, the data system can identify which entries are doubtful or need further system-manager approval or revision.
Misleading name (syn:msl)
The misleading or false names are the trade gemstone, decorative, dimension, or ornamental stone names pointing to completely different species or rocks to advertise or artificially increase the stone's price. For instance, very often, the trade names of rubies do not reflect their actual composition (Lytvynov, Reference Lytvynov2011): American ruby is pyrope, garnet, or rose quartz; Siberian ruby is a red tourmaline; and balas ruby is a spinel. More than 50 false trade gemstone names are provided by Schumann (Reference Schumann2002), including Madeira topaz for citrine, matura diamond for colourless zircon, and Ural sapphire for blue tourmaline. Another part of misleading names comes from the decorative and dimension stone market. A general principle for naming stones in the construction industry is based on their hardness and capability of taking a polish. Therefore, all hard stones are called granites, and the soft ones are marbles, independently of their colour or origins, e.g.: black pearl granite is preferred instead of gabbro; black galaxy granite for norite; Ashford black marble for limestone; Purbeck marble for fossiliferous limestone; and St. Genevieve marble for oolitic limestone.
Polytype (plt)
The polytypes definition and general recommendations with examples of the application of the modified Gard nomenclature are given in Bailey (Reference Bailey1977). Nearly a hundred polytypes are available in the mindat.org database. Several polytypes of sapphirine are known: -2M and -1A are the most common ones. The others, e.g. -3A, -4M and -5A are found as domains ranging from <100 Å to several thousand Å thick (Christy and Putnis, Reference Christy and Putnis1988). Examples include the numerous polytypes of högbomite, taaffeite and nigerite (McKie, Reference McKie1963; Hudson et al., Reference Hudson, Wilson and Threadgold1967; Armbruster, Reference Armbruster2002).
Variety level (var)
Numerous variety types could be distinguished based on crystal habit, structure, physical properties, optical properties and variations in composition (Rogers, Reference Rogers1913). Based on literature data and our observations, a subdivision of varieties is provided to link valid species with their varieties through a specific physical or chemical pattern on a database level.
Chemical variety (var:chm)
The chemical variety definition is given by Bulakh (Reference Bulakh2008). These are the varieties with isomorphic substitution of some chemical component in a valid species. In most cases, these names combine the ion or element name and a valid species name, e.g. Al-lizardite is an aluminium rich variety of lizardite, Mg3(Si2O5)(OH)4 (Bentabol et al., Reference Bentabol, Cruz and Sobrados2010); cobalt-bearing calcite is a variety of calcite, CaCO3, with Co2+ replacing Ca. In other cases, the name completely deviates from the valid species name. For example, hallerite is a lithium-bearing mica (Schaller and Stevens, Reference Schaller and Stevens1941) and ishkulite is a Cr3+ variety of magnetite (Burns and Burns, Reference Burns, Burns and Irvine1976).
Composition variety (var:cmp)
Unlike chemical varieties, these deviate from the valid species due to containing inclusions or intergrowths with other minerals. Examples are aventurine – a variety of quartz containing fragments of mica or hematite that can be polished as a gemstone (Monroe, Reference Monroe1986); chiastolite – a variety of andalusite with cross-shaped inclusions of carbon (Mason et al., Reference Mason, Burton, Yuan and She2010).
Physical variety (var:phs)
Compared to their valid parental species, these varieties differ in physical properties, commonly colour, habit, fracture and lustre. It should be noted that a different colour is often attributed to chemical impurities, as in alexandrite, a green chromian variety of chrysoberyl, BeAl2O4, or bredbergite, (Ca,Mg)3Fe2(SiO4)3, a green magnesian variety of andradite. Therefore, these varieties are assigned two statuses in a database – chemical and physical, because the impurity leads to a change of physical property. In other cases, the colour change may be attributed to treatment, as in burnt amethyst – the heating results in a yellow–orange or brownish colour (Neumann and Schmetzer, Reference Neumann and Schmetzer1984). Hence it is a physical and treated (see below) variety. An example of pure physical variety is delawarite, K(AlSi3O8), a variety of orthoclase with a pearly lustre. When the colour is attributed to inclusions, a species is assigned both composition and physical variety status, for example, cymophane, BeAl2O4, is an opalescent variety of chrysoberyl with bluish chatoyancy which is caused by tube-like cavities or needle-like inclusions of rutile.
Origin and regional variety (var:org)
These varieties are distinguished by their specific environments or types of formations, commonly as pseudomorphs and products of alteration. Examples are bone opal, SiO2⋅nH2O, an opal replacing fossil bone (Jones and Segnit, Reference Jones and Segnit1971), calcium-gümbelite – Ca-bearing hydromuscovite pseudomorphs after plagioclase and cliftonite – a graphite pseudomorph after kamacite (Brett and Higgins, Reference Brett and Higgins1967). Other examples include houghite, Mg6Al2(OH)16[CO3]⋅4H2O, a variety of hydrotalcite derived from the alteration of spinel (Johnson, Reference Johnson1851); and brecciated agate – a naturally cemented matrix of broken agate fragments. This group also includes varieties from those localities where the sample acquires unique properties, often physical. For example, a Teis sphere is datolite with a geode texture from Tiso (Teis) in Italy. Notably, the origin varieties are closely related to the mineral kinds defined in the new evolutionary system under different paragenetic modes (Hazen et al., Reference Hazen, Morrison, Krivovichev and Downs2022). Accordingly, stellar diamond, mantle diamond and impact diamond are origin varieties. Additionally, this subcategory includes classical names of varieties from a particular locality or region that are not necessarily unique in the environment of form worldwide. However, the name includes meaning about a specific type or form of the mineral. The examples include blue john, CaF2, a variety of fluorite from Blue John Mine, Castleton, UK and ‘Herkimer-style’ quartz from Herkimer region, New York, US.
Structural variety (var:str)
These varieties differ from their valid parent species in structural properties other than polytypism, and have not been approved by the IMA. The nomenclature of polytypes, polytypoids, and polymorphs is provided by Guinier et al. (Reference Guinier, Bokij, Boll-Dornberger, Cowley, Ďurovič, Jagodzinski, Krishna, De Wolff, Zvyagin and Cox1984), Angel (Reference Angel1986), (Nickel, Reference Nickel1993), and Nickel and Grice (Reference Nickel and Grice1998). Examples are polywurtzite, (Zn,Fe)S, a hexagonal polymorph of wurtzite; kolloid-magnesite – a colloidal variety of magnesite; and geltenorite, CuO, a gel form of tenorite.
Enhanced or treated Variety (var:enh)
Sometimes, a material may be subjected to special treatment to increase its commercial value or for other gemological purposes. These include bleaching, surface coating, dyeing, fracture or cavity filling, heat treatment and high pressure, high temperature (HPHT) treatment (Nassau, Reference Nassau1984; McClure et al., Reference McClure, Kane and Sturman2010). Examples of such treated varieties are titanium quartz, a quartz crystal coated with titanium to give a metallic blue colour; aqua aura, quartz coated with an ultra-thin gold layer to produce an iridescent sheen; and London blue topaz, sky blue topaz, and Swiss blue topaz which are all irradiated topazes with a distinguishing light-blue colour (Zhang et al., Reference Zhang, Lu, Wang and Chen2011). In most cases, an enhanced variety will appear with physical variety in a database entry, as treatment constantly changes some physical properties, typically colour.
Uncertain Variety (var:unc)
In exceptional cases, the type of variety is unknown or unclear due to a lack of information, empirical data, or controversies in the description. For instance, keramite (of Hunt) is probably impure kaolinite or possibly dehydrated halloysite (De Fourestier and Ivanyuk, Reference De Fourestier and Ivanyuk1999); bucaramangite is a fossil resin and, possibly, a variety of retinite – a large group of resins containing no succinic acid (Vavra, Reference Vavra, Höck and Koller1993); helenite (of Nawratil) is paraffin wax and a variety of ozocerite – a naturally-occurring odoriferous mineral wax or paraffin.
Mineraloid category (mnd)
A mineraloid is a naturally occurring mineral-like phase that is not (or is only partly) crystalline (Hatert et al., Reference Hatert, Mills, Hawthorne and Rumsey2021). As these substances are non-crystalline, they do not meet the standard requirements for mineral species as defined by Nickel and Grice (Reference Nickel and Grice1998). However, due to specific purposes of the current classification, the status ‘mineraloid’ is subdivided into organic (biotic) and inorganic (abiotic), according to origins.
Organic mineraloid (mnd:org)
These are the mineraloids of organic, or as defined by Fairbridge (Reference Fairbridge1972), biotic origins. A typical example is a pearl – an organic mineraloid formed with the soft living tissue of a shelled mollusc. It is typically composed of aragonite or a mixture of aragonite and calcite, sometimes with vaterite. Other examples are amber, a fossil tree resin; coral; and the hard mineralised skeleton of marine animals, composed primarily of calcium carbonate.
Inorganic mineraloid (mnd:inr)
These mineraloids are of inorganic origins, e.g. abiotic. The examples include various supergene materials such as opal and limonite, both crystallised from gels or colloids in the shallow subsurface.
Mixture (mix)
Many mineral mixtures were regarded as distinct mineral species until, eventually, it was proven that they were a mixture of two or more phases. In most cases, these are rejected and discouraged after thorough analytical studies. An example is achrematite, first regarded as a new molybdo-arsenate of lead (Mallet, Reference Mallet1875) and later proved to be a mixture of mimetite, Pb5(AsO4)3Cl, and wulfenite, Pb(MoO4) (Dunn, Reference Dunn1977). In specific cases, the study of the material is complicated, e.g. ashanite was initially thought to be the Nb-dominant analogue of ixiolite (Zhang et al., Reference Zhang, Peng, Tian, Peng, Ma, Han and Jing1980), later the type material was believed to be a mixture of several minerals, including ixiolite, samarskite-(Y) and uranmicrolite. However, the latest research poses questions about identification only based on the chemical–composition data and proves ashanite to be the Nb-dominant ixiolite analogue (Zubkova et al., Reference Zubkova, Chukanov, Pekov, Bernes, Schüller and Pushcharovskii2021). Another example is maufite – an interstratified clinochlore–lizardite (Burke, Reference Burke2006).
Invalid names category (inv)
The non-relevant or obsolete nomenclature names and hypothetical species and solid-solution members are included in the invalid classification level with two subdivisions.
Obsolete Nomenclature Name (inv:obs)
The current subcategory represents obsolete, old nomenclature names that lost their mineralogical meaning due to successive nomenclature modifications. They represent neither variety nor synonyms of any currently approved minerals. For example, natrobistantite studied by Voloshin et al. (Reference Voloshin, Pakhomovskii, Stepanov and Tyusheva1983) and Beurlen et al. (Reference Beurlen, Soares, Thomas, Prado-Borges and Castro2005) is a zero-valent dominant member of the microlite group with significant contents of Bi and Cs. According to current microprobe data, at least part of natrobistantite specimens are hydroxynatromicrolite. Other examples are the discredited member of the amphibole group, such as ferri-nybøite, clinoferroholmquistite and chloro-potassic-ferri-magnesiotaramite.
Hypothetical and Highly Unstable Minerals and Solid Solution Members (inv:hpt)
This subcategory indicates the mineral names that do not exist in the crustal conditions, including hypothetical minerals and hypothetical members of the solid-solution series. For example, permanganogrunerite, □Mn2+4Fe2+3(Si8O22)(OH)2, is a theoretical member of the Mg–Fe–Mn clino-amphibole subgroup of the amphibole group. Such names are often still used in the literature, especially in a petrological context and compositional plots between end-members. Thus, permanganogrunerite is often used to plot the Mn–Mg–Fe diagram with the Mg end-member cummingtonite and the Fe end-member grunerite (Hawthorne and Oberti, Reference Hawthorne, Oberti, Hawthorne, Oberti, Della Ventura and Mottana2007). The majority of the group comprises the end-member, such as ferri-gehlenite as an Fe end-member of the melilite group and blythite, Mn2+3Mn3+2[SiO4]3, as a Mn hypothetical end-member of the garnet supergroup, and intermediate members, such as potassic-leakeite, KNa2(Mg2Al2Li)(Si8O22)(OH)2, as hypothetical clino-amphibole in the leakeite root-name group and many other amphibole-related names.
This subcategory also includes the species with rather unstable crystal structures at current environmental conditions or hypothetical for Earth surface conditions. Examples are argentite, Ag2S, only stable over 177°C that converts to acanthite at lower temperatures (Emmons et al., Reference Emmons, Stockwell and Jones1926); and monalbite, NaAlSi3O8 a polymorph of albite with monoclinic symmetry that is stable under equilibrium conditions at temperatures 980–1060°C (Winter et al., Reference Winter, Okamura and Ghose1979). An example of a paramorph is beta-quartz, which is not considered a mineral species by the IMA as it is not stable at room temperature; therefore, all beta-quartz in mineral collections are paramorphs of quartz after beta-quartz. This broad category refers to minerals that are highly unstable under standard conditions. However, the definition could be more precise by describing the specific conditions where the mineral is stable. Additionally, the kinetics of the instability, or how fast the mineral changes, is also an essential factor to consider when characterising unstable minerals.
Discussion
An exciting task to solve in recent mineralogical data science discussions is to develop an interoperable platform acting as a middleware between the databases. The latter platform should allow seamless data retrieval and be shared consistently across all mineralogical communities, such as earthchem.org, georoc.eu and mindat.org. The first issue to tackle is to develop a standard schema or structure for the data that all resources could easily apply to their databases.
According to Markus Winand (Reference Winand2011), the most critical information for proper database indexing is not the storage system configuration or the hardware setup but how the potential consumers will query the data. The principal goal of the classification is to design a set of rules for structuring relations and standardising the mineral terms in the relational database so that they can be retrieved and applied by other mineralogical resources. The latter design is intended to contribute to the interoperability of the data platform, optimise the database schema, improve the performance of table lookups and queries and allow identifying the related species and types of relations to be much less computationally expensive. The proposed classification is applied to classify the species provided by mindat.org and compile a base layer for mineralogy.rocks relational database management system (RDBMS) – a computing community oriented at solving practical data analysis tasks and related issues in mineralogy and geochemistry (https://github.com/orgs/mineralogy-rocks/repositories).
The data model compiled during this research has the potential for discovering links between historical and obsolete names, finding hidden patterns through recursive queries (e.g. a synonym of variety or a synonym of synonym), and coherently structuring the database. Moreover, the latter database is fully interoperable with mindat.org and is kept in one-directional sync with the latter.
The regulatory scheme allows calculation of a dependency tree between species attributes stored in a database (e.g. chemical properties and physical properties) with their classification status assigned and design of a self-check system prototype for database integrity and consistency (Table 2). The latter allows an algorithm to be developed for checking which entries in the database are: (1) awaiting revision or approval of the admin; (2) incomplete and therefore awaiting further data search/hunt; or (3) incorrect due to the overlap of specific attributes from different contexts. The database structure is developed from the principle of normalising all attributes to at least a second normal form (Beeri et al., Reference Beeri, Bernstein, Goodman, Mylopolous and Brodie1989; Demba, Reference Demba2013). The normalisation reduces data redundancy and improves data integrity in the early stages of development. The database design and general species attribute contexts according to the normalised database schema are as follows:
Note: abbreviations are as follows: * – essential attribute, an entry is considered incomplete without this context or a part of the context; ? – optional, could be present, but doesn't influence the consistency of data; X – prohibited, should never be present in the database. Otherwise, the entry is considered invalid and needs further check or a revision.
Historical and social context
The attributes are first usage date, first usage note, discovery year, discovery country, first publication year, IMA approval year, and IMA submission year. The context can be present for all categories but essential only for officially regulated mineral and rock names. The historical and social context can be used to trace spatial and temporal evolution of mineral discoveries (Ponomar, Reference Ponomar, Gavryliv and Putiš2023) that was shown to impact the mineral rarity classification (Gavryliv et al., Reference Gavryliv, Ponomar, Bermanec and Putiš2022b).
Compositional context
This includes idealised and empirical mineral formulas, ions, elements, impurities, inclusions, and intergrowth material. The denormalised attributes include unprocessed raw analytical chemical data linked to each species and stored as-is in a non-relational data lake centralised repository. Note, in a database, the rock names and mineral names are differentiated only through their specific classification statuses. The latter allows for storing whole-rock chemical data using the same data structure. A compositional context is a viable tool for more precise predictions and calculations of the diversity, availability, and distribution of different metallic and non-metallic resources.
Physical context
The attributes are colour, streak, hardness, tenacity, lustre, transparency, cleavage, fracture, density, radioactivity, and optical properties in transmitted and reflected lights. The physical context can serve as a tool for more reliable mineral identification and for exploring how the other variabilities (i.e. chemical composition, admixtures, isomorphism) can affect the physical properties of minerals.
Hierarchy context
This includes the linked ‘parent’ name in a database, e.g. the entity id or a foreign key referencing the upper grouping level of species. For example, the closest hierarchical parent of augite is a clinopyroxene subgroup, and the closest parent of the subgroup is the pyroxene group. Therefore, the context represents a tree data structure, and a recursive SQL query could establish the entire hierarchy chain. The hierarchy, which is mostly formally regulated, seems helpful for the statistical representation of the properties of mineral groups, including the consistency and inconsistency of the properties of its certain members.
Relations context
The layer provides a reference of species to the related species and the type or direction of relation (direct or indirect). The relations should always contain a specific classification status assigned so that the type of dependency between species can be established. For example, African emerald is a misleading synonym for fluorite; therefore, the status of this synonym is directly related to fluorite. On the opposite side, fluorite is assigned the same status related to African emerald but with indirect relation indication so that all synonyms of the fluorite could be retrieved in one query.
Crystallography context
The crystallography context consists of the crystal system, space group, setting, cell parameters, etc. The data set has recently attracted attention for the study of mineral complexity playing a role in global and local geological processes associated with the formation and transformation of crystalline phases (e.g. Evain et al., Reference Evain, Bindi and Menchetti2006; Plášil, Reference Plášil2018; Bačík and Fridrichová, Reference Bačík and Fridrichová2021).
All of the context entries might have a text description or note in a database explaining the relation of the context to a referenced entry. For instance, chemical synonyms would never have a deviation from the referenced material except for the name and the historical, linguistic or social context. Therefore, a short note is assigned to the synonym entry, explaining the type of deviation.
Note that the contexts do not represent the structure of individual tables in the database as decomposition and normalisation of the data often end up with several one-to-many and many-to-many tables within a single context, referencing multiple list tables with unique normalised context properties or attributes. The normalisation often leads to better database performance and overall integrity (Demba, Reference Demba2013).
Conclusions
This work contributes to open access in data engineering and data analysis in mineralogy (what is now regarded as ‘mineral informatics’), clean code development, and optimal design and structure of the mineralogical data warehouses. The regulatory classification has been successfully applied to standardise and organise the entries of the mineralogy.rocks database in its early stage of development. The scheme allows an optimised database structure to be designed with the relations accessible through the classification layer of the data model. The data revision and information hunt allowed us to distinguish several categories of mineral-related names, including those regulated by official bodies e.g. (IMA and IUGS) and, what is more critical – by no means unregulated entries. The relationship between the status of a species and its attributes can serve as a unique tool for studying the relationship between minerals, their diversity, distribution, chemical and structural properties, and complexity.
Acknowledgements
Liubomyr Gavryliv project No. 3007/01/01 has received funding from the European Union's Horizon 2020 research and innovation Programme based on a grant agreement under the Marie Skłodowska-Curie scheme No. 945478. Additionally, the research was supported by the Slovak Research and Development Agency (contract APVV–19–0065)
Competing interests
The authors declare none.