The Instituto de Biología of the Universidad Nacional Autónoma de México (IBUNAM) celebrates its 95th anniversary this year. Faculty and students at IBUNAM are devoted to the study, conservation, and sustainable use of the biota of Mexico, but also from other regions of the world. The research performed at IBUNAM touches on virtually all branches of the tree of life and uses a wide variety of methodological and analytical tools to discover, describe, document, and understand biological diversity. Among other research institutions in Mexico, IBUNAM stands out for housing several National Biological Collections, including 10 National Zoological Collections and the National Herbarium of Mexico (MEXU). The specimens deposited at the IBUNAM’s National Biological collections are the foundation of a myriad of taxonomic, evolutionary, ecological, biogeographic, social, and conservation studies, from which an enormous amount of associated data (hereafter referred to as “metadata”) is generated.
Every day, these collections are actively consulted, both in person and virtually through IBdata (http://ibdata4.ib.unam.mx), a web system for consulting the records of the National Biological Collections housed at IBUNAM (Murguía-Romero et al., 2024). Under UNAM’s open data policy (http://www.datosabiertos.unam.mx/informacion/terminosdeuso.html), IBdata currently provides free, easy, and continuous access to digitalized information of over 1.7 million biological specimens, allowing the dissemination of knowledge and transdisciplinary research, thus benefiting the scientific, governmental, and educational society sectors, as well as private users. For each physical specimen, the digitalized information available in IBdata usually includes high-resolution digital images along with data on the locality where the specimen was collected (including geographic coordinates, when available), date of collecting, collector(s), as well as notes on habitat, morphological, and socio-cultural aspects recorded by the collectors.
One of the commonly generated metadata are genetic resources, which are often made publicly available through the International Nucleotide Sequence Database Collaboration (INSDC; https://www.insdc.org/), which includes 3 international databases that exchange data every day, namely the DNA Databank of Japan (DDBJ; https://www.ddbj.nig.ac.jp/index-e.html), the European Nucleotide Archive (ENA; https://www.ebi.ac.uk/ena/browser/), and GenBank of the USA National Center for Biotechnology Information (NCBI; https://www.ncbi.nlm.nih.gov/genbank/). When properly submitted, these genetic resources include information about the voucher specimens of the genetic data, as well as where the specimens are deposited. Access to such genetic information is essential for the sustainable use and conservation of global biodiversity (Cowell et al., 2022).
Here we used the information of MEXU’s Collection of Types of Vascular Plants available in IBdata (10,972 records) to search and link the specimens to their public genetic resources available at GenBank. For this, we downloaded all the type records and built URL calls for the interface Entrez Programming Utilities (E-utilities; https://www.ncbi.nlm.nih.gov/books/NBK25501/) of GenBank. Query searches used the species’ scientific name and the collection number assigned by the collector or the unique identifier of the specimens (MEXU’s catalogue number) as in the following example: “https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&term=%22Agave%20isthmensis%22[organism]+AND+(4177+OR+628489)”.
In cases where the collection number contained non-numerical characters or spaces, query searches used instead the main collector’s last name (only the first last name was used when 2 last names were present). URL calls were submitted to the nucleotide NCBI database, with a 1 s delay between each search request to avoid overload of the NCBI E -utility servers, with the Bulk URL Opener extension of Google Chrome. Calls with hits were saved, and the corresponding query translations were used to download the associated GenBank accession numbers. Retrieved accession numbers were merged into a text file to perform a NCBI Batch Entrez search (https://www.ncbi.nlm.nih.gov/sites/batchentrez). The resulting records were further filtered with a custom filter using the flag “MEXU.” The filtered results were revised individually to confirm their association with one vascular plant type specimen deposited at MEXU. To facilitate linking back the type specimens to their associated genetic metadata, IBdata records of type specimens having genetic information available at GenBank were complemented with a new data field called “GenBank”, which displays a list of available molecular markers and their corresponding GenBank accession numbers. Additionally, a web link leading to the corresponding NCBI records was added as an additional data field named “GenBank Search” in IBdata (Fig. 1).

Figure 1 IBdata “Summary data sheet of the specimen” of the isotype of Epiphyllum chrysocardium (MEXU’s catalogue number: 72938) showing within a red rectangle two new implemented fields, “GenBank” and “GenBank Search”, which link the specimen to its available genetic information in GenBank.
Our search identified 71 GenBank accession numbers corresponding to 23 angiosperm species representing 20 genera, 7 families, and 5 orders (Table 1). Type specimens found associated with genetic resources at GenBank can be easily accessed in IBdata through the “Simple Search” option and the keyword “genbanksearch”. The sequenced molecular markers include 18 plastid regions: genes accD, atpA, matK, ndhF, psbA, rbcL, and ycf1; the introns of the genes rpl16, trnL; the intergenic spacers rpl20-rps12, rpl32-trnL, rps16- trnQ, trnS- trnG, and ycf6- psbM; the regions trnD- trnT, trnH-psbA, trnK-matK, and trnL-trnF; and different portions of the nuclear- ribosomal Internal Transcribed Spacer (ITS) region. Each type specimen had 1 to 7 associated sequences, being the most frequently sequenced markers the plastid regions trnL-trnF and trnK-matK and the nuclear-ribosomal ITS region. Most sequenced type specimens were collected in the 1980’s, 1990’s, and 2000’s. However, the sequenced isotype specimen of Epiphyllum chrysocardium Alexander (Cactaceae) was collected in 1951 (Fig. 1). Authors of the sequences of the latter type specimen indicated us that the plant tissue used for DNA extraction came indeed from the type collection (MacDougall, 198), but from a division maintained under cultivation at the Botanical Garden of IBUNAM, explaining why sequencing was achieved from such an “old” specimen. Although explicitly stated for only 19 accessions, all the markers recovered seem to have been generated through capillary (Sanger) sequencing.
Table 1 MEXU’s types of vascular plants with available genetic information at GenBank.
| Scientific name | GenBank accession number | Collector, collection number | Type category |
|---|---|---|---|
| Asparagales | |||
| Asparagaceae | |||
| Agave isthmensis García-Mend. & F. Palma | MN900422.1 | García Mendoza, 4177 | Holotype |
| Agave rzedowskiana P. Carrillo, Vega & R. Delgad. | MN900449.1 | Carrillo-Reyes, 1503 | Isotype |
| Agave tenuifolia Zamudio & E. Sanchez | MN900461.1 | Carranza, 1905 | Isotype |
| Yucca mixtecana García-Mend. | MN900508.1, MN893703.1 | García Mendoza, 6198 | Holotype |
| Milla valliflora J. Gut. & E. Solano | MF189697.1, MF189646.1, MF189596.1 | Gutiérrez, 1151 | Holotype |
| Orchidaceae | |||
| Bletia riparia Sosa & Palestina | KU054381.1, KU054368.1, KU054356.1, KU054344.1 | Palestina, 590 | Isotype |
| Dichromanthus yucundaa Salazar & García-Mend. | FN996950.1, FN996962.1 | García Mendoza, 8774 | Holotype |
| Encyclia × nizandensis Pérez-García & Hágsater | KP057187.1, KM385692.1, KM385889.1, KM386017.1 | Pérez-García, 2085 | Holotype |
| Galeoglossum cactorum Salazar & C. Chávez | FN645940.1, FN645939.1 | Chávez-Rendón, 1604 | Holotype |
| Malaxis molotensis Salazar & J.R. Santiago | HG970131.1, HG970153.1 | Santiago, 1320 | Holotype |
| Myrmecophila christinae Carnevali & Gómez-Juárez | EF065697.1 | Carnevali, 4445 | Isotype |
| Asterales | |||
| Asteraceae | |||
| Sinclairia ismaelis Panero & Villaseñor | JN837193.1, JN837373.1, JN837476.1, JN837283.1 | Panero, 3572 | Holotype |
| Caryophyllales | |||
| Cactaceae | |||
| Epiphyllum chrysocardium Alexander | KU598136.1, KU598186.1, KU597978.1, KU597925.1, KU598083.1, KU598030.1 | MacDougall, 198 | Isotype |
| Selenicereus dorschianus Ralf Bauer | LT745712.1, LT745480.1, LT745595.1 | Böhme, s/n | Isotype |
| Cephalocereus parvispinus S. Arias, H.J. Tapia & U. | MK165436.1, MK165437.1, | Tapia Héctor, 38 | Holotype |
| Guzmán | MK165439.1, MK165435.1, MK165434.1, MK165433.1, MK165438.1 | ||
| Nyctaginaceae | |||
| Mirabilis polonii Le Duc | KY952455.1 | Le Duc, 259 | Paratype |
| Cucurbitales | |||
| Cucurbitaceae | |||
| Microsechium gonzalo-palomae Lira | JN560193.1, JN560568.1, JN560294.1, JN560474.1, JN560640.1 | Lira, 1230 | Holotype |
| Sicyos davilae Rodrí.-Arév. & Lira | JN560230.1, JN560595.1, JN560330.1, JN560507.1, JN560663.1, JN560419.1 | Lira, 949 | Paratype |
| Sicyos dieterleae Rodrí.-Arév. & Lira | JN560232.1, JN560596.1, JN560332.1, JN560509.1, JN560664.1, JN560421.1 | Lira, 1385 | Isotype |
| Fabales | |||
| Fabaceae | |||
| Caesalpinia oyamae synonym of Erythrostemon oyamae (Sotuyo & G.P. Lewis) Gagnon & G.P. Lewis | KX373079.1, KX379300.1 | Hawkins, 23 | Holotype |
| Phaseolus albescens McVaugh ex R. Delgad. & A. Delgado | AF115150.1, DQ445955.1 | Delgado, 1705 | Holotype |
| Platymiscium calyptratum M. Sousa & Klitg. | EU735872.1, EU735933.1, EU735990.1, EU736047.1 | Tenorio, 126 | Holotype |
| Harpalyce torresii São-Mateus & M. Sousa | PP250089.1, PP238799.1 | Téllez, 950 | Paratype |
Previous studies have stressed the importance of open access to the digitalized information of type specimens (Nicolson et al., 2023), which are key reference elements of scientific names. The value of both the digitalized information of type specimens and the genetic information derived from them increases when both elements can be linked and easily accessed. Including genetic sequences from type specimens into molecular taxonomic studies often plays an important role in the circumscription of taxa or their placement at a particular place of the tree of life. Explicit recognition of the inclusion of genetic sequences from type specimens in molecular studies can promote the progress of molecular systematics and taxonomy (Chakrabarty, 2010).
Given the improvements in sequencing technology, sequencing of type and non -type herbarium specimens should seek to incorporate more efficient sequencing strategies that maximize the amount of generated sequence data. The combination of supervised sampling of herbarium specimens with high-throughput DNA sequencing and bioinformatics has given rise to “herbariomics” i.e., the access to genome-scale genetic information from specimens maintained in herbaria. Such an approach opens the possibility of incorporating in genomic, phylogenomic, and population genetic studies taxa that otherwise may not be accessible, such as extinct or extremely rare species, or species that live in places difficult to access or subjected to regulations for collecting (Davis, 2023; Strijk et al., 2020). The wealth of already available, potential sources of new genomic information is informed by the recent report by Thiers (2023) on the world’s herbaria, based on data from the Index Herbariorum (https://sweetgum.nybg.org/science/ih/): the 3,567 active herbaria in the world hold over 396.7 million specimens. It should be a priority to incorporate those valuable sources of already collected, curated specimens in world-wide initiatives such as the “global biodiversity cyberbank” (Wen et al., 2017), aimed at integrating all the existing resources to promote free access and generation of information on biological diversity.














