About

What is OpenBiodiv?
What data is in OpenBiodiv?
What knowledge can be obtained from OpenBiodiv?
How to find information about biodiversity in OpenBiodiv?
      General search
      User applications
      Application Programing Interface (API)

What is OpenBiodiv?

OpenBiodiv is a biodiversity database containing knowledge extracted from scientific literature, built as an Open Biodiversity Knowledgement Management System (OBKMS). OpenBiodiv consists of a knowledge graph, a Linked Open Dataset, an ontology (OpenBiodiv-O) and а website. The knowledge graph contains semantic statements about authors, articles, treatments, taxonomic names, examined materials, institutions, genomic sequences, habitats, localities, and more. Each entity in the Linked Open Dataset has its globally unique, persistent and resolvable identifiers (GUPRI).

Data is modelled according to the OpenBiodiv-O ontology integrating semantic resource types from recognised biodiversity and publishing ontologies with biodiversity-specific resource types not modelled before.

The aim of OpenBiodiv is to make biodiversity knowledge easily findable and accessible both by humans and machines. OpenBiodiv has several user-oriented applications, a RESTful API and a SPARQL endpoint where experienced users can write complex queries.

What data is in OpenBiodiv?

OpenBiodiv gathers knowledge extracted from semantically enhanced biodiversity-related articles published in Pensoft’s journals (e.g. ZooKeys, PhytoKeys, MycoKeys, Biodiversity Data Journal, etc.) and taxonomic treatments harvested and semantically annotated by Plazi from journals of other publishers (e.g. Zootaxa, European Journal of Taxonomy, etc.) and exposes the links between and within articles.

What knowledge can be obtained from OpenBiodiv?

OpenBiodiv offers a broad biodiversity-related querying system answering open-ended queries based on the data. OpenBiodiv can be used to obtain new knowledge about taxa, scientific articles and their subsections, the examined materials and their metadata, localities, sequences and a lot more. OpenBiodiv can discover hidden links within biodiversity data and can guide research into how data is used in scholarly articles.

The system is able to return information with relevant visual representation about any one or a combination of its major data classes within a certain scope and semantic context

Data classes are:

Taxon name (Taxon Name Usage, TNU)
Taxon treatment
Specimen
Sequence
Person (author)
Collection/Institution

Examples of data properties are:

Location
Date (of publication, sample collection, etc.)
Geo-coordinates
Habitat

Article metadata and sections are:

Title
Authors
Abstract
Keywords
Bibliographic metadata (DOI, publication date, journal name, article number, pages)
Introduction
Material and methods
Data resources
Results
Taxon treatments

Nomenclature
Material citations (specimen records)
Type locality
Description
Diagnosis
Taxonomy
Etymology
Distribution
Molecular data
Ecology and biology
Conservation
Uses
Identification keys

Discussion
Conclusions
General (or Undefined) sections
Figures (including figure legends)
Tables
Appendices
Supplementary files
Reference lists
Acknowledgements
Usage rights
Funding information
Author contributions
Notes

Semantic classes are article sections grouped by topic:

Taxonomy & Nomenclature
Diagnoses
Identification keys
Conservation
Biology & Ecology
Distribution
Uses (e.g. ethnobotanical information)

Using OpenBiodiv one can answer complex questions like these (see Sample SPARQL queries for more detail):

Which articles contain treatments which describe specimens in forest or wood habitats?
Which taxa that are mentioned together in a treatment have a potential feeding relationship?
Which are the most cited resources and which are the journal articles that cite them?
What are the life stages and collection dates of all specimens from genus Eupolybothrus?
What are the storing institutions of collected holotypes from family Theraphosidae?
Which treatments describe materials stored in the Natural History Museum, London? Which taxa are described?

How to find information about biodiversity in OpenBiodiv?

There are four approaches for exploration of data stored in the graph:

General search

The general search is available on the homepage of OpenBiodiv and allows exploration of the knowledge graph based on key terms like taxonomic names, persons, articles. The user only needs to type the name of an entity of interest belonging to one of the above-mentioned types and the system finds information about it. Misspelling the name is not a problem because the Elasticsearch index supports fuzziness for maximum edit distance allowed for matching. It can also automatically determine the semantic type of the searched entity.

User applications

Literature exploration

This application is designed to answer the following general question: Find me information about an entity mentioned within a certain article section in OpenBiodiv. The results will show the number of mentions of this entity (e.g. taxonomic name) in each section of interest (e.g. Titles (X), Abstracts (Y), Treatments (Z), etc.) and aggregated by articles.

By clicking on the hyperlinked number, the user is redirected to the article section where that entity is mentioned.

A simple graphic representation of the information, for example, about Element X mentioned in Y titles and Z abstracts (plot comparison) illustrates the distributions of the element in the searched sections.

In addition to being visualised in the web page, the results can be exported to a CSV file for further use.

Co-occurrences

This application extends the functionality of the Literature exploration app by adding two or more data elements (named entities), e.g. taxon names, sequences, specimens, specific terms, etc. to be searched together within a given context. For example, some possible questions are:

Give me article sections or taxon treatment sections where Data element 1 and Data element 2 are mentioned together, e.g.:

Taxon name A & Taxon name B
Sequence C & Taxon name Y
Taxon name X & Treatment Y

External links

The basic aim of this data discovery application is to search, discover and display data available from trusted external resources, for example specimens, collections, sequences, taxon names, literature, persons and others. The element of interest may be present also in OpenBiodiv.

This service is available also as a an additional step to other apps. For example, when one is making a bibliographic exploration about a certain named entity, it could have the option to ask for additional information about that entity available from external resources.

The data records and their identifiers obtained as a result of the search across various resources can be stored as CSV file or RDF using the SCOR ontology.

Alerts

OpenBiodiv performs a number of queries at regular intervals to generate reports and send these to the users subscribed to the RSS & E-mail Alert service. The queries can deliver for example:

All mentions of specimens from a collection or institution based on either citations of a particular collection/Institution code or use of specimen identifiers in the examined materials (material citations).
All taxon treatments (new taxa, re-descriptions, nomenclatural changes and others) published within a particular taxon.
All newly published literature that mentions a certain taxon or other named entity of interest (e.g. sequence).

Application Programing Interface (API)

OpenBiodiv can be explored by an unlimited number of various SPARQL queries, however it also provides an API for programmatic access to the data. The documentation of the API is described in Swagger. The API construction and functionalities follow the recommendations elaborated by the Technical Research Infrastructures forum of the BiCIKL project.

SPARQL Endpoint

Sample SPARQL queries

Literature exploration

Co-occurrences

External links

Alerts