opnMINER FAQs
opnMINER is a convenient way for research scientists to search heterogonous data sources to find and analyze articles, patents, databases or web pages with semantic, ontology based concepts rather than just words. Moreover, it allows you to identify relationships between proteins, chemical compounds, organisms or pharmacological effects or other life science properties.
Thus, opnMINER can help you to answer typical questions arising in pharmacological research:
- Diseases related to a specific target,
- Targets related to a disease,
- Most recent patents for targets and small molecules,
- Small molecule modulators, methods and assays for my target,
- Mode-of-Action (MoA) of compounds or interactions between proteins.
“Basic Search” retrieves documents that contain the search concepts or terms. While you type in the search field our auto-complete functions check if the text matches any terms or parts of terms in our knowledge domains. A domain term matches if one of its word parts starts with your text or if it matches it completely. Terms in the auto-complete suggestion list are ranked by their prevalence. For example, if you type “sensiti” the suggestion list will include matches such as sensitivity, pressure-sensitive adhesives and sensitive skin but not insensitive or desensitization:
If a search term is part of our knowledge domains we will search the document collections for its occurrence including all of its synonyms and child concepts. Example: If you search for “steroids” the result documents will include matches of steroids but also pregnenolone, testosterone, cholesterol etc. because we know that all of those are steroids.
Otherwise, if the term is not known in our domains, we will search documents for an exact match of the given term. Using * as a wild card in the search term allows expanding the query term, e.g. “a*gonist” searches both for agonist as well as antagonist.
Semantic searching allows, different to simple text based searching, also to search for concepts and sub-concepts. Available documents have been indexed with 10 ontologies from different knowledge domains: chemistry, companies, diseases, drugs, effects, methods, proteins, species, polymers, and toxicology. Thus, if entered search term matches a concept in one of those knowledge domains, the respective domain will be shown in the autocomplete list in parenthesis after the term: e.g. entering “bovine” will show “bovine (Species)” as the first proposed term. This concept can be selected by clicking on it. A subsequent search will yield documents as hits that contain e.g. “cow” or “cattle” - as these are all synonyms of the concept in our species ontology. As a consequence, using the search term “cattle” will yield the same document hits as the search term “bovine”. Another example is the knowledge domain “companies”. Please search for “pharmaceutical companies” to narrow down your search on companies of relevance in the life science sector. Similarly, search for “disease” instead of “indication”. After a while you will understand the strength of the applied approach and be able to adopt your searches accordingly.
High level concepts are especially useful when searching for conceptual relations. Using a high level concept such as for example “cancer (Disease)” as search term will return all documents that include any term representing any disease. Together with a second concept term, e.g. “drugs (Drug)”, in a co-occurrence search (see “What are co-occurrences?”) shows you drugs of interest for the treatment of cancer.
Please see that some terms may occur in multiple ontologies representing a specific semantic meaning. For example “Vitamin D” occurs in the domains of chemistry, drugs and substance. In each of those ontologies, the “Vitamin D” concept has different synonyms and sub-concepts and will thus retrieve different document hits.
Basic Search results are an adjustable list of documents that can be manipulated further. The length of the list can be adjusted, sorting is possible by “Relevance” or “Date”. The list could be limited where search terms or concepts are found in the "Whole document", "Abstract", or for the Patents it could be limited to occurrences in “Claims”. The required document date can also be set via the "Release date” Searching for release date with a date range or a start or end date could include the full date or years only. The release date corresponds to a respective date in the sources, e.g. the publication date for patents.
The hit lists generated by a search and their display options are dependent on the repository used. For example, when searching PubMed Central, the hit list can be used to find the original document (click on Original Document: PMC as shown above), it shows a document relevancy score related to the executed query. The higher the score the more relevant is the document. The “Find: similar documents” button allows you to search for conceptually similar documents – however, please be aware that this search may take quite a while as many millions of documents will be classified on-the-fly based on concepts found in the selected document. When no document with sufficient conceptual similarity is found, no hit list will be returned.
For the Patents repository, it also possible to restrict the search to specific parts of the document, e.g. searching the whole document, or searching in the abstract or in the claims only, allowing to narrow down search results:
Our basic search allows searching for compounds either via their synonyms or chemical names, alternatively via their chemical classes (e.g. "steroids"). This will retrieve all such compounds or chemistry mentions in text documents. For patents, we also extract compounds and their structures from their images.
The specific structure based chemistry search ("Compound Search") option allows you to draw the chemical structure of a compound using a chemistry structure editor. After drawing, you may want decide if you like to perform a chemical "Full" (including isotopes and isomers), "Similarity", "Substructure", or "Duplicate" (exact match of all atomar properties) chemistry search:
Alternatively, you may open a file with a chemical structure, e.g. as a SDF or SMILES file, in a folder on your local computer and import the structure(s) into the editor.
When done with editing the compound, you may SEARCH this compound in our compound database, returning a table of matching compound hits, together with their structures, names, and IDs. The "Select And Export" selection field enables you to define compounds for export. When you "EXPORT" you will receive a file named "compound_export.csv" into your download folder, containing the compounds information for use in other chemistry programs.
Please be aware that the number of hit molecules for any chemistry query is currently limited to 100 results.
Hitting the "Compound details" link in the structure search result table returns information on the respective compound such as parent classes, synonyms and links to other databases. By hitting the "Search in documents" link a search on this specific compound is performed in the document collections, see "How to search chemical structures in documents?". When this search is performed, the result is shown in the basic search document hit list view while the search field contains the 12-digit unique structural identifier (OCID) of this compound. The different names of the compound (here 2-methylnaphthalene) are shown in bold in the document snippets:
The search for compounds may be performed in two ways: using “Basic search” or using “Compound search”. In “Basic search” our cognitive search engine will detect when you enter a chemistry term and proposes it as a search term. For example, when entering “aspirin” the engine knows that it is either a chemistry compound term or a drug term, when used as a drug ingredient. The search for either one of the two concepts will return different results, as these concepts may have different synonyms and meaning – these will be displayed for your information below the search field.
However, you may also search for chemical terms that describe chemical classes, e.g. like “steroids” instead of a specific chemical compound. This search is especially powerful as our cognitive search engine knows what steroids are and uses all compounds that are classified as steroids for searching, e.g. estradiol or testosterone.
When searching for chemical compounds using “Compound search” our constantly growing chemistry database returns a list of compounds with some information such as synonyms and chemical structure. Subsequently, documents that contain these structures could be retrieved by the "Search in documents" option in the resulting compound hit list, providing an alternative way of discovering documents that contain a specific chemical structure.
The “Search for Co-occurrences” search strategy allows searching for sentences that contain two concepts of interest within one sentence. For example, when you are interested what kind enzyme proteins/genes are connected to the disease cancer you simply need to put “enzyme” and “cancer” into the two search fields Concept 1 and Concept 2, select a repository such as “Patents” and SEARCH:
As a search result, you will get a graphical representation of the connected, co-referenced enzymes as well as cancer types, also showing the most frequent co-occurrences by the size of the bullet. The total number of sub-bullets per query concept is however limited to a maximum of the 20 most frequent concepts found. Thus, we may get the co-occurrences for the up to 20 most frequent sub-concepts of query concept 1 together with the up to 20 most frequent sub-concepts of query concept 2:
If you are interested in the sentences that have produced these co-occurrences, please click the list beneath the graph and you will see those sentences (limiting the maximal number of retrieved sentences to 1000) as a list:
Clicking on a bullet of the 2D-graph, such as e.g. the “AKT1 family” bullet, will allow you to drill down into more specific relationships:
You may also want to explore and discover important relationships of any concept. For example, typing “Bcl6” into Concept 1, it is recognized as a protein term. Leaving Concept 2 empty and selecting a document repository such as “NIH Grants” performs a co-occurrence search of this protein and other concepts, co-occurring in one sentence of any NIH grant application. In the co-occurrences graph display you will see concept types that have been identified, e.g. “Disease”. By clicking on the Disease bullet link, you will re-focus on co-occurrences between “bcl6” and “Diseases”. This could be continued to drill down the hierarchy of concepts to a very specific co-occurrence. Alternatively, you may select any other hierarchy like Chemistry, Toxicology, Effect or process, Species, Substance, Polymer, Company Drug or Method.