In my last post I introduced the latest output from the Knowledgeblog project, the KCite plugin for adding citations and bibliographies to blog posts. In this post, I’m using the plugin to add citations to the introduction from one of my papers. The paper is “An integrated dataset for in silico drug discovery”, published last year in the Journal of Integrative Bioinformatics under an unspecified “Open Access” license [cite source=’doi’]10.2390/biecoll-jib-2010-116[/cite].
The drug development process is increasing in cost and becoming less productive. In order to arrest the decline in the productivity curve, pharmaceutical companies, biotechnology companies and academic researchers are turning to systems biology approaches to discover new uses for existing pharmacotherapies, and in some cases, reviving abandoned ones [cite]10.1038/nrd2265[/cite]. Here, we describe the use of the Ondex data integration platform for this purpose.
1.1 Drug Repositioning
There is recognition in the pharmaceutical industry that the current paradigm of research and development needs to change. Drugs based on novel chemistry still take 10-15 years to reach the market, and development costs are usually between $500 million and $2 billion [cite]10.1016/S0167-6296(02)00126-1[/cite] [cite]10.1377/hlthaff.25.2.420[/cite]. Most novel drug candidates fail in or before the clinic, and the costs of these failures must be borne by the companies concerned. These costs make it difficult even for large pharmaceutical companies to bring truly new drugs to market, and are completely prohibitive for publicly-funded researchers. An alternative means of discovering new treatments is to find new uses for existing drugs or for drug candidates for which there is substantial safety data. This repositioning approach bypasses the need for many of the pre-approval tests required of completely new therapeutic compounds, since the agent has already been documented as safe for its original purpose [cite]10.1038/nrd1468[/cite].
There are a number of examples where a new use for a drug has been discovered by a chance observation. New uses have been discovered for drugs from the observation of interesting side-effects during clinical trials, or by drug administration for one condition having unintended effects on a second. Sildenafil is probably the best-known example of the former; this drug was developed by Pfizer as a treatment for pulmonary arterial hypertension; during clinical trials, the serendipitous discovery was made that the drug was a potential treatment of erectile dysfunction in men. The direction of research was changed and sildenafil was renamed “Viagra” [cite]10.1056/NEJM199805143382001[/cite].
In order that a systematic approach may be taken to repositioning, a methodology that is less dependent on chance observation is required for the identification of compounds for alternative use. For instance, duloxetine (Cymbalta) was originally developed as an anti- depressant, and was postulated to be a more effective alternative to selective serotonin reuptake inhibitors (SSRIs) such as fluoxetine (Prozac). However, a secondary indication, as a treatment for stress urinary incontinence was found by examining its mode of action [cite source=’pubmed’]7636716[/cite].
Performing such an analysis on a drug-by-drug basis is impractical, time consuming and inappropriate for systematic screens. Nevertheless, such a re-screening approach, in which alternative single targets for existing drugs or drug candidates are sought by simple screening, has been attempted by Ore Pharmaceuticals [cite]10.1007/s00011-009-0053-3[/cite]. Systems biology provides a complementary method to manual reductionist approaches, by taking an integrated view of cellular and molecular processes. Combining data integration technology with systems approaches facilitates the analysis of an entire knowledgebase at once, and is therefore more likely to identify promising leads. This general approach, of using Systems approaches to search for repositionable candidates, is also being developed by e-Therapeutics plc and others exploring Network Pharmacology [cite]10.1038/nchembio.118[/cite]. However, network pharmacology differs from the approach we set out here, by examining the broadest range of the interventions in the proteome caused by a molecule, and using complex network analysis to interpret these in terms of efficacy in multiple clinical indications.
1.2 The Ondex data integration and visualisation platform
Biological data exhibit a wide variety of technical, syntactic and semantic heterogeneity. To use these data in a common analysis regime, the differences between datasets need to be tackled by assigning a common semantics. Different data integration platforms tackle this complicated problem in a variety of ways. BioMart [cite]10.1093/nar/gkp265[/cite], for instance, relies on transforming disparate database schema into a unified Mart format, which can then be accessed through a standard query interface. On the other hand, systems such as the Distributed Annotation System (DAS) take a federated approach to data integration; leaving data on multiple, distributed servers and drawing it together on a client application to provide an integrated view [cite]10.1186/1471-2105-8-333[/cite].
Ondex is a data integration platform for Systems Biology [cite]10.1093/bioinformatics/btl081[/cite], which addresses the problem of data integration by representing many types of data as a network of interconnected nodes. By allowing the nodes (or concepts) and edges (or relations) of the graph to be annotated with semantically rich metadata, multiple sources of information can be brought together meaningfully in the same graph. So, each concept has a Concept Class, and each relation a Relation Type. In this way it is possible to encode complex biological relationships within the graph structure; for example, two concepts of class Protein may be joined by an interacts_with relation, or a Transcription Factor may be joined to a Gene by a regulates relation. The Ondex data structure also allows both concepts and relations to have attributes, accessions and names. This feature means that almost any information can be attached to the graph in a systematic way. The parsing mechanism also records the provenance of the data in the graph. Ondex data is stored in the OXL data format [cite]10.2390/biecoll-jib-2007-62[/cite], a custom XML format designed for the exchange of integrated datasets, and closely coupled with the design of the data structure of Ondex.
The Ondex framework therefore combines large-scale database integration with sequence analysis, text mining and graph-based analysis. The system is not only useful for integrating disparate data, but can also be used as a novel analysis platform.
Using Ondex, we have built an integrated dataset of around 120,000 concepts and 570,000 relations to visualise the links between drugs, proteins and diseases. We have included information from a wide variety of publicly available databases, allowing analysis on the basis of: drug molecule similarity; protein similarity; tissue specific gene expression; metabolic pathways and protein family analysis. We analysed this integrated dataset to highlight known examples of repositioned drugs, and their connectivity across multiple data sources. We also suggest methods of automated analysis for discovery of new repositioning opportunities on the basis of indicative semantic motifs.