Brand new same old job

The week before last I had my first job interview in 6 years (actually 6 years and one day, to be precise), and I’m delighted that 15 minutes after I left the interview room, I was offered the job.

I wasn’t expecting to get a new job this year, I’m perfectly happy where I am. I love my job. I enjoy the challenge of continually changing focus to work on different people’s projects, and of driving multiple lines of research all at once. However, this was an opportunity I simply couldn’t ignore. Because it is the same job, only better. Heck, I don’t even have to move desks (if I don’t want to).

See, this all came about because my esteemed colleague, and the founding head of the Newcastle Bioinformatics Support Unit, Daniel Swan, has decided to move to pastures new, at Oxford Gene Technology. This means there was an opening to do pretty much what I already do, but while running the show too.

I’m very excited to have this opportunity. Dan will be sorely missed – I have very big boots to fill – but I’ll be working very hard to make sure the unit goes from strength to strength. Also, since I have effectively vacated my old job, we will be recruiting very shortly to fill that gap too. So watch this space if you’re interested in working in Bioinformatics support in the North East.

Announcing a Bioinformatics Kblog writeathon

(Reposted from Knowledgeblog.org)

The Knowledgeblog team is holding a ‘writeathon’ to produce content for a tutorial-focused bioinformatics kblog.

The event will be taking place in Newcastle on the 21st June 2011.  We’re looking for volunteer contributors who would like to join us in Newcastle on the day, or would like to contribute tutorial material remotely to the project.

We will be sending invites shortly to a few invited contributors but are looking for a total of 15 to 20 participants in total.

Travel and accommodation costs (where appropriate) can be reimbursed.

If you would like to contribute tutorial material on microarray analysis, proteomics, next-generation sequencing, bioinformatics workflow development, bioinformatics database resources, network analysis or data integration and receive a citable DOI for your work please get in touch with us at admin@knowledgeblog.org

For more information about Knowledgeblog please see http://knowledgeblog.org.  For examples of existing Knowledgeblogs please see http://ontogeneis.knowledgeblog.org and http://taverna.knowledgeblog.org.

Automatic citation processing with Zotero and KCite

Writing papers. It’s a pain, right? Journals are finicky about formatting. You write the content and then the journal wants you to make it look right. You finally get the content in the right shape and then they tell you that you’ve formatted the bibliography wrong. Your bibliography is clearly in Harvard format, when the journal only accepts papers where the bibliography is formatted Chicago style. Another hour or two of spitting and cursing as you try to massage the citations and bibliography into the “correct” format. You’re not even allowed to cite everything you want to, because the internet is clearly so untrusted a resource.

I’m of the opinion that publishing should be lightweight, the publishers should get out of the way of the author’s process, not actively get in the way. Working on the Knowledgeblog project has only reinforced this opinion. Why should I spend days formatting the content, when any web content management system (CMS) worth its salt will take raw content and format it in a consistent way? Why should I process all the citations and format the bibliography when it should be (relatively) simple to do this in software? Why should I spend time producing complicated figures that compromise what I am able to show when data+code would give the reader far more power to visualise my results themselves?

This document is written in Word 2007 on a Windows 7 virtual machine. On this virtual machine I have also installed Standalone Zotero. The final piece of this particular jigsaw is a Citation Style Language (CSL) style document I wrote (you can download it from the Knowledgeblog Google Code site) that formats a citation in such a way that KCite, Knowledgeblog’s citation engine, can understand it. Now, when I insert citations into my Word document via the Zotero Add-In, I can pick the “KCite” style from the list, and the citation is popped into my document. Now when I hit “Publish” in Word, the document is pushed to my blog, KCite sees the citation as added by Zotero, and processes it, producing a nicely formatted bibliography. We are working on the citeproc-js implementation that means the reader can format this bibliography any way they choose (Phil has a working prototype of this). The biggest current limitation is that your Zotero library entry must have a DOI in it for everything to join up.

So, here is a paragraph with some (contextually meaningless) citations in it [cite]10.1006/jmbi.1990.9999[/cite]. All citations have been added into the Word doc via Zotero, and processed in the page you’re viewing by KCite [cite]10.1073/pnas.0400782101[/cite]. Adding a reference into the document from your Zotero library takes 3-4 clicks, no further processing is needed [cite]10.1093/bioinformatics/btr134[/cite].

Other popular reference management tools, such as Mendeley and Papers, also use CSL styles to format citations and bibliographies, so this same style could be employed to enable KCite referencing with those tools as well. This opens up a wide range of possible tool chains for effective blogging. Mendeley + OpenOffice on Ubuntu. Papers + TextMate on OS X (Papers can be used to insert citations into more than just office suite documents, more on that in a later post). The possibilities are broad (but not endless, not yet anyway). Hopefully this means many people’s existing authoring toolchain is already fully supported by Knowledgeblog.

Image credit: http://www.flickr.com/photos/sybrenstuvel/2468506922/ (Sybren Stüvel on Flickr)

Stack Exchange and the future of BioStar

Over the weekend I saw this tweet from Stack Overflow/Exchange founder Joel Spolsky. The content of the link he posted has served to crystallise some of my thinking of the last couple of weeks with relation to the Bioinformatics question and answer site BioStar.

The link Spolsky posted in the tweet was to a failed Stack Exchange proposal, and I found the page interesting not for the proposal, or the fact that it failed, but the clearly enumerated reasons for why it failed. Here’s a screenshot:

Atheism SE Proposal Screen Cap

To clarify the procedure here, new Stack Exchange sites are proposed by a community of users. That community was originally drawn from Stack Overflow, the extremely successful programming Q&A site, but now that there are nearly 50 active sites, the available community of proposers is much larger. Newly proposed sites have to overcome a series of hurdles before they go live, from proposal, through commitment, to a private beta, a public beta, before finally becoming a fully-fledged SE site. At the end of each of these stages, sites are assessed for the likelihood that they will become a healthy and active site. Crucially, this assessment appears to not be an individual process. It is obviously the view of the SE powers-that-be that all Q&A sites are created equal, and what works for one will work for all of them. What is worrying about this attitude is that sites that are genuinely niche and likely to have a small, but active and dedicated, community will be left by the wayside, since presumably they will be unable to generate the kind of ad-revenue that Spolsky at al are going to require to repay their investors.

BioStar is a web community reaching a crossroads. The site is running on the now-free, but inevitably unsupported Stack Exchange 1.0 platform (the process discussed above is for the SE 2.0 community). To continue to thrive, I firmly believe the site needs to move on from this platform, since it is almost certainly going to be closed down from under it within the next 12-18 months. This presents the site owners (and us, the community) with a choice.

  1. Migrate the site to SE 2.0
  2. Change to an open-source alternative Q&A platform
  3. Roll-our-own site, with the functionality we require

I will start by ruling out option 3. Bioinformatics teaches us the perils of reinventing the wheel when it is not necessary. An effort to write a custom-built platform for BioStar would be almost entirely redundant, undertaken on the free time of the community (free-time which could be better spent answering questions on BioStar), and almost certainly offer no tangible benefit over using one of the already available Q&A engines. (Think Facebook-for-Scientists…)

I used to be firmly in the camp supporting option 1. I genuinely love Stack Overflow. I have found great utility in some of the Stack Exchange family of sites. However, the attitude betrayed in both Spolsky’s tweet and the closure notice on the Atheism Stack Exchange site makes me think that BioStar would be left out in the cold if we attempted this migration. Let’s look at how BioStar measures up to these numbers:

  • Questions per day (SE 2.0 recommends – “15 questions per day on average is a healthy beta”)
    • Since 30th September 2009 BioStar has received 1,681 questions – that’s 3.13 questions per day
  • Percentage answered (SE 2.0 – “90% answered is a healthy beta”)
    • BioStar does well here. There are currently 47 questions with no upvoted answers – about 2.8%
  • User group (SE 2.0 – 150 users with 200+, 10 with 2,000+, 5 with 3,000+)
    • We have 14 users with 3,000+, 24 with 2,000+ and (by my count) 142 with 200+. But BioStar has been going for 18 months, the atheism SE site was shut down after 2 months in public beta
  • Answer ratio (SE 2.0 – “2.5 answers per question is good”)
    • I don’t have easy access to precise numbers for this, but it’s around 3 answers per question on BioStar
  • Visits per day (SE 2.0 – “1,500 visits per day is good, 500 visits per day is worrying.”)
    • I have no stats at all for this, but I’m willing to put good money on the fact that daily numbers are much closer to 500 than 1,500.

By these criteria, and judging by the Atheism Stack Exchange linked to by Spolsky, BioStar would fail to emerge from SE 2.0 beta, based on current numbers, and any effort the existing community put in to get it that far would be wasted. And I don’t think the audience of the site would be grown dramatically by it being a Stack Exchange 2.0 site. I think we have to accept that Bioinformatics is a niche subject with a relatively small potential audience, one that is not going to be especially interesting to a commercially driven exercise (such as Stack Exchange necessarily has to be).

So that leaves us with migration to an OSS alternative as the only remaining option. There are a number of platforms available, some of which offer an experience extremely close to ‘real’ Stack Exchange. I would pick one of these that allows an existing SE XML dump to be imported, and migrate the site as soon as possible, certainly within the next 6 months. There is no question that the change over will be painful, and will probably cost the site a few users, and some traffic in the first instance (the biostar.stackexchange.com URL will have to go, for example), but I am confident in the community that has been built around the site – it will survive, and will be all the stronger for the change.

Besides, if we look at the facts in the cold, hard light of day, we really have no choice.

CASE PhD studentship in Bioinformatics available

I’m delighted to announce we’re offering a PhD studentship, commencing in October. I’ve spent most of my time on the Ondex project building an integrated network focussed on drug repositioning (see [cite source=’doi’]10.2390/biecoll-jib-2010-116[/cite]). I’m very excited that we’ve managed to secure a CASE studentship, in collaboration with Philippe Sanseau at GSK, to continue and considerably extend this work. I think this is a very exciting opportunity. Full details below.

Where? – Newcastle University – School of Computing Science

What? – Development of Novel Computational Approaches to Mine Integrated Datasets for Drug Repurposing Opportunities

The blurb

We invite applications for a CASE PhD studentship in Bioinformatics at Newcastle University in the North East of England. The project is a 3-year EPSRC PhD sponsored by GlaxoSmithKline (GSK) and involves the development of novel methods of finding new targets for existing drugs using data integration.

Ondex is a data integration computational platform for Systems Biology (SB). The student will research the optimization and application of Ondex integrated datasets to the identification of repurposing opportunities for existing compounds with a particular, but not exclusive, focus in the infectious diseases therapeutic area. The student will also use the dataset to explore the interplay between microbial targets and perturbations in the metabolic and community structure of the human gut microbiome.

An ideal student will have a background in computing science, good programming skills, preferably in Java and an interest in biology and bioinformatics. Applicants should also possess an upper second class undergraduate degree. Only students who meet the EPSRC home student requirements are eligible for full fees, other EU students are only eligible to support for the fees. Students from outside the EU are not eligible to apply – please see the EPSRC website for details. 

The studentship will start in October 2011, jointly supervised by Prof. Anil Wipat and Dr. Simon Cockell at Newcastle University, and Dr. Philippe Sanseau at GSK. The student will spend at least three months at GSK in Stevenage as part of the project. Home students are eligible for payment of full fees and an enhanced stipend of approximately £18,000 tax free. To apply, please send an email to [anil dot wipat at ncl dot ac dot uk] with CV (including the contact details of least two referees) and a cover letter indicating your suitability for the position. Please include “Application CASE PhD” in the subject of the email. Applications will be dealt with as they arrive – there is no closing date.



The Problem with DOIs

This article was jointly authored by Phillip Lord and Simon Cockell.

Rhodopsin is a protein found in the eye, which mediates low-light-level vision. It is one of the 7-transmembrane domain proteins and is found in many
organisms including human.

Rhodopsin has an number of identifiers attached to it, which allow you to get additional data about the protein. For instance, the human version is identified by the string “OPSD_HUMAN” in uniprot. If you wish, you can go to http://www.uniprot.org/OPSD_HUMAN and find additional information. Actually, this URI redirects to http://www.uniprot.org/P08100.html. P08100 is an alternative (semantic-free) identifier for the same protein; P08100 is called the accession number and it is stable, as you can read in the user manual. If you don’t like the HTML presentation, you can always get the traditional structured text so beloved of bioinformatics; this is at http://www.uniprot.org/P08100.txt. Or the Uniprot XML (that is at http://www.uniprot.org/P08100.xml). Or http://www.uniprot.org/P08100.rdf if you want RDF. If you just want the sequence, that is at http://www.uniprot.org/P08100.fasta, or http://www.uniprot.org/P08100.gff if you want the sequence features. You might be worried about changes over time, in which case you can see all at http://www.uniprot.org/uniprot/P08100?version=*. Or if you are worried about changes in the future, then http://www.uniprot.org/uniprot/P08100.rss?version=* is the place to be. Obviously, if you want to move outward from here to the DNA sequence, or a report about the protein family, or any of the domains, then all of that is linked from here. If you don’t want to code this for yourself, there are libraries in perl, python and java which will handle these forms of data for you.

So this might be overkill, but the point is surely clear enough. It’s very easy to get the data in a multiple variety of formats, through stable identifiers. The history is clear, and the future as clear as it can be. The technology is simple, straight-forward both for humans and computers to access. The world of the biologist is a good place to be.

What does this have to do with DOIs. Let’s consider a section of publications from one of us. Of course, one of the nice things about DOIs is that you can convert them into URIs. But what do they point to? Well, a variety of different things. Maybe the full HTML article. Or, perhaps an HTML abstract and a picture of the front page. Or more links. Or, bizarrely, a list of the author biographies. Or just another image of a print out of the front page of a identified digital object.

These are a selection from our conference and journal publications. Obviously, this doesn’t cover many of our conference papers, as most don’t have DOIs unless they are published by a big publisher. Or our books. These are published by big publishers, but obviously they are books which is different. I’ve also organised or been on the PC for a number of workshops. They don’t have DOIs either. All of them do have URIs.

In no case, can we guarantee that what we see today will be the same as what we get tomorrow, even though DOIs are supposedly persistent. The presentation of the HTML on those pages that display HTML is wildly different; in many cases, there is no standard metadata. Given the DOI, there doesn’t appear to be a standard way to get hold of the metadata. If you poke around really hard on the DOI website, you may get to http://www.doi.org/tools.html. At this point, you probably already know about http://dx.doi.org, which allows you to resolve a DOI through HTTP. The list of links doesn’t take that long to work through, so you might eventually get to http://www.crossref.org. From here, you can perform searches, including extracting metadata for articles; obviously, you need to register, and you need an API key for this. It doesn’t always work, so if that fails, you can try http://www.pubmed.org, which returns metadata for some DOIs that CrossRef doesn’t, but doesn’t hold a DOI for every publication it lists (even those that have them), so it also fails in unpredictable ways.

The difference between the two situations couldn’t really be clearer. Within biology, we have an open, accessible and usable system. With DOIs, we don’t. The DOI handbook spends an awful lot of time describing the advantages of DOIs for publishers; very little is spent on the advantages for the people generating and accessing the content. It is totally unclear to us what use case DOIs are trying to address from our point of view; what ever it is, they certainly seem to fail of their purpose.

So, why do we care about this? Well, recently, we have been implementing a DOIs for kblogs. Ontogenesis articles now all have DOIs. When we were originally thinking about kblogs, our investigations on how to mint new DOIs came to very little. If DOIs are hard to use, creating them is even worse, you need a Registration Authority; setting this up within a university would be a nightmare. Compare this to the £9 credit card transaction required for a domain name (even this can be quite hard in a University setting!). In the end, we have managed to achieve this using DataCite. Ironically, they are misusing technology intended for articles to represent data; we are misusing DataCite to represent articles again. We also have to keep a hard record of our own of the DOIs we have minted, because, despite the fact all this information is stored in the Datacite database, there is no way of discovering if a DOI points at a given URL using the Datacite API, so we have no way of doing a reverse lookup from a blogpost to discover its DOI.

We’ve also created a referencing system for WordPress. This does DOI lookups for the user, currently using CrossRef, or PubMed. We are not sure yet whether we can retrieve DataCite metadata in this way also.

The irony of this is that it is all totally pointless. WordPress already creates permalinks, based on a URI. These URIs are trackback/pingback capable so can be used bi-directionally. We have added support so that URIs maintain their own version history, so that you can see all previous versions. If you do not trust us, or if we go away, then URIs are archived and versioned by the UK Web archive. Currently, we are adding features for better metadata support, which will use a simple REST style API like Uniprot. Hopefully, multiple format and subsection access will follow also.

So, why are we using DOIs at all? For the same reason as DataCite which has as one of it’s aims “to increase acceptance of research data as legitimate, citable contributions to the scientific record”. We need DOIs for kblog because, although DOIs are pointless, they have become established, they are used for assigning credit, and they are used as a badge of worth. For us, we find it unfortunate, that in the process of using DOIs, we are supporting their credentials as a badge of worth, but it seems the course of least resistance.

Blogging with KCite – a real world test

In my last post I introduced the latest output from the Knowledgeblog project, the KCite plugin for adding citations and bibliographies to blog posts. In this post, I’m using the plugin to add citations to the introduction from one of my papers. The paper is “An integrated dataset for in silico drug discovery”, published last year in the Journal of Integrative Bioinformatics under an unspecified “Open Access” license [cite source=’doi’]10.2390/biecoll-jib-2010-116[/cite].

1. Introduction

The drug development process is increasing in cost and becoming less productive. In order to arrest the decline in the productivity curve, pharmaceutical companies, biotechnology companies and academic researchers are turning to systems biology approaches to discover new uses for existing pharmacotherapies, and in some cases, reviving abandoned ones [cite]10.1038/nrd2265[/cite]. Here, we describe the use of the Ondex data integration platform for this purpose.

1.1 Drug Repositioning

There is recognition in the pharmaceutical industry that the current paradigm of research and development needs to change. Drugs based on novel chemistry still take 10-15 years to reach the market, and development costs are usually between $500 million and $2 billion [cite]10.1016/S0167-6296(02)00126-1[/cite] [cite]10.1377/hlthaff.25.2.420[/cite]. Most novel drug candidates fail in or before the clinic, and the costs of these failures must be borne by the companies concerned. These costs make it difficult even for large pharmaceutical companies to bring truly new drugs to market, and are completely prohibitive for publicly-funded researchers. An alternative means of discovering new treatments is to find new uses for existing drugs or for drug candidates for which there is substantial safety data. This repositioning approach bypasses the need for many of the pre-approval tests required of completely new therapeutic compounds, since the agent has already been documented as safe for its original purpose [cite]10.1038/nrd1468[/cite].

There are a number of examples where a new use for a drug has been discovered by a chance observation. New uses have been discovered for drugs from the observation of interesting side-effects during clinical trials, or by drug administration for one condition having unintended effects on a second. Sildenafil is probably the best-known example of the former; this drug was developed by Pfizer as a treatment for pulmonary arterial hypertension; during clinical trials, the serendipitous discovery was made that the drug was a potential treatment of erectile dysfunction in men. The direction of research was changed and sildenafil was renamed “Viagra” [cite]10.1056/NEJM199805143382001[/cite].

In order that a systematic approach may be taken to repositioning, a methodology that is less dependent on chance observation is required for the identification of compounds for alternative use. For instance, duloxetine (Cymbalta) was originally developed as an anti- depressant, and was postulated to be a more effective alternative to selective serotonin reuptake inhibitors (SSRIs) such as fluoxetine (Prozac). However, a secondary indication, as a treatment for stress urinary incontinence was found by examining its mode of action [cite source=’pubmed’]7636716[/cite].

Performing such an analysis on a drug-by-drug basis is impractical, time consuming and inappropriate for systematic screens. Nevertheless, such a re-screening approach, in which alternative single targets for existing drugs or drug candidates are sought by simple screening, has been attempted by Ore Pharmaceuticals [cite]10.1007/s00011-009-0053-3[/cite]. Systems biology provides a complementary method to manual reductionist approaches, by taking an integrated view of cellular and molecular processes. Combining data integration technology with systems approaches facilitates the analysis of an entire knowledgebase at once, and is therefore more likely to identify promising leads. This general approach, of using Systems approaches to search for repositionable candidates, is also being developed by e-Therapeutics plc and others exploring Network Pharmacology [cite]10.1038/nchembio.118[/cite]. However, network pharmacology differs from the approach we set out here, by examining the broadest range of the interventions in the proteome caused by a molecule, and using complex network analysis to interpret these in terms of efficacy in multiple clinical indications.

1.2 The Ondex data integration and visualisation platform

Biological data exhibit a wide variety of technical, syntactic and semantic heterogeneity. To use these data in a common analysis regime, the differences between datasets need to be tackled by assigning a common semantics. Different data integration platforms tackle this complicated problem in a variety of ways. BioMart [cite]10.1093/nar/gkp265[/cite], for instance, relies on transforming disparate database schema into a unified Mart format, which can then be accessed through a standard query interface. On the other hand, systems such as the Distributed Annotation System (DAS) take a federated approach to data integration; leaving data on multiple, distributed servers and drawing it together on a client application to provide an integrated view [cite]10.1186/1471-2105-8-333[/cite].

Ondex is a data integration platform for Systems Biology [cite]10.1093/bioinformatics/btl081[/cite], which addresses the problem of data integration by representing many types of data as a network of interconnected nodes. By allowing the nodes (or concepts) and edges (or relations) of the graph to be annotated with semantically rich metadata, multiple sources of information can be brought together meaningfully in the same graph. So, each concept has a Concept Class, and each relation a Relation Type. In this way it is possible to encode complex biological relationships within the graph structure; for example, two concepts of class Protein may be joined by an interacts_with relation, or a Transcription Factor may be joined to a Gene by a regulates relation. The Ondex data structure also allows both concepts and relations to have attributes, accessions and names. This feature means that almost any information can be attached to the graph in a systematic way. The parsing mechanism also records the provenance of the data in the graph. Ondex data is stored in the OXL data format [cite]10.2390/biecoll-jib-2007-62[/cite], a custom XML format designed for the exchange of integrated datasets, and closely coupled with the design of the data structure of Ondex.

The Ondex framework therefore combines large-scale database integration with sequence analysis, text mining and graph-based analysis. The system is not only useful for integrating disparate data, but can also be used as a novel analysis platform.

Using Ondex, we have built an integrated dataset of around 120,000 concepts and 570,000 relations to visualise the links between drugs, proteins and diseases. We have included information from a wide variety of publicly available databases, allowing analysis on the basis of: drug molecule similarity; protein similarity; tissue specific gene expression; metabolic pathways and protein family analysis. We analysed this integrated dataset to highlight known examples of repositioned drugs, and their connectivity across multiple data sources. We also suggest methods of automated analysis for discovery of new repositioning opportunities on the basis of indicative semantic motifs.

KCite – easy citations in Wordpress

I’m excited about this one.

For a couple of months now, I’ve been working on a referencing plugin for Knowledgeblog. The idea is to make it easy for authors to add citations to their posts, and have a bibliography produced automatically. Key to this approach (as with everything we’re doing on Knowledgeblog) is enabling authors to use their pre-existing workflow. So, if they are used to writing documents/papers in Word, they should be able to continue using it for writing posts for Knowledgeblog. If, on the other hand, they prefer to write collaboratively using Google Docs, we shouldn’t put unnecessary obstacles in their path, and so on. So the tool that we have produced, called KCite, uses simple text-based tags to process citations. These tags can be added from any platform (they are extremely simple to just type in), and WordPress will interpret them when it renders the post.

There is no attempt to manage references, to create a database and allow selection from that database when adding new citations. This is quite deliberate, researchers already use these tools, they are external to WordPress and (as of yet), incompatible with it. By keeping the system as simple (I hope) as possible, citations should be perfectly manageable by copy&paste from a browser or reference manager of your choosing, into the tool of your choosing.

I will publish an example of the plugin in action as a separate post, but in short the idea is that you surround either a DOI or a PMID with a cite shortcode. The plugin queries the CrossRef API or PubMed (via NCBI eUtils) in order to retrieve metadata about each publication, and uses that data to build the bibliography, which is then appended to the foot of the post. As yet this is far from being completely generic, and there will be circumstances where the lookup fails, but I have attempted to handle these situations as gracefully as possible, so hopefully a usable bibliography will be produced in as many cases as possible.

This is a 0.1 release, intended almost as a preview. The plugin is currently nowhere near what we would consider to be feature complete. There are a number of things on my TODO list to address over the next few weeks, but I would welcome feature requests and bug reports. You can follow development, and contact us, through the Google Code page for Knowledgeblog.

A final reminder, you can download KCite from http://wordpress.org/extend/plugins/kcite/.

The Taverna Knowledgeblog

Today I am sat in a room with a fairly large group of people, who all work on the Taverna project. They are writing a Knowledgeblog book about the workflow manager, and I am providing help and technical assistance as a part of my role on the Knowledgeblog project. As well as producing a hopefully useful product (a beginner’s guide to Taverna), we are testing some of the procedures and products that we have been working on over the last few months on the project.

Posts on a Knowledgeblog now have several features that were in our plan for the project. Specifically, post revisions are now publicly exposed, providing a public provenance trail, and preventing someone from ‘unsaying’ anything without the proper process. The editorial workflow is better defined than it was for Ontogenesis (the Knowledgeblog prototype), meaning requests for reviews and the provision of the reviews themselves should be more streamlined, and despite the approach to today, doesn’t require all of the collaborators on a publication to be sitting in the same room (for this we are using the excellent EditFlow plugin, which provides ‘editorial comments’ on posts, and can fire email events upon certain, pre-defined, operations).

Posts can have multiple authors, which, combined with the ability to author posts in genuinely collaborative tools such as Google Docs (as opposed to totally non-collaborative tools like Word documents shared by email, although you can write posts like that too if you like), allows jointly authored posts to be both simple to generate and properly attributed. Finally, easy to generate tables of contents, for both posts and whole sites, makes navigating the content simple.

There are still a number of pieces of the puzzle that need to be slotted into place for us to have a fully functional platform, but I can’t help but feel we’re getting there. As I mentioned, I was here for technical support, and I didn’t really have a massive amount to do today (I spent most of it tinkering with the chosen theme to get it to support CoAuthors Plus).

The next major step will be a plugin to assist with citing papers and generating bibliographies that I am currently in the process of writing, more on that in a future post. I agree with many of Martin Fenner’s points in his post of a few days ago, citations are not currently well supported by WordPress, or any plugins so far. I am working on the dynamic generation of citations and bibliographies from specific tags within posts. This should allow for simple management of referencing by authors, and provide a range of tools for readers of articles, such as BibTeX/RIS export and on-the-fly bibliography reformatting.