internet

Making an RSS feed from Google Reader shared items

A month ago there would have been no reason to write this post, because Google Reader made it’s own RSS feed of the posts you wanted to share. See, Google wants to drive people to use Google+, and they seem to be doing this by crippling their other services that have even a smidgen of social usefulness. The thing is, I liked the social bits of Google Reader. Sure, they were a bit of an afterthought, and mildly dysfunctional, but I got a lot of value from reading what other people shared, and I liked that when I shared something I was creating a kind of archive of stuff I liked on Reader, with an RSS feed of its very own.

The great thing about RSS is that it can be consumed by arbitrary 3rd party applications. The likes of Google, Facebook and Twitter don’t like this, because they want control over the 3rd parties that can access their stream. So they are quite prepared to kill off RSS services in their applications, because it does not serve their mission. You can no longer get an RSS feed of your Twitter stream (as far as I know), there is no RSS built into Google+ (which I have yet to have the time to fully grok).

My needs are small, I want an RSS feed of the stuff I want to share from Google Reader, so that other people can follow the things I share in Reader (if they want), and I can pipe that information elsewhere (I use dlvr.it to post selected RSS feeds into Twitter). Google doesn’t want to provide that anymore, so I’ll hack something together.

The ingredients:

  1. These simple instructions for how to render an RSS feed from a MySQL backend.
  2. The instructions for how to create your own “Send to:” item in Google Reader
  3. My rudimentary PHP hackery skills

The code:
All source is available on BitBucket.

First, we need a database connection. The database is set up exactly as described in (1), above.

<?php 
DEFINE('DB_USER', 'db_user'); 
DEFINE('DB_PASSWORD', 'db_password'); 
DEFINE('DB_HOST', 'localhost'); 
DEFINE('DB_NAME', 'db_name'); 
// Make the connnection and then select the database. 
$dbc = @mysql_connect(DB_HOST, DB_USER, DB_PASSWORD) OR die(mysql_error()); 
mysql_select_db(DB_NAME) OR die(mysql_error()); 
?>

Now, when the page is visited, we want to render what is in the database as an RSS feed (again, this is a simple adaptation of the code in (1)):

<?php
  class RSS {
        public function RSS() {
                require_once ('mysql_connect.php');
        }
        public function GetFeed() {
                return $this->getDetails() . $this->getItems();
        }
        private function dbConnect() {
                DEFINE('LINK', mysql_connect(DB_HOST, DB_USER, DB_PASSWORD));
        }
        private function getDetails() {
                //header of the RSS feed
                $detailsTable = "webref_rss_details";
                $this->dbConnect($detailsTable);
                $query = "SELECT * FROM ". $detailsTable;
                $result = mysql_db_query (DB_NAME, $query, LINK);
                while($row = mysql_fetch_array($result)) {
                        //fairly minimal description of the feed
                        $details = '<?xml version="1.0" encoding="ISO-8859-1" ?>
                                <rss version="2.0">
                                        <channel>
                                                <title>'. $row['title'] .'</title>
                                                <link>'. $row['link'] .'</link>
                                                <description>'. $row['description'] .'</description>
                                                <language>'. $row['language'] .'</language>
                                                ';
                }
                return $details;
        }

        private function getItems() {
                //return all the items foe the RSS feed
                $itemsTable = "webref_rss_items";
                $this->dbConnect($itemsTable);
                $query = "SELECT * FROM ". $itemsTable;
                $result = mysql_db_query(DB_NAME, $query, LINK);
                $items = '';
                while($row = mysql_fetch_array($result)) {
                        $items .= '<item>
                                <title>'. $row["title"] .'</title>
                                <link>'. $row["link"] .'</link>
                                <description><![CDATA['. $row["description"] .']]></description>
                        </item>';
                }
                //close the feed
                $items .= '</channel>
                                </rss>';
                return $items;
        }
}
?>

Finally, we need a method for adding new stuff for the feed. This code takes the GET variables passed to it by Google Reader, and stores them in the DB:

<?php
if ($_GET['url']) {
        //receive google reader 'send to' items, and store in mysqldb
        $url = $_GET['url'];
        $source = $_GET['source'];
        $title = $_GET['title'];
        $simple_check = $_GET['check'];
        //stops anyone adding new items to your feed unless they have the key
        if ($simple_check == 'uniquepasscodehere') {
                //statement adds new item to RSS database
                $insert_statement = "INSERT INTO webref_rss_items(title, description, link) VALUES('$title', '$source', '$url')";
                require_once('mysql_connect.php');
                $result = mysql_query($insert_statement, $dbc);
                if ($result) {
                        echo "<p>Success!";
                        //would be nice to close the window automatically after a couple of seconds
                }
                else {
                        die('<p>Invalid query: ' . mysql_error());
                }
        }
}
else {
        //render everything in the db as RSS
        header("Content-Type: application/xml; charset=ISO-8859-1"); 
        include("RSS.class.php"); 
        $rss = new RSS(); 
        echo $rss->GetFeed(); 
}
?>

Now, I can set up the Send To: item in Google Reader:

Finally, click ‘Send To: -> Readershare’ in the footer of an item in Google Reader, and it is rendered into my RSS feed, which can then be consumed by other applications, including Google Reader itself (so if you want to subscribe to my Google Reader shared items feed, you can find it at http://fuzzierlogic.com/readershare). Oh, and I can pipe my Google Reader shares back into Twitter again.

Stack Exchange and the future of BioStar

Over the weekend I saw this tweet from Stack Overflow/Exchange founder Joel Spolsky. The content of the link he posted has served to crystallise some of my thinking of the last couple of weeks with relation to the Bioinformatics question and answer site BioStar.

The link Spolsky posted in the tweet was to a failed Stack Exchange proposal, and I found the page interesting not for the proposal, or the fact that it failed, but the clearly enumerated reasons for why it failed. Here’s a screenshot:

Atheism SE Proposal Screen Cap

To clarify the procedure here, new Stack Exchange sites are proposed by a community of users. That community was originally drawn from Stack Overflow, the extremely successful programming Q&A site, but now that there are nearly 50 active sites, the available community of proposers is much larger. Newly proposed sites have to overcome a series of hurdles before they go live, from proposal, through commitment, to a private beta, a public beta, before finally becoming a fully-fledged SE site. At the end of each of these stages, sites are assessed for the likelihood that they will become a healthy and active site. Crucially, this assessment appears to not be an individual process. It is obviously the view of the SE powers-that-be that all Q&A sites are created equal, and what works for one will work for all of them. What is worrying about this attitude is that sites that are genuinely niche and likely to have a small, but active and dedicated, community will be left by the wayside, since presumably they will be unable to generate the kind of ad-revenue that Spolsky at al are going to require to repay their investors.

BioStar is a web community reaching a crossroads. The site is running on the now-free, but inevitably unsupported Stack Exchange 1.0 platform (the process discussed above is for the SE 2.0 community). To continue to thrive, I firmly believe the site needs to move on from this platform, since it is almost certainly going to be closed down from under it within the next 12-18 months. This presents the site owners (and us, the community) with a choice.

  1. Migrate the site to SE 2.0
  2. Change to an open-source alternative Q&A platform
  3. Roll-our-own site, with the functionality we require

I will start by ruling out option 3. Bioinformatics teaches us the perils of reinventing the wheel when it is not necessary. An effort to write a custom-built platform for BioStar would be almost entirely redundant, undertaken on the free time of the community (free-time which could be better spent answering questions on BioStar), and almost certainly offer no tangible benefit over using one of the already available Q&A engines. (Think Facebook-for-Scientists…)

I used to be firmly in the camp supporting option 1. I genuinely love Stack Overflow. I have found great utility in some of the Stack Exchange family of sites. However, the attitude betrayed in both Spolsky’s tweet and the closure notice on the Atheism Stack Exchange site makes me think that BioStar would be left out in the cold if we attempted this migration. Let’s look at how BioStar measures up to these numbers:

  • Questions per day (SE 2.0 recommends – “15 questions per day on average is a healthy beta”)
    • Since 30th September 2009 BioStar has received 1,681 questions – that’s 3.13 questions per day
  • Percentage answered (SE 2.0 – “90% answered is a healthy beta”)
    • BioStar does well here. There are currently 47 questions with no upvoted answers – about 2.8%
  • User group (SE 2.0 – 150 users with 200+, 10 with 2,000+, 5 with 3,000+)
    • We have 14 users with 3,000+, 24 with 2,000+ and (by my count) 142 with 200+. But BioStar has been going for 18 months, the atheism SE site was shut down after 2 months in public beta
  • Answer ratio (SE 2.0 – “2.5 answers per question is good”)
    • I don’t have easy access to precise numbers for this, but it’s around 3 answers per question on BioStar
  • Visits per day (SE 2.0 – “1,500 visits per day is good, 500 visits per day is worrying.”)
    • I have no stats at all for this, but I’m willing to put good money on the fact that daily numbers are much closer to 500 than 1,500.

By these criteria, and judging by the Atheism Stack Exchange linked to by Spolsky, BioStar would fail to emerge from SE 2.0 beta, based on current numbers, and any effort the existing community put in to get it that far would be wasted. And I don’t think the audience of the site would be grown dramatically by it being a Stack Exchange 2.0 site. I think we have to accept that Bioinformatics is a niche subject with a relatively small potential audience, one that is not going to be especially interesting to a commercially driven exercise (such as Stack Exchange necessarily has to be).

So that leaves us with migration to an OSS alternative as the only remaining option. There are a number of platforms available, some of which offer an experience extremely close to ‘real’ Stack Exchange. I would pick one of these that allows an existing SE XML dump to be imported, and migrate the site as soon as possible, certainly within the next 6 months. There is no question that the change over will be painful, and will probably cost the site a few users, and some traffic in the first instance (the biostar.stackexchange.com URL will have to go, for example), but I am confident in the community that has been built around the site – it will survive, and will be all the stronger for the change.

Besides, if we look at the facts in the cold, hard light of day, we really have no choice.

5 Best Data Visualization Projects of the Year – 2009 | FlowingData

The top visualization project of the year, according to FlowingData, is a project by Ben Fry, which shows changes to the theory of evolution over time. The project takes advantage of the publication of the full works of Darwin online to trace changes in the text of On the Origin of Species over time.
A deserved winner, and very apt, considering the batch of anniversaries that have gone by this year.

Google Acquires AppJet – are there any live, functional alternatives to Etherpad?

We are happy to announce that AppJet Inc. has been acquired by Google. The EtherPad team will continue its work on realtime collaboration by joining the Google Wave team.

[…]

The EtherPad site will stay online through March 2010 with some restrictions.

[…]

No new free public pads may be created. Your pads will no longer be accessible after March 31, 2010, at which time your pads and any associated personally identifiable information will be deleted.

[…]

Etherpad was a nice little tool, very effective at what it offered, I’m sure the guys who developed it will bring a lot to the Wave party. But seriously, Wave is nowhere near functional yet, it’s confusing, and glacially slow. So is there a decent alternative to Etherpad that is usable – right now?

ISMB – attending without travelling

ismb2009It’s ISMB time again, and as colleagues jet off to Stockholm, I can’t help but feel that twinge of envy. Lucky then, that the conference organisers have a fantastic attitude towards live- and micro-bloggers, and after the success of last year’s efforts (see the FriendFeed room, and the paper), they are positively encouraging more of the same this year. There’s a FriendFeed room again, where a new thread will be posted 10 minutes before each talk, and with a number of dedicated FFers present, there’s sure to be some fantastic live coverage. I’ll be following along.

MMR scaremongerer sicks the legal dogs on Ben Goldacre

Let the blogosphere and twittersphere spring to his defence!

See here for full details, but a London broadcaster had a half hour long rant on her show about the ‘dangers’ of the MMR jab on 7th January. Ben Goldacre subsequently posted the entire, repulsive, segment on his blog, to show this woman up for the scaremongerer she is. The radio station she works for has now set the lawyers on him, insisting he cease and disist.

So I am reposting his plea for help, and posting links to the relevant content (original post here, complain about the broadcast here). If you have the know-how to help him out, please do so.

EDIT – You can get the audio of the original broacast from YouTube or WikiLeaks… if you want your head to explode with frustration… Also note this graph, which illustrates the very real effect of irresponsible woo like this.

Twitter and Me

This is a bit of a follow-up to my post about FriendFeed. I registered for Twitter at the same time as FriendFeed, and while I immediately saw the value of FF for a long time I only saw Twitter as a tool to broadcast work-related ideas and thoughts to FriendFeed. I saw it as having little utility in it’s own right.

My follower/following count slowly increased, driven by FF, I tended to reciprocally follow people, and the few people who were following me found me there. Then, early this year, David Bradley posted his list of 100+ Scientwists, and I thought: ‘hey, I’m a scientist, and on Twitter… maybe I should be on that list”. So I got included, and then got a sudden upsurge in followers.

Twitter Counter

Not all these followers were on FF anymore, so I couldn’t follow my Twitter traffic on FF (without creating a whole bunch of imaginary friends, which I didn’t want to do). So I had to start following Twitter properly.

This has led to a more interesting conversation developing. I post more @replies, and am receiving a few more in return (though often from the desk next to me in the office), and though I don’t have any specific examples like I did for FF, I feel I am gaining more value from Twitter as a tool in its own right.

I have tried a number of apps to monitor Twitter traffic, but none of them quite fit into my workflow properly (though TweetDeck comes closest, and is much better than Twhirl). However, I got a 3G iPhone yesterday, the Twitterific App seems great, and in the future I suspect most of my Tweeting will be done on that platform.

Finally, there has been a lot written about Twitter in the last couple of weeks, and in particular about the number of ‘celebrities’ tweeting, both real and fake. For my part, I do follow a few, and get most value from @stephenfry (bonus linky), who really seems to ‘get it’ (as he does most things, though how he keeps track following over 30k people, I’ll never know), and @dave_gorman (bonus linky), who, like me, is just learning the value of the platform. But the real value of Twitter is again, like FF, in the quality of the science conversation on there, and how it makes me feel connected to a worldwide community of like-minded people.

Wordle

I’m a bit behind the zeitgeist here, I know, but Wordle is a pretty cool little tool. I think the cloud of my CiteULike tags fairly reflects my academic interests, and the fact that even after 5 years away from the lab, ‘wet’ techniques dominate, I really should make more of the bioinformatics involved in the papers I bookmark.

Wordle Tag Cloud of my CiteULike Library

Wordle Tag Cloud of my CiteULike Library

Find it on Wordle here.

A few notes on technique. I extracted the tags from the RIS export of my CiteULike library using this Python script, I had to limit the represntation of words to 25, otherwise ‘proteomics’ would have dominated to such an extent that virtually none of the other tags would be visible (except maybe ‘protein-protein-interactions’). But then I am responsible for supporting proteomics researchers, and predicting and validating protein-protein interactions is my major personal research interest. So I guess this is fair enough.

FriendFeed and Me

Apparently its bad for a blog to be introspective, and always about the author. But I’m unrepentant. What do people write about if not themselves, even indirectly? So here’s another post about me, and about my participation in a small web revolution.

Next week marks 6 months since I registered my account at FriendFeed (and simultaneously, Twitter). Ally posted yesterday about her moment of epiphany with the ‘lifestreaming’ site, and I know other people have blogged about it’s impact on their online lives, and I thought I’d do the same as a bit of a retrospective.

Briefly, FriendFeed is a site that aggregates information from other sites, and shares it with the world. I collate the feeds from this blog, Twitter, CiteULike.org, del.icio.us, Flickr, Google Reader and a few others there. People can subscribe to this amalgamated feed, and get an idea of my interests and what I am upto.

I try to limit my activity on FF (and, consequently, Twitter) to stuff that’s purely work-related (although real life does occasionally creep in), and because of this ‘work-stream’ approach, it has become an increasingly indispensable tool in the pipeline of information discovery and my scientific ‘social life’.

The following are a (direct or indirect) result of my participation at FF:

  • I have finally learnt Python, and made it my programming language of choice
  • I have adopted Git for version control, and have several repos on GitHub
  • I bought this domain, and set up this blog
  • Found countless papers and blogs I may have missed

As a more concrete example of the power of FF, I am currently involved in a project looking at co-evolution of bacterial proteins, and am employing Statistical Coupling Analysis to score multiple sequence alignments. This method produced thousands of scores across an alignment, and the best way of viewing them is by constructing a sort of heatmap. I was using Gnuplot to do this, and my maps looked something like this:

This is not terribly useful, because you keep having to check the legend to see whether red is ‘hotter’ or ‘colder’ than yellow, etc. Then, one morning last week, I saw a link on FriendFeed to this blog post, and following the very wise suggestions in that post, I worked out how to redraw my plots so they now look like this:

This makes it much easier to tell at a glance where the hotspots are to be found in the alignment. It is just one blog post, but I would never have found it without FF, and it is a useful illustration of how this new workflow has changed my productivity.

So, for the next six months, and on into the more distant future, what role do I see for FF in my work life? Well, for a start I need to participate more. I am constantly aware that I should comment more, and even just ‘like’ more stuff. Contribution should also take the form of propogating things to FF for others to see. Most of my Feed consists of articles at CiteULike and Tweets. By posting more stuff to FF directly, and by sharing interesting articles on Google Reader, I’ll be providing more grist to the mill of conversation than I currently do. And I want to be an active member of this community, I like the people, I’ve got a lot out of the last 6 months of (relatively) passive interaction, and want that to continue, but I should no longer be a passenger.