I built a blog aggregator - waywework.it

November 09, 2008

I've been spending some time recently putting together a blog aggregator site for some of the folks I work with. Its now up and running at http://waywework.it. I hope this will be an interesting place to share our public community and as one of my colleagues said "this keeps my Google Reader much neater".

Today I'd like to talk about the code running this site which is posted and available on github at http://github.com/alexrothenberg/waywework.

I started thinking I would use an existing aggregator site and just apply my skin but when I did a quick search on github I most of the hard work existed in atom and rss gems and plugins and I wanted to take advantage of the just released Rails 2.2 so I decided to build my own. This turned out to be not too much work. Today I'd like to talk about how I put this together.

First I created my project with some scaffolding for feeds which would have_many posts

class Feed < ActiveRecord::Base
  has_many :posts, :dependent => :delete_all
end

class Post < ActiveRecord::Base
  belongs_to :feed
end



I soon found the atom gem and rss parser built into ruby. Using them was a piece of cake as all I had to do was create a method to call each one in my Feed model

class Feed < ActiveRecord::Base
  def get_posts_from_atom atom_xml
    feed = Atom::Feed.new(atom_xml)
    feed.entries.each { |entry|
      link = entry.links.detect {|l| l.rel == 'alternate'}
      create_post(:contents=>entry.content.value, :url=>link.href, :title=>entry.title, 
                  :published=>entry.published.to_s(:db), :updated=>entry.updated.to_s(:db))
    }
    return !feed.entries.blank?
  end  
  
  def get_posts_from_rss rss_xml
    rss = RSS::Parser.parse(rss_xml, false)
    rss.items.each { |entry|
      create_post(:contents=>entry.description, :url=>entry.link, :title=>entry.title, 
                  :published=>entry.date.to_formatted_s(:db), :updated=>entry.date.to_formatted_s(:db))
    }
    return !rss.items.blank?
  end
end



Of course I had to create the glue wrapping it all together. A rake task to be call on a schedule

namespace :feeds do
  desc "Load the feeds"
  task :populate => :environment do
    feeds = Feed.all
    feeds.each do |feed|
      feed.get_latest
    end
  end
end



and the logic to load the feed, parse it and update the posts.

class Feed < ActiveRecord::Base
  def get_latest
    puts "getting feed for #{name}"
    xml = get_feed
    got_atom_posts = get_posts_from_atom xml
    get_posts_from_rss xml unless got_atom_posts
  end

  def get_feed
    uri = URI.parse(feed_url)
    uri.read
  end
  
  def create_post params
    params.merge!(:feed_id=>id) 
    existing_post = Post.find_by_url(params[:url])
    if existing_post
      existing_post.update_attributes(params)
    else
      Post.create(params) 
    end
  end
end



The next step was to publish an atom feed of my site. Again there was a plugin atom_feed_helperwaiting to help me. I installed the plugin and created a view builder

atom_feed(:url => atom_feed_url) do |feed|
  feed.title("WayWeWork")
  feed.updated(@posts.first.published)

  for post in @posts
    feed.entry(post, :url=>post.url, :published=>post.published, :updated=>post.updated) do |entry|
      entry.title("#{post.feed.author}: #{post.title}")
      entry.content(post.contents, :type => 'html')
    end
  end
end


This was all so easy I had hardly done anything other than glue these plugins together. Now I finished up with a few bells and whistles.

I added a who's talking and archive by date section to my homepage that I called from my controller like this

class PostsController < ApplicationController
  @active_feeds = Feed.by_author
  @activity_by_date = Post.activity_by_date
end



I added security to restrict who can administer feeds

class FeedsController < ApplicationController
  before_filter :authenticate

protected
  def authenticate
     authenticate_or_request_with_http_basic do | user_name, password|
       username = YAML::load_file(File.join(RAILS_ROOT, %w[config password.yml]))['username']
       pwd = YAML::load_file(File.join(RAILS_ROOT, %w[config password.yml]))['password']
      user_name == username && password == pwd
    end
  end
end



For the UI I am somewhat graphically challenged so got some help. For this github was very cool as I could add lessallan as a collaborator and he could check in his changes so they just appeared!

Finally a little work with capistrano (mostly just creating a Capfile) and I could deploy!

Overall I spent a few days and now have a site that does exactly what I want. Where most of the code I wrote is specific to my site and the general purpose plumbing was downloaded. I'm very pleased with the availability of plugins and gems and how easy it was to collaborate using github!

Now I just hope others find the site interesting to use!