sysadmin


12
Feb 10

Unfuddle Git Backups – How to Actually Use Them

I really like Unfuddle. The service is easy to use, and there are a lot of great features in there. The documentation is… lacking, however.

One of the things I like is the ability to get a full backup of all my project data, repositories, etc in a single tarball. You can even ask them to keep a copy in your own S3 account.

To create a backup do the following

1) log into Unfuddle and goto the Project page.
2) Click the ‘settings’ tab then
3) Scroll down till you see link that says ‘Request a backup of this project now’ link. Click it.

In a few moments you’ll get an email, and you’ll see a new link on the right hand side of your project settings page that includes a timestamped backup. This backup is a tarball that contains all the GIT repositories and some other files like a backup.xml file which looks like all your tickets.

To use the git dumps run the following

mkdir reponame
cd reponame
git init
git fast-import < ../my-unfuddle-backup.git.dmp
git checkout master

You’re done!

If you’re using subversion repositories there is documentation on how to use these repo backups on Unfuddles website.


5
Feb 09

Why Serverside Javascript Matters

Javascript is a popular scripting language that comes embedded in most browsers. It’s usually what’s responsible for making your browsing experience as rich as it is, and for this reason we tend to categorize it in the realm of client-side development. In fact, running javascript on the server is odd enough for the phrase ‘Server-side Javascript’ to have been coined in the first place, but it isn’t exactly a new idea. Livewire, Netscape’s Enterprise Server product included server-side javascript functionality in 1996. But it hasn’t really caught on. Writing server-side code in PHP, Ruby, Python and Perl, ASP.Net and Java has been the “way we do things” and javascript remained something you messed around with once you wanted to spoil your users with a  richer experience. Before I explain server-side Javascript adoption, we need one important piece of background information.

There are economic concepts that dictate how you use services and hosting on the internet.

Do tell.

Computing is really cheap. Think about all the email that Gmail handles in a day. It’s so cheap that advertising can pay for it. But the “Network is the computer” after all, so we have to think about what it takes to get that information in and our of these clusters of cheap computing, and that’s the rub. Amazon charges $0.17/gig to get your data out of EC2, which is equivalent to almost two hours of their cheapest computing instance. This is a good scenario if the task you send to your cheap compute cluster can be defined in a very small package, and yields a relatively small result but typical web services and applications don’t work this way. The point is: Its cheaper to move the computing than it is to move the data.

So what?

This all clicked for me when I messed around with Freebases development environment, “Acre”. Acre is great. It lets you create, edit, and host your applications through a browser. Not only had I been messing around with Acre, but I’d also been toying with the idea of using Freebase as a mechanism for validating and normalizing data. The problem is asking Freebase for a bunch of information on say, “every city on the planet” is pretty expensive. Not only do you incurr a network transfer cost, but you then have to process the information. Not exactly ideal. But what if I could pose a question to an application running at Freebase? What if, instead of pulling out all the information about every movie and creating your own Freebase-based IMDB, you could host it right next to the data source. You get all the benefits of transferring the ‘heavy stuff’ over the WAN, and the browser gets the good stuff, but only when it asks for it.

This is why server-side javascript is perfect

Hosting Ruby, PHP, Python, etc is kind of a pain in the ass. Well its easier than it used to be but it could be a lot better. If I had to choose something relatively lightweight to interface to my data-source and create that rich browsing experience, you’d probably pick Javascript. My initial impression is that depending on your data-source, scaling it would be easy, too. Running computing close (as in LAN close) to the data-set means a few things

1. You can create cheaper mashups

2. You can eliminate all the cruft from your data before it gets sent over the wire

3. You can create nifty applications and ask them short questions that yield short answers but require huge amounts of data to determine

ZOMG How do I start?

You’ll have to learn javascript, and as a hosting or service operator you’ll have to choose an application for running it server-side. There are a few options. Trusty Wikipedia has a lengthy list of Server-side Javascript implementations. I’d recommend checking out the following:

Rhino

Spidermonkey

V8

AppJet

Jaxer

-Trevor


16
Nov 08

How to Monitor Usage Patterns in ActiveMQ

ActiveMQ is an enterprise message bus that’s completely open source. It’s great if you want to tie together a bunch of different services, or act as your own personal Simple Message Queue (SQS). It supports a few interface methods such as JMS, Stomp, XMPP and plain REST. You can learn more about ActiveMQ here.

There are a few monitoring solutions for ActiveMQ that will let you know when its broken, but I needed to grab usage patterns over time, so I would be able to automatically spin up more workers. I didn’t see anything quickly available so I threw this together:

#queuemonitor.rb

# queries the activeMQ status XML file and returns relevant data.
# I'm sure there's a more elegant way of creating method directly out of xml.
 
# Rather than have each queuemonitor as a seprate class with one name, we
# might want to be able to parse multiple queues on the same server. So we have
# an array of queue names (@queues) that contain the names of queues we want to monitor.
# if you want to monitor more than one queue then you'd have to create a new QueueMonitor
# instance.
 
class QueueMonitor
 
  attr_accessor :url, :queues
 
  require 'net/http'
  require 'rexml/document'
 
  def initialize
    @url="http://127.0.0.1:8161/admin/xml/queues.jsp"
    @queues = []
  end
 
  #add a queue to the list of queues that we'll pay attention to
  def addqueue(queue)
    @queues.push(queue) unless @queues.include?(queue)
  end
 
  def delqueue(queue)
    @names.reject!{|q| q == queue}
  end
 
  def query
    results = []
    date = Time.now
    xml_data = Net::HTTP.get_response(URI.parse(@url)).body
    doc = REXML::Document.new(xml_data)
 
    doc.elements.each('queues/queue') do |queue|
      name = queue.attributes["name"]
      # only list from Queues listed in the '@names'
      if @queues.member?(name) then
 
        queue.elements.each('stats') do |ele|
          size = ele.attributes["size"]
          consumers = ele.attributes["consumerCount"]
          enqueue = ele.attributes["enqueueCount"]
          dequeue = ele.attributes["dequeueCount"]
 
          queue = { 'name' => name,
                    'size' => size,
                    'consumers' => consumers,
                    'enqueue' => enqueue,
                    'dequeue' => dequeue,
                    'date' => date
          }
 
          results < < queue
        end
      end
    end
    return results
  end
 
end

You can then start monitoring your queue servers with something like this:

#!/usr/bin/env ruby
 
require 'queuemonitor'
 
mfreq = 30
@monitor = QueueMonitor.new
@monitor.url = "http://67.202.41.64:8161/admin/xml/queues.jsp"
@monitor.addqueue "pubsub.pings.spider"
 
#do something interesting like write to a file
#screen or database
 
def query
  date = Time.now
  results = @monitor.query
  results.each do |q|
    puts "#{date} #{q["name"]} #{q["size"]} #{q["consumers"]}"
  end
end
 
#You'd want to use daemonize here...
 
loop do
  query
  sleep mfreq
end

It works pretty well. Right now I’m just writing the results to a file to parse later, and so you can setup other parts of a program that would automatically spin up new workers based on a set of circumstances. The algorithm for that is the hard part and will depend on a bunch of your own rules.


15
Jul 08

Feature as a Service

Websites have gone from hand-typed static pages, to massive applications with every feature under the moon. Most applications have some secret sauce that does magical things in the background – whether that be the ability to handle massive amounts of volume, reduce the barrier to entry into a market, or just keep users engaged by providing endless amounts of quick short updates.

Take Amazon as an example. Amazon operates their environment as a bunch of different groups, each running different services within the same company. S3, EC2, Payment Services. They’re all independent, highly scalable functions, tied together in the application we call Amazon.com.

Companies and startups are starting to break this operational model open, and putting those individual functions online for everyone. They’re building services that do something really well – or rather that do one thing really really well. They’re companies that focus on a specific function or feature and are open enough so creative people can say “I’m going to take this, this, and this – mix it in a pot and voila!”.

Do you want to build your own Twitter? Find an SMS gateway, Cloud Computing Host and XMPP service provider.

Do you want to build an interesting RSS/ATOM service? Find an RSS aggregator service and pour on some glue – see what sticks.

It’s Feature as a Service world (to use an already overused description). Eventually cloud companies will realize that doing one thing really *really* well is tremendously valuable. Why does everyone have to build their own DNS service? Why does everyone have to build their own hosting system? What about Enterprise Storage, Authentication, SMS Gateways, Massively scalable XMPP services? How come I have to do that myself? Can 10,000 messages sent through a jabber server be worth a dollar? I think it can (maybe the math needs adjusting but you get my point). We’re all really just building a massive computer called the internet, only with each big trend we replace ‘The Internet’ with something else. First it was ‘The Web’, then it was ‘Web 2.0′, and now its ‘The Cloud’. The fact of the matter remains – the further along we go the more tightly knit the internet becomes, and that means that theres opportunity for programmable white label services to propel us further and faster.