March, 2008


15
Mar 08

Open Virtual Machine Format

Open standardized protocols are what made the Web possible. We have standards all the way up the computational stack, from agreeing on which pins mean what in a wire, to what an X button means in a user-interface. Companies who don’t embrace them are destined to isolate themselves on tiny technological islands.  Specific implementations, however, don’t have to be shared and open. Huge markets with tiny verticals of implementation lock out competition, but they also prevent innovation. But sometimes something beautiful happens, and people get together to support a new kind of standard. An open, extensible standard that can be written and read by anyone. One place where this is just starting to happen is with computer virtualization.

The Open Virtual Machine Format, or OVF is a proposed universal format that aims to create a secure, extensible method of describing and packaging virtual containers. Because the standard is open, it means any environment supporting the standard can import and export those virtual machines between different hypervisor platforms. The current OVF specification includes definitions ranging from virtual machine metadata and disk format, all the way to detailed hardware specifications and logical network information. It also provides an ability for the virtual machine itself to get information from the hypervisor host, meaning that if you’re creative you could create some really nifty automated integration and deployment tools.

If that doesn’t mean much to you, then consider this: Industry heavy-weights like Dell, HP, IBM, Microsoft, VMware, and XenSource all took part in drafting the specification. As far support tools go, VMware has published what appears to be the first OVF container creation tool, available here.

There IS a big problem with OVF right now, and a lot of bloggers and analysts out there are getting it wrong. OVF is not and does not define a new virtual disk format, simply a wrapper around them. This means that OVF support doesn’t enable you to drag and drop virtual machines between Xen and VMware. Some formats can be converted externally using tools, however most of the current techniques involve booting up a system, and running a migration tool to be able to convert the image – not exactly ideal. OVF does include the ability to describe your specification in an HREF, which means that you could publish your spec, and create a system that could modify containers on the fly.

If Vmware, Xen, and Parallels are technolgical islands, then OVF may one day be the bridge that will allow you to travel between them.

Update It looks like OVF will be announced formally at the Catalyst 2008 conference. More information here.


13
Mar 08

Google Sky Online

Google Sky is online now – you don’t need Google Earth to browse the heavens. I expect this to be part of every science class curriculum from now on. Now that everyone has access to this kind of data (w/ zoom!) it means that amateur astronomers can help catalog and identify phenomenon. Never underestimate the power of the crowd!

Is nice.


11
Mar 08

Picture This – A Better Kind of CAPTCHA

In the defense against bots and automated scripts, people often employ the CAPTCHA strategy. Typically, a CAPTCHA is a series of letters that are non OCR friendly, and are difficult for humans to read. The reason they stop computers is because they’re deformed using graphics transforms such as pinch, whirl and rotate, which are difficult for computers to understand, but people have an easier time understanding what’s there. Not that easy though. Often times captchas are complicated, and difficult to read. Captchas that use random strings of letters and numbers are a deterrent to signing up for a service – especially if you can’t get it right after a couple of tries. Some sites will generate sound files that will tell you what to type in, others will simply use words and ask you to put them into the text box, but both of them make it slightly easier for a bot to sign up for your service. Anyone who writes automated tasking can tell you that once you can automate interaction with any kind of service, it can be exploited for *some* purpose. Text-based Captchas such as questions about history or arithmetic are cool, but after you get to a certain market share and have enough exposure someone will automate answering those questions.

An alternative to text-based Captchas are Picture-based Captchas. Simply put, a Picture Captcha asks a user to identify objects in an image – which is MUCH harder for a computer. This isn’t a new technique, but it seems to be getting a little bit more market share. Microsoft Research (Don’t say it) has Asirra, which are similar to (or copied from?) the much-hyped KittenAuth. HumanAuth is another implementation, as well as ESP-PIX

My favourite by far is HotCaptcha. It uses hot or not and requires you to select 3 images of ‘hot people’. Because we all tend to percieve beauty the same way (Symmetry, propotions, etc) it’s pretty well understood what ‘hot’ is. The 3 images are required because otherwise it would just be a 50/50 guess. The same goes wiht KittenAuth. You need multiple identifications in order to prove that you are indeed a human.


All in all it’s cool stuff, and because user-registration pages suck, we should make them easier to use – maybe even fun.

-T


10
Mar 08

Cloud Economies – The Compute Time Black Market

Here’s an idea. A ‘black market’ for cloud computing. As with any market there are two parties – Buyers and Sellers. Buyers are looking to have big chunks of data processed, and sellers are looking to make the most use of their environment.
Buyers

At night I’d like to render a batch of 100 High-def videos to a DVD image, or perform some statistical analysis on files using a system like Hadoop. As this kind of user, my data isn’t time sensitive. It was going to take a few hours to do anyway, so I might as well just have it in the morning. But I don’t want to maintain all those server instances, and besides I don’t need them all the time.

Sellers

I run a website that gets ridiculous amounts of traffic during the day. My environment requires many instances of web and database servers, all of which are being used to about 70% CPU utilization – During the day. At night my servers lie mostly dormant, consuming only as many cycles as required to answer a quick burst in traffic, balanced across my many servers. To offset the costs of running a 24×7 operation I’d like to sell some of my compute time to the highest bidder. “50 GigaFlops – Starting bid is 20$/hour”

Just run a package like Folding@Home or the SETI project and sell access. The system has to support saving state so you could resume a job that carried over the time window. You could then sell your CPU time to offset the cost of hosting your website.

I’m not quite sure how many people out there would need that kind of power, but I’m curious as to what people would pay for grid computing by the hour, and how you might effectively host it on an already existing environment.


google.load("language", "1"); var curstate = 0; var hasloaded = 0; function bnc_show_translated() { if (hasloaded == 0) { bnc_lang_callback(); hasloaded = 1; } for (i = 0; i < 0; i++) { var elem = $("bnc_original_" + i); if (elem) { if (curstate) { elem.show(); } else { elem.hide(); } } } for (i = 0; i < 0; i++) { var elem = $("bnc_trans_" + i); if (elem) { if (curstate) { elem.hide(); } else { elem.show(); } } } if (curstate) { $("bnc_trans_state1").show(); $("bnc_trans_state2").hide(); curstate = 0; } else { $("bnc_trans_state1").hide(); $("bnc_trans_state2").show(); curstate = 1; } } function bnc_detect_div(div_id) { var text = document.getElementById(div_id); if (text) { text = text.innerHTML; if (text.length > 0) { google.language.detect(text, function(result) { if (!result.error) { if (result.language != "en") { if (result.confidence > 0.25) { $("bnc_translating").show(); bnc_xlate_div(result.language, div_id, "en"); } } } } ); } } } function bnc_xlate_div(src_lang,div_id,o_lang) { var text = document.getElementById(div_id); if (text) { text = text.innerHTML; google.language.translate(text, src_lang, o_lang, function(result) { var translated = document.getElementById(div_id); if (result.translation) { translated.innerHTML = result.translation; } }); } } function bnc_lang_callback() { } function bnc_startup() { bnc_xlate_div("en", "bnc_translate_info", "en"); bnc_xlate_div("en", "bnc_translate_info2", "en"); } google.setOnLoadCallback(bnc_startup);