“The world is diverse. Act accordingly.”
—Prof. Dr. Stefan Edlich, in his talk on object databases.
Where do you store your data? In a relational database, of course. It’s so convenient to use the persistence store we are used to, which has been there for us since the day we started programming. But in the spirit of using the right tool for the job – and making our lives easier – it pays off to know other persistent storages—those which aren’t based on the RDBMS/SQL paradigma. They promise to be better suited for some of the problems we face day to day; mapping the real world to a persistent storage, scaling, and reliability being among them.
The NoSQL meetup in Berlin gave a great overview of this active and growing scene, and shed some light on the characteristics of the main tools. Here are some rough notes from the meetup. For the full monty, all video and slides of the talks are available at the NoSQL Berlin website. Thanks guys for the perfect organization!
Consistency in Key-Value Stores (Monika Moser)
The only talk which wasn’t about a specific database. It gave an introduction into the problems and solutions that we face when working with many database servers (nodes). Since the written data has to be distributed across many physical machines, there will be a noticable delay until every node has received the updated data—the replication lag. Only after the replication lag, all nodes will contain the same (=consistent) data.
Two types of consistency were distinguished: Strong consistency (updated data is immediately available to all processes in the system) and eventual consistency (at some point in time all processes will get the update).
Strong consistency is usually expensive to implement on larger systems and isn’t always necessary, so eventual consistency is often acceptable. Depending on the use case, one can go for one of these subtypes of eventual consistency:
“read your writes” consistency
The process that wrote the data will always get the latest data. Other processes may still get old data for some time.
session consistency
A special case of the above: only the session that wrote the data is guaranteed to get the latest data immediately.
monotonic read consistency
after one process has read the new data, all following reads get it. So once the new data is in the system, the old data doesn’t appear again.
Monika went on to describe the CAP theorem (choose 2 of 3 for your storage setup: partition tolerance, availability, consistency), the reasons strong consistency is expensive, and the Paxos algorithm (good trade off between fault tolerance and consistency). See the slides and video for details!
My personal summary of the talk: good overview with lots of pointers to further info. And I’ll care about the details I didn’t grasp when I first need them.
Redis, Fast and Furious (Mathias Meyer)
Redis is awesome, I heard someone say.
Oh … Redis is also like memcached, but with extra features: persistence, additional commands (increment values, sets, push/pop, sorting, a text-based simple protocol). It is also slower than memcache, but not so much you would care.
According to Mathias, Redis is put to good use when storing statistical data (as long as it fits in memory!) and implementing worker queues.
Peer-to-peer Applications with CouchDB (Jan Lehnardt)
Jan contradicted himself on the first slide. It read: “Relax.”. Then he started a 10_000 WPM (words per minute) presentation, that still managed to raise my interest in CouchDB again. The presentation was about the “what can it do” instead of “how to do it”. Good choice to go this way.
a nice explanation “CouchDB is built “of the Web””—REST, JSON and HTTP are core technologies of the database.
Learning curve: store full documents, not relations (JSON). No data normalization into tables => make developers happy, not computers.
meant to be robust: append-only design for the database file. on crash, old data is not damaged.
scales out (horizontally). Does Master-Master replication. No scaling built in, but prepared (use couchdb-lounge). Then a scaled CouchDB cluster looks like a single DB from the outside.
scales down (runs on small devices). Own your own data, take it with you on your device.
incremental map-reduce: after updates, only the affected documents get reindexed
as with any document-oriented database, store full documents as JSON, not relations. Good tip in the Q&A: a document is something that will be updated and used as a whole. “put stuff into seperate documents when it is updated seperately”. There’s no clear guideline however, it depends on the use case.
RESTful HTTP: “text-based protocol is not slower than binary” / “all HTTP infrastructure and tools can be used”
BBC uses CouchDB in production, after a survey/comparison of storage solutions.
document-oriented DB like CouchDB. “Riak combines a decentralized key-value store, a flexible map/reduce engine, and a friendly HTTP/JSON query interface to provide a database ideally suited for Web applications.”
100% awesome. Though disputable, even more awesome than CouchDB.
the Riak “Data-Sphere” consists of: Bucket x Key x Document
GET/POST/PUT /jiak/<bucket>/<key>
travel the graph/links between documents with map/reduce
but: travelling links is expensive (no caching of map/reduce result, although possible to implement it yourself)
Bucket: can have as many keys as you want
chainable map/reduce stages—unique feature of RIAK
“It is extensible and configurable in many ways. Riak is a perfect fit for buiding reliable and scalable custom data storage systems.”
unfortunately my brain went offline through the second half of the talk … See the video
MongoDB (Mathias Stearn)
Quote: “I won’t tell you MongoDB is awesome. But I hope you’ll know it is after my talk.”
Mongo as in “HuMONGOus” scaling
Schemaless; data organized into Databases and Collections (like tables). But document-oriented (not a K/V store)
Good when you don’t know up front what you will be looking for (example: logfile analysis), and want to store everything.
extended JSON with data types Date, Int32/64, OID, Binary and called it BSON. B as in binary.
Wants to integrate with native language as well as possible. I.e. “db.users.find({$where: “this.a + this.b >= 42”})” instead of “RestClient.get ‘http://example.com/resource’”. And, btw. old-school C++.
changing only part of a document is possible. features: $set, $inc, $push, $pull, $remove for subdocuments
“you can put all data in one place, MongoDB”. Get rid of RDBMS.
works for 1 billion documents.
map/reduce + finalizers.
uses the eventual consistency model (see first talk)
uses MMAP database files (OS kernel) to automatically use available RAM
async modifications: no server response, client doesn’t wait. good for bulk inserts.
good for: websites, complex objects, high and low volume sites, real-time analysis.
bad for: complex transactions, business intelligence
4th Generation Object Databases (Stefan Edlich)
these have been around for 15 years
“no impedance mismatch”. These DBs are very nice to work with OOP—no disassembling objects into tables required, and back. Just dump full objects, and load them again.
but: when refactoring code, DB has to change. No insulation between code and data.
looks as if they are alive and well in a specific niche (extremely large datasets)
typical applications: transportation networks, tree structures, social graphs, object traversal, capture space (grok this!). He gave an example of one OOD application which stores 3.2 Mio. objects per second, 1TB of data per hour.
no convincing answer why OO databases haven’t entered mainstream, while OO programming has—it sounds like such a good idea. My impression was they are a great tool for specific uses (high performance, huge scale), but exotic and commercial solutions with high up-front investment.
Somehow I wonder if document-oriented databases will make it, when object-oriented DBs haven’t …
A talk not held: Neo4j – The Benefits of Graph Databases
There was no talk on GraphDBs, which are designed to store nodes and the relationships between them. As in social networks (nodes = people, relationships = connections/friendships). Slideshare to the rescue, it has Neo4j – The Benefits of Graph Databases. There also was a talk on Neo4j at NoSQLEast.
Update: added links to conference-related stuff at the bottom of this post.
Some notes, roughly chronological, left in draft state.
Rails developers usually don’t seperate data access layer and domain model.
This can constrain how easily the domain model can be changed. If done, saving/loading and validating data is on the DAOs, and “the interesting stuff” (business logic) lives in the model objects.
Q: how do you develop a domain model? A: may should be explained in Analysis Patterns
SASS and lesscss are nice extensions to css. They require processing the CSS, however.
at least three German-speaking universities now have courses where they use Rails (Bremen, Potsdam, Salzburg).
Refactor vs. Rewrite. First, “find out the hard core of what the client actually needs”. Be brave and delete, change.
clients of “rescue mission” projects didn’t get what they wanted from their last dev shop. The time and money reserved for the project are usually already spent, so they are in a hurry. => as a dev team, you need to show progress as early as possible.
do the agile thing as well—prioritize by business need
Don’t change code that you don’t like but which works well. Overcome your own prejudice and deal with the client’s money responsibly. Part of being professional, imho. Resist the Not invented here syndrome. Especially if the code is well tested. You can always refactor it when continue to work in that area.
don’t dive into removing complexity as a first refactoring step. Look for easy targets first.
Watch team morale on legacy code projects. Always pair.
Read the Refactoring book before starting, and really apply the techniques step by step when doing non-trivial stuff. Always keep the application running while changing structure.
When coding normal apps, refactor as you go, don’t see it as a separate activity, don’t speciallly reserve time for it.
always manage your client’s expectations. Underpromise, overdeliver.
JRuby has the by far best compatibility of the alternative Ruby implementations. It has an extensive test suite.
It allows you to change between 1.8 and 1.9 with a command-line switch.
ActiveRecord via JDBC is slow.
JRuby is the only Ruby implementation with real native threads.
Rack allows inserting code before and after the application handles a request. And allows plugging together different frameworks and components, and access session data from one in the other via Rack::Session. “Middleware” examples: Rack::Profiler, Rack::MailExceptions, Rack::Cache.
Rails 3 release: “could roll it up and ship” any time. Rails development has always been like that. There’s never a “Todo” list of what will go into a release.
They will do so when they feel they have done enough. But at least one thing Yehuda would like to do is get ActionMailer on the rewritten ActionController code.
to introduce new technologies in places reluctant to change, first do ugly or boring stuff no one wants to do anyhow. With Ruby that could be: automate manual processes, write a test tool, small internal applications, quickly build prototypes, wire together systems. Realize that Ruby is perfect for glue code. Introduce the techniques (agile), not only the technologies.
A couple of experienced people fear that the new JVM Scripting languages (Clojure, Scala, ...) may stop the stream from Java-resignees to Ruby.
CouchFoo is intended to allow smooth ActiveRecord/RDBMS => CouchDB migration. This is a good first step to get on the couch. Then you can start wrapping your head around how to persist stuff with document-oriented databases, which I find the hardest part. “Performance tuning” of CouchDB is a whole new topic to be discovered.
With couchDB, the cost of index updates is incurred at read, not at write as with RDBMS. Index updates at read can be suppressed with :update => false. Read CouchFoo::Base for performance info.
#bulk_save for performance.
a good use for document-oriented DBs is when the data structure changes often and future “schema” development is unpredictable.
CouchFoo generates views for simple AR-style finders on the fly. Nice!
Dr Nic once more proved to be the best Rails entertainer (_why is in his own league, of course, but wasn’t present to present).
the i18n gem has great new features in 0.2.0 and edge: pluggable extensions, translation procs, advanced pluralization rules (implemented with procs), translation fallbacks, backend fallbacks, etc. Using it in current Rails currently requires a hack, however. See the Unicode CLDR Project for a massive amount of localization information.
Globalize 1 happily overused metaprogramming, had to hack into Rails big-time, and as such is a PITA to migrate to the new Rails i18n. Any solutions?
Kasabian kick Oasis’ ass on stage (according to London press).
Rough trade in Brick Lane reminds you what’s cool about a real-world record store.
LBI has 400 employees, a large terrace where you can work, and friendly people doing lots of barbecues.
ExtJS is a useful rich client library with nice client-server data transportation, interface elements and data binding. It doesn’t have to look like Windows. It lacks a high-level architecture, though. It’s not free for commercial work (150 per developer), only for open source.
Food in London is better than expected; even the traditional (Lamb stew, Apple Crumble & custard). Girls are cuter than expected, as well.
London weather follows the same patterns as in Hamburg. Quick rains, lots of grey skies, sometimes sun. A bit warmer.
Kevin Davy played the trumpet for Lamb, on tracks like Merge. Today he has fun playing around with electronic effects at his Jazz gigs.
””Now wash your hands””:http://www.flickr.com/photos/phil76/3759350196/in/set-72157621719325175/ was a design agency that built cool stuff in their time. Today only toilets in Indian restaurants remind of their glory.
Hashrocket has guest pairs regulary. You can visit them at Jacksonville, Florida, stay at their guest house, and pair with them on the regular work.
London is green, can be sunny and beautiful.
a taxi from Russell square to Denmark Hill costs less than 20 pounds. Good if you’ve already spent the same amount on beer.
the mapping of the British pound shapes and sizes to their value is only obvious to the British themselves. They lovingly call the coins shrapnel.
Conaissence can be seen as underlying principle of many OOP design rules. And it’s a word that only Jim Weirich uses, so far.
Paul Campell is the best storyteller of the Rails world.
sheepexchange.com is a new agile venture from Ireland you should be following.
Embedding video is easy in HTML 5, use the native <video> tag. Firefox 3.1+, Safari 4+ and Google Chrome (Windows) already have experimental support for it.
The HTML 5 spec does not specify which video format and codec should be used, so naturally ;-) the browser vendors have picked different formats. Getting videos to play cross-browser still is no problem, since the <video> tag can contain more than one source. So give an ogg and a quicktime version of the video, and all browsers currently supporting the <video> tag will be happy.
Example
The code
<video controls width="320" height="240">
<!-- Firefox 3.1+, Google Chrome -->
<source src="/files/swiss.ogg" type="video/ogg">
<!-- Safari 4+, Google Chrome -->
<source src="/files/swiss.mov" type="video/quicktime">
<!-- All others (including Internet Explorer and Opera) -->
Sorry, your browser doesn't seem to support the <code>video</code> element.
</video>
How to create the video formats (OS X)
Easy. Use Quicktime Player’s export feature (File > Export). But first get the xiph Quicktime Components to add .ogg support to Quicktime.
I just published my first gem on github, all_tweets_must_die. It will become the core of a web app that regularly deletes your old tweets automatically. If you want to use twitter but value your privacy, this thing is for you.
I build the whole thing for fun and exercise. I’ll try to use as much new stuff as possible – Sinatra for the web app, Cucumber for testing it, CouchDB for storing user auth and preferences, and – if I actually get a couple of users on the site – possibly a small Erlang or Scala program for tweet deletion, connected to Ruby with Thrift. Lot’s of cool new technology, a lot of buzz. I know :-)
Ubiquity is a Mozilla Labs add-on for Firefox. It’s a new way of interacting with the browser and web content. Imagine Quicksilver, but for everything that can be reached from the browser. Common examples are controllign the browser, translating text on a web page in-place, looking up an Google maps address in-place, you imagine. It let’s you throw out half of the other add-ons. Oh, and on the Mac, it integrates with Growl.
And it’s super-easy to extend with JavaScript!
Example – using is.gd to shorten URLs
I hacked this together using the pretty good documentation. The code takes the text selection (an URL) and shortens it via is.gd.
CmdUtils.CreateCommand({
name: "is-gd",
description: "Replaces the selected URL with a short URL generated with is.gd.",
author: { name: "Phillip Oertel" },
takes: {"url to shorten": noun_arb_text},
execute: function(urlToShorten) {
var baseUrl = "http://is.gd/api.php";
var params = {longurl: urlToShorten.text};
jQuery.get( baseUrl, params, function( shortUrl ) {
CmdUtils.setSelection( shortUrl );
});
}
});
I embedded the above script into this page—if you installed the Ubiquity add-on, Firefox will notify you. Install the script, then select an URL anywhere on the page or in the location bar, press Alt-Space and type “is-gd”. The selected text will be replaced with a shortended URL. Neat!
They really have easy administration and extension in mind—go to chrome://ubiquity/content/cmdlist.html to get an in-browser interface to Ubiquity. You can directly write the scripts in there. Oh, and did I mention it already ships with jquery ?
I’m currently preparing a talk on Software Craftsmanship for Euruko 2009 in Barcelona. It’s the first time I try out a mind map for structuring everything related to a topic, and I really like it. By looking at the whole at once, you’re able to see new connections, redundancy, overlap, etc. And it’s more fund to draw stuff than putting everything in nested bullet lists.
I’ve also become a fan of Big Visible Charts, so I prefer creating it by hand instead of using a computerized tool, like MindMeister. Don’t get me wrong – MindMeister is very cool – it’s just that I don’t need it’s advantages of collaboration, change tracking etc. since I’m working on my own.
Two learnings already:
always start with a really big sheet of paper on a big wall. You never know in which direction you’re gonna need more space! I failed with that, and it’s probably limiting me currently.
use stickies to dump ideas that come to your mind when you’re not really working on the preparation. You don’t waste much time drawing and sorting, but you don’t forget and can pick up or discard the ideas later.
By the way, the Euruko conference artwork is the best I have ever seen. It’s a beautiful blend of Barcelona’s omnipresent Gaudi tiles and color tones and the Ruby logo.
Have you consciously decided to use bash as your shell? I haven’t. How often do you use the shell? For me, it’s one of the tools I use most.
So while bash does the job, why not find out if there’s something better? I’ve heard zsh mentioned a couple of times, and it seems to be smarter and more modern, while maintaining good compatibility with bash. So I’ll give it a try.
zsh Features selection
tab completion is programmable and depends on the command it is used with. examples:
cd<TAB> shows a list containing only directories
tab completion for options: `ls -<TAB>` shows a menu of available options, selectable with cursor
cd -<TAB> shows you a list of recently visited directory, select with cursor where to go. instead of pressing <TAB>, you could also have typed cd -2 to go to the 2nd last dir.
ssh<TAB> shows a list of all known hosts
kill<TAB> gives you a list of processes. another example: kill memca finds the correct pid and inserts it
cd-less directory switching (just type name of directory)
all shell windows share one history, i.e you can access a command from window A’s history in a new window B
can shorten the path shown in the prompt (“Resources/Styles/Marble $” instead of ”/Applications/Chess.app/Contents/Resources/Styles/Marble $”)
more powerful globbing. example: ‘ls \\*/\_helper.rb’ lists all *_helper.rb files in the current tree
temporary aliases for directories, using ~dirname:
~ $ work=`pwd`
~ $ cd /
/ $ cd ~work
~ $ pwd
/Users/phillip
bash-compatibility
optional prompt at the right top side
much more to be discovered … continued on my zsh cheatsheet
How to start using it on OS X
OS X already ships with zsh installed. Type zsh to try it out! It’s pretty dumb without proper configuration, so …
get a .zshrc config file, like from here for example. That page is also a good intro and contains more links to useful resources.
to use zsh permanently (OS X Leopard): go to System Preferences > User, click the lock and authenticate, right-click on your user and select “Advanced Options”, then select /bin/zsh. Open a Terminal. To check, type echo $SHELL.
if your like using Ctrl-A/Ctrl-E/Ctrl-K … to go to the start or end of your command or delete to the end of the line, you’ll need to add ‘bindkey -e’ to the .zshrc file.
A tip, independent of zsh, if you create a .hushlogin file in your home dir, the shell will not display the “Last login …” blah on startup.
Such a pleasant experience. It took less than ten minutes to enter, calculate and layout this diagram. Having opened Numbers for the first time, without reading the documentation, searching for features, swearing. This is what working with computers should be like, all the time.