<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>World Wide Webber</title>
	<atom:link href="http://jim.webber.name/feed/" rel="self" type="application/rss+xml" />
	<link>http://jim.webber.name</link>
	<description>Jim Webber&#039;s Blog</description>
	<lastBuildDate>Sat, 18 May 2013 09:55:18 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>My New Book on Graph Databases</title>
		<link>http://jim.webber.name/2013/02/my-new-book-on-graph-databases/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=my-new-book-on-graph-databases</link>
		<comments>http://jim.webber.name/2013/02/my-new-book-on-graph-databases/#comments</comments>
		<pubDate>Thu, 28 Feb 2013 23:03:37 +0000</pubDate>
		<dc:creator>jim</dc:creator>
				<category><![CDATA[Books]]></category>
		<category><![CDATA[neo4j]]></category>
		<category><![CDATA[NOSQL]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://jimwebber.org/?p=718</guid>
		<description><![CDATA[Over the last few months as well as working on the day job developing Neo4j, Ian Robinson, Emil Eifrem and I have been working on writing a new book for O&#8217;Reilly that showcases the expressive power and technical capabilities of<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://jim.webber.name/2013/02/my-new-book-on-graph-databases/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
				<content:encoded><![CDATA[<p>Over the last few months as well as working on the day job developing <a href="http://neo4j.org">Neo4j</a>, <a href="http://iansrobinson.com">Ian Robinson</a>, <a href="https://twitter.com/emileifrem">Emil Eifrem</a> and I have been working on writing a new book for <a href="http://oreilly.com">O&#8217;Reilly</a> that showcases the expressive power and technical capabilities of graph databases. I&#8217;m really happy to announce that the fruits of that labour has now been released as a <strong>full</strong>, <strong>free</strong> early-access eBook, aptly named <em>Graph Databases</em>.</p>
<p><a href="http://graphdatabases.com"><img class="alignnone" alt="" src="http://graphdatabases.com/wp-content/uploads/2013/02/graphdatabases_v31.png" width="320" height="396" /></a></p>
<p>The cover of the book is an octopus, chosen because it looks a bit like a node in a graph connected via some relationships. The book&#8217;s available for <strong>free download</strong> from the <a href="http://graphdatabases.com">GraphDatabases.com</a> site, so head on over there and get downloading.</p>
<p><map name='google_ad_map_718_e02f1a83f08692b1'>
<area shape='rect' href='http://imageads.googleadservices.com/pagead/imgclick/718?pos=0' coords='1,2,367,28' />
<area shape='rect' href='http://services.google.com/feedback/abg' coords='384,10,453,23'/></map>
<img usemap='#google_ad_map_718_e02f1a83f08692b1' border='0' src='http://imageads.googleadservices.com/pagead/ads?format=468x30_aff_img&amp;client=&amp;channel=&amp;output=png&amp;cuid=718&amp;url= http%3A%2F%2Fjim.webber.name%2F2013%2F02%2Fmy-new-book-on-graph-databases%2F' /></p>]]></content:encoded>
			<wfw:commentRss>http://jim.webber.name/2013/02/my-new-book-on-graph-databases/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Neo4j: Facebook GraphSearch for the Rest of Us</title>
		<link>http://jim.webber.name/2013/02/neo4j-facebook-graphsearch-for-the-rest-of-us/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=neo4j-facebook-graphsearch-for-the-rest-of-us</link>
		<comments>http://jim.webber.name/2013/02/neo4j-facebook-graphsearch-for-the-rest-of-us/#comments</comments>
		<pubDate>Mon, 04 Feb 2013 12:31:07 +0000</pubDate>
		<dc:creator>jim</dc:creator>
				<category><![CDATA[neo4j]]></category>

		<guid isPermaLink="false">http://jimwebber.org/?p=681</guid>
		<description><![CDATA[The recent big announcement from Facebook was a search platform that provides answers to contextual questions. They&#8217;ve called it &#8220;Facebook Graph Search&#8221; which is a pretty big deal for those of us into graph computing since it moves the notion<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://jim.webber.name/2013/02/neo4j-facebook-graphsearch-for-the-rest-of-us/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
				<content:encoded><![CDATA[<p>The recent big announcement from <a href="http://facebook.com">Facebook</a> was a search platform that provides answers to contextual questions. They&#8217;ve called it &#8220;<a href="https://en-gb.facebook.com/about/graphsearch">Facebook Graph Search</a>&#8221; which is a pretty big deal for those of us into graph computing since it moves the notion of graphs from an interesting niche to centre stage for many developers.</p>
<p>The key aspect of Facebook&#8217;s Graph Search &#8211; at least from an external perspective &#8211; is that the quality of the answers stems from querying data <em>and</em> relationships, enabling applications to reason about multiple overlapping facts, and in turn enable both discrete (path-centric) and probabilistic (weighted) information to be returned. Importantly, the ability to, for example, find friends from a particular city and who like a certain food or movie genre is important move beyond the large but simple graph of friends and likes and into rich semantic data. Naturally Facebook have applied this technology first on their core domain &#8211; social &#8211; and their platform now supports applications that now connect us even better.</p>
<p>But Facebook isn&#8217;t alone in needing better insight into its domain. Whether our domains consist, like Facebook&#8217;s, of users or whether it&#8217;s widgets, data centres, or protein interactions, technologies like Graph-Search could be hugely valuable for the rest of us too. And yet so few of us can hope to operate at the same operational and intellectual scale as Facebook. Hiring in world-class graph boffins to build and run a platform like Graph Search is a non-starter for practically everyone except the handful of global technology giants.</p>
<p>And yet the technology exists in the form of graph databases like <a href="http://neo4j.org">Neo4j</a> for business IT solutions and Web applications and APIs to take advantage of graph data. To demonstrate that point, let&#8217;s see how we could implement Facebook&#8217;s current social Graph-Search features atop Neo4j. It won&#8217;t quite have the same operational scale of the Facebook implementation, since Neo4j has only(!) been used to store around half the Facebook graph to-date, but it&#8217;ll show how Neo4j provides those same powerful graph features for the rest of us.</p>
<p>Let&#8217;s start at the beginning, by solving the kind of question that Facebook posed when it announced Graph Search: find all of the Sushi restaurants in New York that my friends like. However, since I don&#8217;t live in New York, nor am I keen on sushi, I&#8217;m going to localise it to my conditions here in London. Let&#8217;s see what happens when I try to find the curry houses in Southwark (the borough where I live) which my friends like.</p>
<p>Parsing the previous sentence carefully reveals the interesting intersection of several domains: social, geospatial, taxonomical, and so on. Each can be considered both as a graph in isolation and composed within a larger graph. The first domain is the familiar social graph where I&#8217;m simply looking for my friends, and (thankfully) find a few by following the relationships marked <em>FRIEND</em> to other people.</p>
<p><img class="alignnone size-large wp-image-692" alt="1" src="http://jimwebber.org/wp-content/uploads/2013/02/1-1024x185.png" width="550" height="99" /></p>
<blockquote><p>Note that here I&#8217;m using the diagrammatic shortcut of double-ended relationships to show reciprocal friendships. However in real life relationships aren&#8217;t always reciprocal, and so in Neo4j we explicitly model this with two relationships, which is expressed as (Jim)-[:FRIEND]-&gt;(Simon) and (Simon)-[:FRIEND]-&gt;(Jim) in Cypher, Neo4j&#8217;s query language.</p></blockquote>
<p>The next aspect to consider is the curry houses themselves which can be easily represented by the following graph:</p>
<p><img class="alignnone size-large wp-image-693" alt="2" src="http://jimwebber.org/wp-content/uploads/2013/02/2-1024x226.png" width="550" height="121" /></p>
<p>We can see that these restaurants are indeed curry houses since they&#8217;re connected to a node representing that food category via <em>CUISINE</em> relationships. This arrangement acts as a kind of simple index or tag cloud, allowing us to find a all the establishments offering a particular kind of cuisine (Neo4j&#8217;s indexes work here too, but want to stay explicitly in the graph for now). It&#8217;s noteworthy the restaurant &#8220;Indian Mischief&#8221; is easily identifiable as a vegetarian Indian restaurant by being connected to two such category nodes.</p>
<p>Now things get more interesting when we bring those domains together to see the curry houses my friends like, itself being easily expressed with the <em>LIKES</em> relationship between the people and the restaurants they&#8217;ve enjoyed. In this case we can see that both Kath and I like <a href="http://www.tandoorinightsdulwich.co.uk/">Tandoori Nights</a>, while Martin and Simon both like Babur (hat tip to the real-life <a href="http://martinfowler.com">Martin Fowler</a> &#8211; co-author of <a href="http://www.amazon.com/gp/product/0321826620/ref=as_li_ss_tl?ie=UTF8&amp;camp=1789&amp;creative=390957&amp;creativeASIN=0321826620&amp;linkCode=as2&amp;tag=jimwebbesblog-20">NoSQL Distilled</a> - and <a href="https://twitter.com/shs96c">Simon Stewart</a> &#8211; a Facebook employee no less &#8211; for recommending <a href="http://www.babur.info/">Babur</a>), while nobody explicitly likes Indian Mischief (I can&#8217;t attest to it, I&#8217;ve never eaten there but it looks nice from the outside).</p>
<p><img class="alignnone size-large wp-image-694" alt="3" src="http://jimwebber.org/wp-content/uploads/2013/02/3-1024x441.png" width="550" height="236" /></p>
<p>Finally we have the geospatial aspects of the domain, something you might not immediately recognise as a graph. However geospatial data is easily represented as a tree, and in our example we can see that our target restaurants are in the desired borough (though in different neighbourhoods). Furthermore if I&#8217;d used an R-Tree (the canonical structure for spatial data, see: http://en.wikipedia.org/wiki/R-tree) to represent bounded boxes rather than using a simple  hierarchy, we&#8217;d actually see these neighbourhoods are close by.</p>
<p><img class="alignnone size-large wp-image-695" alt="4" src="http://jimwebber.org/wp-content/uploads/2013/02/4-1024x695.png" width="550" height="373" /></p>
<p>Finally we can bring the whole domain together ready to query it for detailed insight.</p>
<p><img class="alignnone size-large wp-image-696" alt="5" src="http://jimwebber.org/wp-content/uploads/2013/02/5-1024x765.png" width="550" height="410" /></p>
<p>To find Indian restaurants in Southwark which my friends like is a trivial Cyper query:</p>
<pre>START jim = node:node_auto_index(name='Jim'), 
      southwark = node:node_auto_index(borough='Southwark'),
      indian = node:node_auto_index(cuisine='Indian')
MATCH jim-[:FRIEND]-&gt;friend-[:LIKES]-&gt;restaurant-[:IN]-&gt;()-[:IN]-&gt;southwark, restaurant-[:CUISINE]-&gt;indian
WHERE friend-[:FRIEND]-&gt;jim
RETURN restaurant</pre>
<p>The <em>START</em> clause looks up some well-known nodes in the graph to act as starting points for our query. Under the covers it uses Neo4j indexing, but we can conveniently think of it as a naming server here, where it&#8217;s used to remember the node representing me (that is <em>name=&#8217;Jim&#8217;</em>), my borough (<em>borough=&#8217;Southwark&#8217;</em>) and the type of cuisine I&#8217;m interested in (<em>cuisine=&#8217;Indian&#8217;</em>). Each of these start nodes is bound to a name so they can be referred to in the rest of the query.</p>
<p>The <em>MATCH</em> clause is the heart of any Cypher query. In here we describe, using ASCII art, the patterns that we want the database to discover. In this <em>MATCH</em> clause, I&#8217;m firstly asking to match jim-[:FRIEND]-friend which reads &#8220;match from the bound node jim those other nodes connected by an outgoing <em>FRIEND</em> relationship and bind them to the identifier <em>friend</em>.&#8221; Following on I&#8217;m then asking to match where any friend of mine <em>LIKES</em> a restaurant somewhere in Southwark. That&#8217;s expressed as <em>friend-[:LIKES]-&gt;restaurant-[:IN]-&gt;()-[:IN]-&gt;(southwark)</em> where friend and restaurant are nodes matched in the graph, and the node southwark is bound to a particular node in the start clause. The other interesting piece of syntax in there are the empty brackets () which indicate an anonymous node that we&#8217;re not interested in naming for this query. The reason we use that syntax here is that we&#8217;re interested in any borough in Southwark, but aren&#8217;t interested in knowing the specifics, meaning we don&#8217;t bother to name nodes that will match on boroughs in the graph. Finally in the second part of the <em>MATCH</em> clause (after the comma) we specify that the restaurant matched must serve Indian food.</p>
<blockquote><p>Remember that Neo4j is a schemaless database. In this example I have safely inferred that certain nodes represent boroughs, restaurants, people and so on by the way they connect to the graph. However to be totally certain, it can be wise in some situations to check the contents of nodes too, particularly where the relationships in the domain tend to be very homogeneous.</p></blockquote>
<p>Following the match clause we have a <em>WHERE</em> filter that ensures we only accept recommendations from people who declare that reciprocate our friendship. Since friendship is otherwise unilateral, this seems a sensible thing to do &#8211; it might be unwise to go places where people we like, but who don&#8217;t reciprocate, recommend.</p>
<p>So far, so good. And adding an engaging user interface that understands natural language (like the one my colleague <a href="http://maxdemarzi.com/2013/01/28/facebook-graph-search-with-cypher-and-neo4j/">Max de Marzi wrote in a weekend</a>), we could declare functional equivalence to the fundamental Facebook Graph Search functionality. And yet with graphs it&#8217;s so very easy to continue adding dimensions into our data and support increasingly sophisticated query functionality just like Graph Search.</p>
<p>Since graphs allow us to explore any number of dimensions in a domain separately or together they provide exceptional expressive power and insight. Once you&#8217;re hooked on that expressive power, it&#8217;s easy to see how you can go so much further with relatively little effort. We can trivially extend the Facebook examples to encompass other facts encoded in the graph. Whether that&#8217;s musical preference, language, job history or any other facet that we choose to store in the graph, we can query that structure efficiently and gain great insight.</p>
<p>Now, why not try it yourself? <a href="http://www.neo4j.org/install">Installing Neo4j</a> takes just a minute job and is only a click away, and the source code for the examples in this post is available at <a href="https://github.com/jimwebber/graphsearchexample">my GitHub repo</a>.</p>
<p><map name='google_ad_map_681_e02f1a83f08692b1'>
<area shape='rect' href='http://imageads.googleadservices.com/pagead/imgclick/681?pos=0' coords='1,2,367,28' />
<area shape='rect' href='http://services.google.com/feedback/abg' coords='384,10,453,23'/></map>
<img usemap='#google_ad_map_681_e02f1a83f08692b1' border='0' src='http://imageads.googleadservices.com/pagead/ads?format=468x30_aff_img&amp;client=&amp;channel=&amp;output=png&amp;cuid=681&amp;url= http%3A%2F%2Fjim.webber.name%2F2013%2F02%2Fneo4j-facebook-graphsearch-for-the-rest-of-us%2F' /></p>]]></content:encoded>
			<wfw:commentRss>http://jim.webber.name/2013/02/neo4j-facebook-graphsearch-for-the-rest-of-us/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Neo4j Koans update</title>
		<link>http://jim.webber.name/2012/12/neo4j-koans-update/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=neo4j-koans-update</link>
		<comments>http://jim.webber.name/2012/12/neo4j-koans-update/#comments</comments>
		<pubDate>Fri, 14 Dec 2012 14:18:02 +0000</pubDate>
		<dc:creator>jim</dc:creator>
				<category><![CDATA[neo4j]]></category>

		<guid isPermaLink="false">http://jimwebberorg.ipage.com/?p=582</guid>
		<description><![CDATA[Over the last couple of years, Ian Robinson and I (along with a little help from friends in the community) have been building a set of koans for learning Neo4j. These koans follow the same mantra as the Ruby Koans, where<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://jim.webber.name/2012/12/neo4j-koans-update/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
				<content:encoded><![CDATA[<p>Over the last couple of years, <a href="http://iansrobinson.com">Ian Robinson</a> and I (along with a little help from friends in the community) have been building a set of koans for learning <a href="http://neo4j.orf">Neo4j</a>. These koans follow the same mantra as the <a href="http://rubykoans.com/">Ruby Koans</a>, where you&#8217;re given a set of failing unit tests and as you make them green you learn something. For the Ruby Koans that learning is the Ruby language and idioms, for the Neo4j Koans you learn the <a href="http://api.neo4j.org/">Neo4j APIs</a> and <a href="http://docs.neo4j.org/chunked/stable/cypher-query-lang.html">Cypher query language</a> as a side-effect of making the unit tests go green.</p>
<p>Oh, and you learn lots about the TV show &#8220;<a href="http://www.bbc.co.uk/programmes/b006q2x0">Doctor Who</a>&#8221; too since that&#8217;s the (lovingly and extensively curated) data set that underlies the tests. But we&#8217;ve now decided to update the Neo4j Koans so if you make use of them, read on.</p>
<p>The Koans themselves along with all the scripts required to turn them from &#8220;teacher&#8217;s version&#8221; into &#8220;students&#8217; version&#8221; and the associated dependency management and tool download actions are staying in the public repository. These will form the basis for chapters for our next book project (since our current book project <em>Graph Databases</em> (O&#8217;Reilly) is nearing completion). However all the teaching materials (and in particular that big, slow-to-clone PowerPoint deck) are going away since the teaching materials will be focussed in the book itself rather than as slides. The deck will &#8211; of course &#8211; remain in version control history but over time it will become inconsistent with the koans so be careful if you use historical materials with current koans.</p>
<p>We don&#8217;t think this is going to inconvenience too many people, but if it&#8217;s going to affect you then don&#8217;t hesitate to <a href="http://jimwebber.org/contact/">reach out to me</a>.</p>
<p>&nbsp;</p>
<p><map name='google_ad_map_582_e02f1a83f08692b1'>
<area shape='rect' href='http://imageads.googleadservices.com/pagead/imgclick/582?pos=0' coords='1,2,367,28' />
<area shape='rect' href='http://services.google.com/feedback/abg' coords='384,10,453,23'/></map>
<img usemap='#google_ad_map_582_e02f1a83f08692b1' border='0' src='http://imageads.googleadservices.com/pagead/ads?format=468x30_aff_img&amp;client=&amp;channel=&amp;output=png&amp;cuid=582&amp;url= http%3A%2F%2Fjim.webber.name%2F2012%2F12%2Fneo4j-koans-update%2F' /></p>]]></content:encoded>
			<wfw:commentRss>http://jim.webber.name/2012/12/neo4j-koans-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>jimwebber.org is live again</title>
		<link>http://jim.webber.name/2012/11/jimwebber-org-is-live-again/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=jimwebber-org-is-live-again</link>
		<comments>http://jim.webber.name/2012/11/jimwebber-org-is-live-again/#comments</comments>
		<pubDate>Mon, 26 Nov 2012 16:26:41 +0000</pubDate>
		<dc:creator>jim</dc:creator>
				<category><![CDATA[Admin]]></category>

		<guid isPermaLink="false">http://jimwebber.org/?p=577</guid>
		<description><![CDATA[After a decade or so of being somewhat surreptitiously hosted by the formidable folks at Newcastle University (thanks guys, you&#8217;re awesome) my blog site finally had to move to a proper commercial provider. Following his own lead, I&#8217;ve retired Savas&#8216;s trusty PBlog (ASP.NET!) engine and<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://jim.webber.name/2012/11/jimwebber-org-is-live-again/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
				<content:encoded><![CDATA[<p>After a decade or so of being somewhat surreptitiously hosted by the formidable folks at <a title="Newcastle University School of Computing Science" href="http://www.cs.ncl.ac.uk">Newcastle University</a> (thanks guys, you&#8217;re awesome) my blog site finally had to move to a proper commercial provider.</p>
<p>Following his own lead, I&#8217;ve retired <a title="Savas Parastatids" href="http://savas.me">Savas</a>&#8216;s trusty PBlog (ASP.NET!) engine and moved to WordPress mostly so that I can ask him for tech support when I can&#8217;t do mod rewrite syntax :-)</p>
<p>Thanks to those of you (especially on Twitter) that took time to tell me my blog was down, this is my hat tip to you to say it&#8217;s back up again. My apologies for those of you who&#8217;ve commented on my blog over the years, those comments are still languishing in a relational database (ha!) and I&#8217;ll port &#8216;em over on my next sick day.</p>
<p><map name='google_ad_map_577_e02f1a83f08692b1'>
<area shape='rect' href='http://imageads.googleadservices.com/pagead/imgclick/577?pos=0' coords='1,2,367,28' />
<area shape='rect' href='http://services.google.com/feedback/abg' coords='384,10,453,23'/></map>
<img usemap='#google_ad_map_577_e02f1a83f08692b1' border='0' src='http://imageads.googleadservices.com/pagead/ads?format=468x30_aff_img&amp;client=&amp;channel=&amp;output=png&amp;cuid=577&amp;url= http%3A%2F%2Fjim.webber.name%2F2012%2F11%2Fjimwebber-org-is-live-again%2F' /></p>]]></content:encoded>
			<wfw:commentRss>http://jim.webber.name/2012/11/jimwebber-org-is-live-again/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Governments should not subsidise business through the working poor</title>
		<link>http://jim.webber.name/2011/08/governments-should-not-subsidise-business-through-the-working-poor/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=governments-should-not-subsidise-business-through-the-working-poor</link>
		<comments>http://jim.webber.name/2011/08/governments-should-not-subsidise-business-through-the-working-poor/#comments</comments>
		<pubDate>Fri, 26 Aug 2011 11:18:48 +0000</pubDate>
		<dc:creator>jim</dc:creator>
				<category><![CDATA[Politics]]></category>

		<guid isPermaLink="false">http://jim.webber.name/2011/08/26/d352bea8-cfe3-4499-b5e3-c2dc9142b54a.aspx</guid>
		<description><![CDATA[I don&#8217;t often blog about politics, but in the decline of the UK exchequer and the shift of blame from the corporate greedy to the working needy has seen some fundamentally unfair and downright nasty pieces of legislation being progressed<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://jim.webber.name/2011/08/governments-should-not-subsidise-business-through-the-working-poor/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
				<content:encoded><![CDATA[<div>
<p>I don&#8217;t often blog about politics, but in the decline of the UK exchequer and the shift of blame from the corporate greedy to the working needy has seen some fundamentally unfair and downright nasty pieces of legislation being progressed by our minority government.</p>
<p>The latest of these policies designed to help prop up the exchequer (which we all know is carrying enormous debt caused by bailing out the greedy and reckless, and is now being bullied by the same because of its debts) is the idea to cut council tax benefits by 10% and federate the blame away from central government to local.</p>
<p>For those of you who are not familiar with the UK taxation system (and I&#8217;m no expert), council tax is a local tax paid by residents according to the value of the homes they live in, and funds local amenities like refuse collection. However, some people in the UK are on sufficiently low incomes that they receive benefits to help them pay their share of this tax.</p>
<p>Let me re-iterate that: <em>some people in the UK are so poor that they government steps in to help them pay their tax</em>.</p>
<p>On the surface, you could assert that the government is currently doing a fine thing by helping those with few means, and that by reducing the council tax help it gives to some of its most needy citizens it would be doing them quite a disservice. That&#8217;s an easy argument to make, especially since the relatively few recipients of this benefit need all the help they can get.</p>
<p>But the easy answer is not necessarily the right answer. I disagree with this benefit, not because I think it is unworthy, but because I think it places the burden in the wrong place. By subsidising a citizen&#8217;s tax obligations, the government is effectively subsidising employers and continuing to allow employment for far below a living wage (including the ability to meet the tax obligations that implies).</p>
<p>This is an outrageous situation where the government (on behalf of the citizens) subsidises corporations through beneficial corporate tax regimes, provides them with a stable and regulated market, educates and medicates their employees, and provides a relatively competent transport network to move their goods and services around. But by paying its own tax, the government is also effectively subsidising industry&#8217;s wage bill.</p>
<p>Governments shouldn&#8217;t be in the business of paying taxes to themselves. Governments should be in the business of provisioning services through taxes they collect. We must undo the moral and financial decay that has allowed business to free-ride from the tax payer and force them to pay a living wage and end this perversion where we subsidise the rich by using the working poor as a government-sponsored investment vehicle.</p>
<p>It&#8217;s time for business to pay their fair share.</p>
</div>
<p><map name='google_ad_map_119_e02f1a83f08692b1'>
<area shape='rect' href='http://imageads.googleadservices.com/pagead/imgclick/119?pos=0' coords='1,2,367,28' />
<area shape='rect' href='http://services.google.com/feedback/abg' coords='384,10,453,23'/></map>
<img usemap='#google_ad_map_119_e02f1a83f08692b1' border='0' src='http://imageads.googleadservices.com/pagead/ads?format=468x30_aff_img&amp;client=&amp;channel=&amp;output=png&amp;cuid=119&amp;url= http%3A%2F%2Fjim.webber.name%2F2011%2F08%2Fgovernments-should-not-subsidise-business-through-the-working-poor%2F' /></p>]]></content:encoded>
			<wfw:commentRss>http://jim.webber.name/2011/08/governments-should-not-subsidise-business-through-the-working-poor/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Graph Processing versus Graph Databases</title>
		<link>http://jim.webber.name/2011/08/graph-processing-versus-graph-databases/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=graph-processing-versus-graph-databases</link>
		<comments>http://jim.webber.name/2011/08/graph-processing-versus-graph-databases/#comments</comments>
		<pubDate>Wed, 24 Aug 2011 11:20:35 +0000</pubDate>
		<dc:creator>jim</dc:creator>
				<category><![CDATA[neo4j]]></category>
		<category><![CDATA[NOSQL]]></category>

		<guid isPermaLink="false">http://jim.webber.name/2011/08/24/66f1fb4b-83c3-4f52-af40-ee6382ad2155.aspx</guid>
		<description><![CDATA[There&#8217;s recently been a great deal of discussion on the subject of graph processing. For those of us in the graph database space, this is an exciting development since it reinforces the utility of graphs as both a storage and<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://jim.webber.name/2011/08/graph-processing-versus-graph-databases/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
				<content:encoded><![CDATA[<div>
<p>There&#8217;s recently been a great deal of discussion on the subject of graph processing. For those of us in the graph database space, this is an exciting development since it reinforces the utility of graphs as both a storage and a computational model. Confusingly however, processing graph-like data is often mistakenly conflated with graph databases because they share the same data model, yet each tool addresses a fundamentally different problem.</p>
<p>For example, graph processing platforms like <a href="http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html">Google&#8217;s Pregel</a> achieve high aggregate computational throughput by adopting the <a href="http://en.wikipedia.org/wiki/Bulk_synchronous_parallel">Bulk Synchronous Processing</a> (BSP) model from the parallel computing community. Pregel supports large-scale graph processing by partitioning a graph across many machines and allowing those machines to efficiently compute at vertices using localised data. Only during synchronisation phases is localised information exchanged (c.f. the BSP model). This gives Google the ability to process huge volumes of interconnected data, albeit at relatively high latencies, to gain greater business insight than with traditional (non-graph optimised) map-reduce approaches.</p>
<p>Sadly few of us have Google-scale resources at our disposal to invent novel platforms on demand. In enterprise-scale scenarios, <a href="http://hadoop.apache.org/">Hadoop</a> (incidentally an implementation of Google&#8217;s earlier <a href="http://en.wikipedia.org/wiki/MapReduce">map-reduce framework</a>) has become a popular platform for batch processing large volumes of data. Like Pregel, Hadoop is a high-latency, high-throughput processing tool that optimises computational throughput by processing large volumes of data in parallel outside the database.</p>
<p>Unlike Pregel, Hadoop is a general purpose framework which means that while it can be used for graph processing, it&#8217;s not optimised for that purpose nor are the underlying storage mechanisms HDFS (a distributed file system) and HBase (a distributed tabular database designed for large numbers of rows and columns) graph-oriented in nature (though interestingly the <a href="http://www.raveldata.com/goldenorb/">Ravel Golden Orb platform</a> claims to add a Pregel-like programming model above Hadoop).</p>
<p>What Pregel and Hadoop have in common is their tendency towards the data analytics (<a href="http://en.wikipedia.org/wiki/Online_analytical_processing">OLAP</a>) end of the spectrum, rather than being focussed on transaction processing. This is in stark contrast to graph databases like <a href="http://neo4j.org">Neo4j</a> which optimise storage and querying of connected data for online transaction processing (<a href="http://en.wikipedia.org/wiki/Online_transaction_processing">OLTP</a>) scenarios &#8211; much like a regular <a href="http://en.wikipedia.org/wiki/Relational_database_management_system">RDBMS</a>, only with a more expressive and powerful data model. We can visualise these differing capabilities easily as in the figure below:</p>
<p><a href="http://jimwebber.org/2011/08/graph-processing-versus-graph-databases/slide1/" rel="attachment wp-att-593"><img class="alignnone size-medium wp-image-593" alt="Slide1" src="http://jimwebber.org/wp-content/uploads/2011/08/Slide1-300x225.png" width="300" height="225" /></a></p>
<p>In this breakdown, Pregel is positioned firmly in the OLAP graph processing space, much as Hadoop is positioned in the general-purpose OLAP space (though closer to the OLTP axis because of recent advances in so-called <a href="http://www.slideshare.net/lusciouspear/real-time-bi-with-hadoop">real-time Hadoop</a>). Relational databases are positioned as general purpose OLTP engines that can be somewhat adapted to the OLAP needs. <a href="http://neo4j.org">Neo4j</a> has strong graph affinity and is designed primarily for OLTP scenarios, though as a native graph database with strong read-scalability, it can also be suited to OLAP work.</p>
<p>However the Hadoop community continues to foster innovation in the area of graph processing, and there are regular announcements about how Hadoop can be adapted towards solving graph problems. Recently Daniel Abadi <a href="http://dbmsmusings.blogspot.com/2011/07/hadoops-tremendous-inefficiency-on.html">publicised work on solving graph problems more efficiently with Hadoop from his team at Yale University</a>.</p>
<p>This work is novel empirical science and presents an important observation: by skillfully partitioning data in <a href="http://hbase.apache.org/">HBase</a> to exploit locality, (graph) computational throughput in Hadoop can be substantially increased. And yet for casual observers of the <a href="http://nosql-database.org/">NOSQL</a> community, this is easily inferred as the demise of graph databases, which appear to have much more modest throughput. I don&#8217;t believe this is a valid comparison however:</p>
<ul>
<li>Hadoop is a batch processing framework, and operates at high latencies compared to graph databases (even real-time Hadoop involves seconds of latencies, compared to the millisecond scale at which <a href="http://neo4j.org">Neo4j</a> operates). The work done to improve graph processing through data locality means that batches will be executed more efficiently, and so throughput will be higher (or similar throughput will be achievable with fewer computational resources). Yet latency will remain comparatively high and so this approach is unlikely to be well-suited to on-demand processing (OLTP) that is the mainstay of most applications where data latency is more helpfully measured in milliseconds. Instead it is likely to remain firmly in the OLAP domain for the foreseeable future.</li>
<li>For generating regular reports from a data warehouse or pre-computing results, batch processing can be a sensible strategy, especially if it can be made efficient through laying out data carefully. Making this efficient comes at a cost, namely that data has to be denormalised within HBase, expanding the cognitive gap between your data and how it is represented for processing. Conversely <a href="http://neo4j.org">Neo4j</a> works in OLAP scenarios consistently with how it works in OLTP scenarios &#8211; your OLTP database is your OLAP database (usually a read slave, with the same data model). This means Neo4j doesn&#8217;t need denormalisation or special processing infrastructure, and for large read-queries like reporting jobs scales very well even under heavy and unpredictable online loads.</li>
<li>Batch-oriented approaches are best suited where data can be read and processed outside the database rather than manipulated in place. That is, efficiently processing static graph-like data (or triples), not only requires careful placement of data in HBase, but practically rules out mutating the graph during processing. In contrast Neo4j supports in-place graph mutation graphs, which is a more powerful tool for Web real-time analytics than (even efficiently processed) batches.</li>
</ul>
<p>Bringing all of these sentiments together, it&#8217;s clear that we&#8217;re looking at two different tools for two different sets of problems. The Hadoop-based solution is batch-oriented processing at high throughput with correspondingly high latency with substantial denormalisation. The <a href="http://neo4j.org">Neo4j</a> approach emphasises OLTP native graph processing with real-time OLAP and more modest throughput at very low latency (ms), and since work happens in the database it&#8217;s always consistent.</p>
<p>So if you need OLTP and deep insight (OLAP-style) in near real-time at enterprise scale then <a href="http://neo4j.org">Neo4j</a> is a sensible choice. For niche problems where you can afford high latency in exchange for higher throughput, then the graph processing platforms like Pregel or Hadoop could be beneficial. But it’s important to understand that <em>they are not the same</em>.</p>
</div>
<p><map name='google_ad_map_120_e02f1a83f08692b1'>
<area shape='rect' href='http://imageads.googleadservices.com/pagead/imgclick/120?pos=0' coords='1,2,367,28' />
<area shape='rect' href='http://services.google.com/feedback/abg' coords='384,10,453,23'/></map>
<img usemap='#google_ad_map_120_e02f1a83f08692b1' border='0' src='http://imageads.googleadservices.com/pagead/ads?format=468x30_aff_img&amp;client=&amp;channel=&amp;output=png&amp;cuid=120&amp;url= http%3A%2F%2Fjim.webber.name%2F2011%2F08%2Fgraph-processing-versus-graph-databases%2F' /></p>]]></content:encoded>
			<wfw:commentRss>http://jim.webber.name/2011/08/graph-processing-versus-graph-databases/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Square Pegs and Round Holes in the NOSQL World</title>
		<link>http://jim.webber.name/2011/04/square-pegs-and-round-holes-in-the-nosql-world/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=square-pegs-and-round-holes-in-the-nosql-world</link>
		<comments>http://jim.webber.name/2011/04/square-pegs-and-round-holes-in-the-nosql-world/#comments</comments>
		<pubDate>Thu, 21 Apr 2011 17:10:52 +0000</pubDate>
		<dc:creator>jim</dc:creator>
				<category><![CDATA[neo4j]]></category>
		<category><![CDATA[NOSQL]]></category>

		<guid isPermaLink="false">http://jim.webber.name/2011/04/21/e2f48ace-7dba-4709-8600-f29da3491cb4.aspx</guid>
		<description><![CDATA[The graph database space is a peculiar corner of the NOSQL universe. In general, the NOSQL movement has pushed towards simpler data models with more sophisticated computation infrastructure compared to traditional RDBMS. In contrast graph databases like Neo4j actually provide<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://jim.webber.name/2011/04/square-pegs-and-round-holes-in-the-nosql-world/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
				<content:encoded><![CDATA[<div>
<p>The graph database space is a peculiar corner of the NOSQL universe. In general, the NOSQL movement has pushed towards simpler data models with more sophisticated computation infrastructure compared to traditional RDBMS. In contrast graph databases like <a href="http://neo4j.org/">Neo4j</a> actually provide a far richer data model than a traditional RDBMS and a search-centric rather than compute-intensive method for data processing.</p>
<p>Strangely the expressive data model supported by graphs can be difficult to understand amid the general clamour of the simpler-is-better NOSQL movement. But what makes this doubly strange is that other NOSQL database types can support limited graph processing too.</p>
<p>This strange duality where non-graphs stores can be used for limited graph applications was the subject of a thread on the Neo4j mailing list, which was the inspiration for this post. In that thread, community members discussed the value of using non-graph stores for graph data particularly since prominent Web services are known to use this approach (like Twitter&#8217;s <a href="https://github.com/twitter/flockdb">FlockDB</a>). But as it happens the use-case for those graphs tends to be relatively shallow &#8211; &#8220;friend&#8221; and &#8220;follow&#8221; relationships and suchlike. In those situations, it can be a reasonable solution to have information in your values (or document properties, columns, or even rows in a relational database) to indicate a shallow relation as we can see in this diagram:</p>
<p><a href="http://jimwebber.org/2011/04/square-pegs-and-round-holes-in-the-nosql-world/attachment/1/" rel="attachment wp-att-596"><img class="alignnone size-medium wp-image-596" alt="1" src="http://jimwebber.org/wp-content/uploads/2011/04/1-300x94.png" width="300" height="94" /></a></p>
<p>At runtime, the application using the datastore  (remember: that’s code <em>you</em> typically have to write) follows the logical links between stored documents and creates a logical graph representation. This means the application code needs to understand how to create a graph representation from those loosely linked documents.</p>
<p><a href="http://jimwebber.org/2011/04/square-pegs-and-round-holes-in-the-nosql-world/attachment/2/" rel="attachment wp-att-598"><img class="alignnone size-medium wp-image-598" alt="2" src="http://jimwebber.org/wp-content/uploads/2011/04/2-300x214.png" width="300" height="214" /></a></p>
<p>If the graphs are shallow, this approach can work reasonably well. Twitter&#8217;s FlockDB is an existential proof of that. But as relationships between data become more interesting and deeper, this is an approach that rapidly runs out of steam. This approach requires graphs to be structured early on in the system lifecycle (design time), meaning a specific topology is baked into the datastore and into the application layer. This implies tight coupling between the code that reifies the graphs and the mechanism through which they&#8217;re flattened in the datastore. Any structural changes to the graph now require changes to the stored data <em>and</em> the logic that reifies the data.</p>
<p><a href="http://neo4j.org/">Neo4j</a> takes a different approach: it stores graphs natively and so separates application and storage concerns. That is, where your documents have relationships between them, that&#8217;s they way they&#8217;re stored, searched, and processed in Neo4j even if those relationships are very deep. In this case, the logical graph that we reified from the document store can be natively (and efficiently) persisted in <a href="http://neo4j.org/">Neo4j</a>.</p>
<p><a href="http://jimwebber.org/2011/04/square-pegs-and-round-holes-in-the-nosql-world/attachment/3/" rel="attachment wp-att-600"><img class="alignnone size-medium wp-image-600" alt="3" src="http://jimwebber.org/wp-content/uploads/2011/04/3-300x90.png" width="300" height="90" /></a></p>
<p>What&#8217;s often deceptive is that in some use-cases, projecting a graph from a document or KV store and using <a href="http://neo4j.org/">Neo4j</a> might begin with seemingly similar levels of complexity. For example, we might create an e-commerce application with customers and items they have bought. In a KV or document case we might store the identifiers of products our customers had bought inside the customer entity. In <a href="http://neo4j.org/">Neo4j</a>, we&#8217;d simply add relationships like <em>PURCHASED</em> between customer nodes and the product nodes they&#8217;d bought. Since <a href="http://neo4j.org/">Neo4j</a> is schema-less, adding these relationships doesn’t require migrations, nor should it affect any existing code using the data. The next diagram shows this contrast: the graph structure is explicit in the graph database, but implicit in a document store.</p>
<p><a href="http://jimwebber.org/2011/04/square-pegs-and-round-holes-in-the-nosql-world/attachment/4/" rel="attachment wp-att-602"><img class="alignnone size-medium wp-image-602" alt="4" src="http://jimwebber.org/wp-content/uploads/2011/04/4-300x197.png" width="300" height="197" /></a></p>
<p>Even at this stage, the graph shows its flexibility. Imagine that a number of customers bought a product that had to be recalled. In the document case we&#8217;d run a query (typically using a map/reduce framework) that grabs the document for each customer and checks whether a customer has the identifier for the defective product in their purchase history. This is a big undertaking if each customer has to be checked, though thankfully because it&#8217;s an embarrassingly parallel operation we can throw hardware at the problem. We could also design a clever indexing scheme, provided we can tolerate the write latency and space costs that indexing implies.</p>
<p>With <a href="http://neo4j.org/">Neo4j</a>, all we need to do is locate the product (by graph traversal or index lookup) and look for incoming <em>PURCHASED</em> relations to determine immediately which customers need to be informed about the product recall. Easy peasy!</p>
<p>As the e-commerce solution grows, we want to evolve a social aspect to shopping so that customers can receive buying recommendations based on what their social group has purchased. In the non-native graph store, we now have to encode the notion of friends and even friends of friends into the store and into the logic that reifies the graph. This is where things start to get tricky since now we have a deeper traversal from a customer to customers (friends) to customers (friends of friends) and then into purchases. What initially seemed simple, is now starting to look dauntingly like a fully fledged graph store, albeit one we have to build.</p>
<p><a href="http://jimwebber.org/2011/04/square-pegs-and-round-holes-in-the-nosql-world/attachment/6/" rel="attachment wp-att-604"><img class="alignnone size-medium wp-image-604" alt="6" src="http://jimwebber.org/wp-content/uploads/2011/04/6-300x130.png" width="300" height="130" /></a></p>
<p>Conversely in the <a href="http://neo4j.org/">Neo4j</a> case, we simply use the <em>FRIEND</em> relationships between customers, and for recommendations we simply traverse the graph across all outgoing <em>FRIEND</em> relationships (limited to depth 1 for immediate friends, or depth 2 for friends-of-friends), and for outgoing <em>PURCHASED</em> relationships to see what they&#8217;ve bought. What&#8217;s important here is that it&#8217;s Neo4j that handles the hard work of traversing the graph, not the application code as we can see in the diagram above.</p>
<p>But there&#8217;s much more value the e-commerce site can drive from this data. Not only can social recommendations be implemented by close friends, but the e-commerce site can also start to look for trends and base recommendations on them. This is precisely the kind of thing that supermarket loyalty schemes do with big iron and long-running SQL queries &#8211; but we can do it on commodity hardware very rapidly using <a href="http://neo4j.org/">Neo4j</a>.</p>
<p>For example, one set of customers that we might want to incentivise are those people who we think are young performers. These are customers that perhaps have told us something about their age, and we&#8217;ve noticed a particular buying pattern surrounding them &#8211; they buy DJ-quality headphones. Often those same customers buy DJ-quality decks too, but there&#8217;s a potentially profitable set of those customers that &#8211; shockingly &#8211; don&#8217;t yet own decks (much to the gratitude of their flatmates and neighbours I suspect).</p>
<p>With a document or KV store, looking for this pattern by trawling through all the customer documents and projecting a graph is laborious. But matching these patterns in a graph is quite straightforward and efficient – simply by specifying a prototype to match against and then by efficiently traversing the graph structure looking for matches.</p>
<p><a href="http://jimwebber.org/2011/04/square-pegs-and-round-holes-in-the-nosql-world/attachment/69/" rel="attachment wp-att-613"><img class="alignnone size-medium wp-image-613" alt="69" src="http://jimwebber.org/wp-content/uploads/2011/04/69-300x229.png" width="300" height="229" /></a></p>
<p>This shows a wonderful emergent property of graphs &#8211; simply store all the data you like as nodes and relationships in <a href="http://neo4j.org/">Neo4j</a> and later you&#8217;ll be able to extract useful business information that perhaps you can&#8217;t imagine today, without the performance penalties associated with joins on large datasets.</p>
<p>In these kind of situations, choosing a non-graph store for storing graphs is a gamble. You may find that you&#8217;ve designed your graph topology far too early in the system lifecycle and lose the ability to evolve the structure and perform business intelligence on your data. That&#8217;s why <a href="http://neo4j.org/">Neo4j</a> is cool &#8211; it keeps graph and application concerns separate, and allows you to defer data modelling decisions to more responsible points throughout the lifetime of your application.</p>
<p>So if you&#8217;re fighting with graph data imprisoned in Key-Value, Document or relational datastores, then <a href="http://neo4j.org/download">try Neo4j</a>.</p>
</div>
<p><map name='google_ad_map_121_e02f1a83f08692b1'>
<area shape='rect' href='http://imageads.googleadservices.com/pagead/imgclick/121?pos=0' coords='1,2,367,28' />
<area shape='rect' href='http://services.google.com/feedback/abg' coords='384,10,453,23'/></map>
<img usemap='#google_ad_map_121_e02f1a83f08692b1' border='0' src='http://imageads.googleadservices.com/pagead/ads?format=468x30_aff_img&amp;client=&amp;channel=&amp;output=png&amp;cuid=121&amp;url= http%3A%2F%2Fjim.webber.name%2F2011%2F04%2Fsquare-pegs-and-round-holes-in-the-nosql-world%2F' /></p>]]></content:encoded>
			<wfw:commentRss>http://jim.webber.name/2011/04/square-pegs-and-round-holes-in-the-nosql-world/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Neo4j 1.3GA: Free as in Beer!</title>
		<link>http://jim.webber.name/2011/04/neo4j-1-3ga-free-as-in-beer/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=neo4j-1-3ga-free-as-in-beer</link>
		<comments>http://jim.webber.name/2011/04/neo4j-1-3ga-free-as-in-beer/#comments</comments>
		<pubDate>Wed, 13 Apr 2011 16:26:39 +0000</pubDate>
		<dc:creator>jim</dc:creator>
				<category><![CDATA[neo4j]]></category>
		<category><![CDATA[NOSQL]]></category>

		<guid isPermaLink="false">http://jim.webber.name/2011/04/13/f940e27d-6a72-4842-aabd-a254ddcbfd10.aspx</guid>
		<description><![CDATA[Over at Neo Tech HQ, we&#8217;ve been working away for the last 3 months, and today we&#8217;re finally releasing Neo4j 1.3 GA. That in itself is usually cause enough for a celebration and a bit of a hangover, but this<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://jim.webber.name/2011/04/neo4j-1-3ga-free-as-in-beer/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
				<content:encoded><![CDATA[<div>
<p>Over at <a href="http://neotechnology.com/">Neo Tech HQ</a>, we&#8217;ve been working away for the last 3 months, and today we&#8217;re finally <a href="http://blog.neo4j.org/2011/04/neo4j-13-abisko-lampa-released.html">releasing Neo4j 1.3 GA</a>. That in itself is usually cause enough for a celebration and a bit of a hangover, but this release marks an <a href="http://blogs.neotechnology.com/emil/2011/04/graph-databases-licensing-and-mysql.html">important turning point in Neo4j&#8217;s licensing</a>, and I think for the NOSQL space in general.</p>
<p>From here on the core database (or community edition) is <strong>licensed under the GPL</strong>! That is, you can use as many instances of Neo4j community edition (except as OEM) <strong>for ever, for free!</strong></p>
<p>For more sophisticated deployments, we&#8217;ve <a href="http://neo4j.org/licensing-guide/">simplified our product structure</a> and are now offering Neo4j Advanced (with management features) and Neo4j Enterprise (with Advanced features plus HA) both under a dual AGPL and commercial license (so you can still stay open source, or simply see what you&#8217;re buying).</p>
<p>The <a href="http://blog.neo4j.org/2011/04/neo4j-13-abisko-lampa-released.html">release announcement</a> has just been posted to the <a href="http://blog.neo4j.org">Neo4j blog</a> with all the juicy details on features and licensing, so what are you waiting for? Go <a href="http://neo4j.org/download/">download</a> and install everywhere!</p>
</div>
<p><map name='google_ad_map_122_e02f1a83f08692b1'>
<area shape='rect' href='http://imageads.googleadservices.com/pagead/imgclick/122?pos=0' coords='1,2,367,28' />
<area shape='rect' href='http://services.google.com/feedback/abg' coords='384,10,453,23'/></map>
<img usemap='#google_ad_map_122_e02f1a83f08692b1' border='0' src='http://imageads.googleadservices.com/pagead/ads?format=468x30_aff_img&amp;client=&amp;channel=&amp;output=png&amp;cuid=122&amp;url= http%3A%2F%2Fjim.webber.name%2F2011%2F04%2Fneo4j-1-3ga-free-as-in-beer%2F' /></p>]]></content:encoded>
			<wfw:commentRss>http://jim.webber.name/2011/04/neo4j-1-3ga-free-as-in-beer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Strategies for Scaling Neo4j</title>
		<link>http://jim.webber.name/2011/03/strategies-for-scaling-neo4j/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=strategies-for-scaling-neo4j</link>
		<comments>http://jim.webber.name/2011/03/strategies-for-scaling-neo4j/#comments</comments>
		<pubDate>Tue, 22 Mar 2011 00:34:09 +0000</pubDate>
		<dc:creator>jim</dc:creator>
				<category><![CDATA[neo4j]]></category>
		<category><![CDATA[NOSQL]]></category>

		<guid isPermaLink="false">http://jim.webber.name/2011/03/22/ef4748c3-6459-40b6-bcfa-818960150e0f.aspx</guid>
		<description><![CDATA[As I&#8217;ve discussed before, graph databases like Neo4j can be lack the same predictability in terms of scaling when compared to other kinds of NOSQL stores (that’s the cost of a rich data model). But with a little thought, we&#8217;ve<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://jim.webber.name/2011/03/strategies-for-scaling-neo4j/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
				<content:encoded><![CDATA[<div>
<p>As <a href="http://jim.webber.name/2011/02/16/3b8f4b3d-c884-4fba-ae6b-7b75a191fa22.aspx">I&#8217;ve discussed before</a>, graph databases like <a href="http://neo4j.org/">Neo4j</a> can be lack the same predictability in terms of scaling when compared to other kinds of NOSQL stores (that’s the cost of a rich data model). But with a little thought, we&#8217;ve seen how both cache-sharding and the application of domain-specific strategies can help to improve throughput and increase the dataset size that Neo4j can store and process.</p>
<p>Those blog posts triggered some <a href="http://www.listware.net/201102/neo4j-user/84469-neo4j-cache-sharding-blog-post.html">very useful discussion</a> on the <a href="http://neo4j.org/community/list/">Neo4j mailing list</a>, with several community members adding their own thoughts and experiences. In particular Mark Harwood suggested a simple heuristic for deciding a scaling strategy, that I thought was so useful I&#8217;d share it (with his permission) here.</p>
<ol>
<li><em>Dataset size:</em> Many tens of Gigabytes<br />
<em>Strategy:</em> Fill a single machine with RAM<br />
<em>Reasoning:</em> Modern racks contain a lot of RAM, of the order of 128GB typically for a typical machine. Since Neo4j likes to use RAM to cache data, where O(dataset) ≈ O(memory) we can keep all the data cached in-memory and operate on it at extremely high speed.<br />
<em>Weaknesses:</em> Still need to cluster for availability; write scalability limited by disk performance.</li>
<li><em>Dataset size:</em> Many hundreds of Gigabytes<br />
<em>Strategy:</em><a href="http://jim.webber.name/2011/02/23/abe72f61-27fb-4c1b-8ce1-d0db7583497b.aspx">Cache sharding</a><br />
<em>Reasoning:</em> The dataset is too big to hold all in RAM on a single server but small enough to allow replicating it on disk across machines. Using <a href="http://jim.webber.name/2011/02/23/abe72f61-27fb-4c1b-8ce1-d0db7583497b.aspx">cache sharding</a> improves the likelihood of reaching a warm cache which maintains high performance. The cost of a cache miss is not catastrophic (a local disk read), and can be mitigated by replacing spinning disks with SSDs.<br />
<em>Weaknesses:</em> Consistent routing requires a router on top of the Neo4j infrastructure; write-master in the cluster is a limiting factor in write-scalability.</li>
<li><em>Dataset size:</em> Terabytes and above<br />
<em>Strategy:</em> Domain-specific sharding<br />
<em>Reasoning:</em> At this scale, the dataset is too big for a single memory space and it&#8217;s too big to practically replicate across machines so sharding is the only viable alternative. However given there is no perfect algorithm (yet) for arbitrary sharding of graphs, we rely on domain-specific knowledge to be able to predict which nodes should be allocated to which machines.<br />
<em>Weaknesses:</em> Not all domains may be amenable to domain-specific sharding</li>
</ol>
<p>As an architect, I really like this heuristic. It&#8217;s easy to find where I am on the scale and plan a Neo4j deployment accordingly. It also provides quite an exciting pointer towards the future &#8211; while I think most enterprise-scale deployments are currently in the tens to hundreds of gigabytes range, there are clearly applications out there for connected data that do &#8211; or will &#8211; require more horsepower.</p>
<p>Neo4j 2.0 will address these challenges, it&#8217;s going to be a fun ride. But until then, I hope you&#8217;ll find this as helpful as heuristic as I did.</p>
</div>
<p><map name='google_ad_map_123_e02f1a83f08692b1'>
<area shape='rect' href='http://imageads.googleadservices.com/pagead/imgclick/123?pos=0' coords='1,2,367,28' />
<area shape='rect' href='http://services.google.com/feedback/abg' coords='384,10,453,23'/></map>
<img usemap='#google_ad_map_123_e02f1a83f08692b1' border='0' src='http://imageads.googleadservices.com/pagead/ads?format=468x30_aff_img&amp;client=&amp;channel=&amp;output=png&amp;cuid=123&amp;url= http%3A%2F%2Fjim.webber.name%2F2011%2F03%2Fstrategies-for-scaling-neo4j%2F' /></p>]]></content:encoded>
			<wfw:commentRss>http://jim.webber.name/2011/03/strategies-for-scaling-neo4j/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Scaling Neo4j with Cache Sharding and Neo4j HA</title>
		<link>http://jim.webber.name/2011/02/scaling-neo4j-with-cache-sharding-and-neo4j-ha/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=scaling-neo4j-with-cache-sharding-and-neo4j-ha</link>
		<comments>http://jim.webber.name/2011/02/scaling-neo4j-with-cache-sharding-and-neo4j-ha/#comments</comments>
		<pubDate>Wed, 23 Feb 2011 17:15:29 +0000</pubDate>
		<dc:creator>jim</dc:creator>
				<category><![CDATA[neo4j]]></category>
		<category><![CDATA[NOSQL]]></category>

		<guid isPermaLink="false">http://jim.webber.name/2011/02/23/abe72f61-27fb-4c1b-8ce1-d0db7583497b.aspx</guid>
		<description><![CDATA[In the Neo4j world, we consider large datasets to be those which are substantially larger than main memory. With such large datasets, it&#8217;s impossible for a Neo4j instance to cache the whole database in RAM and therefore provide extremely rapid<span class="ellipsis">&#8230;</span><div class="read-more"><a href="http://jim.webber.name/2011/02/scaling-neo4j-with-cache-sharding-and-neo4j-ha/">Read more &#8250;</a></div><!-- end of .read-more -->]]></description>
				<content:encoded><![CDATA[<p>In the <a href="http://neo4j.org/">Neo4j</a> world, we consider large datasets to be those which are substantially larger than main memory. With such large datasets, it&#8217;s impossible for a Neo4j instance to cache the whole database in RAM and therefore provide extremely rapid traversals of the graph, since we&#8217;ll have to hit the disk eventually. In those situations we&#8217;ve previously recommended scaling vertically by opting for solid-state storage to provide constant, low seek times for data on disk (avoiding the high seek penalty incurred by spinning disks). While the SSD approach provides a substantial performance boost in most scenarios, even the fastest SSD isn&#8217;t a replacement for RAM.</p>
<p>I <a href="http://jim.webber.name/2011/02/16/3b8f4b3d-c884-4fba-ae6b-7b75a191fa22.aspx">wrote previously</a> that partitioning graphs across physical instances is a notoriously difficult way to scale graph data. Yet  we still want the ability to service large workloads and host large data sets in Neo4j. Until the recent 1.2 release I think this was a weak point for the database, but with the advent of Neo4j High Availability (HA) I don&#8217;t think it is anymore. In fact Neo4j HA gives us considerably more options for designing solutions for availability, scalability and very large data sets.</p>
<p>One pattern that I&#8217;ve seen using Neo4j HA for large deployments is what we&#8217;re calling &#8220;Cache Sharding&#8221; to maintain high performance with a dataset that far exceeds main memory space. Cache sharding isn&#8217;t sharding in the traditional sense, since we expect a full data set to be present on each database instance. To implement cache sharing, we partition the workload undertaken by each database instance, to increase the likelihood of hitting a warm cache for a given request &#8211; and warm caches in Neo4j are ridiculously high performance.</p>
<p>The solution architecture for this setup is shown below. We move from the hard problem of graph sharding, to the simpler problem of consistent routing, something which high volume Web farms have been doing for ages.</p>
<p><a href="http://jimwebber.org/2011/02/scaling-neo4j-with-cache-sharding-and-neo4j-ha/1-2/" rel="attachment wp-att-617"><img class="alignnone size-medium wp-image-617" alt="1" src="http://jimwebber.org/wp-content/uploads/2011/02/1-300x191.png" width="300" height="191" /></a></p>
<p>The strategy we use to implement consistent routing will vary by domain. Sometimes it&#8217;s good enough to have sticky sessions, other times we&#8217;ll want to route based on the characteristics of the data set.  A simple strategy is where the instance which first serves requests for a particular user will serve subsequent requests for that user ensuring a good chance that the request will be processed by a warm cache. Other domain-specific approaches will also work, for example in geographical data system we can route requests about particular locations to specific database instances which will be warm for that location. Either way, we&#8217;re increasing the likelihood of the required nodes and relationships already being cached in RAM, and therefore quick to access and process.</p>
<p>Of course reading from the database is only half of the story. If we&#8217;re going to run a number of servers to exploit their large aggregate caching capability, we need to keep those servers in sync. This is where the Neo4j HA software becomes particularly useful. A Neo4j HA deployment effectively creates a multi-master cluster.</p>
<p>Writes to any node will result in all other nodes eventually receiving that write through the HA protocol. Writing to the elected master in the cluster causes the data to be persisted (strictly ACID, always), and then changes are propagated to the slaves through the HA protocol for eventual consistency.</p>
<p><a href="http://jimwebber.org/2011/02/scaling-neo4j-with-cache-sharding-and-neo4j-ha/2-2/" rel="attachment wp-att-618"><img alt="2" src="http://jimwebber.org/wp-content/uploads/2011/02/2-300x221.png" width="300" height="221" /></a></p>
<p>If a write operation is processed by a slave, it enrols the elected master in a transaction and both instances persist the results (again strictly ACID). Other slaves then catch up through the HA protocol.</p>
<p><a href="http://jimwebber.org/2011/02/scaling-neo4j-with-cache-sharding-and-neo4j-ha/3-2/" rel="attachment wp-att-619"><img class="alignnone size-medium wp-image-619" alt="3" src="http://jimwebber.org/wp-content/uploads/2011/02/3-300x255.png" width="300" height="255" /></a></p>
<p>By using this pattern, we can treat a Neo4j HA cluster as a performant database for reads and writes, knowing that with a good routing strategy we&#8217;re going to be keeping traversals in memory and extremely fast &#8211; fast enough for very demanding applications.</p>
<p><map name='google_ad_map_124_e02f1a83f08692b1'>
<area shape='rect' href='http://imageads.googleadservices.com/pagead/imgclick/124?pos=0' coords='1,2,367,28' />
<area shape='rect' href='http://services.google.com/feedback/abg' coords='384,10,453,23'/></map>
<img usemap='#google_ad_map_124_e02f1a83f08692b1' border='0' src='http://imageads.googleadservices.com/pagead/ads?format=468x30_aff_img&amp;client=&amp;channel=&amp;output=png&amp;cuid=124&amp;url= http%3A%2F%2Fjim.webber.name%2F2011%2F02%2Fscaling-neo4j-with-cache-sharding-and-neo4j-ha%2F' /></p>]]></content:encoded>
			<wfw:commentRss>http://jim.webber.name/2011/02/scaling-neo4j-with-cache-sharding-and-neo4j-ha/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
