Skip to content

Instantly share code, notes, and snippets.

@timblair
Created November 23, 2012 12:56
Show Gist options
  • Save timblair/4135496 to your computer and use it in GitHub Desktop.
Save timblair/4135496 to your computer and use it in GitHub Desktop.
Notes from All Your Base 2012, 2012-11-23

AYB12

Alvin: MongoDB

  • Trade-off: scale vs. functionality. MongoDB tries to have good functionality and good scalability.
  • Auto-sharding to maintain equilibrium between shards
  • Scalable datastore != scalable application: use of datastore may still be non-scalable (e.g. many queries across all shards)
  • Get low latency by ensuring shard data is always in memory: datastore then becomes a cache with persistence
  • Replica sets: auto-election of new primary node on failure, plus automatic recovery once failed node is back online
  • Async replication between nodes in a replica set (eventual consistency)
  • Auto TTL for messages, and can update on read operations
  • Tunable data consistency before write is "complete" from "none": fire and forget, assume it's going to get there eventually, to "full": includes remote replication to other geographies
  • Data model of RDBS enforces relational model which can limit ability to scale that system. Data locality ("which server is my record on?") becomes an issue

Luca Garulli: OrientDB

  • Biggest issue with switching from RDBMS: what about the data model?
  • KV, column-based, document DBs ... and graph DBs
  • Property graph model: vertices and edges can have properties, edges are directional, edges connect vertices, vertices can have one or more incoming + outgoing edges
  • In RDBMS, every time you traverse a relationship, you perform an expensive JOIN. Indexes can speed up reads, but slow down writes
  • Index lookups are generally based on balanced trees. More entries == more lookup steps == slower JOIN
  • "A graph DB is any storage system that provides index-free adjacency"
  • A graph DB treats relationships as physical links assigned to the record when the edge is created; RDBMS computes the same relationship every time you perform a JOIN
  • Lookup time moves from O(log N) to new O(1), and does not increase with DB size
  • NuvloaBase.com: REST-based graph DB service
  • Difficult to create distributed graph DBs. Scaling is basically a case of using client-side hashing.

Dale Harvey: PouchDB

  • CouchDB for JavaScript environments, mainly for browsers (but also works in Node.js)
  • Multi-master replication, supports disconnected sync
  • "Ground computing" -- like cloud computing, but provides offline behaviour with on-demand sync
  • Designed for builing applications that needs to work well offline, and that need to sync data
  • Would simplify something like multi-app SimpleNote-type system?
  • Offline is a fact: the more mobile devices, the more people are offline. No reception data limits, slow / unstable connections etc
  • Sync is hard: Things took 2 years to develop sync
  • Bad connections + retries, transfer overhead and moving deltas (mobile access might not want total sync), master-master scenarios, conflict resolution
  • [CP]ouchDB has good, simple conflict resolution, but sometimes you need to tell it what to resolve (based on your app usage)
  • Requires CouchDB on the server for sync
  • Safari + Opera support in progress, so not production-ready yet

Matt: Eventual Consistency

  • Brewer's Conjecture (2000): CAP -- you can only have two
  • "Life is full of tradeoffs" as is engineering
  • Amazon's Dynamo paper: tradeoff between C & A -- they chose A
  • Financial systems already dealing with eventual consistency: trading banks closing and reconciling, network partitions between cash point and centralised bank etc
  • Riak uses vnodes in a ring topology (ketama-style)
  • Writes go to hashed node + the next two (i.e. three copies on separate nodes)
  • Read Repair: handle out of date copies of data on vhosts automatically on read and update out of date nodes to logical descendants (e.g. v1 -> v2)
  • Read Repair etc means internally three objects are requested and checked for consistency. This can be tuned via quoram, single-read for speed etc
  • There can be divergent ojbect versions, a.k.a. siblings: after a network partition, two operations can have altered object state at the same time. Riak returns both versions
  • Per-application, can define a "conflict resolver": as part of the Riak client to define how to handle sibling resolution
  • Common use-cases are: pick one based on some property, or perform a set union of the data
  • Probabilistically Bounded Staleness

Monty Widenius: MySQL-MariaDB

  • MySQL named after Monty's daughter, My (MaxDB released later, named after his son, Max)
  • Original MySQL devs started focussing on MariaDB in 2009 with the impending purchase of Sun by Oracle
  • Chose to use dual-license to be able to work full-time on MySQL: took 2 months to become profitable
  • Don't go to investors when you need their money. Wait for them to come to you when you don't need their money, and you won't have to give up so much of your company
  • Monty Program Ab: new company (using Hacker Business Model) to focus on MariaDB, with most of the original MySQL developers
  • Aim to keep MySQL dev talent together, always have an open-source version of MySQL. More important after Oracle purchase of Sun
  • MariaDB is a drop-in replacement for MySQL. "No reason to use MySQL anymore: MariaDB is better in all cases"
  • Big JOIN and subquery performance is an order of magnitude (or more) faster than MySQL
  • "SQL doesn't solve all common problems" e.g. arbitrary attributes (shop item sizes, colours etc). Dynamic columns introduced in MariaDB 5.3. As a POC, created a storage engine for Cassandra with MariaDB 10
  • Any close-sourced features that Oracle has added to MySQL have been added to MariaDB as open-source features
  • 5.5 introduces a new thread pool (instead of thread-per-connection)
  • Full merge of MySQL 5.6 into MariaDB 5.6 is a year-long project due to broken features and new bugs, over-complicated vode, lack of understanding of existing code etc
  • Did such a good job of getting the MySQL name out there, changing everyone over to MariaDB is going to be a tough job!
  • Though creating a dev community is easier as Oracle is not working with the community
  • Aim of MariaDB: make MySQL obselete
  • Free MariaDB + MySQL knowledgebase available at askmonty.org
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment