ArangoDB — Is Multi-Model the Future of NoSQL? Boston Software Engineers User Group 4 November 2014 Max Neunhöffer www.arangodb.com Max Neunhöffer I am a mathematician “Earlier life”: Research in Computer Algebra (Computational Group Theory) Always juggled with big data Now: working in database development, NoSQL, ArangoDB I like: research, hacking, teaching, tickling the highest performance out of computer systems. 1 Polyglot Persistence Idea Use the right data model for each part of a system. For an application, persist an object or structured data as a JSON document, a hash table in a key/value store, relations between objects in a graph database, a homogeneous array in a relational DBMS. If the table has many empty cells or inhomogeneous rows, use a column-based database. Take scalability needs into account! 2 Document and Key/Value Stores Document store A document store stores a set of documents, which usually means JSON data, these sets are called collections. The database has access to the contents of the documents. each document in the collection has a unique key secondary indexes possible, leading to more powerful queries different documents in the same collection: structure can vary no schema is required for a collection database normalisation can be relaxed Key/value store Opaque values, only key lookup without secondary indexes: =⇒ high performance and perfect scalability 3 Graph Databases Graph database A graph database stores a labelled graph. Vertices and edges are documents. Graphs are good to model relations. graphs often describe data very naturally (e.g. the facebook friendship graph) graphs can be stored using tables, however, graph queries notoriously lead to expensive joins there are interesting and useful graph algorithms like “shortest path” or “neighbourhood” need a good query language to reap the benefits horizontal scalability is troublesome graph databases vary widely in scope and usage, no standard 4 A typical Use Case — an Online Shop We need to hold customer data: usually homogeneous, but still variations =⇒ use a document store: product data: even for a specialised business quite inhomogeneous =⇒ use a document store: shopping carts: need very fast lookup by session key =⇒ use a key/value store: order and sales data: relate customers and products =⇒ use a document store: recommendation engine data: links between different entities =⇒ use a graph database: 5 Polyglot Persistence is nice, but . . . Consequence: One needs multiple database systems in the persistence layer of a single project! Polyglot persistence introduces some friction through data synchronisation, data conversion, increased installation and administration effort, more training needs. Wouldn’t it be nice, . . . . . . to enjoy the benefits without the disadvantages? 6 The Multi-Model Approach Multi-model database A multi-model database combines a document store with a graph database and a key/value store. Vertices are documents in a vertex collection, edges are documents in an edge collection. a single, common query language for all three data models is able to compete with specialised products on their turf allows for polyglot persistence using a single database queries can mix the different data models can replace a RDMBS in many cases 7 A Map of the NoSQL Landscape Operational DBs Complex queries Map/reduce Extensibility Column Stores Documents Structured Data Graphs Analytic DBs Massively distributed Key/Value 8 is a multi-model database (document store & graph database), is open source and free (Apache 2 license), offers convenient queries (via HTTP/REST and AQL), including joins between different collections, is memory efficient by shape detection, uses JavaScript throughout (Google’s V8 built into server), API extensible by JavaScript code in the Foxx framework, offers many drivers for a wide range of languages, is easy to use with web front end and good documentation, and enjoys good community as well as professional support. 9 A Map of the NoSQL Landscape Operational DBs Complex queries Map/reduce Extensibility Column Stores Documents Structured Data Graphs Analytic DBs Massively distributed Key/Value 10 The ArangoDB Territory Operational DBs Complex queries Map/reduce Extensibility Column Stores Documents Structured Data Graphs Analytic DBs Massively distributed Key/Value 11 Strong Consistency ArangoDB offers atomic and isolated CRUD operations for single documents, transactions spanning multiple documents and multiple collections, snapshot semantics for complex queries, very secure durable storage using append only and storing multiple revisions, all this for documents as well as for graphs. In the (near) future, ArangoDB will offer the same ACID semantics even with sharding, implement complete MVCC semantics to allow for lock-free concurrent transactions. 12 Replication and Sharding — horizontal scalability Right now, ArangoDB provides easy setup of (asynchronous) replication, which allows read access parallelisation (master/slaves setup), sharding with automatic data distribution to multiple servers. Very soon, ArangoDB will feature fault tolerance by automatic failover and synchronous replication in cluster mode, zero administration by a self-reparing and self-balancing cluster architecture. 13 Powerful query language: AQL The built in Arango Query Language AQL allows complex, powerful and convenient queries, with transaction semantics, allowing to do joins, with user definable functions (in JavaScript). AQL is independent of the driver used and offers protection against injections by design. For Version 2.3, we are reengineering the AQL query engine: use a C++ implementation for high performance, optimise distributed queries in the cluster. 14 Extensible through JavaScript and Foxx The HTTP API of ArangoDB can be extended by user-defined JavaScript code, that is executed in the DB server for high performance. This is formalised by the Foxx framework, which allows to implement complex, user-defined APIs with direct access to the DB engine. Very flexible and secure authentication schemes can be implemented conveniently by the user in JavaScript. Because JavaScript runs everywhere (in the DB server as well as in the browser), one can use the same libraries in the back-end and in the front-end. =⇒ implement your own micro services 15
© Copyright 2024