Not Only SQL
Table of Content
Background and history
Used Applications
What is Cassandra? – Overview
Replication & Consistency
Writing, Reading, Querying and Sorting
API’s & Installation
World Database in Cassandra
Using Hector API
Administration tools
Background
Influential Technologies:
Dynamo – Fully distributed design - infrastructure
BigTable – Sparse data model
Other NoSql databases
NoSql
Big Data NoSql
MongoDB
Hypertab
Neo4J
Cassandra
HyperGra
Riak
Memcach
Voldemort
Tokyo Ca
HBase
Redis
CouchDB
Bigtable / Dynamo
Bigtable
Dynamo
Hbase
Hypertable
Riak
Voldemort
Cassandra Combination of Both
CAP Theorem
Consistency
Availability
Partition Tolerance
Applications
Facebook
Google Code
Apache
Digg
Twitter
Rackspace
Others…
What Is Cassandra?
O(1) node lookup
Key – Value Store
Column based data store
Highly Distributed – decentralized (no master\slave)
Elasticity
Durable, Fault-tolerant - Replications
Sparse
ACID NoSQL!
Overview – Data Model
Keyspace
Uppermost namespace
Typically one per application
Column
Basic unit of storage – Name, Value and timestamp
ColumnFamily
Associates records of a similar kind
Record-level Atomicity
Indexed
SuperColumn
Columns whose values are columns
Array of columns
SuperColumnFamily
ColumnFamily whose values are only SuperColumns
Examples
Column - City:
ORANJESTAD {"id": 1,
"name": "ORANJESTAD",
"population": 33000,
"capital": true}
SuperColumns – Country:
Aruba {"id": "aa",
"name": "Aruba",
"fullName": "Aruba“,
"location": "Caribbean, island in the Caribbean Sea, north of Venezuela",
"coordinates": {
"latitudeType": "N",
"latitude": 12.5,
"longitudeType": "W",
"longitude": 69.96667},
….
Replication & Consistency
Consistency Level is based on Replication Factor (N), nor
the number of nodes in the system.
The are a few options to set How many replicas must
respond to declare success
Query all replicas on every read
Every Column has a value and a timestamp – latest
timestamp wins
Read repair – read one replica and check the
checksum/timestamp to verify
R(number of nodes to read from) + W(number of nodes to
write on) > N (number of nodes)
The Ring - Partitioning
Each NODE has a single, unique TOKEN
Each NODE claims a RANGE of its neighbors in the
ring
Partitioning – Map from Key Space to Token – Can be
random or Order Preserving
Snitching – Map from Nodes to Physical Location
Writing
No Locks
Append support without read ahead
Atomicity guarantee for a key (in a ColumnFamily)
Always Writable!!!
SSTables – Key/data – SSTable file for each column
family
Fast
Reading
Wait for R responses
Wait for N – R responses in the background and
perform read repair
Read multiple SSTables
Slower than writes (but still fast)
Compare with MySQL (RDBMS)
Compare a 50GB Database:
MySQL
~300ms write
~350ms read
Cassandra
~0.12ms write
~15ms read
Queries
Single column
Slice
Set of names / range of names
Simple slice -> columns
Super slice -> supercolumns
Key range
Sorting
Sorting is set on writing
Sorting is set by the type of the Column/Supercolumn
keys
Sorting/keys Types
Bytes
UTF8
Ascii
LexicalUUID
TimeUUID
Drawbacks
No joins (for speed)
Not able to sort at query time
Not really supports sql (altough some API’s support it
on a very small portion)
API’s
Many API’s for large number of languages includes C++,
Java, Python, PHP, Ruby, Erlang, Haskell, C#,
Javascript and more…
Thrift interface – Driver level interface – hard to use.
Hector – a java Cassandra client – simple Column
based client – does what Cassandra is intended to do.
Kundera – JPA supported java client – tries to translate
JPA classes and attributes to Cassandra – good on
inserts, hard and problematic still with queries.
Cassandra Installation
Install prerequisite – basically the latest java se release
Extract the Cassandra Zip files to your requested path
Run Bin/cassandra.but –f
Cassandra node is up and running
World database in cassandra
World - Keyspace
Countries – SuperColumn Family
CountryDetails – SuperColumn
Border – SuperColumns
Coordinates – SuperColumn
GDP – SuperColumn
Language – SuperColumns
Cities – Column Family
Using Hector API - definitions
Creating a Cassandra Cluster :
Cluster cluster = HFactory.getOrCreateCluster("WorldCluster", "localhost:9160");
Adding a keyspace:
columnFamilyDefinition.setKeyspaceName(WORLD_KEYSPACE);
Adding a Column:
BasicColumnFamilyDefinition columnFamilyDefinition = new
BasicColumnFamilyDefinition();
columnFamilyDefinition.setKeyspaceName(WORLD_KEYSPACE);
columnFamilyDefinition.setName(CITY_CF); // ColumnFamily Name
columnFamilyDefinition.addColumnDefinition(columnDefinition);
Using Hector API - definitions
Adding a SuperColumn:
BasicColumnFamilyDefinition superCfDefinition = new BasicColumnFamilyDefinition();
superCfDefinition.setKeyspaceName(WORLD_KEYSPACE);
superCfDefinition.setName(COUNTRY_SUPER);
superCfDefinition.setColumnType(ColumnType.SUPER);
Adding all definition to cluster:
ColumnFamilyDefinition cfDefStandard = new ThriftCfDef(columnFamilyDefinition);
ColumnFamilyDefinition cfDefSuper = new ThriftCfDef(superCfDefinition);
KeyspaceDefinition keyspaceDefinition =
HFactory.createKeyspaceDefinition(WORLD_KEYSPACE,
"org.apache.cassandra.locator.SimpleStrategy",
1, Arrays.asList(cfDefStandard, cfDefSuper));
cluster.addKeyspace(keyspaceDefinition);
Using Hector API - inserting
Creating a Column Template
ColumnFamilyTemplate<String, String> template =
new ThriftColumnFamilyTemplate<String, String>(keyspaceOperator,
columnFamilyName,
stringSerializer,
stringSerializer);
Adding a Row into a Column Family
ColumnFamilyUpdater<String, String> updater = template.createUpdater("a key");
updater.setString(“key", "value");
try { template.update(updater); }
catch (HectorException e) { // do something ... }
Using Hector API - inserting
Creating a Super Column Template
SuperCfTemplate<String,String, String> template =
new ThriftSuperCfTemplate<String, String, String>(keyspaceOperator,
columnFamilyName,
stringSerializer,
stringSerializer,
stringSerializer);
Adding a Row into a SuperColumn Family
SuperCfUpdater<String, String, String> updater = template.createUpdater("a key");
HSuperColumn<String, String, ByteBuffer> superColumn = updater.addSuperColumn(“sc name”);
superColumn.setString(“column name”, value);
superColumn.update();
try { template.update(updater); }
catch (HectorException e) { // do something ... }
Using Hector API - reading
Reading all Rows and it’s columns from a Column
Family (Using CQL)
CqlQuery<String,String,String> cqlQuery = new
CqlQuery<String,String,String>(factory.getKeyspaceOperator(), stringSerializer, stringSerializer,
stringSerializer);
cqlQuery.setQuery("select * from City");
QueryResult<CqlRows<String,String,String>> result = cqlQuery.execute();
Reading all columns from a Row in a SuperColumn
Family
SuperCfTemplate<String,String,String> superColumn =
HectorFactory.getFactory().getSuperColumnFamilyTemplate(“SuperColumnFamily”);
SuperCfResult<String, String, String> superRes = superColumn.querySuperColumns(“key");
Collection<String> columnNames = superRes.getSuperColumns();
Using Hector API - reading
Reading a SuperColumn from a Row in a SuperColumn
Family
SuperColumnQuery<String, String, String, String> query = HFactory.createSuperColumnQuery(keyspaceOperator,
stringSerializer,
stringSerializer, stringSerializer, stringSerializer);
query.setColumnFamily(“SuperColumnFamily”);
query.setKey(“key");
query.setSuperName(“SuperColumnName");
QueryResult<HSuperColumn<String, String, String>> result = query.execute();
for (HColumn<String, String> col : result.get().getColumns()) {
String name = col.getName();
String value = col.getValue();
}
Every query as options to get part of the rows – by
setting start value and end value (the rows are sorted
on inserting), and part of the columns by setting the
column names explicitly
Administration tools
Cassandra – node activator
Nodetool – bootstrapping and monitoring
Cassandra-cli – Application Console
Sstable2json - Export
Json2sstable - Import
© Copyright 2025