Cassandra

Cassandra is by default an AP (Available Partition-tolerant) database, hence it is “always on”. But you can indeed configure the consistency on a per-query basis.

my questions

challenges

  • full multi-primary database replication
  • global availability at low latency
  • scaling out on comodity hardware
  • linear throughput increase with each additional processor
  • online load balancing and cluster growth
  • partitioned key-oriented queries
  • flexible schema

basic knowledges

  • keyspace (database)
  • table
  • partition (primary index)
  • row
  • column

storage engine

  • logging data in the commit log
  • writing data to the memtable
  • flushing data from memtable
  • storing data on disk in SSTables

commit log

memtable

SSTables

SSTables are the immutable data files that Cassandra uses for persisting data on disk. SSTables are maintained per table.

  • data.db - contents of rows
  • partitions.db
  • rows.db
  • index.db
  • summary.db
  • filter.db - bloom filter
  • CompressionInfo.db
  • statistics.db
  • digest.crc32
  • TOC.txt
  • SAI*.db

summit keynote

messaging summary

  • each node starts a gossip round every second
  • 1-3 peers per round
  • 3 messages passed
  • constant amount of network traffic

pratical implications

  • who is in the cluster?
    • gossip with a seed on startup
    • learn all peers
    • gossip
    • lather, rinse, repeat
  • how are peers judged UP or DOWN
    • what does UP/DOWN mean
      • local to each node
      • determined via heartbeat
    • failure detector
      • glorified heartbeat listener
      • records timestamp when heartbeat update is received for each peer
      • keeps backlog of timestamp intervals between updates
      • periodically check all peers to make sure we’ve heard from them recently
    • UP/DOWN affects
      • stop sending writes (hints)
      • sending reads
      • gossip
      • repair/stream sessions are terminated
  • when does a node stop sending a peer traffic
  • when is one peer preferred over another
    • dynamic snitch to rank all peers’ latency
  • when does a node leave the cluster

Query Structure

.

cql

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
-- Create a keyspace
CREATE KEYSPACE IF NOT EXISTS store WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : '1' };

-- Create a table
CREATE TABLE IF NOT EXISTS store.shopping_cart (
userid text PRIMARY KEY,
item_count int,
last_update_timestamp timestamp
);

-- Insert some data
INSERT INTO store.shopping_cart
(userid, item_count, last_update_timestamp)
VALUES ('9876', 2, toTimeStamp(now()));
INSERT INTO store.shopping_cart
(userid, item_count, last_update_timestamp)
VALUES ('1234', 5, toTimeStamp(now()));

misc

configuration

cassandra.yaml

source code

  • to hack the source code locally, we had to use ant (to build the project), also to generate files by using command ant generate-idea-files, after this, the IDEA starts to learn how to source the libraries.

references

Licensed under CC BY-NC-SA 4.0
Get Things Done
Built with Hugo
Theme Stack designed by Jimmy