Posts

Showing posts from August, 2021

Cassandra's Write Path

Image
Cassandra's Write Path As Cassandra is designed for heavy writes, writing in Cassandra is a piece of cake. In Cassandra any input is taken as write. Insert, Update, Delete, Alter all these operations are considered as writes in Cassandra. Components of Write Path There are only three main elements in Cassandra. They are: Commitlog Memtable SStable  Commit Log: It is the disk component. It is the append only storage in the disk. When a write operation is going on, the data will be reaching the commit log first and gets appended.  Mem Table: It is the memory component. After the write is written in the commit log it will immediately write the data in the mem table in a sorted order. SS Table: It is also a disk component. The full form of SS Table is "Sorted Strings Table". SS tables are immutable.  WRITE PATH: The best thing in Cassandra is any node in the Cassandra cluster can respond to the client's request. That node is called as Coordinator node. Coordinator nod...

Cassandra Reaper Configuration

Image
Cassandra Reaper  Reaper is an open source tool that aims to schedule and orchestrate repairs of Apache Cassandra clusters. It improves the existing nodetool repair process by Splitting repair jobs into smaller tunable segments. Handling back-pressure through monitoring running repairs and pending compactions. Adding ability to pause or cancel repairs and track progress precisely. It also gives us a simple web interface to schedule, run, pause or stop the repair process. Cassandra Reaper Configuration >> The main prerequisite to configure reaper is there must be some backend system to store the reaper data.  >> These may be: In-Memory Cassanda PostgresQL H2 Astra >> I'm choosing Cassandra as my backend. For that I must have Cassandra running on my machine. >> Then visit this website http://cassandra-reaper.io/docs/ to know about the detailed documentation  >> I downloaded rpm from the below link: >> Reaper download     ...

Secondary Indexes in Cassandra

Image
Secondary Index in Cassandra In Cassandra data which is stored can be retrieved by using the partition key or entire primary key. Cassandra is not designed to retrieve data by using the elements which are not present in the primary key. If we use it that way it throws the following error.  Example:  CREATE TABLE ratings_by_title ( email TEXT, title TEXT, year INT STATIC, rating INT, PRIMARY KEY((title),email) ); select * from ratings_by_movie where rating=8; InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING" This is where we use "allow filtering" this can give data requested for but will hit each and every node as it does not know the exact location. Hence it is not good to use allow filtering. select * from ratings_by_movie where rating=8 Allow fi...

Cassandra's Tunable Consistency

Cassandra's Tunable Consistency Cassandra is an AP (Available and partition tolerant) system. But it also provides tunable consistency. Consistency can be defined as the same data available on all the replica nodes even concurrent updates are done. High consistency gives low availability. For read queries consistency is given as the number of replica nodes must respond for that particular read. For write queries consistency is given as number of replica nodes must write the data. There are types of consistency levels in Cassandra. They are: One : One of the replica node must respond to the query. Two: Two of the replica nodes must respond to the query. Quorum: These many nodes ("(RF/2)+1")  must respond from the cluster. where RF is the replication factor. For example if we have RF = 3 then two nodes must respond to the query.  Each_Quorum: Simply, must read from the maximum number of replica nodes from each data center. Local_Quorum: Must read from the maximum numbe...

Cassandra's Replication

Cassandra's Replication  Replication Factor: Replication in Cassandra can be defined as the number of copies of data present in the different nodes in the cluster. The main factor for availability in Cassandra is replication factor. Replication is set at KEYSPACE level. Example : If we have a 3 node Cassandra cluster and if we give replication factor as 3, then the data will be available in all the 3 nodes. So, in this case we can get availability even if we loose 2 nodes. Hence replication factor is the one which gives us high availability in Cassandra. The data will be stored in the cluster based on the hash value of the partition key. If the data's hash value falls under the particular token range then the data will be sent to that particular node. This node behaves as the primary token range. The storage of remaining replicas of data among the nodes can be described by using "replication strategies".  Replication Strategies:   There are two types of replication ...

Cassandra's-Gossip Protocol

 Gossip Protocol Cassandra uses gossip protocol to know the state information of nodes. The gossip protocol is named after the concept of human gossiping. This gossiping runs every second in the Cassandra cluster. We use nodetool command "nodetool gossipinfo". It gives the complete gossip information about all the nodes in the cluster. It gives state information, schema version, DC, Rack, Load etc. The nodes can gossip with any of their peers. If for some reason one node cannot gossip with it peers, it will directly go to the seed node which is given in the cassandra.yaml (main configuration file in Cassandra) of that particular node and asks the seed node to share the gossip.  Gossip protocols are also called as epidemic protocols. The main use of gossip protocol in Cassandra is "failure detection". Once per second, the gossiper chooses a random node in the cluster and initialize a gossip session with it. Every round of gossip requires three messages. 1. The gossip...

Cassandra CAP Theorem

Image
 Cassandra's CAP Theorem Cassandra's CAP theorem states that all consistency, availability, partition-tolerance cannot be given in one system. All the databases can have partition-tolerance but they must choose between consistency and availability. Cassandra is a AP database. It gives high availability and partition-tolerance and it also gives tunable consistency. We can tune our required consistency in Cassandra. But in Cassandra high consistency gives low availability. So we must choose our database based on our requirement.  CAP theorem is also called Brewer’s theorem. It is named after after its author, Eric Brewer. Consistency : All the users will read the same data for the same query, even when concurrent updates are done.  Availability : All the users will always be able to read and write data.  Partition Tolerance : The database can be split into multiple machines. It can continue functioning in the face of network segmentation breaks. Cassandra ...

Cassandra Keywords

Image
 Cassandra Keywords The main keywords which are used in Cassandra are Cluster, Datacenter, Node.         Cassandra Cluster                       Cluster:  The outermost layer in the Cassandra Architecture is a Cluster. Cluster may contain one or more datacenters. It can also contain one datacenter from US and another datacenter from India. This both datacenters may also become a part of one Cassandra cluster. As Cassandra is a highly available system if we maintain to different datacenters in two different places we can still get our required data if one of the datacenters is down due to some reasons.  Datacenter: Datacenter is a part of Cassandra cluster which contains nodes. It can have one single node or multiple nodes in one datacenter. The datacenter contains racks in which nodes are stored. By using replication strategies we can also place nodes of one datacenter into different racks. So that...

CASSANDRA

CASSANDRA       Basics of Cassandra Cassandra is a NoSql, distributed, highly available, fault-tolerant, tunable consistent, decentralized, linearly scalable database. Cassandra is a write heavy database. If your application is write heavy application then Cassandra is suggestable.  Cassandra is designed by using Amazon's Dynamo(for it's distribution design) and Google's Bigtable(for it's data modelling) with a query language which is similar to the SQL.  The basic understanding of Cassandra's characteristics: Distributed:  The data which is stored in the Cassandra is distributed among all the nodes of the Cassandra cluster. Hence it is a distributed system.       Highly Available: In Cassandra there is a replication factor which replicates the data into the other nodes makes it highly available.    Fault-Tolerant: In Cassandra the process of storing and retrieving of data is unstoppable also if there is a problem in one o...