Posts

Cassandra's Read Path

Image
 Cassandra's Read Path Cassandra is a NoSql database that is designed for heavy writes. So the read path of Cassandra is a little complex. Every node stores its data in immutable sstables. So there can be multiple sstables for one table of data. So in order to read something from Cassandra, we have to search all the sstables which are responsible for that data. So we have multiple components which are useful for optimizing the sstable search. They are: 1. Bloom Filter:   Bloom filter says "Hey! The data you looking for doesn't exist here!" Meaning it helps us to find the correct sstable. But sometimes BF may give false positives(sometimes it says the data is in this sstable but we end up not finding the data.), but never gives false negatives (If BF says data is not there in a particular sstable it'll not be there.) 2. KeyCache: KeyCache stores the byte offset value of particular data which is already viewed. 3. Partition Summary: It is used when the partition ind...

Authorization in Cassandra

Image
Authorization in Cassandra  Authorization in Cassandra is disabled by default. This grants all permissions to all roles. But disabling authorization is not used in production deployment. Cassandra has role based access control and using this we can configure proper access profile and schema access limitations. To enable authorization we must enable authorizer in cassandra.yaml file. By default: After enabling authorization: Now restart the node using the following command: ndoetool drain; nodetool stopdaemon; cassandra Once we enable authorization we have to start creating roles.  Let's create a dba role, which has to have all the permissions on all the keyspaces. Create an sales_admin role, which has to have all permissions on that particular keysapce. Create an read_only role, which has to have only select access on all the keyspace. High level roles as place holder for all roles. create role 'dba_role' with login=false; create role 'sales_admin' login=false; crea...

Authentication in Cassandra

Image
Authentication in Cassandra Authentication in Cassandra is disabled by default. Which allows any one on your network to connect to the database.  To enable authentication we must make changes in the main configuration file in Cassandra, the "Cassandra.yaml" file.  The default file: Changes to perform in cassandra.yaml file in order to enable authentication: Make the above changes and save the file. The changes to make effect on the node we must restart the node. Command used for restarting the node: nodetool drain; nodetool stopdaemon; cassandra  Now if we try to connect with the database. It throws the following error. Now it is asking for username and password. The default username and password after enabling authentication is  username = cassandra password = cassandra cqlsh -u cassandra -p cassandra Every one who uses Cassandra knows about this default username and password. So, if we continue with the same user and password then there is no need of enabling authe...

Cassandra's Write Path

Image
Cassandra's Write Path As Cassandra is designed for heavy writes, writing in Cassandra is a piece of cake. In Cassandra any input is taken as write. Insert, Update, Delete, Alter all these operations are considered as writes in Cassandra. Components of Write Path There are only three main elements in Cassandra. They are: Commitlog Memtable SStable  Commit Log: It is the disk component. It is the append only storage in the disk. When a write operation is going on, the data will be reaching the commit log first and gets appended.  Mem Table: It is the memory component. After the write is written in the commit log it will immediately write the data in the mem table in a sorted order. SS Table: It is also a disk component. The full form of SS Table is "Sorted Strings Table". SS tables are immutable.  WRITE PATH: The best thing in Cassandra is any node in the Cassandra cluster can respond to the client's request. That node is called as Coordinator node. Coordinator nod...

Cassandra Reaper Configuration

Image
Cassandra Reaper  Reaper is an open source tool that aims to schedule and orchestrate repairs of Apache Cassandra clusters. It improves the existing nodetool repair process by Splitting repair jobs into smaller tunable segments. Handling back-pressure through monitoring running repairs and pending compactions. Adding ability to pause or cancel repairs and track progress precisely. It also gives us a simple web interface to schedule, run, pause or stop the repair process. Cassandra Reaper Configuration >> The main prerequisite to configure reaper is there must be some backend system to store the reaper data.  >> These may be: In-Memory Cassanda PostgresQL H2 Astra >> I'm choosing Cassandra as my backend. For that I must have Cassandra running on my machine. >> Then visit this website http://cassandra-reaper.io/docs/ to know about the detailed documentation  >> I downloaded rpm from the below link: >> Reaper download     ...

Secondary Indexes in Cassandra

Image
Secondary Index in Cassandra In Cassandra data which is stored can be retrieved by using the partition key or entire primary key. Cassandra is not designed to retrieve data by using the elements which are not present in the primary key. If we use it that way it throws the following error.  Example:  CREATE TABLE ratings_by_title ( email TEXT, title TEXT, year INT STATIC, rating INT, PRIMARY KEY((title),email) ); select * from ratings_by_movie where rating=8; InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING" This is where we use "allow filtering" this can give data requested for but will hit each and every node as it does not know the exact location. Hence it is not good to use allow filtering. select * from ratings_by_movie where rating=8 Allow fi...

Cassandra's Tunable Consistency

Cassandra's Tunable Consistency Cassandra is an AP (Available and partition tolerant) system. But it also provides tunable consistency. Consistency can be defined as the same data available on all the replica nodes even concurrent updates are done. High consistency gives low availability. For read queries consistency is given as the number of replica nodes must respond for that particular read. For write queries consistency is given as number of replica nodes must write the data. There are types of consistency levels in Cassandra. They are: One : One of the replica node must respond to the query. Two: Two of the replica nodes must respond to the query. Quorum: These many nodes ("(RF/2)+1")  must respond from the cluster. where RF is the replication factor. For example if we have RF = 3 then two nodes must respond to the query.  Each_Quorum: Simply, must read from the maximum number of replica nodes from each data center. Local_Quorum: Must read from the maximum numbe...