Cassandra's Replication

Replication Factor: Replication in Cassandra can be defined as the number of copies of data present in the different nodes in the cluster. The main factor for availability in Cassandra is replication factor. Replication is set at KEYSPACE level.

Example: If we have a 3 node Cassandra cluster and if we give replication factor as 3, then the data will be available in all the 3 nodes. So, in this case we can get availability even if we loose 2 nodes. Hence replication factor is the one which gives us high availability in Cassandra.

The data will be stored in the cluster based on the hash value of the partition key. If the data's hash value falls under the particular token range then the data will be sent to that particular node. This node behaves as the primary token range. The storage of remaining replicas of data among the nodes can be described by using "replication strategies".

Replication Strategies:

There are two types of replication strategies. They are:

SimpleStrategy
NetworkTopologyStrategy

SimpleStrategy: This strategy stores the data consecutive nodes starting with the node which has primary token range. In this case it doesn't place the data in different rack or different datacenter. So, if we have any problem with that particular rack we can't access the data as all the replicas are in the same rack. Hence this strategy is not for production deployments.

NetworkTopologyStrategy: This strategy allows us to have different replication factors for different datacenters. Within the datacenter also it stores the replica data in the different racks. So, at any failure of particular rack or the datacenter will not affect the availability. Hence, for production deployments NetworkTopologyStrategy is preferred.

Search This Blog

KKR's Tutorial