Cassandra's Read Path
Cassandra's Read Path
Cassandra is a NoSql database that is designed for heavy writes. So the read path of Cassandra is a little complex.
Every node stores its data in immutable sstables. So there can be multiple sstables for one table of data.
So in order to read something from Cassandra, we have to search all the sstables which are responsible for that data.
So we have multiple components which are useful for optimizing the sstable search. They are:
1. Bloom Filter:
Bloom filter says "Hey! The data you looking for doesn't exist here!" Meaning it helps us to find the correct sstable. But sometimes BF may give false positives(sometimes it says the data is in this sstable but we end up not finding the data.), but never gives false negatives (If BF says data is not there in a particular sstable it'll not be there.)
2. KeyCache: KeyCache stores the byte offset value of particular data which is already viewed.
3. Partition Summary: It is used when the partition index is very large. It will group the indexes so that we can search easily.
4. Partition Index: The partition index consists of the byte offset value and the partition key of the data stored in the sstable.
If we know the byte offset value makes the sstable search easier.
When a coordinator node receives the request it will send the acknowledgement to all the replicas.
The coordinator requests the original data from one node and digest values from other replica nodes. The coordinator will compare the hash values of the obtained data.
>> If the hash values match with each other it will send the data to the client.
>> If the hash values mismatch then it will repair the inconsistencies based on the timestamp and simultaneously sends the data to the client. This is called read repair. This takes place only if we are using a consistency of more than 1.
So, let's see how the read operation will be when a read request hit the replica node.
>> It starts with bloom filter, It tells us, don't check that sstable, The partition you are looking for doesn't exist there.
>> BF never gives you False negatives. but BF may result in False positives. ( meaning It tells you the partitions exists in sstables, but it may not exists)
>> If sstable passes the BF, then it checks the keycahce to see if it contains the offset of the partition key in the sstable.
>> If it finds it, it will skip the two index checks and reads the data from sstable directly.
>> If it didn't fount it, then it checks the Partition Summary and Partition Index, and then it gets the offset information and reads the data from sstable.
>> After that, it will updates the key cache with a partition key and offset value.
>> Finally it will return the data to the coordinator node.
Comments
Post a Comment