Secondary Indexes in Cassandra
Secondary Index in Cassandra
In Cassandra data which is stored can be retrieved by using the partition key or entire primary key. Cassandra is not designed to retrieve data by using the elements which are not present in the primary key. If we use it that way it throws the following error.
Example:
CREATE TABLE ratings_by_title (
email TEXT,
title TEXT,
year INT STATIC,
rating INT,
PRIMARY KEY((title),email)
);
select * from ratings_by_movie where rating=8;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
This is where we use "allow filtering" this can give data requested for but will hit each and every node as it does not know the exact location. Hence it is not good to use allow filtering.
select * from ratings_by_movie where rating=8 Allow filtering;
Creating secondary indexes is a common practice in traditional RDBMS. But it is not recommended in Cassandra. When indexes are created, a hidden table is created in a background process. To query a secondary index the partition key and secondary index column should be included in order to be successful. By including the partition key and the secondary index column only one node will need to be queried.
Secondary indexes should not be used on tables that are frequently updated. Interestingly, Cassandra does not eliminate tombstones beyond 100 thousand cells. Once the tombstone limit is reached a query using the indexed value will fail. Secondary indexes should also be avoided in looking for values contained in a large partition unless the query is very narrow.
Secondary Indexes do not support ranged queries ( WHERE rating > 8 ). They can only be used on equality queries. Also, maintaining indexes through hidden tables means they are going through a separate compaction process. . Independently compacting sstables and indexes means the location of the data and the index information are completely decoupled. If the data is compacted, a new sstable is written, and our index is now incorrect. This means we can’t simply point to a location on disk in an index because the location of the data can change.
Comments
Post a Comment