Tombstone (data store)
A tombstone is a deleted record in a replica of a distributed data store.[1] The tombstone is necessary, as distributed data stores use eventual consistency, where only a subset of nodes where the data is stored must respond before an operation is considered to be successful.
Motivation
If information is deleted in an eventually-consistent distributed data store, the "eventual" part of the eventual consistency causes the information to ooze through the node structure, where some nodes may be unavailable at time of deletion. But a feature of eventual consistency causes a problem in case of deletion, as a node that was unavailable at that time will try to "update" the other nodes that no longer have the deleted entry, assuming that they have missed an insert of information. Therefore, instead of deleting the information, the distributed data store makes out of the information to be deleted a tombstone, which does not appear anymore to the user.[1]
Removal of tombstones
In order not to fill the data store with trash information, there is a policy to remove tombstones completely. For this, the system checks the age of the tombstone and will remove it after a prescribed time has elapsed. In Apache Cassandra, this elapsed time is set with the GCGraceSeconds
parameter.[1]
Consequences
Because of the delayed removal, the deleted information will appear as empty, after the content of some columns of a number of records has been deleted. After a compaction, the unused columns will be removed from these records.[2]
References
- 1 2 3 "DistributedDeletes". http://wiki.apache.org/cassandra/FrontPage: CassandraWiki. Retrieved 2011-04-13.
Thus, the "eventual" in eventual consistency: if a client reads from a replica that did not get the update with a low enough ConsistencyLevel, it will potentially see old data. [...] There's one more piece to the problem: how do we know when it's safe to remove tombstones? [...] [It] defined a constant, GCGraceSeconds, and had each node track tombstone age locally. Once it has aged past the constant, it can be GC'd during compaction (see MemtableSSTable).
- ↑ "User Guide: Dealing with Tombstones". https://github.com/: github SOCIAL CODING. Retrieved 2011-04-13.
To put this in the context of an example, say we have just created 10 rows of data with three columns each. If half the columns are later deleted, and a compaction has not yet occurred, these columns will show up in get_range_slices queries as empty. Using RangeSlicesQuery as described in the previous section, we would have 10 results returned, but only five of them will have values. More importantly, calls to get (via ColumnQuery) by design assume the Column you are retrieving exists in the store. Therefore if you call get on tombstoned data, null is returned (note: this is different than previous versions of Hector where the underlying NotFoundException was propagated up the stack).