Was this page helpful?
Caution
You're viewing documentation for an unstable version of Scylla Operator. Switch to the latest stable version.
Automatic data cleanup¶
This document explains the automatic data cleanup feature in Scylla Operator.
Overview¶
Scylla Operator automates the execution of node cleanup procedures following cluster scaling operations. This feature ensures that your ScyllaDB cluster maintains storage efficiency and data integrity by removing stale data and preventing data resurrection.
Note
While ScyllaDB performs automatic data cleanup for tablet-enabled keyspaces, it does not cover system or standard vnode-based keyspaces. To bridge this gap, Scylla Operator implements automatic cleanup for these remaining keyspaces.
Reasons for running cleanup¶
When you scale a ScyllaDB cluster horizontally (adding or removing nodes), the ownership of data tokens changes across the cluster (topology changes). Data ownership is determined by the token ranges assigned to each node in the cluster. Cleanups address the issues that may arise from the redistribution of token ownership:
Stale data: When a node loses ownership of certain tokens, the data associated with those tokens remains on the node’s disk.
Data resurrection: If you do not remove this stale data, it can lead to data resurrection later.
Cleanup triggering mechanism¶
Cleanup is triggered after scaling operations (adding or removing nodes/racks) are complete. It runs on all nodes that have changed token ownership as a result of the scaling operation. This also includes the cluster bootstrap as nodes are added one by one and the token ring changes with each addition.
During scale-out, cleanup runs on all nodes except the last one added, as it does not lose any tokens.
How cleanup is triggered¶
Scylla Operator tracks changes in the token ring of the cluster. When a mismatch is detected between the current state
of the ring and the last cleaned-up state, it triggers a cleanup job for the affected nodes. Before triggering
cleanup jobs, Scylla Operator ensures the cluster is stable (conditions: StatefulSetControllerProgressing=False,
Available=True, and Degraded=False).
Note
Because Scylla Operator relies strictly on token ring changes, there are some limitations, which are described in the Known limitations section.
Inspecting cleanup jobs¶
Scylla Operator creates a Job for each node that requires cleanup. If any job is still running, the ScyllaCluster
status contains the JobControllerProgressing condition set to True. When a job completes successfully, Scylla Operator removes it
from the cluster. The condition’s message contains the name of the running job(s).
When no cleanup jobs are running, the JobControllerProgressing condition is set to False.
You can inspect the condition by running:
kubectl get scyllacluster <sc-name> -o jsonpath='{.status.conditions[?(@.type=="JobControllerProgressing")]}' | jq
Where <sc-name> is the name of your ScyllaCluster.
You should see output similar to this (when no jobs are running):
{
"lastTransitionTime": "2025-12-12T13:51:52Z",
"message": "",
"observedGeneration": 2,
"reason": "AsExpected",
"status": "False",
"type": "JobControllerProgressing"
}
Caution
You may not see the cleanup jobs in the cluster if they complete quickly (e.g., on small datasets or cluster bootstrap) and are removed before you can inspect them.
To ensure they were created and completed, you can inspect Kubernetes events:
kubectl get events | grep job
The output should contain entries similar to:
30m Normal JobCreated job/cleanup-scylla-us-east-1-us-east-1a-0 Job default/cleanup-scylla-us-east-1-us-east-1a-0 created
30m Normal SuccessfulCreate job/cleanup-scylla-us-east-1-us-east-1a-0 Created pod: cleanup-scylla-us-east-1-us-east-1a-0-tpd7x
30m Normal Completed job/cleanup-scylla-us-east-1-us-east-1a-0 Job completed
30m Normal JobDeleted job/cleanup-scylla-us-east-1-us-east-1a-0 Job default/cleanup-scylla-us-east-1-us-east-1a-0 deleted
You can see that for each cleanup job, there are events for job creation, pod creation, job completion, and job deletion.
Known limitations¶
Scylla Operator triggers cleanup based on token ring changes. While this approach is safe, it has some side effects where cleanup may be triggered unnecessarily or not at all.
Unnecessary cleanups on node decommission¶
When you decommission a node, tokens are redistributed to the remaining nodes. These nodes do not require cleanup since they are not losing any tokens. However, Scylla Operator still triggers cleanup for these nodes due to the token ring change. This is a safe operation, but it may lead to an unnecessary spike in I/O load.
Missed cleanup on RF changes¶
When you decrease the replication factor (RF) of a keyspace, the token ring remains unchanged. Thus, Scylla Operator does not detect the need for cleanup. You should trigger cluster cleanup manually in such cases (run on any node):
kubectl exec -it service/<sc-name>-client -c scylla -- nodetool cluster cleanup
Where <sc-name> correspond to your ScyllaCluster name.