ScyllaDB University Live | Free Virtual Training Event
Learn more
ScyllaDB Documentation Logo Documentation
  • Deployments
    • Cloud
    • Server
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
    • Supported Driver Versions
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Install
Ask AI
ScyllaDB Docs Scylla Operator Management Automatic data cleanup

Caution

You're viewing documentation for an unstable version of Scylla Operator. Switch to the latest stable version.

Automatic data cleanup¶

This document explains the automatic data cleanup feature in Scylla Operator.

Overview¶

Scylla Operator automates the execution of node cleanup procedures following cluster scaling operations. This feature ensures that your ScyllaDB cluster maintains storage efficiency and data integrity by removing stale data and preventing data resurrection.

Note

While ScyllaDB performs automatic data cleanup for tablet-enabled keyspaces, it does not cover system or standard vnode-based keyspaces. To bridge this gap, Scylla Operator implements automatic cleanup for these remaining keyspaces.

Reasons for running cleanup¶

When you scale a ScyllaDB cluster horizontally (adding or removing nodes), the ownership of data tokens changes across the cluster (topology changes). Data ownership is determined by the token ranges assigned to each node in the cluster. Cleanups address the issues that may arise from the redistribution of token ownership:

  1. Stale data: When a node loses ownership of certain tokens, the data associated with those tokens remains on the node’s disk.

  2. Data resurrection: If you do not remove this stale data, it can lead to data resurrection later.

Cleanup triggering mechanism¶

Cleanup is triggered after scaling operations (adding or removing nodes/racks) are complete. It runs on all nodes that have changed token ownership as a result of the scaling operation. This also includes the cluster bootstrap as nodes are added one by one and the token ring changes with each addition.

During scale-out, cleanup runs on all nodes except the last one added, as it does not lose any tokens.

How cleanup is triggered¶

Scylla Operator tracks changes in the token ring of the cluster. When a mismatch is detected between the current state of the ring and the last cleaned-up state, it triggers a cleanup job for the affected nodes. Before triggering cleanup jobs, Scylla Operator ensures the cluster is stable (conditions: StatefulSetControllerProgressing=False, Available=True, and Degraded=False).

Note

Because Scylla Operator relies strictly on token ring changes, there are some limitations, which are described in the Known limitations section.

Inspecting cleanup jobs¶

Scylla Operator creates a Job for each node that requires cleanup. If any job is still running, the ScyllaCluster status contains the JobControllerProgressing condition set to True. When a job completes successfully, Scylla Operator removes it from the cluster. The condition’s message contains the name of the running job(s).

When no cleanup jobs are running, the JobControllerProgressing condition is set to False.

You can inspect the condition by running:

kubectl get scyllacluster <sc-name> -o jsonpath='{.status.conditions[?(@.type=="JobControllerProgressing")]}' | jq

Where <sc-name> is the name of your ScyllaCluster.

You should see output similar to this (when no jobs are running):

{
  "lastTransitionTime": "2025-12-12T13:51:52Z",
  "message": "",
  "observedGeneration": 2,
  "reason": "AsExpected",
  "status": "False",
  "type": "JobControllerProgressing"
}

Caution

You may not see the cleanup jobs in the cluster if they complete quickly (e.g., on small datasets or cluster bootstrap) and are removed before you can inspect them.

To ensure they were created and completed, you can inspect Kubernetes events:

kubectl get events | grep job

The output should contain entries similar to:

30m  Normal  JobCreated        job/cleanup-scylla-us-east-1-us-east-1a-0  Job default/cleanup-scylla-us-east-1-us-east-1a-0 created
30m  Normal  SuccessfulCreate  job/cleanup-scylla-us-east-1-us-east-1a-0  Created pod: cleanup-scylla-us-east-1-us-east-1a-0-tpd7x
30m  Normal  Completed         job/cleanup-scylla-us-east-1-us-east-1a-0  Job completed
30m  Normal  JobDeleted        job/cleanup-scylla-us-east-1-us-east-1a-0  Job default/cleanup-scylla-us-east-1-us-east-1a-0 deleted

You can see that for each cleanup job, there are events for job creation, pod creation, job completion, and job deletion.

Known limitations¶

Scylla Operator triggers cleanup based on token ring changes. While this approach is safe, it has some side effects where cleanup may be triggered unnecessarily or not at all.

Unnecessary cleanups on node decommission¶

When you decommission a node, tokens are redistributed to the remaining nodes. These nodes do not require cleanup since they are not losing any tokens. However, Scylla Operator still triggers cleanup for these nodes due to the token ring change. This is a safe operation, but it may lead to an unnecessary spike in I/O load.

Missed cleanup on RF changes¶

When you decrease the replication factor (RF) of a keyspace, the token ring remains unchanged. Thus, Scylla Operator does not detect the need for cleanup. You should trigger cluster cleanup manually in such cases (run on any node):

kubectl exec -it service/<sc-name>-client -c scylla -- nodetool cluster cleanup

Where <sc-name> correspond to your ScyllaCluster name.

Was this page helpful?

PREVIOUS
Synchronising bootstrap operations in ScyllaDB clusters
NEXT
Upgrading
  • Create an issue
  • Edit this page

On this page

  • Automatic data cleanup
    • Overview
    • Reasons for running cleanup
    • Cleanup triggering mechanism
      • How cleanup is triggered
    • Inspecting cleanup jobs
    • Known limitations
      • Unnecessary cleanups on node decommission
      • Missed cleanup on RF changes
Scylla Operator
  • master
    • v1.19
    • v1.18
    • v1.17
    • master
  • Architecture
    • Overview
    • Storage
      • Overview
      • Local CSI Driver
    • Tuning
    • ScyllaDB Manager
  • Installation
    • Overview
    • Kubernetes prerequisites
    • GitOps (kubectl)
    • Helm
  • Management
    • Configuring kernel parameters (sysctls)
    • Synchronising bootstrap operations in ScyllaDB clusters
    • Automatic data cleanup
    • Upgrading
      • Upgrading Scylla Operator
      • Upgrading ScyllaDB clusters
    • Monitoring
      • ScyllaDB Monitoring overview
      • Setting up ScyllaDB Monitoring
      • Exposing Grafana
      • Setting up ScyllaDB Monitoring on OpenShift
  • Resources
    • Overview
    • ScyllaClusters
      • ScyllaClusters
      • ScyllaDB clients
        • Discovering ScyllaDB Nodes
        • Using CQL
        • Using Alternator (DynamoDB)
      • Node operations using Scylla Operator
        • Upgrading version of Scylla
        • Replacing a Scylla node
        • Automatic cleanup and replacement in case when k8s node is lost
        • Maintenance mode
        • Restore from backup
        • Resizing storage in ScyllaCluster
      • Deploying multi-datacenter ScyllaDB clusters in Kubernetes
        • Build multiple Amazon EKS clusters with inter-Kubernetes networking
        • Build multiple GKE clusters with inter-Kubernetes networking
        • Deploy a multi-datacenter ScyllaDB cluster in multiple interconnected Kubernetes clusters
      • Exposing ScyllaDB cluster
    • ScyllaDBClusters
      • ScyllaDBClusters
      • Exposing ScyllaDB cluster
    • NodeConfigs
    • ScyllaOperatorConfigs
    • RemoteKubernetesCluster
  • Quickstarts
    • Deploying ScyllaDB on GKE
    • Deploying ScyllaDB on EKS
  • Support
    • Support overview
    • Known issues
    • Troubleshooting
      • Troubleshooting installation issues
    • Gathering data with must-gather
    • Releases
  • Reference
    • API Reference
      • scylla.scylladb.com
        • NodeConfig (scylla.scylladb.com/v1alpha1)
        • RemoteKubernetesCluster (scylla.scylladb.com/v1alpha1)
        • RemoteOwner (scylla.scylladb.com/v1alpha1)
        • ScyllaCluster (scylla.scylladb.com/v1)
        • ScyllaDBCluster (scylla.scylladb.com/v1alpha1)
        • ScyllaDBDatacenterNodesStatusReport (scylla.scylladb.com/v1alpha1)
        • ScyllaDBDatacenter (scylla.scylladb.com/v1alpha1)
        • ScyllaDBManagerClusterRegistration (scylla.scylladb.com/v1alpha1)
        • ScyllaDBManagerTask (scylla.scylladb.com/v1alpha1)
        • ScyllaDBMonitoring (scylla.scylladb.com/v1alpha1)
        • ScyllaOperatorConfig (scylla.scylladb.com/v1alpha1)
    • Feature Gates
Docs Tutorials University Contact Us About Us
© 2025, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 29 December 2025.
Powered by Sphinx 8.1.3 & ScyllaDB Theme 1.8.10
Ask AI