ScyllaDB University Live | Free Virtual Training Event
Learn more
ScyllaDB Documentation Logo Documentation
  • Deployments
    • Cloud
    • Server
  • Tools
    • ScyllaDB Manager
    • ScyllaDB Monitoring Stack
    • ScyllaDB Operator
  • Drivers
    • CQL Drivers
    • DynamoDB Drivers
    • Supported Driver Versions
  • Resources
    • ScyllaDB University
    • Community Forum
    • Tutorials
Install
Search Ask AI
ScyllaDB Docs ScyllaDB Operator Understand Automatic data cleanup

Caution

You're viewing documentation for an unstable version of ScyllaDB Operator. Switch to the latest stable version.

Automatic data cleanup¶

This page explains why ScyllaDB Operator runs automatic data cleanup after scaling operations and how the mechanism works.

Why cleanup is needed¶

When a ScyllaDB cluster scales horizontally (nodes are added or removed), the ownership of data tokens changes. Nodes that lose ownership of certain token ranges still hold the corresponding data on disk. This stale data must be removed to:

  1. Reclaim storage — stale data wastes disk space unnecessarily.

  2. Prevent data resurrection — if stale data is not removed, it can reappear during repair or read operations, overriding newer deletions.

ScyllaDB handles cleanup automatically for keyspaces that use tablets. However, system keyspaces and standard vnode-based keyspaces are not covered by this automatic mechanism. ScyllaDB Operator fills the gap by triggering cleanup on all keyspaces — the cleanup of tablet-based keyspaces is a no-op on the server side.

Trigger mechanism¶

The Operator tracks the token ring of each ScyllaDB cluster. When the ring changes — because a node was added, removed, or replaced — the Operator compares the current ring state against the last state for which cleanup was completed. If they differ, cleanup Jobs are created for all nodes that were affected by the token redistribution.

Before creating any Jobs, the Operator waits for the cluster to reach a stable state:

  • StatefulSetControllerProgressing is False.

  • Available is True.

  • Degraded is False.

This ensures that cleanup runs only after the scaling operation has fully completed and the cluster is healthy.

What triggers cleanup¶

  • Scale-out — after a new node finishes bootstrapping. Cleanup runs on the pre-existing nodes whose token ring hash changed. The newly added node is not cleaned up because its member Service is initialized with matching hashes.

  • Scale-in (decommission) — after a node is removed. The remaining nodes inherit its tokens but technically do not need cleanup (they did not lose tokens). The Operator still triggers cleanup because the token ring changed. This is safe but may cause a brief I/O spike.

  • Initial cluster bootstrap — when a node’s member Service is first created, the Operator initializes last-cleaned-up-token-ring-hash to the current token ring hash. Because the hashes start equal, no cleanup is triggered during initial bootstrap.

Cleanup Job details¶

The Operator creates one Kubernetes Job per affected node. Each Job runs the scylla-operator cleanup-job subcommand, which connects to the ScyllaDB REST API on the target node through the Manager Agent proxy (port 10001) and runs cleanup on every keyspace. The Job pod authenticates using a Manager Agent auth token mounted from a Secret.

When a cleanup Job completes successfully, the Operator deletes it. If a Job is still running, the ScyllaCluster status shows the JobControllerProgressing condition set to True with a message listing the active Job names.

Inspecting cleanup status¶

Check whether cleanup is in progress:

kubectl get scyllacluster <name> -o jsonpath='{.status.conditions[?(@.type=="JobControllerProgressing")]}' | jq

When no cleanup Jobs are running:

{
  "status": "False",
  "type": "JobControllerProgressing",
  "reason": "AsExpected"
}

Cleanup Jobs may complete and be deleted before you can observe them. To verify they ran, check Kubernetes events on the ScyllaDBDatacenter resource:

kubectl get events --field-selector involvedObject.name=<name>

Events are emitted by the resource apply framework when Jobs are created, updated, or deleted.

Known limitations¶

Replication factor changes are not detected¶

Decreasing the replication factor of a keyspace does not change the token ring — the same nodes own the same token ranges, but fewer replicas are needed. The Operator does not detect this and does not trigger cleanup. Run cleanup manually:

kubectl exec -it service/<cluster-name>-client -c scylla -- nodetool cleanup

Unnecessary cleanup on decommission¶

When a node is decommissioned, the remaining nodes inherit its tokens. They do not lose any tokens and therefore do not strictly need cleanup. The Operator triggers cleanup anyway because the token ring changed. The operation is safe but adds temporary I/O load.

Related pages¶

  • Understand — component diagram and reconciliation model.

  • Sidecar — the sidecar that reports node status used for stability checks.

Was this page helpful?

PREVIOUS
Bootstrap synchronisation
NEXT
Sidecar and pod anatomy
  • Create an issue
  • Edit this page

On this page

  • Automatic data cleanup
    • Why cleanup is needed
    • Trigger mechanism
      • What triggers cleanup
    • Cleanup Job details
    • Inspecting cleanup status
    • Known limitations
      • Replication factor changes are not detected
      • Unnecessary cleanup on decommission
    • Related pages
ScyllaDB Operator
Search Ask AI
  • master
    • master
    • v1.21
    • v1.20
    • v1.19
    • v1.18
  • Get Started
    • What Is ScyllaDB Operator?
    • ScyllaDB Concepts on Kubernetes
  • Install Operator
    • Provision infrastructure
      • Set up a GKE cluster for ScyllaDB
      • Set up an EKS cluster for ScyllaDB
      • Set up an OKE cluster for ScyllaDB
      • Set up an OpenShift cluster for ScyllaDB
    • Install with GitOps
    • Install with Helm
    • Install on OpenShift
  • Deploy ScyllaDB
    • Before you deploy
      • Set up dedicated node pools
      • Configure CPU pinning
      • Configure nodes
      • Configure ScyllaDB Operator
    • Deploy your first cluster
    • Reference deployments
      • Reference deployment: GKE
      • Reference deployment: EKS
      • Reference deployment: OKE
      • Reference deployment: OpenShift
    • Install ScyllaDB Manager
    • Set up networking
      • Configure external access
      • IPv6 networking
        • Getting started with IPv6 networking
        • Configure dual-stack networking
        • Configure IPv6-only networking
        • Migrate clusters to IPv6
        • Troubleshoot IPv6 networking issues
        • IPv6 networking concepts
    • Set up monitoring
      • Set up ScyllaDB Monitoring
      • Set up ScyllaDB Monitoring on OpenShift
      • Expose Grafana
    • Production checklist
  • Connect Your App
    • Connect via CQL
    • Alternator (DynamoDB API)
    • Discovery endpoint
  • Understand
    • Storage
    • Tuning
    • ScyllaDB Manager
    • Networking
    • ScyllaDB Monitoring overview
    • Bootstrap synchronisation
    • Automatic data cleanup
    • Sidecar and pod anatomy
    • Ignition
    • Pod disruption budgets
    • Security
    • StatefulSets and racks
  • Operate
    • Scale, add, remove racks
    • Replace nodes
    • Expand storage volumes
    • Use maintenance mode
    • Back up and restore
    • Restore from backup
    • Perform a rolling restart
    • Migrate a rack to a new node pool
    • Pass additional ScyllaDB arguments
    • Configure precomputed IO properties
  • Upgrade
    • Upgrading ScyllaDB Operator
    • Upgrading ScyllaDB clusters
  • Troubleshoot
    • Investigate pod restarts
    • Change log level on a live cluster
    • Recover from a failed node replace
    • Troubleshoot performance
    • Collect debugging information
      • Collect data with must-gather
      • must-gather contents
      • Query system tables for debugging
    • Collect core dumps
  • Reference
    • API Reference
      • scylla.scylladb.com
        • NodeConfig (scylla.scylladb.com/v1alpha1)
        • RemoteKubernetesCluster (scylla.scylladb.com/v1alpha1)
        • RemoteOwner (scylla.scylladb.com/v1alpha1)
        • ScyllaCluster (scylla.scylladb.com/v1)
        • ScyllaDBCluster (scylla.scylladb.com/v1alpha1)
        • ScyllaDBDatacenterNodesStatusReport (scylla.scylladb.com/v1alpha1)
        • ScyllaDBDatacenter (scylla.scylladb.com/v1alpha1)
        • ScyllaDBManagerClusterRegistration (scylla.scylladb.com/v1alpha1)
        • ScyllaDBManagerTask (scylla.scylladb.com/v1alpha1)
        • ScyllaDBMonitoring (scylla.scylladb.com/v1alpha1)
        • ScyllaOperatorConfig (scylla.scylladb.com/v1alpha1)
    • Feature gates
    • IPv6 configuration reference
    • Releases
    • Known issues
    • Conditions reference
    • nodetool alternatives
  • Contributing to ScyllaDB Operator
Docs Tutorials University Contact Us About Us
© 2026, ScyllaDB. All rights reserved. | Terms of Service | Privacy Policy | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Last updated on 22 May 2026.
Powered by Sphinx 9.1.0 & ScyllaDB Theme 1.9.2