Was this page helpful?
Caution
You're viewing documentation for an unstable version of ScyllaDB Operator. Switch to the latest stable version.
Investigate pod restarts¶
Determine why a ScyllaDB pod or container restarted and collect the evidence needed for diagnosis or a support ticket.
Identify that a restart occurred¶
Check the restart count:
kubectl -n scylla get pods -l scylla-operator.scylladb.com/pod-type=scylladb-node
A non-zero RESTARTS column indicates that one or more containers in the pod have restarted.
You can also compare the container start time against the pod creation time. If the container started significantly later than the pod was created, the container has restarted:
kubectl -n scylla get pod <pod-name> -o jsonpath='Pod created: {.metadata.creationTimestamp}{"\n"}Container started: {.status.containerStatuses[?(@.name=="scylla")].state.running.startedAt}{"\n"}'
Determine the restart reason¶
Container status¶
kubectl -n scylla get pod <pod-name> -o jsonpath='{.status.containerStatuses}' | jq .
Key fields:
Field |
Description |
|---|---|
|
Total number of restarts for this container |
|
Why the container stopped ( |
|
Process exit code ( |
|
Timestamp of the last termination |
Pod events¶
kubectl -n scylla describe pod <pod-name>
Look for these events in the Events section:
Event |
Meaning |
|---|---|
|
Container was killed (by kubelet or OOM killer) |
|
Container is in |
|
Container exceeded its memory limit |
|
Liveness probe failed — kubelet killed the container |
|
Pod cannot be placed on any node |
Distinguish restart causes¶
OOMKilled¶
Indicators:
lastState.terminated.reason: OOMKilledlastState.terminated.exitCode: 137
Common causes:
Memory limit too low for the workload.
ScyllaDB memory allocation exceeds the container limit.
Resolution:
Increase the memory limit in the ScyllaCluster spec.
Review ScyllaDB memory usage via monitoring dashboards.
Liveness probe failure¶
Indicators:
Event:
UnhealthywithLiveness probe failedContainer restarted without
OOMKilledreason.
Common causes:
ScyllaDB unresponsive due to long GC pauses or compaction stalls.
Node overloaded — too many concurrent operations.
Resolution:
Check ScyllaDB logs for compaction or GC warnings.
Review resource allocation (CPU, memory).
Check for large partition warnings in logs.
CrashLoopBackOff¶
Indicators:
Pod status:
CrashLoopBackOffEvent:
BackOff
Common causes:
ScyllaDB fails to start — corrupt SSTables, invalid configuration, wrong seeds.
Disk permission issues.
Missing or invalid
io_properties.yaml.
Resolution:
Check previous container logs:
kubectl -n scylla logs <pod-name> -c scylla --previousVerify configuration with
kubectl -n scylla describe scyllacluster <cluster-name>
Node eviction¶
Indicators:
Pod event:
EvictedNode conditions show
MemoryPressureorDiskPressure.
Cause: The Kubernetes node is under resource pressure and the kubelet evicted the pod.
Resolution:
Check node conditions:
kubectl describe node <node-name>Ensure dedicated node pools with appropriate taints prevent co-scheduling with other workloads.
Collect evidence¶
When filing a support ticket or investigating further, collect a must-gather archive. It includes previous container logs, full pod status, and events needed to diagnose restarts.
See Collect debugging information for instructions.