EKS suddenly failing with disk pressurekubectl cannot authenticate with AWS EKSHorizontalPodAutoscaling on Amazon EKSIn AWS EKS, what security group allows access to master node from kubernetes client?Current status of HorizontalPodAutoscaling on Amazon EKSAWS EKS CNI Creating Duplicate Network Interfaces Causing OutagesContainer runtime/kubelet failures on EKS cluster nodesUnable to list services in AWS EKSWhy is Kubernetes (AWS EKS) registering all workers to the Load Balancer?HTTP/2 for ALB with EKS on AWSECR IAM policy document for EKS node access

What is a Power on Reset IC?

Count Even Digits In Number

Can a person survive on blood in place of water?

Can the product of any two aperiodic functions which are defined on the entire number line be periodic?

What is a fully qualified name?

I know that there is a preselected candidate for a position to be filled at my department. What should I do?

In the 3D Zeldas, is it faster to roll or to simply walk?

Can a British citizen living in France vote in both France and Britain in the European Elections?

Construct a word ladder

Why would Ryanair allow me to book this journey through a third party, but not through their own website?

What does $!# mean in Shell scripting?

Why do Russians almost not use verbs of possession akin to "have"?

Why were helmets and other body armour not commonplace in the 1800s?

Is it true that cut time means "play twice as fast as written"?

Why did Jon Snow do this immoral act if he is so honorable?

How can I tell if I'm being too picky as a referee?

Access to the path 'c:somepath' is denied for MSSQL CLR

Can my floppy disk still work without a shutter spring?

Website returning plaintext password

Value of a binomial series

Is it possible to remotely hack the GPS system and disable GPS service worldwide?

How to cut a climbing rope?

Is the Unsullied name meant to be ironic? How did it come to be?

Count rotary dial pulses in a phone number (including letters)



EKS suddenly failing with disk pressure


kubectl cannot authenticate with AWS EKSHorizontalPodAutoscaling on Amazon EKSIn AWS EKS, what security group allows access to master node from kubernetes client?Current status of HorizontalPodAutoscaling on Amazon EKSAWS EKS CNI Creating Duplicate Network Interfaces Causing OutagesContainer runtime/kubelet failures on EKS cluster nodesUnable to list services in AWS EKSWhy is Kubernetes (AWS EKS) registering all workers to the Load Balancer?HTTP/2 for ALB with EKS on AWSECR IAM policy document for EKS node access






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















We have an EKS cluster with two t3.small nodes with 20Gi of ephemeral storage. The cluster runs only two small Nodejs (node:12-alpine) applications for now.



This worked perfectly for a few weeks, and now suddenly we're getting disk pressure errors.



$ kubectl describe nodes
Name: ip-192-168-101-158.ap-southeast-1.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.small
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=ap-southeast-1
failure-domain.beta.kubernetes.io/zone=ap-southeast-1a
kubernetes.io/hostname=ip-192-168-101-158.ap-southeast-1.compute.internal
Annotations: node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 31 Mar 2019 17:14:58 +0800
Taints: node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Sun, 12 May 2019 12:22:47 +0800 Sun, 31 Mar 2019 17:14:58 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Sun, 12 May 2019 12:22:47 +0800 Sun, 31 Mar 2019 17:14:58 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Sun, 12 May 2019 12:22:47 +0800 Sun, 12 May 2019 06:51:38 +0800 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Sun, 12 May 2019 12:22:47 +0800 Sun, 31 Mar 2019 17:14:58 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sun, 12 May 2019 12:22:47 +0800 Sun, 31 Mar 2019 17:15:31 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.101.158
ExternalIP: 54.169.250.255
InternalDNS: ip-192-168-101-158.ap-southeast-1.compute.internal
ExternalDNS: ec2-54-169-250-255.ap-southeast-1.compute.amazonaws.com
Hostname: ip-192-168-101-158.ap-southeast-1.compute.internal
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 20959212Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 2002320Ki
pods: 11
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 19316009748
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1899920Ki
pods: 11
System Info:
Machine ID: ec2aa2ecfbbbdd798e2da086fc04afb6
System UUID: EC2AA2EC-FBBB-DD79-8E2D-A086FC04AFB6
Boot ID: 62c5eb9d-5f19-4558-8883-2da48ab1969c
Kernel Version: 4.14.106-97.85.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.1
Kubelet Version: v1.12.7
Kube-Proxy Version: v1.12.7
ProviderID: aws:///ap-southeast-1a/i-0a38342b60238d83e
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 0 (0%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ImageGCFailed 5m15s (x333 over 40h) kubelet, ip-192-168-101-158.ap-southeast-1.compute.internal (combined from similar events): failed to garbage collect required amount of images. Wanted to free 1423169945 bytes, but freed 0 bytes
Warning EvictionThresholdMet 17s (x2809 over 3d4h) kubelet, ip-192-168-101-158.ap-southeast-1.compute.internal Attempting to reclaim ephemeral-storage


Name: ip-192-168-197-198.ap-southeast-1.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.small
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=ap-southeast-1
failure-domain.beta.kubernetes.io/zone=ap-southeast-1c
kubernetes.io/hostname=ip-192-168-197-198.ap-southeast-1.compute.internal
Annotations: node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 31 Mar 2019 17:15:02 +0800
Taints: node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Sun, 12 May 2019 12:22:42 +0800 Thu, 09 May 2019 06:50:56 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Sun, 12 May 2019 12:22:42 +0800 Thu, 09 May 2019 06:50:56 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Sun, 12 May 2019 12:22:42 +0800 Sat, 11 May 2019 21:53:44 +0800 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Sun, 12 May 2019 12:22:42 +0800 Sun, 31 Mar 2019 17:15:02 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sun, 12 May 2019 12:22:42 +0800 Thu, 09 May 2019 06:50:56 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.197.198
ExternalIP: 13.229.138.38
InternalDNS: ip-192-168-197-198.ap-southeast-1.compute.internal
ExternalDNS: ec2-13-229-138-38.ap-southeast-1.compute.amazonaws.com
Hostname: ip-192-168-197-198.ap-southeast-1.compute.internal
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 20959212Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 2002320Ki
pods: 11
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 19316009748
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1899920Ki
pods: 11
System Info:
Machine ID: ec27ee0765e86a14ed63d771073e63fb
System UUID: EC27EE07-65E8-6A14-ED63-D771073E63FB
Boot ID: 7869a0ee-dc2f-4082-ae3f-42c5231ab0e3
Kernel Version: 4.14.106-97.85.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.1
Kubelet Version: v1.12.7
Kube-Proxy Version: v1.12.7
ProviderID: aws:///ap-southeast-1c/i-0bd4038f4dade284e
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 0 (0%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning EvictionThresholdMet 5m40s (x4865 over 3d5h) kubelet, ip-192-168-197-198.ap-southeast-1.compute.internal Attempting to reclaim ephemeral-storage
Warning ImageGCFailed 31s (x451 over 45h) kubelet, ip-192-168-197-198.ap-southeast-1.compute.internal (combined from similar events): failed to garbage collect required amount of images. Wanted to free 4006422937 bytes, but freed 0 bytes


I'm not entirely sure how to debug this issue, but it feels like K8s is not able to delete old unused Docker images on the nodes. Anyway to verify this assumption? Any other thoughts?










share|improve this question



















  • 1





    I can't help with your question sorry, but I'm curious how you got ephemeral storage on t3.small instances? AWS says they're EBS only.

    – Tim
    May 12 at 6:15











  • Have you tried using SSH to get on the worker nodes and drilling in to find what's taking up the disk space?

    – Belmin Fernandez
    May 12 at 16:15











  • @Tim I'm actually not sure... I might have gotten the terminology mixed up. I have created standard instances using the CloudFormation template for EKS worker nodes.

    – chrisvdb
    May 13 at 4:47












  • @BelminFernandez good suggestion... will do that if the issue reoccurs. In the meantime I have terminated the nodes and recreated them using CloudFormation.

    – chrisvdb
    May 13 at 4:48

















1















We have an EKS cluster with two t3.small nodes with 20Gi of ephemeral storage. The cluster runs only two small Nodejs (node:12-alpine) applications for now.



This worked perfectly for a few weeks, and now suddenly we're getting disk pressure errors.



$ kubectl describe nodes
Name: ip-192-168-101-158.ap-southeast-1.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.small
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=ap-southeast-1
failure-domain.beta.kubernetes.io/zone=ap-southeast-1a
kubernetes.io/hostname=ip-192-168-101-158.ap-southeast-1.compute.internal
Annotations: node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 31 Mar 2019 17:14:58 +0800
Taints: node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Sun, 12 May 2019 12:22:47 +0800 Sun, 31 Mar 2019 17:14:58 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Sun, 12 May 2019 12:22:47 +0800 Sun, 31 Mar 2019 17:14:58 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Sun, 12 May 2019 12:22:47 +0800 Sun, 12 May 2019 06:51:38 +0800 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Sun, 12 May 2019 12:22:47 +0800 Sun, 31 Mar 2019 17:14:58 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sun, 12 May 2019 12:22:47 +0800 Sun, 31 Mar 2019 17:15:31 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.101.158
ExternalIP: 54.169.250.255
InternalDNS: ip-192-168-101-158.ap-southeast-1.compute.internal
ExternalDNS: ec2-54-169-250-255.ap-southeast-1.compute.amazonaws.com
Hostname: ip-192-168-101-158.ap-southeast-1.compute.internal
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 20959212Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 2002320Ki
pods: 11
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 19316009748
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1899920Ki
pods: 11
System Info:
Machine ID: ec2aa2ecfbbbdd798e2da086fc04afb6
System UUID: EC2AA2EC-FBBB-DD79-8E2D-A086FC04AFB6
Boot ID: 62c5eb9d-5f19-4558-8883-2da48ab1969c
Kernel Version: 4.14.106-97.85.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.1
Kubelet Version: v1.12.7
Kube-Proxy Version: v1.12.7
ProviderID: aws:///ap-southeast-1a/i-0a38342b60238d83e
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 0 (0%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ImageGCFailed 5m15s (x333 over 40h) kubelet, ip-192-168-101-158.ap-southeast-1.compute.internal (combined from similar events): failed to garbage collect required amount of images. Wanted to free 1423169945 bytes, but freed 0 bytes
Warning EvictionThresholdMet 17s (x2809 over 3d4h) kubelet, ip-192-168-101-158.ap-southeast-1.compute.internal Attempting to reclaim ephemeral-storage


Name: ip-192-168-197-198.ap-southeast-1.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.small
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=ap-southeast-1
failure-domain.beta.kubernetes.io/zone=ap-southeast-1c
kubernetes.io/hostname=ip-192-168-197-198.ap-southeast-1.compute.internal
Annotations: node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 31 Mar 2019 17:15:02 +0800
Taints: node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Sun, 12 May 2019 12:22:42 +0800 Thu, 09 May 2019 06:50:56 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Sun, 12 May 2019 12:22:42 +0800 Thu, 09 May 2019 06:50:56 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Sun, 12 May 2019 12:22:42 +0800 Sat, 11 May 2019 21:53:44 +0800 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Sun, 12 May 2019 12:22:42 +0800 Sun, 31 Mar 2019 17:15:02 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sun, 12 May 2019 12:22:42 +0800 Thu, 09 May 2019 06:50:56 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.197.198
ExternalIP: 13.229.138.38
InternalDNS: ip-192-168-197-198.ap-southeast-1.compute.internal
ExternalDNS: ec2-13-229-138-38.ap-southeast-1.compute.amazonaws.com
Hostname: ip-192-168-197-198.ap-southeast-1.compute.internal
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 20959212Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 2002320Ki
pods: 11
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 19316009748
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1899920Ki
pods: 11
System Info:
Machine ID: ec27ee0765e86a14ed63d771073e63fb
System UUID: EC27EE07-65E8-6A14-ED63-D771073E63FB
Boot ID: 7869a0ee-dc2f-4082-ae3f-42c5231ab0e3
Kernel Version: 4.14.106-97.85.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.1
Kubelet Version: v1.12.7
Kube-Proxy Version: v1.12.7
ProviderID: aws:///ap-southeast-1c/i-0bd4038f4dade284e
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 0 (0%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning EvictionThresholdMet 5m40s (x4865 over 3d5h) kubelet, ip-192-168-197-198.ap-southeast-1.compute.internal Attempting to reclaim ephemeral-storage
Warning ImageGCFailed 31s (x451 over 45h) kubelet, ip-192-168-197-198.ap-southeast-1.compute.internal (combined from similar events): failed to garbage collect required amount of images. Wanted to free 4006422937 bytes, but freed 0 bytes


I'm not entirely sure how to debug this issue, but it feels like K8s is not able to delete old unused Docker images on the nodes. Anyway to verify this assumption? Any other thoughts?










share|improve this question



















  • 1





    I can't help with your question sorry, but I'm curious how you got ephemeral storage on t3.small instances? AWS says they're EBS only.

    – Tim
    May 12 at 6:15











  • Have you tried using SSH to get on the worker nodes and drilling in to find what's taking up the disk space?

    – Belmin Fernandez
    May 12 at 16:15











  • @Tim I'm actually not sure... I might have gotten the terminology mixed up. I have created standard instances using the CloudFormation template for EKS worker nodes.

    – chrisvdb
    May 13 at 4:47












  • @BelminFernandez good suggestion... will do that if the issue reoccurs. In the meantime I have terminated the nodes and recreated them using CloudFormation.

    – chrisvdb
    May 13 at 4:48













1












1








1








We have an EKS cluster with two t3.small nodes with 20Gi of ephemeral storage. The cluster runs only two small Nodejs (node:12-alpine) applications for now.



This worked perfectly for a few weeks, and now suddenly we're getting disk pressure errors.



$ kubectl describe nodes
Name: ip-192-168-101-158.ap-southeast-1.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.small
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=ap-southeast-1
failure-domain.beta.kubernetes.io/zone=ap-southeast-1a
kubernetes.io/hostname=ip-192-168-101-158.ap-southeast-1.compute.internal
Annotations: node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 31 Mar 2019 17:14:58 +0800
Taints: node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Sun, 12 May 2019 12:22:47 +0800 Sun, 31 Mar 2019 17:14:58 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Sun, 12 May 2019 12:22:47 +0800 Sun, 31 Mar 2019 17:14:58 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Sun, 12 May 2019 12:22:47 +0800 Sun, 12 May 2019 06:51:38 +0800 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Sun, 12 May 2019 12:22:47 +0800 Sun, 31 Mar 2019 17:14:58 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sun, 12 May 2019 12:22:47 +0800 Sun, 31 Mar 2019 17:15:31 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.101.158
ExternalIP: 54.169.250.255
InternalDNS: ip-192-168-101-158.ap-southeast-1.compute.internal
ExternalDNS: ec2-54-169-250-255.ap-southeast-1.compute.amazonaws.com
Hostname: ip-192-168-101-158.ap-southeast-1.compute.internal
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 20959212Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 2002320Ki
pods: 11
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 19316009748
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1899920Ki
pods: 11
System Info:
Machine ID: ec2aa2ecfbbbdd798e2da086fc04afb6
System UUID: EC2AA2EC-FBBB-DD79-8E2D-A086FC04AFB6
Boot ID: 62c5eb9d-5f19-4558-8883-2da48ab1969c
Kernel Version: 4.14.106-97.85.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.1
Kubelet Version: v1.12.7
Kube-Proxy Version: v1.12.7
ProviderID: aws:///ap-southeast-1a/i-0a38342b60238d83e
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 0 (0%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ImageGCFailed 5m15s (x333 over 40h) kubelet, ip-192-168-101-158.ap-southeast-1.compute.internal (combined from similar events): failed to garbage collect required amount of images. Wanted to free 1423169945 bytes, but freed 0 bytes
Warning EvictionThresholdMet 17s (x2809 over 3d4h) kubelet, ip-192-168-101-158.ap-southeast-1.compute.internal Attempting to reclaim ephemeral-storage


Name: ip-192-168-197-198.ap-southeast-1.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.small
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=ap-southeast-1
failure-domain.beta.kubernetes.io/zone=ap-southeast-1c
kubernetes.io/hostname=ip-192-168-197-198.ap-southeast-1.compute.internal
Annotations: node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 31 Mar 2019 17:15:02 +0800
Taints: node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Sun, 12 May 2019 12:22:42 +0800 Thu, 09 May 2019 06:50:56 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Sun, 12 May 2019 12:22:42 +0800 Thu, 09 May 2019 06:50:56 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Sun, 12 May 2019 12:22:42 +0800 Sat, 11 May 2019 21:53:44 +0800 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Sun, 12 May 2019 12:22:42 +0800 Sun, 31 Mar 2019 17:15:02 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sun, 12 May 2019 12:22:42 +0800 Thu, 09 May 2019 06:50:56 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.197.198
ExternalIP: 13.229.138.38
InternalDNS: ip-192-168-197-198.ap-southeast-1.compute.internal
ExternalDNS: ec2-13-229-138-38.ap-southeast-1.compute.amazonaws.com
Hostname: ip-192-168-197-198.ap-southeast-1.compute.internal
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 20959212Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 2002320Ki
pods: 11
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 19316009748
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1899920Ki
pods: 11
System Info:
Machine ID: ec27ee0765e86a14ed63d771073e63fb
System UUID: EC27EE07-65E8-6A14-ED63-D771073E63FB
Boot ID: 7869a0ee-dc2f-4082-ae3f-42c5231ab0e3
Kernel Version: 4.14.106-97.85.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.1
Kubelet Version: v1.12.7
Kube-Proxy Version: v1.12.7
ProviderID: aws:///ap-southeast-1c/i-0bd4038f4dade284e
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 0 (0%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning EvictionThresholdMet 5m40s (x4865 over 3d5h) kubelet, ip-192-168-197-198.ap-southeast-1.compute.internal Attempting to reclaim ephemeral-storage
Warning ImageGCFailed 31s (x451 over 45h) kubelet, ip-192-168-197-198.ap-southeast-1.compute.internal (combined from similar events): failed to garbage collect required amount of images. Wanted to free 4006422937 bytes, but freed 0 bytes


I'm not entirely sure how to debug this issue, but it feels like K8s is not able to delete old unused Docker images on the nodes. Anyway to verify this assumption? Any other thoughts?










share|improve this question
















We have an EKS cluster with two t3.small nodes with 20Gi of ephemeral storage. The cluster runs only two small Nodejs (node:12-alpine) applications for now.



This worked perfectly for a few weeks, and now suddenly we're getting disk pressure errors.



$ kubectl describe nodes
Name: ip-192-168-101-158.ap-southeast-1.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.small
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=ap-southeast-1
failure-domain.beta.kubernetes.io/zone=ap-southeast-1a
kubernetes.io/hostname=ip-192-168-101-158.ap-southeast-1.compute.internal
Annotations: node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 31 Mar 2019 17:14:58 +0800
Taints: node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Sun, 12 May 2019 12:22:47 +0800 Sun, 31 Mar 2019 17:14:58 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Sun, 12 May 2019 12:22:47 +0800 Sun, 31 Mar 2019 17:14:58 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Sun, 12 May 2019 12:22:47 +0800 Sun, 12 May 2019 06:51:38 +0800 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Sun, 12 May 2019 12:22:47 +0800 Sun, 31 Mar 2019 17:14:58 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sun, 12 May 2019 12:22:47 +0800 Sun, 31 Mar 2019 17:15:31 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.101.158
ExternalIP: 54.169.250.255
InternalDNS: ip-192-168-101-158.ap-southeast-1.compute.internal
ExternalDNS: ec2-54-169-250-255.ap-southeast-1.compute.amazonaws.com
Hostname: ip-192-168-101-158.ap-southeast-1.compute.internal
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 20959212Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 2002320Ki
pods: 11
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 19316009748
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1899920Ki
pods: 11
System Info:
Machine ID: ec2aa2ecfbbbdd798e2da086fc04afb6
System UUID: EC2AA2EC-FBBB-DD79-8E2D-A086FC04AFB6
Boot ID: 62c5eb9d-5f19-4558-8883-2da48ab1969c
Kernel Version: 4.14.106-97.85.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.1
Kubelet Version: v1.12.7
Kube-Proxy Version: v1.12.7
ProviderID: aws:///ap-southeast-1a/i-0a38342b60238d83e
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 0 (0%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ImageGCFailed 5m15s (x333 over 40h) kubelet, ip-192-168-101-158.ap-southeast-1.compute.internal (combined from similar events): failed to garbage collect required amount of images. Wanted to free 1423169945 bytes, but freed 0 bytes
Warning EvictionThresholdMet 17s (x2809 over 3d4h) kubelet, ip-192-168-101-158.ap-southeast-1.compute.internal Attempting to reclaim ephemeral-storage


Name: ip-192-168-197-198.ap-southeast-1.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.small
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=ap-southeast-1
failure-domain.beta.kubernetes.io/zone=ap-southeast-1c
kubernetes.io/hostname=ip-192-168-197-198.ap-southeast-1.compute.internal
Annotations: node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 31 Mar 2019 17:15:02 +0800
Taints: node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Sun, 12 May 2019 12:22:42 +0800 Thu, 09 May 2019 06:50:56 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Sun, 12 May 2019 12:22:42 +0800 Thu, 09 May 2019 06:50:56 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure True Sun, 12 May 2019 12:22:42 +0800 Sat, 11 May 2019 21:53:44 +0800 KubeletHasDiskPressure kubelet has disk pressure
PIDPressure False Sun, 12 May 2019 12:22:42 +0800 Sun, 31 Mar 2019 17:15:02 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sun, 12 May 2019 12:22:42 +0800 Thu, 09 May 2019 06:50:56 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.197.198
ExternalIP: 13.229.138.38
InternalDNS: ip-192-168-197-198.ap-southeast-1.compute.internal
ExternalDNS: ec2-13-229-138-38.ap-southeast-1.compute.amazonaws.com
Hostname: ip-192-168-197-198.ap-southeast-1.compute.internal
Capacity:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 20959212Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 2002320Ki
pods: 11
Allocatable:
attachable-volumes-aws-ebs: 25
cpu: 2
ephemeral-storage: 19316009748
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1899920Ki
pods: 11
System Info:
Machine ID: ec27ee0765e86a14ed63d771073e63fb
System UUID: EC27EE07-65E8-6A14-ED63-D771073E63FB
Boot ID: 7869a0ee-dc2f-4082-ae3f-42c5231ab0e3
Kernel Version: 4.14.106-97.85.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.6.1
Kubelet Version: v1.12.7
Kube-Proxy Version: v1.12.7
ProviderID: aws:///ap-southeast-1c/i-0bd4038f4dade284e
Non-terminated Pods: (0 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 0 (0%) 0 (0%)
memory 0 (0%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning EvictionThresholdMet 5m40s (x4865 over 3d5h) kubelet, ip-192-168-197-198.ap-southeast-1.compute.internal Attempting to reclaim ephemeral-storage
Warning ImageGCFailed 31s (x451 over 45h) kubelet, ip-192-168-197-198.ap-southeast-1.compute.internal (combined from similar events): failed to garbage collect required amount of images. Wanted to free 4006422937 bytes, but freed 0 bytes


I'm not entirely sure how to debug this issue, but it feels like K8s is not able to delete old unused Docker images on the nodes. Anyway to verify this assumption? Any other thoughts?







amazon-web-services kubernetes amazon-eks






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited May 20 at 10:29







chrisvdb

















asked May 12 at 4:31









chrisvdbchrisvdb

3291410




3291410







  • 1





    I can't help with your question sorry, but I'm curious how you got ephemeral storage on t3.small instances? AWS says they're EBS only.

    – Tim
    May 12 at 6:15











  • Have you tried using SSH to get on the worker nodes and drilling in to find what's taking up the disk space?

    – Belmin Fernandez
    May 12 at 16:15











  • @Tim I'm actually not sure... I might have gotten the terminology mixed up. I have created standard instances using the CloudFormation template for EKS worker nodes.

    – chrisvdb
    May 13 at 4:47












  • @BelminFernandez good suggestion... will do that if the issue reoccurs. In the meantime I have terminated the nodes and recreated them using CloudFormation.

    – chrisvdb
    May 13 at 4:48












  • 1





    I can't help with your question sorry, but I'm curious how you got ephemeral storage on t3.small instances? AWS says they're EBS only.

    – Tim
    May 12 at 6:15











  • Have you tried using SSH to get on the worker nodes and drilling in to find what's taking up the disk space?

    – Belmin Fernandez
    May 12 at 16:15











  • @Tim I'm actually not sure... I might have gotten the terminology mixed up. I have created standard instances using the CloudFormation template for EKS worker nodes.

    – chrisvdb
    May 13 at 4:47












  • @BelminFernandez good suggestion... will do that if the issue reoccurs. In the meantime I have terminated the nodes and recreated them using CloudFormation.

    – chrisvdb
    May 13 at 4:48







1




1





I can't help with your question sorry, but I'm curious how you got ephemeral storage on t3.small instances? AWS says they're EBS only.

– Tim
May 12 at 6:15





I can't help with your question sorry, but I'm curious how you got ephemeral storage on t3.small instances? AWS says they're EBS only.

– Tim
May 12 at 6:15













Have you tried using SSH to get on the worker nodes and drilling in to find what's taking up the disk space?

– Belmin Fernandez
May 12 at 16:15





Have you tried using SSH to get on the worker nodes and drilling in to find what's taking up the disk space?

– Belmin Fernandez
May 12 at 16:15













@Tim I'm actually not sure... I might have gotten the terminology mixed up. I have created standard instances using the CloudFormation template for EKS worker nodes.

– chrisvdb
May 13 at 4:47






@Tim I'm actually not sure... I might have gotten the terminology mixed up. I have created standard instances using the CloudFormation template for EKS worker nodes.

– chrisvdb
May 13 at 4:47














@BelminFernandez good suggestion... will do that if the issue reoccurs. In the meantime I have terminated the nodes and recreated them using CloudFormation.

– chrisvdb
May 13 at 4:48





@BelminFernandez good suggestion... will do that if the issue reoccurs. In the meantime I have terminated the nodes and recreated them using CloudFormation.

– chrisvdb
May 13 at 4:48










0






active

oldest

votes












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "2"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f966890%2feks-suddenly-failing-with-disk-pressure%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Server Fault!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f966890%2feks-suddenly-failing-with-disk-pressure%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company