There are many services running on EKS's nodegroup with the spot
type, and when a large number of pods have been scheduled
to the same couple of spot instances. There is a risk that these instances are retaken by AWS. When this happens the following scenario likely to happen
Pods that reside in the retaken instances don't have enough time to react to instance termination and cause hiccups, latency, or minor downtime until they have been rescheduled to other nodes.
There have been a couple of measurements applied to counter this: