Controller Statistics controller
2020-07-18 09:30:00 UTC
Investigating: we’re investigating on the issue.
2020-07-18 09:45:00 UTC
Confirmed: the issue was confirmed in a misconfiguration. ETA in few hours.
2020-07-18 10:00:00
Resolved: all systems are back to normal.
Around 2020-07-17 23:02:00 UTC
the SeismoCloud controller pod was experiencing a crash. This is normally handled by Kubernetes and the pod is restarted.
However, due a misconfiguration, the container image update policy was set to Always
, meaning that the node should always try to pull the latest container image from the registry. Normally, this is not an issue, however in the day of 2020-07-16
the previous provider (where some non-critical services are still hosted) experienced a power issue, and it’s not reachable (the issue is ongoing at the time of the report). So Kubernetes was not able to restart the container.
Around 2020-07-18 09:45:00 UTC
we identified the issue, and we changed the policy to ifNotAvailable
, which means that Kubernetes should try to pull the container image if not present locally. The pod was rescheduled shortly and the system was again up and running in few minutes.
The current issue was due the misconfiguration of the imagePullPolicy
field in the container specs, which is normally left to the value of ifNotAvailable
. Probably this misconfiguration was left from the development environment.
We scheduled a re-check of all configurations about containers to look for similar issues. The registry service is not meant to be online with the same SLA, so this service will be migrated according the previous roadmap.