Resolved -
No further issues found in the last 18 hours. This issue is resolved by rolling back a minor GKE version.
Oct 31, 12:03 UTC
Monitoring -
The issue should be resolved. Continuing investigation and monitoring for further issues
Oct 30, 18:00 UTC
Update -
A hotfix is being patched. Working on a long term fix at the moment.
Oct 30, 16:16 UTC
Identified -
A patch update that automatically rolled in by GKE has affected our custom containerd configurations. This has brought down some critical services in some nodes and has affected some deployments. We have identified a pathway to fix and is currently implementing it
Oct 30, 14:32 UTC
Investigating -
We are investigating a new GKE update to containerd which is affecting new GPUs scaling up. This is applies to GCP deployments when autoscaling is triggered for GPU based services
Oct 30, 12:45 UTC