API MQTT broker Controller Statistics controller My dashboard Database Database replica 2 Legacy HTTP backend
2020-07-01 16:41:07 UTC
Investigating: we’re investigating on the issue that make all services unreachable
2020-07-01 17:43:20 UTC
Confirmed: the issue is in the server colocation. The ETA for the resolution is 2020-07-02 07:00:00 UTC
2020-07-02 08:00:00 UTC
Update: unfortunately the damage is deeper than we previously thought. We’re going on-site to manually export all data and move them to another provider. New ETA is 2020-07-02 13:00:00 UTC
2020-07-02 15:00:00 UTC
Resolved: all systems are back to normal. Further maintenance is required and will be scheduled. Future improvements will be published in few hours.
Around 2020-07-01 14:00:00 UTC
the power source failed again, and the UPS was able to maintain all systems running for a while. However the main power wasn’t restored properly, and in few hours the whole system was down again. Unfortunately there were no one able to assist us until the day after.
Around 2020-07-02 08:00:00 UTC
we made contact with the security, however they were not able to restore the power despite multiple tentatives. At 2020-07-02 13:30:00 UTC
one of our staff was able to enter the building after necessary permissions (for COVID19). At 2020-07-02 15:00:00 UTC
all services were back to normal again.
Clearly we can’t rely on this facility anymore. All of our services but the database were prepared to be moved fast to another server, or deployed in clusters. In fact, the only thing that prevented us to spin up new services in another facility is the lack of datas.
We decided to have a geographical read-only cluster, with a documented and scripted procedure for manual takeover (eg. for activating hot-standby databases), then make a plan for a long-term solution.