Summary
During the night of Tuesday May 10th to Wednesday May 11th 2022, we deployed a new logging tool on the infrastructure. This tool, which passed our staging and pre-production validations, turned out to be misconfigured for the production volume. The developers collected as much data as possible on the cause of the problem and the configuration needed to solve it.
Cause
Implementation of a tool
A new logging tool allowing us to track errors, failed actions, and unexpected problems on the platform was deployed. This tool analyzes everything that happens on the portal and digests an extremely large amount of data. This data transit has been tested on our two test environments and has passed validation. In practice, the production was finally slowed down and even some intermittent service interruptions.
Solution
Rollback
We did a rollback of the production to cancel the production release of the tool. This solved the problem immediately.
Corrective mesures
The developers are correcting the configuration of the tool and its integration on the infrastructure so that it no longer encroaches on the data traffic.
Conclusion
The Dokeos team sincerely apologizes for the inconvenience caused on Wednesday, May 11, 2022. The corrective actions described above are already being implemented as you read this.