Outage Overview on April 2rd549
Pages:
1MikeMills private msg quote post Address this user | ||
On April 23rd 2024 GroveStreams experienced a major outage lasting 37 hours. What happened? We are in the process of updating our server infrastructure. Last week we started increasing bandwidth between servers by utilizing dual network cards combined with a faster network. The distributed store software had an bug in it that exposed itself after several days of uptime with the dual NICs. This bug caused about 1/4 of all stored data to be taken offline. The whole system had to be taken offline or there was a risk that new data flowing in could corrupt existing data. The 3rd party store software we use has a lot of resiliency, but none of its tools could get the offline data back online. We needed to get their code, edit it, and test the solution on our test cluster. This took time, but we needed to ensure no data would be compromised. A lot of our customers use GroveStreams for critical real-time operations. We know that and we apologize for the disruption and will improve our processes. That being said, we are moving forward with a lot of upgrades over the next two months which also includes features that have been worked on over the last two years. Thank you for you patience. |
||
Post 1 IP flag post |
Pages:
1