Data Loss and Consequences
Yesterday I had a severe data loss. What happened?
I had an idea and did some quick-and-dirty test directly in my “production” environment. A second instance of my Lacrosse scraper started and allocated the RaspyRFM Module. This caused a disconnection of the “production” scraper. I checked, if the “prod” instance is still running, which it did. But no more data was received.
I work in an IT affine environment, so I should have known better. I ignored some simple and self-evident rules. So what about consequences? Simply adopt what I would have done in a serious scenario.
- Never ever invoke critical parts (especially components that do generate or manipulate data) manually. Take care that this is not even possible. Also in test-/development!
- Assure that those parts never run concurrently or in a way that is not meant to be.
- Monitor as much as you can. With a proper monitoring this incident would have been detected within minutes, not hours.
- Alerting!
I wrote “in a serious scenario”. This IS a serious scenario. It is my spare time and I love to spend time for this page and I don’t want to waste spare time with cleaning up self-made mess.
Next steps are
- Guard all scripts with a mutex to prevent accidentally starting more than one instance.
- Monitor data age.
- Alert if age is too old.
LESSON LEARNED!