20 Jun, 2020

Data Loss and Consequences

Dirk Fuchs

226 Words

2020-20-06 08:24 +2000

Yesterday I had a severe data loss. What happened?

I had an idea and did some quick-and-dirty test directly in my “production” environment. A second instance of my Lacrosse scraper started and allocated the RaspyRFM Module. This caused a disconnection of the “production” scraper. I checked, if the “prod” instance is still running, which it did. But no more data was received.

Data loss

I work in an IT affine environment, so I should have known better. I ignored some simple and self-evident rules. So what about consequences? Simply adopt what I would have done in a serious scenario.

Never ever invoke critical parts (especially components that do generate or manipulate data) manually. Take care that this is not even possible. Also in test-/development!
Assure that those parts never run concurrently or in a way that is not meant to be.
Monitor as much as you can. With a proper monitoring this incident would have been detected within minutes, not hours.
Alerting!

I wrote “in a serious scenario”. This IS a serious scenario. It is my spare time and I love to spend time for this page and I don’t want to waste spare time with cleaning up self-made mess.

Next steps are

Guard all scripts with a mutex to prevent accidentally starting more than one instance.
Monitor data age.
Alert if age is too old.

LESSON LEARNED!