Data Loss and Consequences


Yesterday I had a severe data loss. What happened?

I had an idea and did some quick-and-dirty test directly in my “production” environment. A second instance of my Lacrosse scraper started and allocated the RaspyRFM Module. This caused a disconnection of the “production” scraper. I checked, if the “prod” instance is still running, which it did. But no more data was received.

Data loss

I work in an IT affine environment, so I should have known better. I ignored some simple and self-evident rules. So what about consequences? Simply adopt what I would have done in a serious scenario.

  • Never ever invoke critical parts (especially components that do generate or manipulate data) manually. Take care that this is not even possible. Also in test-/development!
  • Assure that those parts never run concurrently or in a way that is not meant to be.
  • Monitor as much as you can. With a proper monitoring this incident would have been detected within minutes, not hours.
  • Alerting!

I wrote “in a serious scenario”. This IS a serious scenario. It is my spare time and I love to spend time for this page and I don’t want to waste spare time with cleaning up self-made mess.

Next steps are

  • Guard all scripts with a mutex to prevent accidentally starting more than one instance.
  • Monitor data age.
  • Alert if age is too old.

LESSON LEARNED!