Integrating Slack alerts and Apache Superset for better data observability

Slack notification for Superset Alerts.

Data Engineering at Affinity Answers handles data from close to 10 heterogeneous data sources processing upwards of 2 billion records per day. An example of a Data Pipeline at Affinity Answers:

The challenge about the data we process is that we had an inbuilt mechanism to absorb inconsistency in data; after all, it is said that ‘inconsistency is the consistency’ — and sometimes we are even suspicious when there is no inconsistency. We had variations in data (qualitatively and quantitatively) in a sudden and gradual manner; sudden was a panic attack, and gradual was a slow poison. See the data-change reaction matrix here:

For aggregate data visualization, we used Apache Superset, but there were times when we missed catching the data variations and noticed them after some damage had been done. Usually, data damage cannot be rolled back like a software rollback as it would have percolated too many insights and it can only be managed. We had these approaches to fix this and “catch” the issue instant it cropped up:

  • In-house custom alert mechanisms; after a while it became untenable
  • Tools like GreatExpectations, which solved somethings
  • Slack integration with Superset for alert

Apache Superset is where we had all metrics tracked and visualized, so that is when we discovered that we can have Superset raise data alerts and even connect with Slack to message the alert. This is how we integrated the two useful tools. We can configure automated alerts and reports on Superset and it will alert us when a SQL condition is reached.

Here is one scenario: We process data from one of our partners which continues to grow. A sudden spike in the size of data will occupy huge storage and may lead to low disk space for the currently running process, failure of that process, and re-running the same process may increase in storage/processing cost and so on. Superset data alerts can help us with this. Using these alerts, we can send notifications & reports as dashboards or charts to an e-mail or Slack channel when the condition meets the threshold limit. This enables us to take appropriate timely action when data reaches some critical threshold.

How to set the alert on Superset?

It’s just a 2-step process.

Step 1: Click on Settings and choose Alerts & Reports from the menu

Step 2: Set an alert by providing the below details.

  • Alert Name — Relevant name to describe the alert
  • Alert Condition with threshold — Required condition for the alert
  • Alert schedule — Set time to alert
  • Dashboard for Chart which already exists in Superset and is required for the condition
  • Notification Method — Email, Slack or both
Superset Add Alert window

Now we are good to go! We can receive the alert via Slack channel & Email as below.

Slack alert with the chart
Email alert with the chart

Conclusion

Observability is a very important factor to ensure that we handle data variations. An integrated alert mechanism such as Apache Superset that serves as data visualisation as well as data alerting system is simple to maintain, reduces the unnecessary sub-systems for observation and helps maintain the data quality.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store