DataFlint  
Mitigate your big data performance bottlenecks

Installed within minutes via open source library, working on top of the existing infrastructure, all in order to help you solve big data performance issues and debug failures!

Your Apache Spark dev experience can be completely changed within minutes

DataFlint makes Apache Spark human-readable, with real-time query updates, alerts on potential performance issues and suggestions for fixes

Install with 2 lines of code

A DataFlint tab will be added to Spark UI, clicking on it will open a real-time web app

  • Supports installation with scala, pyspark and no-code
  • Supports installation on Spark History Server
  • Supports DataBricks
from pyspark.sql import SparkSession

builder = pyspark.sql.SparkSession.builder
    ...
    .config("spark.jars.packages", "io.dataflint:spark_2.12:0.2.3") \
    .config("spark.plugins", "io.dataflint.spark.SparkDataflintPlugin") \
    ...

Access powerful features with DataFlint SaaS (in close beta)

DataFlint Open Source helps you visibility for a single spark job, but with DataFlint SaaS you could have full observability for all of your jobs in all your spark applications, in one place!

Monitor your jobs

See all your application job in one place. Manage versions, see alerts

Get Alerted

Alerts on performance issues and query failures

Resources Managment

Tune your resource usage

Control all your applications

Manage all your spark applications in one place