Transforming the Data Ecosystem
The essence of our mission at Revefi is best expressed by the experiences of our customers with the Revefi Data Operations Cloud.
"With Revefi, our data team now sleeps peacefully! We’ve had zero escalations from business users in the past six months. During the time when our data adoption surged by 35%, insights from Revefi helped us reduce our Snowflake spend by30%, saving close to six figures in USD. The best part, this year we won’t exhaust our annual credits in just 10 months."
This validation is magical and embodies why we created Revefi in the first place. Designed from scratch, Revefi stands as a trusted ally for data teams, empowering them to transition their companies mindfully from merely using data for convenience to becoming genuinely data-driven entities.
Why an all-in-one data solution is critical to business success
Picture this: You’re working in a company with advanced technology and services at your disposal, yet when it comes to data, the very lifeblood of modern organizations, you find yourself waiting for days just to get your hands on it. And even when you finally receive it, doubts linger about its reliability, thanks to potential issues that may have seeped in during the waiting period. At the same time, you're confronting the concern that your cloud infrastructure spending is spiraling out of control.
But why do we find ourselves in this situation? Shouldn’t data, this vital lifeline, be accessible around the clock? Moreover, shouldn’t it serve as a trustworthy foundation, akin to an insurance policy?
But trust must be earned. If businesses consistently encounter data quality issues, it's unrealistic to expect them to prioritize data over their own instincts. This delayed issue detection contrasts sharply with how we handle software releases, for instance.
This stark disparity in treatment between data, the cornerstone of many businesses today, and well-established processes like Software Development Life Cycle (SDLC) gnawed at us relentlessly.
Driven by this frustration, we set out to create something more holistic than just a data observability platform. We imagined a solution that instills confidence in data teams to trust their data without reservations. Ensuring consistent data quality and performance, while continuously monitoring spend, our aim is to provide hardworking data teams the peace of mind that their challenges are being addressed proactively.
This is our story and the start of Revefi. It began out of necessity. We came together to forge a common vision for data excellence.
How It All Started
Back in 2018, when Shashank was part of the data infrastructure team at Meta— with multiple exabytes of data, over 1M tables, and ~100K data pipeline/transformation jobs run daily— data quality and consistency were the constant Achilles Heel for all data teams. They spent about 40% of their time detecting, root-causing, and fixing potential data issues. Some of these issues, classified as severity 1 problems in the data infrastructure team, required a large investment of time and resources to recover from. Often, a full recovery remained a distant and unattainable goal.
Meanwhile, at ThoughtSpot, Sanjay was pioneering a system capable of executing ad hoc SQL queries over billions of rows in a matter of seconds. The technical intricacies of scaling this system are substantial, but one critical question loomed large— when should alerts be triggered? The proliferation of alerts without a clear prioritization had started to overwhelm teams, and the resulting looming cloud costs were a growing concern.
It was at this juncture that both of us realized a common thread running through our experiences— data teams struggled to consistently provide businesses with reliable data. Simultaneously, enterprises everywhere were grappling with the challenge of managing soaring cloud data costs.
From this initial hypothesis, we talked to hundreds of data practitioners and observed the same recurring issues:
● Data teams were caught in a continuous cycle of issue-firefighting
● They faced relentless pressure to deliver results
● Business stakeholders were left feeling frustrated and underserved
● Heads of data were under constant stress
A data leader from a tech company of about 6,000 employees described to us how they had a mere five-person team responsible for maintaining a library of 20,000 manual data quality checks. While this might seem like an outlier, we heard enough similar stories that propelled us to address the mounting chaos faced by data teams. We wanted to build not just any solution but a transformative one that could play the role of a supportive copilot in the data journey of an organization and provide a 100x improvement.
Automation: the Bridge Between Data and Decisions
In today’s world, both high-quality data and spending are top of mind for every organization, whether it's for analytics or exploring Generative AI. It seems safe to assume that in this business landscape, data usage would reign supreme. Ideally, every business decision should be underpinned by data-driven insights. Yet, the stark reality today is far removed from this ideal scenario.
It comes back to trust and the cost of this trust. Data teams continue to tackle the challenge of delivering consistent, reliable data to aid critical business decisions. Simultaneously, enterprises, both large and small, find themselves in a battle to control cloud costs, a task made all the more challenging by the unpredictability of expenses that can spiral out of control, particularly when no one is closely monitoring them—think weekends and holidays.
Going back to the Meta days, the data engineering teams operated in a challenging environment, and Shashank and his team found themselves with repeated requests for better tools and infrastructure for enhancing data quality from the data engineering team. In response to these requests, they developed a system capable of vigilantly monitoring data metrics and alerting teams when anomalies or issues surfaced.
Automation can play a pivotal role in data monitoring & management. Data is unique and different from code, and its characteristics can be observed and harnessed to drive greater efficiency and reliability in the data ecosystem.
This idea powers the automation engine at Revefi today.
Why Is Automated Data Observability So Critical?
All our conversations with real-world practitioners were pointing in the same direction. The challenges we heard from teams were consistent, and repetitive, and proved to us that the need for a dependable and automated solution was immediate.
● Poor adoption rate of manual data quality checks: Traditional manual SQL-based data quality checks have struggled to gain traction. Fewer than 10% of the tables within the data warehouse had the see essential checks in place, leaving a vast majority of data unchecked and vulnerable.
● Alert fatigue: The sheer volume of alerts created a significant challenge for data teams, with the risk of either being ignored or disabled, both of which could have adverse consequences.
● Elusive root cause analysis: Identifying the root cause of data issues is time-consuming. The absence of uniform quality standards across the data ecosystem for quality checks, manual processes, and fragmented responsibilities across different teams further add to the challenge.
● Escalating costs: Despite the limited number of data quality checks, business-oriented cloud data is dynamic and actively used by various applications and employees, exacerbating cost concerns.
● Unreliable data quality and delivery: Data teams have consistent trust issues in data quality and delivery. Unlike software development, where issues are caught early in the process, data-related problems often emerge belatedly.
These issues make it crystal clear: automated data observability is not just a choice; it's an immediate imperative for all organizations. In today's data-centric world, where data is the driving force behind a business, we find ourselves at a pivotal juncture, ready to transform how we supervise, handle, and safeguard the integrity of our invaluable data resources.
If you are a data practitioner and you face one of many of these challenges we’ve called out below and are in need of a solution, you’ve come to the right place. Revefi has got your back!
Our Vision: Pioneering the Future of Data Operations
At our core, we aim to elevate data teams—empowering them with the tools to swiftly address pressing challenges and to effectively drive business outcomes using data. In an informal way, our vision resonates as: "Delivering precision—right data, right time, optimal spend—with delight.”
While our initial focus was on tackling issues like cost efficiency, data freshness, and reliability, we quickly understood that the scope of the challenge extended far beyond these facets. We began to ask deeper questions— is data being utilized to its fullest potential? What about its performance? This led us to identify what we now refer to as the data operations cloud quadrant - data quality, spend, usage, and performance.
The Revefi DataOperations Cloud automatically connects data quality, performance, spend, and usage without requiring any manual thresholds and configurations and creates a baseline using AI. Revefi is on the constant lookout and alerts users if and where there is unexpected behavior related to their data. The system also automatically ranks issues based on magnitude and usage and generates information and details about the root cause, to help users understand situations and take immediate action.
With access to the right data, at the right time, and at the right cost, Revefi helps you make critical business decisions faster and reduce operational disruptions. It provides real-time insights that empower organizations to make data-driven decisions with confidence.
The Path Forward
In an age where data reigns supreme, moving from manual verifications to automated checks, and transitioning from mere cost control to unparalleled data reliability, lies a horizon rich in possibilities. We built Revefi purposefully to transform data operations, it is designed to break silos of observability, quality, performance, usage, and spend for the teams. As we embark on this journey, our vision is to be at the forefront of this transformation. Revefi Data Operations Cloud is a powerful tool that will enable businesses to advance not only their data but their data, AI and generative AI initiatives to unprecedented levels of success.
Sanjay Agrawal and Shashank Gupta