S-1 Teardown

Confluent S-1 & IPO Teardown

Confluent $CFLT is one of the most successful enterprise-focused open-source companies that follows the path trail blazed in recent years by Elastic and MongoDB. We summarize the S-1 filing and walk through Confluent’s story, key metrics, product, market, competition, and valuation.

Aneesh Tekulapally

Jul 27, 2021 • 25 min read

Summary

Confluent is the leader in the event streaming space that allows organizations to process changes in data immediately. The company was founded by the founders of the open-source Apache Kafka, the underlying architecture that allows organizations to ingest real-time data.
Kafka itself is a large part of Confluent’s moat as the framework is one of the most successful open-source projects with a large and devoted developer community. Confluent cites more than 60,000 meet-up members across over 200 global meetup groups.
Expert calls indicate that DevOps teams have an inherent bias towards Confluent, due to Confluent employees being many of the top contributors to the Kafka community.
$272M implied ARR in Q1 ‘21 with 55% YoY growth. Confluent is one of the fastest-growing developer-oriented companies with $100M ARR in under 5 years. Confluent is on a similar growth trajectory as other open-source tools like Elastic and MongoDB when normalizing for time elapsed since crossing the $100M ARR threshold.
Confluent Cloud is the main growth driver compared to Confluent Platform. Confluent Cloud is growing 124% YoY while Confluent Platform is growing at a much slower 43% YoY.
2,540 customers with ~107k average customer spend with approximately 27% of Fortune 500 companies utilizing Confluent and around 70%+ implementing Kafka.
Customer retention is decreasing and is lower than other enterprise software vendors like Snowflake which also utilizes usage-based pricing. Dollar-based net retention was 125% in 2020 which is substantially lower than Confluent’s industry leading 177% in 2018 and 134% in 2019.

Confluent Origins

“Today the data architecture of a company is as important in the company’s operations as the physical real estate, org chart, or any other blueprint for the business. This is the underpinning of a modern digital customer experience, and the key to harnessing software to drive intelligent, efficient operations. Companies that get this right will be the leaders in their industries in the decades ahead. We know that there is a foundational missing layer at the heart of data infrastructure that allows companies to harness data as it occurs—data in motion—and that this is critical in the next evolution of the architecture of companies. We think this new stack will evolve to be the central nervous system of every company and will be the single most strategic layer in the modern data world.” - Co-founder & CEO Jay Kreps

In the late 2000s, Jay Kreps while working at LinkedIn noticed that there were hundreds of different technologies for storing data but no efficient way to unify different applications and data stores in a coherent system that is powerful enough to ingest data in real-time. He searched thoroughly for an existing solution, assuming that digital-first enterprises had already created a product that addressed the needs of the ongoing data boom. Realizing that no solution existed Jay Kreps, Jun Rao, Neha Narkhede, the trio who later went on to found Confluent, built an internal event streaming platform called Kafka to revamp LinkedIn’s existing data infrastructure.

After scaling Kafka to handle data streams with billions of messages at LinkedIn, the founders improved the software to handle thousands of various use cases and eventually open-sourced the software to the Apache Software Foundation in 2011. The founders later recognized they could build a business around Kafka and launched Confluent in 2014 as a fully managed Kafka service and enterprise-ready stream processing platform. The trio received strong support from the developer community as well as from higher-ups at LinkedIn who ultimately invested in the startup. Ever since Kafka and Confluent have been closely linked since inception with Confluent developers being the main contributors to the Kafka open source project.

Confluent quickly received backing from Benchmark and Index Ventures and other venture capital firms citing the success of other enterprise-grade open source solutions from Red Hat, MongoDB, and Mulesoft.

Today, Confluent is the leader and has become synonymous with the event streaming and stream processing space with the stated mission to “set data in motion.” The company positions itself as “the central nervous system of an organization, allowing data to be captured and processed as it is generated around the whole organization, enabling organizations to react intelligently in real-time.” Confluent believes that this unification of data in motion and data at rest represents the natural progression from batch data processing on traditional databases to IT spend on real-time data infrastructure.

What is Apache Kafka?

To best understand the significance and utility of Apache Kafka, we need to first understand the old-school API-based and monolithic database architecture used in the past.

In traditional applications, the communication process between services that produce and consume data operated on a “ask, don’t tell” basis. Data sources (producers) and applications (consumers) are directly linked with custom APIs (Application Programming Interfaces) to read and write messages (single unit of data) on demand. With this architecture, Service A contacts other relevant services for their current state every time Service A is needed. This is straightforward and easy to visualize for applications with few producers and consumers, but for organizations that are rapidly building new services, it is very tedious and complex to directly connect each data stream with each consumer. For example, if there are 5 data sources and 8 applications utilizing that data, then 40 API integrations need to be made to link each producer with each consumer. Doing so requires coordination with various teams and a deep understanding of the idiosyncrasies of each access API. The problems of this API-based architecture are further exacerbated by many producers and consumers having different implicitly defined schemas. The names and format of fields for one producer can be different than the same fields for a producer. Tackling this requires data normalization between producers and consumers which is time-consuming and difficult when dealing with legacy producer or consumer systems.

Historically, databases that store changes in the state of a service or object have been montholitc stores of truth for many applications and systems. Databases are essential components of any application stack but as they become more populated with data they become difficult to work with. Deeply altering the schema of a database is challenging and databases aren’t designed for streaming events (changes in the state of an object). Developers create snapshots of data in its current state within a database, but this data is immediately outdated as a new event occurs. This is because databases as the single source of truth rely on batch processes that collect, then save and transform, and finally query the data. Batch processing creates problems for time-sensitive applications like financial transactions that need to integrate many sources of data. With databases, accessing a snapshot of data and live data from data feeds are separate ways of operating.

Log-based architectures solve this problem by being an event-driven source of truth that removes the distinction of taking a snapshot of data and ingesting live data. A log stores an ordered list of all events themselves rather than a database that stores the results of events. This is an important distinction that has powerful effects. With a log, you can create an unlimited number of custom data stores. Datastores are transformative, allowing each service to have its own materialized view, a database that contains only the required fields for that particular service. The log-based architecture makes schema changes easy as a new data store can be created simply by consuming events from the log at any point in time until you reach another specified point or real-time. In effect, utilizing a log massively simplifies the role of databases in a systems architecture by removing the need for a monolithic database and allowing every service to create its own custom data store.

This log-based event-driven architecture makes Apache Kafka a very powerful open-source event streaming platform that at a high level decouples data consumers and producers. Rather than dealing with API idiosyncrasies and building dozens of data pipelines when a new producer or consumer is introduced, Kafka acts as an intermediary layer that puts “data in motion”. With Kafka, each producer and consumer isn’t directly connected, which means the only integration required with a new consumer or producer is a single pipeline to Kakfa. This enables Kafka to act as the “unified data highway” for modern organizations with hundreds of applications and data sources that require real-time data processing for analytics or other real-time services.

AWS, Google, and other open-source tools like RabbitMQ are sometimes seen as analogous to Kafka as they are all often used as message brokers, software that creates an intermediary between consumers and producers within a system. While Kafka is often used as a very high throughput message broker for big data applications, Kafka allows for new workflows where a log-based architecture is pivotal. These workflows must use logs as they require the intermediary layer to retain all events forever and have every event ordered sequentially.

This can be summarized with Kafka's 3 Pillars of Functionality:

Publish & Subscribe: Ability to write and read streams of events
Storage: Store streams of events durably and redundantly
Process: Process streams of events in real-time or retrospectively (requires the use of logs as a central feature)

Confluent Product & Value Proposition

At a high level, if Apache Kafka is the central data highway in an infrastructure stack then Confluent acts as the on-ramp and off-ramp to the highway. When an organization wants to “set data in motion” they may download the Apache Kafka source code or the Confluent Community License and try to get started immediately. They soon realize that in order to build a comprehensive event streaming platform they need to build custom solutions that connect their data producers and consumers to Kafka while simultaneously building tools that provide a birds-eye view of the whole infrastructure stack. While this may not be a hurdle for large engineering-centric companies like Twitter who use Kafka, financial or industrial firms will struggle immensely dealing with the operational and programming complexities of creating such a system.

Confluent’s products catalyze Kafka’s extremely powerful architecture and provide tremendous value for organizations without extensive technical expertise. Confluent’s products abstract the operational and programming complexities of building proprietary solutions by simplifying connecting data sources to Kafka, building streaming applications, as well as securing, monitoring, and managing Kafka infrastructure. This allows organizations to focus on creating applications that benefit from data in motion rather than spending capital on Kafka operations.

Confluent has two main product offerings for enterprises:

Confluent Cloud: fully-managed cloud-native SaaS offering for Kafka
Confluent Platform: self-managed software offering that is cloud-agnostic and multicloud, and can be deployed on-premise, private cloud, or in the public cloud

Both products offer solutions that abstract the complexities of managing and deploying Kafka in your infrastructure stack. Confluent's products build upon the Community License and offering greater simplicity and security. These solutions include:

Confluent Control Center: GUI-based dashboard for managing and monitoring Kafka that provides a no-code simple way of building production-ready data pipelines. It allows simple management of Kafka Connect, the tool for scalably and reliably streaming data between Kafka and other data systems. The system monitors end-to-end efficiency of message delivery between producers and consumers.

Confluent Connectors: Leverage the Kafka Connect API to connect Kafka to other systems such as databases, key-value stores, search indexes, and file systems without writing any code. Connectors are available for both community licensed and commercial customers. Community licensed users have access to 99 connectors while commercial customers have access to an additional 108 connectors. Commercial connectors are more complex and intricate than open-source connectors.
Confluent Replicator: Manages replication of data and configuration between Kafka maintained in multiple data centers. Provides low-latency architecture optimization, centralized analytics, and on-prem and cloud data synchronization.
ksqlDB: Easy-to-use and powerful interactive SQL interface for streaming processing on Kafka, without the need for writing code in Java or some other programming language. The SQL engine allows for scalable and fault-tolerant data filtering, transformations, aggregations, joins, windowing, and sessionization in real-time. ksqlDB is available for community users but commercial users have added enterprise-grade functionality.

Other auxiliary solutions include Confluent Security Plugins, Confluent for Kubernetes, Self-Balancing Clusters, Confluent Cluster Linking, Confluent Auto Data Balancer, Tiered Storage, Confluent JMS Client, and Confluent MQTT Proxy. These solutions and many others provide greater flexibility, efficiency, and security when running Kafka.

Case Study

You’re the Chief Information Security Officer of Intel, one of the largest and most recognizable companies whose semiconductor technology powers hundreds of millions of devices globally. In this role, you oversee Intel’s cybersecurity systems that detect and respond to increasingly advanced cyber threats that threaten computing environments as well as a businesses’ ability to grow.

Your team is tasked with keeping Intel secure while managing legal compliance worldwide. This requires a large number of people, applications, databases, and analytics capabilities that are far too often not well integrated, which leads to silos between teams that need specific information. These silos when unaddressed lead to poor data integration and data inconsistencies that cause IT teams to spend as much time maintaining legacy systems as combating cyber threats.

To reduce siloing, you invest in a centralized scalable cybersecurity platform (Cyber Intelligence Platform) that utilizes Kafka to simplify connecting data sources while actively filtering data. With Kafka and the Confluent Platform, you are now able to ingest multiple terabytes of data per day from many sources and perform advanced analytics all in real-time. To read more about how Intel utilizes Confluent to protect its enterprise feel free to read their white paper here.

“Kafka helps us produce contextually rich data for both IT and our business units. Kafka also enables us to deploy more advanced techniques in-stream, such as machine learning models that analyze data and produce new insights. This helps us reduce mean time to detect and respond; it also helps decrease the need for human touch. Kafka technology, combined with Confluent’s enterprise features and high-performance Intel architecture, support our mission to help make it safe for Intel to go fast.” - Chief Information Security Officer Brent Conran

Why Confluent wins

Best-in-class and category-defining product created by the founders of Apache Kafka that enables effective TAM penetration and immediate ROI

Confluent developers remain the main contributors to the Kafka open-source project with the Confluent Community License being a massively popular distribution for developers. Because of this, CFLT is seen as a thought leader and at the forefront of technological innovation which makes Confluent almost synonymous with the underlying Kafka technology. This means for users interested in deploying Kafka, Confluent is the first commercial and enterprise-ready product that comes to mind. Additionally, Confluent’s technical expertise draws in an extensive developer community that other managed Kafka services or platform competitors lack. Various expert calls indicate that developers and DevOps teams have an inherent bias towards Confluent, due to Confluent employees being many of the top contributors to the Kafka community. This itself creates a flywheel effect as increased developer adoption of Confluent influences other developers to download Confluent’s products and further expand the developer community. Developers are attracted to frameworks with large developer communities and extensive community support systems and Confluent’s Community License distribution of Kafka has one of the largest backings of any open-source tool.
Expert calls note the ease of use adopting Confluent products. Developers are able to get the latest version of Kafka up and running extremely quickly, not worry about version control/updating, and benefit from Confluent's expertise in fixing bugs for all of its customers with very little coding required. The Confluent Platform allows customers to plug-in data streams, consumers, and producers without writing their own connectors and without building their own dashboard view. This saves so much development time and allows customers to get their “data in motion” weeks quicker than comparatively building their own infrastructure. This drives immediate ROI for developers and allows organizations to dramatically shorten their product development schedule.

Confluent benefits from large secular tailwinds that create structural changes within a company’s data infrastructure stack

Kafka represents the natural progression away from batch processing to real-time data ingestion and analytics. This makes Kafka itself a large part of Confluent’s moat as the framework is one of the most successful open-source projects with a large and devoted developer community. Confluent cites more than 60,000 meet-up members across over 200 global meetup groups. Additionally, Kafka is estimated to have been used by over 70%+ of Fortune 500 companies.
With the Confluent Platform or Confluent Cloud, the technical hurdles of deploying Kafka continue to be reduced allowing companies without deep technical expertise to leverage real-time data to power their digital revolution. This secular theme of “data in motion” is becoming more and more valuable for organizations as the rapid increase in IoT devices continues to catalyze the rapid increase in data collection. Confluent references that “according to IDC, by 2025 there will be 55.7 billion connected devices worldwide, 75% of which will be connected to an IoT platform. Data generated from connected IoT devices are projected to be 73.1 zettabytes by 2025, almost four times the 18.3 zettabytes generated in 2019. To capture this massive volume of real-time data and build solutions that deliver transformative impact, enterprises need a new foundational data infrastructure designed for data in motion.”
Management notes that Confluent will continue to reduce barriers to entry and use its positioning to create use-case-specific platform adjacencies that move up the infrastructure stack. Management is particularly bullish on ksqlDB and other initiatives being large expansion opportunities. Expanding the querying and analytics abilities for ksqlDB can be expected as it is currently not as powerful as traditional relational databases in terms of its ability to manipulate data in a complex fashion for advanced analytics. A Business Intelligence offering provides another expansion opportunity for the Confluent platform as current offerings for real-time event streaming are not nearly as robust or established as BI tools for data warehouses. Another key opportunity highlighted by expert calls is Confluent Cloud’s potential as a middleware service for ingesting data between different vendors and companies. Similar to how Snowflake allows other parties to plug into the Snowflake platform and pay their own compute costs when accessing a subset of data, Confluent can offer a service that allows different parties to work with Kafka without needing to set up data replication. Overall, there is so much potential beyond the core data infrastructure and Confluent is best positioned to build as it’s the central nervous system of an organization’s data.

Product stickiness, data flywheel, and multi-cloud/on-prem flexibility

Confluent’s offerings are the center of an organization's data flow making it an extremely sticky enterprise product. As a company increases its use of commercial connectors they become locked into the system as the time and resources required to make custom connectors are too much. Additionally, the time and resources required to make a custom Confluent Control Center and other enterprise-focused features are immense, making Confluent a particularly sticky product for non-tech companies.

Confluent benefits from a data flywheel that results in Confluent’s platform acting as the central nervous system for the entire organization over time. Management notes that customers start with an initial limited use case and then expand to other lines of business, divisions, and geographies. As more uses within an organization are adopted, more applications and systems become connected, which leads to more data being processed on the Confluent platform. These data streams then attract more applications which in turn results in more data thus fueling the data flywheel. This flywheel effect allows Confluent to grow with its customers and land deals that are much bigger than the initial deal.
Additionally, customers increasingly want their event streaming platform to handle messages for a multitude of data sources housed in various cloud and on-prem environments. Large cloud competitors like AWS and GCP offer similar message broker services to Kafka but their services require ecosystem lock-in while Confluent is the only provider that is completely agnostic of where data is held and where data is going. In the event streaming space, Confluent will be the only “Switzerland-like” company as there are no other data-agnostic competitors with the Kafka expertise to gain significant market share. A report from Bain, suggests that two out of three CIOs indicate they plan to use multiple public cloud infrastructure providers to avoid vendor lock-in and control. Gartner further indicates in its latest cloud adoption survey, that more than 75% of organizations are using a multi-cloud strategy. Confluent is poised to benefit the most from this trend.

Business Model

Subscription (88% of revenue for Q1’ 21): Subscription revenue consists of revenue from term-based licenses for Confluent Platform including PCS (post-contract support, maintenance, and upgrades) and the SaaS-based Confluent Cloud. The company notes that revenue from PCS represents a substantial majority of revenue from term-based license subscriptions with Confluent.

Confluent Platform prices are based on the number of nodes (number of machines that are running the platform) and the level of service that is provided. Levels of service include silver, gold, and platinum with silver for development level support, gold for 24/7 operational support, and platinum for a dedicated technical account manager that provides architecture guidance and strategy.

Confluent Cloud is priced based on the cloud provider used (AWS, Azure, GCP), the server region, the cluster type, the amount of storage, and the total quantity of partitions (divisions in each log). Confluent offers 3 cluster types: basic, standard, and dedicated with differences outlined below.

Services (12% of revenue Q1’ 21): Services revenue consists of revenue from professional services and education services, which are generally sold on a time-and-materials basis. Confluent takes advantage of its vast technical expertise in Kafka to host professional services that help customers accelerate the development of data-in-motion use cases and educational services that include instructor-led courses, self-paced course subscriptions, and enterprise course subscriptions.

Go-To-Market

Confluent utilizes two main GTM channels: direct sales and channel partners.

Direct Sales & SMB Focus: In Q1’ 21, 78% of Confluent's revenue came from smaller customers with <$100k ARR while 20% of revenue came from customers with >$100k ARR. Merely 2% of revenue came from customers with >$1M in ARR. The percentage of revenue from smaller customers with <$100k ARR and >$100k ARR has decreased continuously since FY19 due to new customer lands. Confluent’s sales team is laser-focused on the end-to-end customer journey that focuses specifically on the data-in-motion customer adoption process. The sales team helps customers progress to production use cases and then continues to drive expansion by aiding in a complete platform transition. This transition moves customers from utilizing individual disconnected projects to utilizing a cross-enterprise connected data platform.

Channel partners: Confluent relies on a plethora of technology partners including cloud platform providers, database vendors, technology integrators, and consultants. Confluent has strong partnerships with firms like Accenture, Altair, IBM, Infosys, etc.

Additional observations on Confluent's GTM:

Developer-centric bottoms-up sales & product-led growth: Confluent was built to be the central nervous system for modern organizations. There is a strong virality of the Confluent Community License in the community because it is open source and free to use. Developers interested in adopting event streaming can head to the Confluent website, download the Confluent Community License and start integrating their existing producers and consumers. Once these developers are ready to scale up their infrastructure for enterprise applications they can seamlessly upgrade their Confluent support level and deploy their applications. As developers increasingly become decision-makers in companies, this developer-centric model will continue to drive customer lands in the future. Large enterprise software companies like Elastic, Docker, Databricks, and MongoDB have proven that this developer-focused and open-source model works.
Land and expand strategy is a large part of GTM strategy: As noted in a previous section, Confluent implements a land and expand strategy that relies on a certain team within an organization adopting event streaming and then completely transitioning the entire organization to event streaming. As more data is processed and more applications are made the quantity of Kafka nodes increases, which in turn accelerates revenue. This land and expand strategy has proven to be effective by a cohort and contribution analysis done for the 2018 cohort. Confluent defines contribution margin as the subscription revenue from the customer cohort less the cost of subscription revenue and associated S&M expenses. Contribution margin percentage is the contribution margin divided by the subscription revenue associated with that cohort in a given period. The graphic shows that Confluent’s base of ARR for each cohort becomes very profitable over time which is expected given it’s a SaaS/software company.

Massive spend on S&M relative to revenue: S&M spend represents 77% of revenue in FY18, 70% of revenue in FY19, and 76% of revenue in Q1’ 21. Confluent remains in its rapid growth phase with the company hiring hundreds of sales development representatives.

Confluent customers include 136 Fortune 500 companies that contributed approximately 35% of revenue for Q1 ‘21. In comparison, around 80% of Fortune 100 companies and an estimated 70% of Fortune 500 companies use Kafka. Confluent offers professional and educational services for Kafka which means that Confluent still has a long runway for customer adoption in Fortune 500 organizations.
Confluent has a healthy mix of domestic and international revenue. International revenue as a percentage of total revenue has gradually increased as management expands international channel partners.

Market Opportunity

Confluent’s market encompasses both on-premise, cloud, and multi-cloud environments, unlike competitors that are mostly cloud-focused on AWS, Azure, and GCP. The company targets four core Gartner-defined market segments: Application Infrastructure & Middleware, Database Management Systems, Data Integration Tools and Data Quality Tools, and Analytics and Business Intelligence. Confluent notes that the aggregate of these four market segments is approximately $149B with Confluent directly touching $50B.

The breakdown of this $50B Confluent can service is below:

$31B in Application Infrastructure & Middleware (excluding Full Life Cycle API Management, BPM Suites, TPM, RPA, and DXPs)
$7B in Database Management Systems (excluding Pre-relational-era DBMS)
$7B in Analytics and Business Intelligence (excluding Traditional BI Platforms)
$4B in Data Integration Tools and Data Quality Tools (excluding other Data Integration Software)

Gartner expects the market to grow at a 22% CAGR through 2024 to reach $91B.

Confluent Cloud also enjoys the decades-long secular trend of increased IT spend on cloud services. IDC notes that the global public cloud services market is expected to grow from $292B in 2020 to $628B in 2024 and that only 17% of infrastructure spend has been in public cloud services.

Competitive Landscape

Global hyperscale cloud providers: AWS, Azure, and GCP are some of Confluent’s biggest competitors. These providers offer their own event streaming services with AWS even going the unusual route and offering a product called Amazon MSK (Managed Streaming for Apache Kafka). In comparison to Confluent Cloud, MSK is rather primitive and validates the popularity and developer desire for Apache Kafka over their own proprietary event streaming service called Amazon Kinesis. MSK does not have the breadth of proprietary connectors Confluent offers which makes developers rely on open source connectors that aren’t enterprise native. Expert calls even suggest that connectors to other AWS services like S3 and DynamoDB aren’t as developed. MSK makes it easy to get data into the Kafka pipeline like Confluent Cloud but doesn’t offer the auxiliary connectors and analytics tools like ksqlDB that enterprises need to fully set their data in motion. Companies interested in managed Kafka on AWS should use Confluent Cloud as it is a more comprehensive offering.

For shops that are heavily reliant on AWS, Kinesis is a compelling offering but Confluent offers greater operational flexibility and the ability to work on-prem or across cloud providers. Additionally, Kinesis includes auto-scaling capabilities as well as Kinesis Analytics (ksqlDB competitor) but doesn’t offer the same performance in terms of message throughput and write operation speeds. Azure and GCP don’t offer a complete log-based event streaming solution with the same functionality as Confluent. Azure has Event Hubs and GCP has Dataflow but these offerings are built on different architectures that are not log-based.

Azure and GCP also offer their own message broker systems, but developers that want to implement event streaming should utilize Confluent Cloud on Azure or GCP as that service is pretty integrated with the rest of the Azure and GCP platform.

Amazon Kinesis Data Streams - Data Streaming Service - Amazon Web Services

Queue-based message brokers: Azure Service Bus and GCP Pub/Sub are examples of queue-based message brokers that are sometimes used instead of Kafka depending on the organization’s workflow. Other examples of queue-based message brokers include the popular open-source RabbitMQ as well as ActiveMQ, MSMQ, AWS SQS, JMQ. Queue-based systems are fundamentally different from log-based systems and thus have different use cases at scale.

Queues are transient in nature which means the very reading of the message removes the data. This means that different applications that need the same data can’t share a queue as they would compete to consume the messages. For multiple applications to consume messages utilizing a queue requires a publish-subscribe (pub/sub) pattern where a routing and decoupling mechanism needs to be instituted to separate publishing and immediate consumption. Pub/sub systems like the message brokers mentioned above allow applications to have their own queue and offer decoupling services that make queues serviceable for multiple applications.

As explained before, logs are persistent (reading doesn’t remove data) and are a shared resource for the entire system. This allows multiple consumers to read from the same log at the same time from different positions. Logs are also less centralized in nature as you can’t mix different events according to the needs of each application. Log-based infrastructure makes use of many logs as each log is defined for a particular event. Queues generally are perceived as more customizable and easier to evolve for each application. Log-based systems require more development time but benefit largely from persistent data that allows extremely easy event-sourcing, data replication for other applications, and easy schema changes. In general, utilizing logs makes the role of databases so much simpler and really enables the paradigm shift to microservice architectures.

Additionally, log-based architectures like Kafka scale much better than queue based-systems when routing large amounts of data. This makes Kafka a better choice for many forward-looking enterprises as the amount of data collected in applications will continue to skyrocket. To put lots of data in motion, Kafka is a better solution.

Managed Kafka services: Besides AWS other providers like IBM, Aiven, XenonStack, Instaclustr offer managed Kafka services. These services aren’t as complete as Confluent Cloud and as mentioned earlier, DevOps teams naturally gravitate towards Confluent due to their sole focus and their intertwined history with Kafka. For companies that want managed Kafka, Confluent Cloud has the most complete offering and is a no-brainer if it fits in their IT budgets.

Alternative event streaming frameworks: Confluent’s largest competitive advantage is that it is synonymous with Kafka and its team has the most experience managing and creating architectures around Kafka. This expertise in Kafka also makes Confluent’s business entirely reliant on Kafka's enduring popularity. However, Kafka is not the only open-source framework that handles event streaming and high throughput data ingestion.

Apache Pulsar: Pulsar is an event streaming framework open-sourced in 2016 that has many of the same features and benefits as Kafka and is also compatible with Kafka. Pulsar acts as a hybrid between a traditional queuing system and a high throughput event streaming platform like Kafka. Pulsar is gaining popularity and is the biggest competitor to Kafka’s dominance in event streaming. StreamNative positions itself as a Confluent-like company for Pulsar, stating to be founded by the original developers of Apache Pulsar.
Older frameworks have been appropriated for event streaming like Apache Flink and Apache Spark. Databricks offers a Spark-based streaming platform with similar capabilities to Kafka.

Other competitors: Vectorized, a startup backed by Lightspeed Venture Partners and Google Ventures, has started to gain traction as a Kafka replacement that is backward compatible with Kafka. Vectorized’s product Redpanda claims to be more efficient and better performing with less operational complexity than Kafka.

Business Performance & Benchmarks

Financial Metrics Summary:

$272M implied ARR in Q1 ‘21 with 55% YoY growth. 88% subscription revenue and 12% service revenue in Q1 ‘21.
$237M FY20 revenue with 58% YoY growth
70% non-GAAP gross margin with 78% non-GAAP subscription margin and 17% non-GAAP service margin

-28% FCF margin in Q1 ‘21 up from -64% in Q1 ‘20
2,540 customers with ~107k average customer spend
117% NDR in Q1 ‘21 down from 130% in Q1 ‘20
0.63 LTM Magic Number
27.59 LTM Payback Period

Benchmarks & Key Takeaways:

Confluent is one of the fastest-growing developer-oriented companies. The company surpassed the $100M ARR mark in under 5 years and went from $65M in revenue in 2018 to 237M in 2020. While currently smaller than other open-source tools like Elastic and MongoDB on an ARR basis, Confluent is on a similar growth trajectory as when normalizing for time elapsed since crossing the $100M ARR threshold.

Net new ARR seems to be accelerating and is cyclical. Confluent reports negative net new ARR in Q1 of both 2020 and 2021 with the decrease in ARR in 2021 being significantly less than the drop in 2020.

Customer retention is decreasing and is lower than other enterprise software vendors. Dollar-based net retention was 125% in 2020 which is substantially lower than Confluent’s industry-leading 177% in 2018 and 134% in 2019. This is slightly worrisome as Confluent prices similar to Snowflake’s usage-based model that leads the industry with 168% NDR. Confluent's retention is also less than Elastic and MongoDB which have greater ARR. This decline can be attributed to existing customers becoming a larger portion of overall ARR, large initial deal sizes, and the impact of customers transitioning to Confluent Cloud.

Confluent Cloud is the main growth driver compared to Confluent Platform. Confluent Cloud is growing 124% YoY while Confluent Platform is growing at a much slower 43% YoY. Confluent Cloud’s percentage of subscription revenue has been increasing every single year and is currently 20% of subscription revenue. It is clear that management is hoping that Confluent Cloud will experience the same meteoric rise as MongoDB’s Atlas Database service which was first released in 2016 and now accounts for 46% of MongoDB’s revenue in 2021.

It is important to note that Confluent is entirely reliant on Kafka while other open-source companies have expanded their offerings to multiple open source products. For example, Databricks became popular due to Apache Spark but has gone on to launch multiple other open source products like Delta Lake and MLflow.

Valuation

Confluent raised a $250M series E in April 2020 at a $4.5B post-money valuation. Accounting for cash on the balance sheet, the EV/Run Rate multiple is currently at 32.2x which makes Confluent one of the priciest software/SaaS names in the developer and data analytics space. Comparing Confluent to the greater high-growth SaaS landscape shows a fair valuation when accounting for ARR growth in relation to EV/Run Rate multiples.

Conclusion

Confluent and its founding team have rapidly commercialized event streaming in the last 7 years and have created a company so synonymous with its underlying architecture, Kafka, that Confluent has become the definitive leader in setting a company's data in motion. The company’s success comes from its 1) ubiquity and breadth of offerings in abstracting Kafka’s complexity; 2) ability to land and expand within an organization; 3) native multi-cloud offering. This allows Confluent to successfully penetrate the massive event streaming TAM and offer better products compared to cloud providers like AWS, Azure, and GCP. Looking forward, I am excited to see how Confluent moves up the data infrastructure stack and addresses the needs of modern applications and development teams.

Congratulations to Jay Kreps, Jun Rao, Neha Narkhede, and the rest of the Confluent team!

If anyone has any questions/feedback or would like to talk more about Confluent feel free to reach out!

Aneesh Tekulapally, Public Comps Team

Email: aneesh@publiccomps.com

Twitter: @aneesh_tek