85: Arun Thulasidharan: Warehouse-native martech and an alternative pricing model

What’s up everyone, today we have the pleasure of being joined by Arun Thulasidharan, CEO & Co-founder at Castled.io.

Summary: Arun clarifies ‘warehouse-native’, positioning Castled.io as a flexible solution that caters to specific customer needs. He addresses challenges in traditional martech, such as the disparity between customer base size and value derived, and presents Castled.io’s unique solutions like an alternative pricing model and immediate data access. Arun navigates the issues of a warehouse-native approach, providing strategies for handling real-time data and minimizing compute charges. He cautions against seeing warehouse-native adoption as merely an escape from reverse ETL, emphasizing its potential to resolve existing martech problems and enhance functionalities. Arun encourages an open mind towards new, complex technologies, recognizing their transformative potential.

Jump to a section

About Arun

  • Arun is a data engineer by trade with over a decade of experience building and scaling systems in the startup ecosystem
  • He started his career in software engineering roles at Applied Materials, an enterprise semiconductor manufacturer and later MiQ, a programmatic advertising media partner
  • Arun then joined Flipkart, known today as India’s largest e-commerce marketplace with a whopping 150 million customers 
  • He then moved to the startup world joining Hevo Data as one of the first tech hires, a No-code ETL Data Pipeline platform that enables companies to consolidate data from multiple software
  • In 2021, Arun moved to San Francisco to co-found his first startup, Castled Data – A warehouse-native customer engagement platform that sits directly on top of cloud data warehouses
  • Along with his team of founders Arun was selected by YC in the Winter 22 batch

From Open Source Reverse ETL Tool to Warehouse-Native CEP

When asked about the transformational journey of Castled.io, Arun shed light on the genesis of the company’s vision. It was a time when businesses wanted to move their data from warehouses to various tools, yet the market lacked the means to do this efficiently. Recognizing this gap, Arun embarked on the mission to develop an open source, reverse ETL solution. His concept was founded on the idea that no one-size-fits-all tool could cater to the wide range of companies’ diverse requirements.

This venture brought Castled.io a fair amount of traction, with many companies employing their open source solution in-house, and a growing clientele availing of their cloud-based offering. However, around this time, a critical analysis of the martech landscape provoked a pivot. Arun realized the long-term sustainability of reverse ETL solutions was questionable, especially with the burgeoning concept of warehouse-native apps. Other companies were beginning to develop their own reverse ETL tools.

Arun observed that these ETL solutions were not truly designed for data teams but rather marketing growth teams, signaling a limitation in their scope. The need to constantly shift data to different platforms like Intercom was dwindling, given alternative and more efficient methods emerging in the martech ecosystem. In fact, he believed that the popularity of these reverse ETL solutions might begin to wane within a year.

The most crucial feedback that inspired the transformation of Castled.io came directly from its target audience – the marketers. They indicated that a reverse ETL solution did not fully resolve their challenges, especially in scenarios where handling large amounts of data became a bottleneck for their existing tools. It became clear that simply copying data from warehouses to another tool wasn’t an effective solution.

Prompted by these revelations and the rising acceptance of the warehouse-native concept, Arun and his team decided to pivot. They transitioned from being an open-source reverse ETL tool provider to building Castled.io as a solution directly layered on top of data warehouses. This move allowed them to bypass data migration issues and directly cater to the marketers’ needs.

Takeaway: The journey of Castled.io highlights the importance of remaining adaptable and receptive to market changes and customer feedback. This awareness allowed the company to evolve from being an open-source reverse ETL tool to a robust, warehouse-native solution, directly addressing marketers’ challenges. The company’s pivot is a testament to strategic foresight and innovation in the martech space.

Building a Composable vs Packaged CDP

In the fiery debate around composed versus packaged CDPs, Arun weighed in with his unique viewpoint. He likened the contrast between these two approaches to the difference between open source and closed source systems.

From Arun’s perspective, the appeal of composable CDPs lies in the flexibility they offer. This format enables innovation on top of the data warehouse, unlike the constraints potentially imposed by a packaged system. If something isn’t quite right, with a composable CDP, you’re able to add more tables, create more transformations, and even integrate external tools. 

Arun cited examples like Truelty and Zingg.ai, tools that perform identity resolution on top of the data warehouse. These systems, instead of operating deterministically, utilize fuzzy resolution. They identify rows that may be the same and join them together – an innovative process executed directly within the data warehouse. 

Such possibilities underscore the value of composable CDPs. Being locked into a closed system inhibits the ability to incorporate these innovations into one’s data warehouse, a limitation he finds less appealing. Though there are countless other arguments surrounding this topic, Arun emphasizes this angle as one often overlooked in the broader conversation.

Takeaway: In the composable vs. packaged CDP debate, Arun highlights the flexibility and potential for innovation offered by composable CDPs. By likening them to open-source systems, he underscores the opportunities to customize and integrate additional tools directly on top of the data warehouse, an often overlooked yet crucial consideration in the martech space.

Unpacking the Definition of Warehouse-Native Martech

When asked about the varying definitions in the martech space, particularly ‘warehouse-native’ and ‘connected’, Arun addressed these terms with a refreshingly pragmatic viewpoint. He observed that while the industry is caught up in different terminologies, often what doesn’t fit into these boxes is what the customer actually wants.

Arun described his understanding of warehouse-native as akin to the framework offered by Snowflake, where everything runs atop the data. A connected app, in his view, is one that separates compute and data – the data resides in a warehouse, not in the SaaS app, providing the flexibility we’ve discussed before. The actual computations happen on internal clusters, streamlining operations by removing the need for API integrations, enhancing consistency and security, and reducing data movement.

Yet, for Arun, the appeal of warehouse-native martech extends beyond these definitions. The true advantage lies in its potential to transform data into a goldmine of information that can fuel powerful reporting and analytics. The ability to write data back to the data warehouse creates a wealth of opportunities for customers, a feature he deems as a significant boon of connected apps and warehouse-native tech.

Despite these perspectives, Arun chooses not to classify Castled.io strictly as a warehouse-native or connected app. Instead, he emphasizes meeting customer needs. For some enterprise customers, the security of not moving data to an external system like Breeze or Iterable is paramount. Here, he sees value in solving problems directly atop the data warehouse, utilizing its compute capabilities. However, he recognizes that this isn’t a universal requirement, with many mid-market companies comfortable with moving data to different tools.

Emphasizing flexibility, Arun noted that focusing solely on solving problems within the data warehouse could limit the potential for real-time or transactional use cases. To balance this, Castled.io allows users to create segments directly atop the data warehouse. Once a segment is created, it’s cached to a segment architecture designed to power real-time use cases.

Takeaway: While the martech industry grapples with terms like warehouse-native and connected apps, Arun emphasizes the need to focus on customer requirements. He sees the benefits of these concepts but underlines the importance of flexibility, using warehouse-native capabilities where they add value and enabling other options where they serve specific use cases better. This customer-centric perspective holds the potential to influence the way we think about the future of martech solutions.

The Problems with Traditional Martech and Inaccessible Data

When asked about the challenges plaguing traditional martech and how his company, Castled.io, presents solutions, Arun had a wealth of insights to share. There’s a significant disconnect, he explained, between the number of customers a company maintains and the actual value derived from them. This discrepancy was particularly noticeable among smaller B2C companies with millions of subscribers but limited income. 

Traditional martech solutions claim their pricing aligns with the value the customer extracts, but in Arun’s view, this assertion doesn’t hold water. Castled.io, however, he says, offers a more sensible alternative. Its pricing model doesn’t center on the volume of data or compute capacity utilized. Instead, it correlates with the number of team members leveraging the tool, an arrangement Arun believes more accurately reflects the true value provided to the customer.

He further unpacked the issue of data look-back periods, pointing out that especially for certain B2C companies, like travel agencies planning vacations, having only a six-month data retention limit makes re-engagement efforts nearly impossible. Traditional ecommerce companies could similarly benefit from having access to data from past holiday seasons, enabling them to strategize more effectively.

Arun also criticized the inefficiency of marketers waiting for data engineers to fulfil their requests, a process that can take months. In contrast, Castled.io offers immediate access to data stored in the warehouse, overcoming this time-consuming hurdle.

Takeaway: Traditional martech has its shortcomings, notably the disconnection between customer numbers and value derived, and restrictive data look-back periods. However, Arun emphasizes that Castled.io is leading the charge in overcoming these hurdles, with its unique pricing model based on team member usage and its efficient, immediate access to data.

The Role of Clean Data in Warehouse Native Martech

When Arun was asked about the significance of clean data warehouses for operating any warehouse-native martech tools, he agreed with the concept, but offered an intriguing observation. According to Arun, the common notion that only enterprises and mid-market companies maintain clean data warehouses doesn’t hold water.

He revealed that surprisingly, many small to medium-sized businesses (SMBs) also maintain high-quality data warehouses. Arun explained that this trend could be attributed to the increasing awareness among businesses about the importance of data quality. They understand that shoddy data quality could pose issues down the line, and therefore, they focus on data quality from day one.

Arun mentioned an interesting phenomenon among SMBs. These companies, although smaller in size, often have a higher data quality than mid-market and enterprise businesses. Arun attributes this to the hands-on approach of the CTOs at these smaller companies who directly oversee these matters, implementing best practices such as data contracts.

Arun also highlighted that warehouse-native tech isn’t just for large B2C companies with loads of customers. Castled.io, while it primarily targets B2C customers, is also used by B2B customers for tasks like drip and email marketing.

Arun concluded by emphasizing that a clean data warehouse is indeed a prerequisite for any warehouse-native tech. This, he suggested, is evidenced by the popularity of reverse ETL tools, which send data from the data warehouse back to these tools.

Takeaway: Clean data warehouses are critical for the effective operation of warehouse-native martech tools. Interestingly, SMBs are often leading the pack in maintaining clean data warehouses due to a proactive approach towards data quality right from the get-go. It highlights the broader applicability of warehouse-native tech beyond just large B2C companies, as businesses of all sizes strive to achieve better marketing results.

📫 Never miss an episode or key takeaway 💡

By subscribing to our newsletter we’ll only send you an email when we drop a new episode, usually on Tuesday mornings ☕️ and we’ll give you a summary and key takeaways.

Success! You're on the list.

According to other industry experts, the key challenges faced by companies in adopting a warehouse-native approach include: 

  • Real time data: There’s often a delay in getting data in the warehouse. Many marketers are faced with a nightly sync in their warehouse which makes real-time campaigns tricky
  • Compute charges: For teams that are lucky enough to have real time data, there’s the added cost of compute charges / creating a load on your DWH/Snowflake that can add up quickly 
  • Schema ownership: The data team usually owns the schema and it might not always be in a format that marketing can do something with it. Example: creating a table for every product event vs creating a product event table and having the events be columns 
  • DWH access: A lot of companies will block access to the data warehouse because it has sensitive info in it… even if you aren’t reading it, they just don’t want the access point

Arun offered his in-depth insight and experiences to provide a well-rounded view of these complex issues.

Real time data challenges

Arun explained that the real-time data challenges within a warehouse-native approach can be addressed by leveraging APIs and strategic data management. He split it up into two distinct use-cases of real-time data:

  1. reacting to events in real-time
  2. handling real-time data itself.

The first scenario, reacting to real-time events, is already addressed by Castled.io. Arun provided an example of an e-commerce company sending a push notification or email when an order is shipped. By integrating with the Castled.io API, they’re able to send these notifications within milliseconds, utilizing data directly from the data warehouse. This speed is enabled by caching user segment data on their end, catering to a majority of real-time use cases they encounter.

The second scenario, real-time data management, involves cases like newsletter sign-ups, where data may not instantly exist in the data warehouse. Here, Arun shared how customers resolve this: when making the API call, they pass all the contextual data of the event. Since the data associated with a new user sign-up isn’t scattered across multiple sources, this method provides a timely and efficient solution. The collected data can then be used for personalization and other activities.

Computing Charges, The Hidden Pain Point of the Data Warehouse

Arun acknowledged this as a common problem in existing cloud data warehouses. He emphasized the potential adverse impacts these charges could have on the data warehousing ecosystem if not properly managed. But with strategic use of optimized SQL queries and warehouse-native applications, these costs can be minimized and optimized. 

Despite the associated costs, Arun maintained that many would still favor a data warehouse due to the immense value it brings. Particularly when it comes to marketing, where substantial budgets are often deployed, utilizing a data warehouse to its fullest extent becomes crucial.

Arun explained that once a company decides to leverage its warehouse for marketing, it must choose the most efficient way to operate. Either data engineers can write queries to create segments and handle the data, or applications running atop the data warehouse can execute these queries. Arun pointed out that a key reason for the escalating compute charges is the hiring of inexperienced analytics engineers, whose lack of optimized SQL queries knowledge adds to these costs.

Arun explained how his team prides itself on paying careful attention to even minor filter additions in their audience builder query tool. He noted that a thorough load testing process is carried out to ensure customers don’t have to pay any unnecessary costs. For him, this is way more scalable. Despite the inevitability of compute charges, utilizing a warehouse-native application ensures these costs are minimized and optimized.

Understanding the Intricacies of Schema Ownership and Data Warehouse Access 

Arun was frank about the limitations in addressing the challenges of schema ownership and data warehouse access. He acknowledged that these hurdles are more about facilitating an understanding among the people involved. He emphasized the need in educating the relevant parties about the potential value that’s being left on the table. However, if a company insists on restricting marketer access to their data warehouse, the solution may be out of reach.

Arun indicated that the number of companies unwilling to grant marketers access to their data warehouse is relatively small. He stated that even enterprise customers are not frequently faced with such restrictions.

He admitted that beyond educating the individuals involved about the benefits of allowing marketers access to their data warehouses, there isn’t much else they can do. He elaborated that Castled.io can show how they write optimized queries on a data warehouse compared to data engineers, but if a company is still hesitant to give marketers access to their data warehouse, the situation is beyond their control.

Takeaway: Addressing these complex issues requires a deep understanding of the technical landscape and the value that real-time data, well-managed compute charges, and sensible schema ownership can bring to an organization. Arun’s perspective illuminates the path to navigating these challenges, ensuring data is used optimally and efficiently.

The Dynamic Weather of Overlapping Martech Data Tools

Navigating the current martech landscape can feel like a high-stakes game of chess, with tools shifting and overlapping in functions almost as quickly as one can keep track. 

  • ETL tools adding rETL features while rETL tools and CDIs becoming composable CDPs
  • CDPs adding product analytics and AI features while product analytic tools adding CDP and AI features
  • CDPs adding marketing automation features while MAPs adding CDP features
  • CDPs also adding “warehouse connectors” or “warehouse sync” – basically acknowledging that the warehouse is essential, and they need to catch up on ETL and reverse ETL

Arun presented a comprehensive view of the current state and predicted trajectory of the industry considering the labyrinthine tangle of martech data tools. Recognizing that consolidation is a looming certainty, he detailed the implications of such a shift in the landscape.

Arun highlighted the inherent overlap between various data technologies, and spoke about how this redundancy is fueling an inevitable unification of tools. He recalled a conversation with a colleague who, after a detailed analysis of their tech stack, could potentially cut their toolset in half due to existing redundancies.

Moving onto the complex interplay of transformations and Extract, Load, Transform (ELT) tools, Arun shed light on their evolution. He noted that transformation tools, such as DBT, are now included in data warehouse offerings like Snowflake, Google’s data form, and many others. Simultaneously, ELT solutions are expanding their repertoire to include reverse ELT capabilities.

In Arun’s view, this overlap is driving a constant pruning of tools from the data stack. Every tool needs to adapt, evolve and carve its own unique position in the modern data stack to stay relevant. He noted that survival in this space might not always be a given.

Expanding on the idea of consolidation, Arun brought up the evolution of tools that collect events like CDPs. He sees the possibility of these tools offering product analytics, thereby diversifying their functionalities. This continuous absorption of capabilities will lead to the shrinking of data stacks and trigger a battle between composable and single solution platforms.

In Arun’s vision of the future, the role of data warehouses could undergo a fundamental transformation. They could metamorphose into powerhouses of databases with the capability to store vast amounts of data. As this happens, vendors may pivot to building packaged CDPs around these data warehouses, heralding a shift in the industry’s architecture.

Takeaway: Martech is in a state of continuous flux, with data tool overlap and feature consolidation driving a relentless contraction of the data stack. This evolution may culminate in data warehouses morphing into mega-databases, around which packaged CDPs are built. In this scenario, every tool will need to adapt and find its unique niche within the modern data stack to remain relevant. The competitive landscape promises to be intriguing, with potentially transformative effects on the industry.

Martech’s Shift to Warehouse-Native: A Threat to Reverse ETL?

One question looms large: will the move to a warehouse-native approach render reverse ETL and data pipelines redundant? Arun was asked about this potential change and how it could reshape the landscape for data-driven marketers. 

In a scenario where marketing and customer engagement tools sit atop the data warehouse, absorbing information directly from the source, the need for a third-party intermediary like reverse ETL might seem diminished. Arun’s answer to this is as complex as it is insightful.

While he acknowledged that a shift to warehouse-native solutions could, in theory, eliminate the need for reverse ETL, he urged caution. The impetus behind adopting a warehouse-native solution should not solely be to sidestep reverse ETL, he argued. Instead, the focus should be on solving existing issues in the martech stack, enhancing functionalities, and driving more value for the end-users.

Arun referenced companies like Customer.io that have already started incorporating reverse ETL into their platforms. If the motivation behind warehouse-native tools is solely to eliminate reverse ETL, he believes their lifespan in the industry may be short. The way forward, according to Arun, lies in leveraging the power of warehouse-native approaches to overcome limitations of existing tools and to drive innovation in the martech space.

Takeaway: Warehouse-native solutions might theoretically replace the need for reverse ETL, but a successful pivot relies on problem-solving and enhancing existing functionalities. In the evolving martech space, the focus should be on creating value and meeting user needs, rather than simply seeking to disrupt existing processes.

Optimism, Apprehension, and The Pace of AI in Marketing

Arun was asked about one of the hottest topics in marketing today: artificial intelligence (AI). We inquired specifically about the challenges AI and machine learning (ML) pose in potentially transforming or even replacing current marketing roles. This is a topic that resonates with many early-stage marketers who find themselves caught between fear and optimism.

Arun’s viewpoint on this topic provides clarity, drawing from his understanding of the field. He referenced a past podcast episode where the host mentioned the threat of rapid AI innovation. Indeed, one significant worry is the potential obsolescence of innovations built on older AI models like GPT-2 as newer versions, such as GPT-5 or GPT-6, are developed.

From Arun’s perspective, this highlights a fundamental limitation of AI. While current AI models, like transformers, can solve problems related to natural language processing, the road to artificial general intelligence (AGI) is longer and more complex. AGI refers to a type of AI that can understand, learn, and apply knowledge across a wide range of tasks, closely mirroring human intelligence.

When it comes to significant innovation in AI, Arun believes it’s largely confined to big tech giants like OpenAI, Google, and Facebook. These entities have enormous resources at their disposal, making it challenging for smaller players to compete. The journey to disruptive AI innovations is arduous, and in Arun’s view, it’s likely to take more time than most people anticipate.

While acknowledging ongoing AI innovations within many companies’ internal systems, including marketing tools, Arun expressed skepticism about their sustainability. He pointed out that once the tech giants develop AGI, smaller innovations might be overshadowed or rendered obsolete. Therein lies a key issue: who should be innovating in AI, and why, given the rapid pace of change and the dominance of major players?

Arun concluded by reflecting on the future of AI, considering it entirely possible that many roles, even podcast hosts, could be replaced by AI. But the crux of the matter, he stressed, is determining who will lead these innovations and when.

Takeaway: While the advent of AI brings with it both challenges and opportunities, its path to replacing current marketing roles isn’t straightforward. The question isn’t merely what AI can achieve but who’s driving its evolution and at what pace. Future changes will depend on advances in AGI and the balancing act between smaller innovations and the dominance of tech giants in the field.

Striking Balance in Success and Happiness: Insights from a Founder’s Journey

When asked about how he strikes a balance between all his roles – a CEO, co-founder, cricket fan, movie buff, and dad – and still manages to stay happy and successful in his career, Arun had some insightful thoughts to share.

Arun recognized early on that a significant portion of the entrepreneurial journey involves learning from failures. He recalled a moment when a series of potential customers did not convert, sending him into a state of disappointment. However, he learned to adjust his mindset, understanding that things won’t always go as planned, particularly for first-time founders.

Today, he approaches his work with a different attitude, seeing each customer as an opportunity. If one doesn’t convert, he simply moves on to the next, a sign of maturity and growth that has evolved over time.

Yet, the life of an entrepreneur can be stressful and hectic. Arun has found ways to alleviate the stress and maintain balance. He spends quality time with his young daughter and friends. He recognizes that while his work is an important part of his life and career, it’s crucial to separate personal life from professional pursuits. This, he believes, helps him to stay productive and manage his stress better.

Takeaway: Success in one’s career, especially in the high-stakes world of startups, isn’t just about achieving business milestones. It’s also about maintaining a healthy balance between work and personal life, learning from failures, and evolving as an individual. Arun’s story underlines the importance of these facets in fostering both happiness and success.

Embracing New Tech Even if It’s Complicated

When asked if there was anything he wanted to share with the audience, Arun urged everyone to be more open to the world around them, implying that not everything is as intimidating as it might seem at first glance.

Arun acknowledged the common misconceptions about tech, specifically around emerging fields like warehouse-native technology. However, he strongly encourages a more open-minded exploration of these advancements. He believes that understanding their potential and pushing past the initial fear and doubt is key.

Highlighting his own faith in these technologies, Arun mentioned how he would not have left his high-paying job to pursue this venture if he did not believe in its feasibility. His faith in the technology’s potential isn’t based on blind optimism but a firm belief in its transformative possibilities.

Takeaway: The world of tech, particularly emerging fields, can seem daunting. But Arun’s experience underscores the importance of staying open-minded and pushing past initial misconceptions. Embracing change, after all, is often the first step to leveraging the opportunities that accompany it.

Episode Recap

Arun provides a straightforward understanding of ‘warehouse-native’ and ‘connected’ in the context of martech. He explains ‘warehouse-native’ as akin to Snowflake’s model, where all processes run on the data, highlighting its potential to turn data into valuable insights for powerful reporting and analytics. The ‘connected’ approach is where data resides in a warehouse, with computations happening on internal clusters, enhancing security and reducing data movement.

However, Arun doesn’t pigeonhole Castled.io as either; instead, he underlines the importance of catering to specific customer needs and balancing flexibility to allow for real-time and transactional use cases.

Addressing traditional martech’s challenges, Arun speaks of a significant mismatch between a company’s customer base and the value derived. Castled.io proposes a solution through its unique pricing model that correlates with the number of team members using the tool rather than data volume or compute capacity. Additionally, he tackles the issue of data look-back periods, emphasizing the benefits of having access to past data for strategic planning. In contrast to traditional martech that makes marketers dependent on data engineers, Castled.io offers immediate access to data stored in the warehouse.

Acknowledging the challenges of a warehouse-native approach, Arun discusses real-time data, compute charges, schema ownership, and DWH access. He offers solutions like leveraging APIs and strategic data management to address real-time data challenges, and optimized SQL queries and warehouse-native applications to minimize and manage compute charges. Arun believes that despite the costs, the value a data warehouse brings, especially for marketing, is immense.

The shift towards warehouse-native martech raises questions about the future of reverse ETL and data pipelines. While acknowledging the potential redundancy of reverse ETL in a warehouse-native scenario, Arun stresses that the adoption of warehouse-native solutions should focus on solving existing martech issues, enhancing functionalities, and driving value for end-users. The key to the future, according to Arun, lies in leveraging warehouse-native approaches to overcome existing tool limitations and drive innovation in the martech space.

Finally, Arun encourages openness towards new technology, despite its complexity. He dispels misconceptions about emerging tech like warehouse-native solutions and encourages pushing past initial fears and doubts to understand their potential. His faith in these technologies is based not on blind optimism but a conviction in their transformative possibilities.

Listen below for a comprehensive, nuanced, and accessible journey through the world of warehouse-native martech, composable CDPs and reverse ETL. 🎧

Follow Arun and Castled 👇


Intro music by Wowa via Unminus
Cover art created with Midjourney

See all episodes

📫 Never miss an episode or key takeaway 💡

By subscribing to our newsletter we’ll only send you an email when we drop a new episode, usually on Tuesday mornings ☕️ and we’ll give you a summary and key takeaways.

Success! You're on the list.

Future-proofing the humans behind the tech

Leave a Reply