212: Tobias Konitzer: The Causal AI revolution and the boomerang effect in marketing decision science

Image of Anna Aubuchon, VP of Operations at Civic Technologies, discussing AI and operational efficiency.

What’s up everyone, today we have the pleasure of chatting with Tobias Konitzer, VP of AI at GrowthLoop.

Summary: Tobi challenged marketing’s fixation on prediction. He has built highly accurate LTV models, but accuracy alone does not move revenue. Marketing is intervention. Correlation shows patterns; causality tells you what happens when you pull a lever. That shift reshapes experimentation, explains why dynamic allocation can outperform static A B tests, and highlights how self learning systems can backfire or get stuck in local maxima. It also fuels his skepticism of unleashing agentic AI on historical data without a causal layer. If you want to change outcomes instead of forecast them, your systems need to understand levers and log decisions you can actually audit.

In this Episode…

Recommended Martech Tools + Services 🛠️

We only partner with products and agencies that are chosen and vetted by us. If you’re interested in partnering, reach out here.

🔌GrowthBench: Twilio’s top-tier consulting partner, turning your Twilio investment into a customer engagement engine

🦸 RevenueHero: Automates lead qualification, routing, and scheduling to connect prospects with the right rep faster, easier and without back-and-forth.

🎨 Knak: Go from idea to on-brand email and landing pages in minutes, using AI where it actually matters.

📧 MoEngage: Customer engagement platform that executes cross-channel campaigns and automates personalized experiences based on behavior.

About Tobi

A man with glasses smiling in front of a vibrant, colorful background featuring swirling clouds and lightning bolts, with an arrow pointing forward.

Tobias Konitzer, PhD is VP of AI at GrowthLoop, where he’s chasing closed-loop marketing powered by reinforcement learning, causality, and agentic systems. He’s spent the past decade focused on one core problem: moving beyond prediction to actually influencing outcomes.

Previously, Tobi was Chief Innovation Officer at Fenix Commerce, helping major eCommerce brands modernize checkout and delivery with machine learning. He also founded Ocurate, a venture-backed startup that predicted customer lifetime value to optimize ad bidding in real time, raising $5.5M and scaling to $500K+ ARR before its acquisition. Earlier, he co-founded PredictWise, building psychographic and behavioral targeting models that drove over $2M in revenue.

Tobi earned his PhD in Computational Social Science from Stanford and worked at Facebook Research on large-scale ML and bias correction. Originally from Germany and based in the Bay Area since 2013, he writes frequently about causal thinking, machine decisioning, and the future of marketing.

Why Predictive Models Fail Without Causal Inference

A young woman in a hoodie thoughtfully gazes at a glowing crystal ball in a dimly lit room filled with art supplies and mystical elements.

Prediction dominates most marketing roadmaps. Teams invest months refining churn models, tightening confidence intervals, and debating which threshold deserves a campaign. Tobi built an entire company on that logic. His team produced highly accurate lifetime value predictions using deep learning and granular event data. The forecasts were sharp. The lift curves were clean. Buyers were impressed.

Then lifecycle marketers asked a more uncomfortable question: what action should follow the score?

A predictive model encodes the current trajectory of a customer under existing policies. It describes what will likely happen if nothing changes. Marketing changes things constantly. The moment you intervene, you alter the system that generated the prediction. The forecast reflects yesterday’s conditions, not tomorrow’s strategy.

> “Prediction tells you the future if you do nothing. Causation tells you how to change it.”

Consider the Prediction Trap.

Infographic comparing two approaches to sales prediction: 'The Prediction Trap' focuses on observation and identifies churn risk, while 'The Causal Unlock' emphasizes intervention to influence buyer behavior.

Let’s say your propensity model is asking if a user will buy if we do nothing else. The function is observation. The outcome is a description of what happens if you leave the system untouched. The new standard though is pulling the lever. The function is intervention. The outcome is directional change, we’re asking what we can do to make them buy.

Prediction thinking centers on segmentation:

  • Who is likely to churn?
  • Who is likely to buy?
  • Who looks like high LTV?

Causal thinking centers on levers:

  • Which incentive reduces churn?
  • Which sequence increases repeat purchase?
  • Which offer raises lifetime value incrementally?

Tobi often uses an LTV example to expose the trap. Suppose high LTV customers frequently viewed a specific product early in their journey. A team might redesign the onboarding flow to feature that product more aggressively. The correlation looks persuasive. The causal effect remains unknown.

Several alternative explanations could drive the pattern:

  • The product may correlate with a specific acquisition channel.
  • The product may have been highlighted during a limited campaign.
  • The product view may signal prior brand familiarity.

Only an intervention test can estimate incremental impact. Correlation can guide hypothesis generation, but it cannot validate the lever itself.

Tobi also highlights a deeper issue. Acting on predictions introduces compounding uncertainty across multiple layers:

  1. The predictive model carries statistical variance.
  2. The translation from model features to campaign strategy introduces interpretation bias.
  3. The experiment introduces sampling error.
  4. Execution introduces operational noise.

Each layer adds variability. When teams treat prediction accuracy as the goal, they lose visibility into where uncertainty enters the system. When teams focus on intervention impact, they concentrate measurement on the lever that drives revenue.

Boardrooms already operate in causal language. Incremental ROI is causal. Budget allocation is causal. Executives care about what caused growth, not which segment looked promising in a dashboard. Prediction can inform prioritization. Causal inference determines what to scale.

If you want to move in that direction, adjust your operating model:

  1. Start every initiative with a controllable lever.
    Define the action before defining the segment.
  2. Design experiments that isolate the incremental effect of that lever.
    Randomized or adaptive allocation both estimate causal lift.
  3. Report impact in revenue, retention, or contribution margin.
    Tie every experiment to a business outcome.
  4. Document assumptions and uncertainty.
    Build institutional memory around what caused change.

Prediction remains useful. Intervention drives growth. Teams that understand that distinction build systems that learn through action instead of watching the future unfold from the sidelines.

Key takeaway: Anchor your marketing engine in causal experiments. For every predictive score, define the specific action it informs, test that action against a control, and quantify incremental lift tied directly to revenue or retention. Replace segment rankings with lever performance dashboards that show effect size, confidence, and business impact. When every campaign answers the question “What did this intervention cause?” your team shifts from observing trajectories to shaping them.

Back to the top ⬆️

How to Validate Causal Impact on Customer Lifetime Value

Illustration of a stack of coins with one coin being lifted by a mechanical claw, surrounded by an electric aura, symbolizing wealth and prosperity.

Most teams treat high LTV segments as proof of where to spend. The model ranks customers. The top decile looks profitable. Budget flows upward. Tobi described asking the head of CRM at a billion dollar outdoor brand what he does when a model predicts someone will be high LTV. The answer came instantly: Spend more on them, no?

That instinct feels responsible but it also confuses observation with intervention. Introducing the high LTV Fallacy:

Graph depicting the 'High LTV' Fallacy, showing the relationship between marketing spend and revenue. Highlights a segment of high LTV customers, illustrating that correlation does not imply causation and discussing marketing strategies.

On the right side of the chart, you see a dense cluster labeled high LTV customers. Revenue increases with marketing spend. The correlation line slopes upward. It looks clean and convincing. They were going to buy anyway. That cluster may represent customers with higher income, stronger brand affinity, or deeper preexisting intent. Increasing spend toward them can inflate reported revenue while adding little incremental value.

Tobi shared a simple example that makes the risk concrete. Suppose most high LTV customers viewed a specific pair of jeans early in their journey. You decide to feature that product prominently in onboarding. You increase paid spend that drives traffic to that item. Revenue from that segment holds steady or even rises. The story writes itself. The jeans must drive lifetime value.

> “If you tune that lever, will you get a causal outcome? The answer is you don’t know.”

High LTV customers might have entered through a wealthier audience pool. They might have converted regardless of which product they saw. When you push spend toward people who were already going to buy, you shift budget without shifting behavior. The chart’s arrow pointing upward masks that reality. The revenue line moves. The counterfactual remains invisible.

If you want LTV to function as a decision tool instead of a narrative device, your workflow needs to focus on levers. That means disciplined experimentation:

  1. Select one controllable intervention tied to the segment, such as product exposure, incentive size, or message framing.
  2. Randomly assign comparable users to receive or not receive that intervention.
  3. Measure incremental lifetime value against a holdout over a defined period.
  4. Promote only the interventions that generate statistically credible lift.

You build a library of proven actions. You retire interventions that produce movement in dashboards without movement in behavior. Over time, your organization learns which levers actually create value and which simply correlate with it.

Boards increasingly ask causal questions framed as ROI. They want to know what changed because of your action. Segment labels answer who looks valuable. Controlled interventions answer what creates value.

Key takeaway: When your model flags a high LTV segment, treat it as a starting hypothesis. Design a single randomized test that isolates one lever for that group, measure incremental lifetime value against a holdout, and document the effect size. Scale only the interventions that produce measurable lift. Repeat this cycle consistently. You will replace budget reallocation based on correlation with a growing portfolio of verified revenue levers that compound over time.

Back to the top ⬆️

Reducing Uncertainty Around Causal Effects by Optimizing Levers, Not Labels

A vibrant illustration of a pathway through colorful clouds leading to a bright light source, with shades of purple, blue, and orange creating a dramatic sky.

Marketing teams often pour their energy into tightening prediction intervals around user labels. They want to say, with statistical confidence, that a specific user will churn or upgrade. They debate model accuracy and celebrate incremental gains in precision. The organization feels progress because the numbers look sharper.

Tobi redirects the spotlight to the lever. Marketing changes outcomes by acting. The central uncertainty sits in the effect of that action, not in the purity of the label.

> “What happens to customer X with that intervention, and what happens to customer X if you had not exposed them to that intervention? That is an unanswerable question at the end of the day.”

You observe one trajectory. The counterfactual remains hidden. That reality defines causal inference, and it defines marketing. A churn score forecasts the future under the current system. A treatment effect estimate quantifies how the future shifts when you pull a lever.

Infographic comparing two approaches in marketing: 'Levers' (what we do) versus 'Labels' (who they are), highlighting the importance of focusing on intervention outcomes instead of user predictions.

On the left side, uncertainty clusters around labels, who the user is. The old goal tightens prediction probability, for example becoming 90 percent confident that a user will churn. On the right side, uncertainty concentrates around levers, what you do. The new goal tightens the intervention outcome, for example estimating the impact of Message A versus Message B. The arrow pushes uncertainty toward the action.

Most organizations still operate with a layered pipeline:

  1. Train a predictive model that scores users.
  2. Extract correlated traits from high scoring segments.
  3. Translate those correlations into campaign ideas.
  4. Run experiments to validate the ideas.

Each layer introduces variance. The predictive model carries statistical uncertainty. The mapping from correlation to intervention introduces interpretation risk. The experiment introduces sampling noise. When these layers stack, uncertainty compounds across steps. The final decision rests on a chain of assumptions.

Tobi argues for a top down discipline. Start with the outcome metric that matters, such as incremental LTV over ninety days. Frame every model around estimating the causal lift of specific interventions on that metric. Treat predictions as intermediate signals. Treat estimated treatment effects as the primary object of optimization.

When you operate with that mindset, your questions change. You ask how much incremental margin comes from a discount for mid tier customers. You estimate how churn shifts when you adjust message timing. You quantify the distribution of effect sizes for each intervention. You update those estimates continuously as new data arrives. Your uncertainty lives in the spread of expected lift, not in the label assigned to a user.

This structure aligns with how executives evaluate marketing performance. Finance teams care about incremental revenue. Boards care about what changed because marketing acted. Effect estimation connects your modeling work directly to those conversations.

If you run lifecycle or growth today, you can implement this shift with discipline:

  • Define one primary outcome metric, such as incremental contribution margin per user.
  • Enumerate the interventions you can deploy over the next quarter.
  • Design experiments that estimate incremental lift per intervention.
  • Track confidence intervals around each effect size.
  • Reallocate traffic and budget toward interventions with the strongest expected lift while maintaining measured exploration.

This operating model feels more demanding at first because it forces you to connect action and outcome explicitly. Over time, it builds a coherent system where every experiment updates the same objective. Your uncertainty becomes structured and cumulative rather than scattered across disconnected models.

Prediction accuracy produces clean dashboards. Treatment effect estimation produces revenue movement.

Key takeaway: Shift your analytics from labeling users to estimating the causal impact of your actions. Define a single revenue outcome, list the levers you control, measure incremental lift for each lever, and allocate resources based on expected effect size and confidence intervals. Concentrating uncertainty around interventions creates a direct, measurable path from experiment design to financial results.

Back to the top ⬆️

Why Dynamic Allocation Works Better Than Fixed Horizon A/B Testing

A colorful robotic arm manipulating a vintage scale with gold bars and a digital display, set against a vibrant yellow background.

Dynamic allocation changes the economics of experimentation. In a fixed horizon A/B test, you split traffic 50 50, wait for significance, then scale the winner. The structure feels clean. You get a tidy chart. Variant A beat Variant B. Everyone nods.

Meanwhile, half your users keep seeing the weaker experience while you wait.

Tobi describes that as an efficiency problem hiding in plain sight. If your job is to maximize LTV, revenue per user, or contribution margin, every unnecessary exposure to the weaker variant has a cost. You feel it in the weekly numbers, even if the slide deck looks disciplined.

Dynamic allocation, often implemented through contextual multi armed bandits, behaves differently:

  1. It starts with random assignment because you need unbiased signal.
  2. It reallocates traffic toward the better performing variant as data accumulates.
  3. It preserves controlled randomness so learning continues while optimization accelerates.

That system optimizes while it learns. It does not wait for a ceremonial end date.

“You can learn and you can optimize, but these things trade off each other.”

Academic work, including research by Garivier and Kaufmann presented at NeurIPS, demonstrates that fixed time experimentation followed by scaling underperforms dynamic allocation. The compounding effect of reallocating traffic early drives higher cumulative reward. For a lifecycle team chasing LTV, that difference is material.

But it’s not as simple to explain.

A comparison between A/B testing and Bandit algorithms, highlighting the trade-off between explainability and optimization in marketing strategies. The A/B testing section shows a simple bar chart indicating that option A is better than B, while the Bandit section features a dynamic graph representing higher potential returns but increased complexity.

On the left, you see the old world. A bar chart. A beat B. Easy to explain. The narrative fits neatly into a CFO update.

On the right, you see flows bending over time. Traffic shifts. Variants rise and fall dynamically. Harder to narrate. Higher total reward.

That tradeoff is psychological.

Large, sophisticated companies understand the math. They still choose fixed horizon tests because the story is easier to defend in a boardroom. A temporally separated “test phase” and “rollout phase” maps to how humans reason about cause and effect. Dynamic allocation requires comfort with probability distributions that update continuously. Many executives prefer a clean headline over a shifting curve.

Tobi sees this as the real boundary of decisioning science. Your model can be optimal. Your organization still needs to adopt it. If a bandit reallocates traffic in real time but leadership shuts it down because it feels opaque, the revenue gain never materializes. Optimization in code must pair with explainability in narrative.

If you own experimentation, your mandate includes both.

Key takeaway: Dynamic allocation works because it reallocates traffic toward higher performing variants as evidence accumulates, which increases cumulative revenue instead of freezing half your audience in a weaker experience. When you shift the conversation from “Did A beat B?” to “Did we maximize total reward while learning?”, you align optimization with executive decision making and unlock compounding gains.

Back to the top ⬆️

Agentic AI Risks

The Boomerang Effect and Why Uninformed AI Sabotages Early Results

A colorful abstract design featuring swirling arrows in blue, yellow, and purple, surrounded by dynamic shapes and textures, creating a sense of movement and energy.

Reinforcement learning systems in marketing often begin with random initialization. Teams define a small set of treatments, activate dynamic allocation, and expect performance to improve as data accumulates. Tobi focuses on the earlier step that most teams gloss over. He asks who defined the treatment universe and what causal evidence supports those choices.

In many organizations, that initial set comes from instinct and pattern recognition. A marketer proposes a handwritten birthday note for dog owners. Another suggests a grooming coupon. Someone else recommends a dog walker voucher because high value customers often engage in pet related activities. Each idea correlates with strong spenders. None of those correlations guarantee a positive causal effect.

When those treatments enter a reinforcement learning loop without causal priors, the system explores them at scale. Some ideas will backfire. A dog walker voucher can feel intrusive. A customer can interpret it as an assumption about their personal life. That customer may reduce spending or leave entirely. The algorithm detects the negative signal over time and shifts traffic away, but revenue absorbs the damage during the learning phase.

A graph titled 'The Boomerang Effect: Why Autonomous Systems Fail' showing ROI/Lift over time. The curve illustrates initial drop in ROI due to uninformed learning, followed by a recovery phase. Key points highlight random initialization, cost of learning, and the guardrail paradox.

The pattern shown above reflects a common trajectory:

  1. The system starts blind and distributes traffic across treatments without informed priors.
  2. ROI drops below baseline as uninformed decisions generate negative lift.
  3. The model reallocates traffic as feedback accumulates and performance recovers.

Tobi has observed this dynamic repeatedly in lifetime value programs. Early decisions depress LTV because the system lacks structured knowledge about which interventions are directionally sound. He describes this as the boomerang effect, where uninformed exploration produces a measurable dip before the model converges.

> “Your initial idea backfires causally. The system will learn, but in the meantime you have a lowering of LTV.”

The cold start problem compounds this risk. Reinforcement learning optimizes within the set of treatments you provide. If that set is narrow or poorly grounded, the system converges on the best option available inside that limited space. It cannot identify a superior intervention that was never proposed. Optimization amplifies the quality of your starting assumptions.

Tobi proposes a structural solution in the form of a Causal Customer Context Graph, something his colleague Anthony previously explored. This graph logs prior experiments with randomized assignment and explicit counterfactual outcomes. Each treatment is stored alongside the outcome of comparable customers who did not receive that intervention. Creative context, channel, timing, and other attributes are embedded so the system understands similarity across experiments. The graph encodes causal memory rather than raw correlation.

When you initialize reinforcement learning with that structured history, early allocation reflects informed priors instead of random guesses. Exploration still occurs. The system still balances learning and revenue. However, the starting point incorporates proven causal effects from related contexts. Revenue volatility decreases because the model no longer pays for ignorance with your top line.

Operationally, you can implement this in three concrete steps:

  1. Log every experiment with randomized control and treated groups, and store the counterfactual outcome alongside the observed outcome.
  2. Capture treatment metadata such as creative elements, offer type, channel, and timing so similarity between interventions is measurable.
  3. Initialize new decisioning systems with weighted priors derived from prior causal lift rather than equal traffic splits across all arms.

These steps compress the depth and duration of the negative ROI dip. You still explore. You still converge. You do so with a memory layer that encodes what has previously moved the needle.

Reinforcement learning systems can generate durable gains in LTV. They can also degrade performance during early learning if they start without causal structure. The boomerang effect reflects design choices, not randomness.

Key takeaway: Random initialization in reinforcement learning introduces a predictable window of negative ROI because early decisions lack causal grounding. Before activating dynamic allocation, build a causal memory layer that stores randomized experiment results with counterfactual comparisons and rich treatment context. Initialize traffic using priors derived from demonstrated causal lift. When early exploration is informed by structured causal history, you reduce revenue drawdowns and accelerate convergence toward interventions that meaningfully increase lifetime value.

Back to the top ⬆️

Escaping Local Maxima and The Failure of Randomly Initialized Decisioning

A vibrant landscape featuring majestic mountains with snow-capped peaks, colorful clouds in shades of purple and pink, and a large yellow sun setting in the background.

Reinforcement learning systems optimize exactly what you tell them to optimize. They begin with random initialization, distribute traffic across a fixed set of interventions, and then reallocate traffic toward the arm that shows the highest return. The mechanism works efficiently. The constraint sits in the boundaries you define.

> “It ultimately allocates the traffic to the intervention that maximizes returns, which is good, but by definition, that is a local optimum.”

The system evaluates a closed universe of treatments. It does not imagine new ones. If you define three arms such as a grooming discount, a handwritten birthday card, and a dog walker gift certificate, the algorithm will converge on the best performer among those three. It cannot evaluate a subscription bundle, a dynamic pricing model, or a loyalty tier unless you explicitly introduce them. Optimization always occurs inside the search space you construct.

We had a fantastic episode with the Chief Growth Officer at WealthSimple, Simon Lejeune, where he explains local maxima and says understanding it is one of the most important things for a growth marketer. 

3D graph depicting the concept of local maxima versus global maxima in the context of vanity metrics, with labeled issues and results related to optimization and system learning.

The small hill represents a local maximum driven by clicks and opens. On the right, the larger peak represents a global maximum tied to revenue and lifetime value. A reinforcement learning system often converges on the smaller hill because shallow metrics move quickly and provide fast feedback. Revenue and lifetime value evolve more slowly and require longer measurement windows. When you optimize for speed of signal rather than depth of value, the algorithm stabilizes on the closest measurable lift.

For our lifecycle friends: let’s say a team launches a contextual bandit to optimize email creative. Early data shows that subject line A lifts open rate by four percent. Traffic shifts heavily toward that variant. The dashboard stabilizes and variance drops. Internal stakeholders see consistency and assume progress. Meanwhile, the business metric that matters, incremental revenue per user, moves marginally. The system converged efficiently on engagement rather than durable value because engagement was the defined reward.

The local maximum problem rests on three inputs that you control:

  1. The candidate interventions you propose.
  2. The reward metric you optimize.
  3. The exploration budget you tolerate before concentrating traffic.

Each input narrows or expands the mountain range. A narrow intervention set produces shallow peaks. A vanity metric compresses the landscape into small hills. Limited exploration accelerates convergence before meaningful alternatives receive evaluation.

Tobi frames this as a structural trade off between exploration and exploitation. Every reinforcement learning system must decide how aggressively to scale a winning arm while still reserving traffic to test alternatives. Full exploration across all possible interventions is computationally and operationally infeasible. Full exploitation on the first positive signal locks the system into a constrained optimum. Marketing leaders must decide where to sit on that spectrum.

You can diagnose whether your system is stuck by running a disciplined review:

  • List every active arm in production and assess whether each intervention meaningfully impacts revenue or lifetime value.
  • Trace the reward function to confirm whether the algorithm optimizes incremental business outcomes rather than proxy engagement metrics.
  • Measure the percentage of traffic reserved for exploration and evaluate whether new interventions enter the system on a regular cadence.

These steps expose whether your model climbs the tallest mountain available or stabilizes on the nearest measurable bump.

Key takeaway: Reinforcement learning converges on local maxima when you restrict the intervention set and optimize shallow metrics such as clicks or opens. Define your reward as incremental revenue or lifetime value, introduce materially new interventions on a fixed schedule, and protect exploration traffic before concentrating on a winner. When you expand the search space and anchor the objective to business value, the algorithm can pursue higher peaks rather than settling for the closest hill.

Back to the top ⬆️

Why Agentic AI Trained on Data Warehouse Correlations Reinforces Bias

A colorful illustration of a robot navigating through a chaotic warehouse filled with stacks of papers and flying documents in a vibrant, animated style.

Many teams now connect agentic AI directly to their data warehouse and expect intelligent decisions to follow. The reasoning feels efficient. The warehouse holds centralized data. The schemas look clean. The dashboards update in real time. The assumption is that an agent trained on this foundation will discover growth opportunities on its own.

Tobi views that assumption as risky because a warehouse reflects historical patterns, not verified causes. Your warehouse captures what happened under prior targeting rules, budget allocations, and creative strategies. It encodes associations that formed under those conditions. An agent trained on that dataset will internalize those associations and optimize around them.

Infographic explaining the 'Agentic' Trap in AI, highlighting the dangers of biased data in automated decision-making processes.

The warehouse feeds the agent. The agent drives marketing actions. Those actions generate new data. That data flows back into the warehouse, shaped by the agent’s previous decisions. The loop reinforces whatever bias existed in the starting data.

Tobi often uses a simple LTV example to make the risk concrete. Imagine your data shows that customers who view “Style X” tend to have higher lifetime value. An agent trained purely on correlation will promote Style X more aggressively. It will allocate impressions, email placements, and homepage real estate toward that product. The system sees a pattern and increases exposure.

The model does not evaluate whether viewing Style X caused higher lifetime value. It sees co occurrence and treats it as signal. If high intent buyers simply happened to browse that product first, the agent will still amplify it.

> “Agentic can accelerate the things that are good. It will also accelerate the things that are bad. And it doesn’t distinguish between the two if you don’t.”

That statement carries operational weight. Acceleration multiplies both productive and destructive decisions. A flawed correlational belief becomes a scaled policy. Over time, your warehouse fills with new data shaped by those amplified actions. The signal becomes self reinforcing. Revenue may rise in narrow pockets while customer quality erodes elsewhere.

Auditability adds another layer of concern. Reinforcement learning systems provide traceable probability distributions over actions and observable reward updates. You can inspect why a specific allocation occurred at a specific time. A free roaming agent trained on warehouse correlations offers far less structured transparency. When performance shifts, the reasoning path is harder to reconstruct. Leadership will ask for a defensible explanation.

Tobi argues that you need a causal layer between the warehouse and the agent. That layer records interventions as interventions and links them to measured effects. It distinguishes between:

  • Observed behavioral patterns.
  • Experimentally validated treatment effects.
  • Contextual variables that modify those effects.

You can build this in stages.

  1. Create a dedicated experiment ledger that logs treatment definitions, control definitions, and measured lift.
  2. Store estimated treatment effects in a structured table accessible to your decisioning systems.
  3. Connect your agent to this curated causal dataset before exposing it to the broader warehouse.
  4. Update the ledger continuously as new experiments conclude.

That sequence grounds the agent in evidence rather than historical coincidence. The agent can still explore and adapt. It does so within boundaries shaped by causal validation instead of inherited bias.

Key takeaway: Build a causal experiment ledger before connecting agentic AI to your warehouse. Log every intervention, its control condition, and its measured effect in a structured, queryable table. Feed that curated causal dataset into your decisioning system first. Then expand to broader behavioral data. This ordering keeps your agent anchored to validated lift, preserves auditability, and reduces the risk of scaling self reinforcing bias across your revenue engine.

Back to the top ⬆️

The Power of Composable Decisioning

An illustration featuring a central black computer tower surrounded by various vintage and modern computer components, including monitors, keyboards, and processors, set against a vibrant yellow background.

Most decisioning systems still operate on a batch rhythm. Data leaves the warehouse. A model trains in a separate environment. Scores return on a schedule. Marketing teams execute journeys in parallel and hope the model still reflects current behavior. That architecture creates temporal drift and organizational friction.

Tobi argues for collapsing that separation. He frames journeys as causal language. Each step in a journey changes the probability of the next outcome. Learning therefore belongs inside the same environment where journeys execute. When learning and execution share infrastructure, the system updates in the same moment a customer moves.

An infographic illustrating composable causal decisioning, featuring a data warehouse, a cycle labeled 'Learning & Decisioning', and key points about no off-prem training, no black box, and real-time learning.

A data warehouse forms the outer container. Inside it, a circular loop of Learning and Decisioning. 

  • No off prem training.
  • No separate model brain detached from the data.
  • Real time learning as customers experience the journey.

Below that sits the legacy pattern, where data exports to a model training environment and results flow back. Two separate systems coordinate through handoffs. Each handoff introduces latency and misalignment.

Composable decisioning embeds the policy directly inside warehouse infrastructure and existing orchestration tools. Training and traffic allocation run in parallel. Every exposure feeds immediate feedback into the model. Each decision becomes training data for the next decision. The system learns in motion.

Tobi’s perspective carries weight because he has built AI products in isolation before. He has raised capital, hired strong teams, and constructed standalone tools. He now favors retrofitting intelligence into products marketers already use. Marketers think in journeys and orchestration flows. They manage real customers under time pressure. Embedding reinforcement learning into those flows respects how work actually happens.

The architectural shift produces concrete effects:

  • The model updates as soon as outcomes register.
  • Decision logic remains co located with raw behavioral data.
  • Governance improves because data stays inside enterprise boundaries.
  • Context remains intact because features and actions share the same runtime.

That structure addresses the decoupled learning problem at its root. In a traditional setup, the learning pipeline optimizes on historical snapshots while the execution engine operates in the present. In a composable setup, the same loop governs both. The model observes, allocates, measures, and updates within a single environment.

> “Let’s try to get to this parallelism of training and traffic allocation all happens seamlessly in the same breath. Essentially every person who is exposed to this decisioning system will lend intelligence to the system in real time.”

Parallelism changes how you think about optimization. Instead of waiting for retraining cycles, you design decision points inside your journey builder. The reinforcement learning policy allocates traffic dynamically. The warehouse captures outcomes instantly. The policy updates continuously. Marketing becomes a living experiment rather than a sequence of static tests.

Teams running lifecycle or CRM programs can apply this logic immediately. Identify high leverage decision points inside your journeys. Instrument those decisions inside the warehouse. Embed a reinforcement learning policy that updates with each exposure. Measure outcomes in real time and feed them back into the same loop. Continuous learning replaces scheduled retraining.

Key takeaway: Keep training and decisioning in the same runtime as your data and journeys. Embed reinforcement learning inside your warehouse infrastructure so that every customer interaction updates the policy immediately. Co location of data, decisions, and feedback reduces drift, improves governance, and compounds performance over time.

Back to the top ⬆️

How Machine Decisioning Transcends Marketing

Illustration of a robotic head displaying intricate wiring and a visible human brain against a yellow background.

Human decision making carries structural bias because cognition operates through shortcuts, emotion, and context. Tobias anchors this in behavioral economics and the work of Daniel Kahneman, who demonstrated that humans deviate from rational models in predictable ways. Mood influences judgment. Timing influences evaluation. Social cues influence perceived competence. A hiring manager in a positive emotional state may rate a marginal candidate higher. A judge under cognitive fatigue may assign harsher penalties. These patterns replicate across thousands of micro decisions and accumulate into systemic effects.

Tobias frames the core issue as auditability. Human reasoning leaves no durable ledger. When a decision is made, the brain does not record the weighting of variables, the confidence interval around the choice, or the contextual signals that shaped the conclusion. Memory stores a narrative. Narrative rarely captures variance. Organizations then inherit decisions without a clear record of how they were produced.

The visual metaphor Tobias uses captures this structural divide. 

Infographic titled 'Beyond Marketing: Bias, Fairness, and Auditability' illustrating the contrast between human intuition and machine decision-making in optimizing outcomes, with key points about biases and accountability.

One side of a scale holds the dark cloud of human intuition, optimizing for who. The other side holds machine decisioning, optimizing for what. The distinction maps to real behavior. Humans frequently optimize around identity, pedigree, similarity, and narrative coherence. Machines can be designed to optimize around defined outcomes and logged variables. When the objective function is explicit and the data inputs are stored, the reasoning process becomes inspectable.

> “You can go to these systems and say, what did you know at the point you made the decision, and why did you make it, with numbers.”

That capability changes governance. When a model recommends a treatment plan, you can retrieve the features it used and the probabilities it assigned. When a hiring algorithm ranks candidates, you can examine score distributions across demographic groups. When a sentencing model produces risk estimates, you can measure disparate impact across race and income. Bias becomes observable through data rather than inferred from anecdote.

High stakes domains amplify the importance of this structure:

  • In healthcare, treatment allocation models can log inputs, predicted risk, and outcome targets. That record supports periodic fairness audits.
  • In hiring, candidate scoring systems can store feature weights and evaluation thresholds. That record supports bias analysis and model recalibration.
  • In criminal justice, risk assessment tools can archive probability distributions and decision rules. That record supports external review and public accountability.

Delayed feedback complicates design. Recidivism rates and long term health outcomes unfold over years. Reinforcement learning in these environments must account for sparse and unstable reward signals. Tobias views this complexity as a reason to strengthen audit mechanisms rather than weaken them. Logged objectives and stored decision states create institutional memory that persists beyond individual actors.

You can operationalize this mindset in your own systems. Start with governance, not hype. Define explicit objectives. Log model state at inference time. Review outcomes on a fixed cadence.

A practical sequence looks like this:

  1. Define the measurable outcome that guides decisions, such as retention, survival rate, promotion rate, or recidivism.
  2. Log every automated decision with its input features, model parameters, and predicted probabilities at the moment of inference.
  3. Conduct periodic bias audits across protected attributes and compare error rates and outcome distributions.
  4. Adjust objective functions and constraints when disparities exceed defined thresholds.
  5. Document changes so that governance becomes part of organizational infrastructure.

Machine decisioning introduces accountability through structure. Logged data, explicit objectives, and inspectable probabilities create a framework where bias can be measured and managed. Human intuition remains valuable in framing objectives and interpreting results, but system level decisions benefit from auditable logic.

Key takeaway: Define explicit outcome targets, log model inputs and probabilities at every decision point, and review performance across demographic groups on a scheduled cadence; auditable machine decisioning converts bias from an invisible cognitive habit into a measurable system variable that can be governed and improved over time.

Back to the top ⬆️

Why Clear Priority Hierarchies Improve Executive Decision Making

A family of four standing in a field, looking up at a large, bright yellow clock amidst a colorful sky, with a small house in the background.

Energy is a finite resource, and senior operators burn through it faster than they realize. Tobi spends his days building reinforcement learning systems, debating causal inference, and designing machine decisioning frameworks that optimize outcomes at scale. When the conversation turns to his personal system for deciding what deserves his energy, the answer is immediate.

> “My girls always deserve my attention. If there’s any question there, it always goes in favor of the girls.”

That statement functions as a hard constraint in his life. It sits above product roadmaps, board meetings, and AI research. It eliminates ambiguity before it appears. Many leaders talk about balance, yet they treat priorities as fluid. Tobi treats them as ordered. When two commitments collide, the hierarchy resolves the conflict without drama.

He frames the rest of his work through a pragmatic lens. He thinks deeply about causal models, marketing effectiveness, and civilizational progress. He also recognizes the scope of his domain. “We do marketing for the internet,” he says. That sentence reduces ego and sharpens perspective. Marketing can create value, fund teams, and improve customer experience. It carries bounded risk and bounded consequence. That framing regulates stress and clarifies trade offs.

His professional focus on bias and noise shows up at home in subtle ways. He studies how human decision making drifts from rationality. He designs systems that log decisions, audit assumptions, and reduce variance. Then he watches his children develop preferences, reactions, and early heuristics in real time. He sees how quickly patterns form. He sees how easily adults cement them. That observation reinforces his belief that humans carry structural bias. It also grounds him. Family becomes the context that keeps his ambition proportional.

If you examine his system, you can break it into practical components:

  1. Establish a non negotiable priority at the top of your hierarchy.
  2. Define a moral baseline for your work, such as do no harm and net positive contribution.
  3. Assess the real stakes of your domain before amplifying stress.
  4. Accept that optimization at work exists within a broader life context.

These rules create psychological margin. When your hierarchy is explicit, you stop renegotiating it every week. When your moral baseline is defined, you stop chasing every high status opportunity. When you calibrate stakes accurately, you reduce artificial urgency.

Many high achievers live in a state of constant escalation. Every message feels critical. Every missed opportunity feels irreversible. A clear hierarchy lowers that internal noise. It turns energy allocation into a structured decision rather than an emotional reaction.

Key takeaway: Write down your personal priority stack in explicit order and treat the top item as a fixed constraint. Define a simple moral filter for your work and evaluate projects against it before committing. Calibrate your stress to the true stakes of your domain instead of the perceived urgency around you. Clear hierarchies reduce decision fatigue, protect your energy, and keep ambition aligned with what matters most.

Back to the top ⬆️

Episode Recap

Cover of 'Humans of MarTech' featuring a man with glasses and a blue shirt, surrounded by vibrant, colorful illustrations of nature and lightning.

Tobi came in swinging at something most of us have quietly accepted. Marketing teams obsess over prediction. Who will churn. Who will become high LTV. We celebrate model accuracy, then sit there unsure what to actually do with it. He has built those systems himself. They were precise. They were impressive. They still left marketers staring at a dashboard asking what now.

Marketing should be about intervention. You are not paid to forecast the status quo. You are paid to change it. Correlation tells you what high value customers tend to do. Causality tells you what happens when you push a lever. If your best customers all looked at a certain product, forcing that product on everyone else might tank performance. Patterns are not prescriptions.

That logic carried into experimentation. Fixed A/B tests feel safe and clean. You split traffic, wait, declare a winner. Dynamic allocation does something different. It shifts traffic toward what works in real time while still exploring. You learn and optimize at the same time. The math favors it. The politics often do not. Executives understand 50 50 splits. Fewer understand bandits.

He also warned about the boomerang effect. Self learning systems start blind. If your initial ideas are weak or intrusive, revenue drops before the model corrects itself. Reinforcement learning can also get stuck in local maxima, doubling down on something decent while a better option sits undiscovered. Exploration and optimization always fight each other.

That tension explains his skepticism about unleashing agentic AI on a warehouse full of correlations. An autonomous system will happily scale whatever looks strong in historical data, even if it is causally wrong. He wants a causal memory layer, a structured record of what interventions truly moved outcomes, so systems start informed instead of guessing.

Underneath all of it sits a bigger belief. Machine decisioning can be audited. Human decisioning cannot. If you can inspect the probabilities and reward estimates behind a choice, you can challenge it. You can improve it. Prediction describes the world. Causal systems try to move it. That is the line he keeps drawing.

Listen to the full episode ⬇️ or Back to the top ⬆️

Image of Anna Aubuchon, VP of Operations at Civic Technologies, discussing AI and operational efficiency.

Follow Tobi 👇

✌️


Intro music by Wowa via Unminus
Cover art created with Midjourney (check out how)

Find your next episode
Semantic search across every episode transcript. The more context you give, the better the match.
Find my episodes →

All categories

Monthly archives

See all episodes

Future-proofing the humans behind the tech

Leave a Reply