Skip to content

Your ML telemetry should live in your data stack, not ours

Every ML observability tool eventually ships an export feature. That is a sign the architecture was wrong from the start.


Your classifier's accuracy dropped last Tuesday. To figure out why, you need data from:

  • Inference telemetry (confidence, latency, drift) in your observability tool
  • Ground truth labels in your label store or data warehouse
  • Training dataset metadata in your feature store or data version control
  • Business impact in your CRM or analytics database

To answer "why did accuracy drop" you need a JOIN:

sql
-- Is model degradation affecting conversion?
SELECT
    date_trunc('day', e.event_timestamp) AS day,
    e.model_name,
    avg(e.confidence) AS avg_confidence,
    percentile_cont(0.95) WITHIN GROUP (
        ORDER BY e.duration_ms) AS p95_latency_ms,
    avg(o.converted::int) AS conversion_rate,
    count(*) AS inference_count
FROM wildedge.raw_events e
JOIN warehouse.user_outcomes o
    ON e.session_id = o.session_id
WHERE e.event_type = 'inference'
  AND e.event_day >= current_date - interval '30 days'
GROUP BY 1, 2
ORDER BY 1 DESC, 2;

No ML observability vendor offers this today. Their storage architecture prevents it.

So you build a nightly export job: pull telemetry via REST API, dump to object storage, load into Snowflake. Now you have a fragile pipeline that runs at midnight, lags 24 hours, breaks on schema changes, and has to be rebuilt every time you switch tools. This is the export trap.

The right architecture is not an export

Your ML telemetry should be queryable by your existing data stack, with access controls as the only barrier.

That means storing it in a format that Snowflake, Spark, Trino, dbt, and Athena all speak natively. In 2026, that format is Apache Iceberg.

This is what WildEdge does. Every event flows through Raindrop, our ingestion and query engine built on open standards, and lands in a flat Iceberg table on object storage. One row per event. No proprietary encoding.

Partition design

In a minimal setup the table is partitioned by two fields:

project_id  (identity transform)
event_day   (day transform on event_timestamp)

The identity partition on project_id means each customer's data lives at a deterministic object store prefix. Isolation is structural, not a WHERE clause. No query can accidentally touch another tenant's files.

The day partition on event_timestamp is where query performance comes from. Time-range queries skip entire days of Parquet files at the metadata level, before any data is read. Compaction keeps files well-sized within each partition; a query scoped to a specific model run over a few hours reads a small fraction of it.

In practice

Once you have access, the integration is whatever your team already knows:

dbt source:

yaml
sources:
  - name: wildedge
    database: your_warehouse
    schema: wildedge
    tables:
      - name: raw_events

Then join it with your labels table, your business metrics, your feature store snapshots. No export job. No pipeline to maintain. No 24-hour lag.

Snowflake, Spark, Trino, Athena read Iceberg natively. The catalog metadata tells them which files are relevant. They never touch files outside your partition.

For DataFrame-style access in Python, Daft reads Iceberg natively.

On ClickHouse

ClickHouse is the standard recommendation for high-volume telemetry, and it earns that reputation. It handles millions of rows per second on ingest, delivers sub-100ms queries on billion-row event streams, and the OpenTelemetry community treats it as first-class.

The problem is data gravity. Your ground truth labels, business outcomes, and feature distributions live in your warehouse: Snowflake, BigQuery, Redshift. ClickHouse is a separate system. Running the join above requires ETL in one direction or the other. You have recreated the export trap with a faster database at the center.

The cost gap is real. Rough figures for a few hundred million inferences per month, compressing to around 15–20 GB on S3:

Monthly costNative join with your warehouse?
ClickHouse Cloud (prod tier)~$225/mo (dedicated compute + storage)No, ETL required
Iceberg on S3~$0.50–15/mo storage + existing warehouse computeYes

The ClickHouse figure is a floor, not a per-query estimate. Production tier requires always-on compute: you pay whether you run one query a day or a thousand. The Iceberg figure is almost entirely S3 storage at $0.023/GB/month. Query cost depends on what you already run: managed engines like Athena or BigQuery charge per TB scanned, but partition pruning means most queries touch 1–5% of total data; self-hosted Trino or Spark add nothing on top of existing compute.

Iceberg files are S3 objects. Snowflake, Athena, Spark, and Trino read them without moving data. The join happens in your warehouse, alongside the labels and outcomes it needs.

For a standalone telemetry dashboard, ClickHouse is faster. For an ML development loop where inference telemetry needs to meet labels, outcomes, and feature distributions in one query, the bottleneck is never query speed. It is the pipeline you need to build before the join is even possible.

The observe-to-retrain gap

The ML development loop is: train, deploy, observe, retrain. The observe-to-retrain step is where most teams lose days or weeks. You see a confidence drop in your observability tool, then manually extract the problematic examples, then figure out which training slice they came from, then rebuild a dataset.

When your telemetry is in your data stack, that loop collapses. Your data team can write a dbt model that joins WildEdge telemetry with your label store and surfaces retraining candidates automatically. Your feature store team can track input distribution shift against the features that were used at training time. Your ML platform team can trigger retraining pipelines directly off the Iceberg table without any intermediate API.

None of that is possible when your telemetry is behind a vendor API with rate limits and a nightly export.

The tradeoff is setup, not operations

Iceberg on object storage requires more work up front than pointing a pipeline at a managed database: an object store bucket and a catalog. You do that once.

What you get permanently: inference telemetry versioned alongside your data, queryable by tools your team already runs, joinable with everything else you own. No export jobs. No 24-hour lag. No "contact us for data access."


More posts on the way covering how Raindrop works under the hood: partition design, compaction, schema evolution. Follow along or start a discussion on GitHub.

Observability built for models.