Insights
    Data ArchitectureAI Strategy

    Data Architecture for AI Readiness: What Most Roadmaps Get Wrong

    10 March 2026·8–10 min read
    Data Architecture for AI Readiness: What Most Roadmaps Get Wrong

    What AI Actually Needs From Your Data

    AI models — whether large language models, predictive systems, or automation agents — are only as reliable as the data they are given access to. But the failure mode is almost never a lack of data. It is the wrong kind of data, in the wrong form, with insufficient clarity about what it represents.

    The four properties that matter most for AI-ready data architecture are:

    1. Lineage — can you trace where a data point came from, what transformations it passed through, and when it was last validated?
    2. Consistency — does the same concept mean the same thing across systems? Is "customer" defined identically in your CRM, your billing platform, and your data warehouse?
    3. Timeliness — does the data arrive when the AI system needs it, at the latency the use case requires?
    4. Access control — can the right systems read the right data, and only that data, without manual intervention each time?

    Most organisations score reasonably on timeliness and access control once they have a modern data platform. They almost universally underestimate the difficulty of lineage and consistency.

    The Semantic Layer Problem

    The hardest architectural challenge in enterprise AI is not storage or compute. It is semantics.

    When a business analyst asks a question in natural language — "show me our top customers by lifetime value" — there are at least half a dozen implicit assumptions embedded in that question. What counts as a customer? Is a parent company one customer or many? What time window defines lifetime? Is value revenue, margin, or something else?

    A human analyst knows to ask these questions. An AI system will answer them silently, using whatever the underlying data model implies. If the data model is inconsistent across domains — and in most large organisations, it is — the AI will produce answers that are internally coherent but practically misleading.

    Resolving this requires a semantic layer: a governed vocabulary that maps business concepts to data definitions, maintained by people who understand both the business and the data. This is not a technology project. It is an ongoing organisational practice.

    Architectural Patterns That Enable AI

    Several architectural patterns disproportionately support AI use cases:

    Event streaming over batch ETL AI applications increasingly require near-real-time data. Organisations still running nightly batch pipelines will find a growing class of use cases inaccessible. Moving towards event-driven architecture is a multi-year investment with compounding returns.

    Feature stores A feature store is a centralised repository for the derived data features that power ML models — things like rolling averages, entity embeddings, and aggregated signals. Without a feature store, every team building a model recreates the same transformations independently, with no consistency guarantees and significant duplication of effort.

    Data contracts A data contract is a formal agreement between a data producer and a data consumer about the structure, semantics, and quality standards of a data product. Contracts make the implicit explicit, and create a basis for automated validation. They are one of the most practical interventions available for improving data quality at scale.

    The Roadmap Mistake

    Most data strategy roadmaps sequence work in the wrong order: migrate to the cloud, consolidate on a single platform, then enable AI. The AI capability is treated as the reward for getting the data house in order.

    The problem is that "getting the data house in order" is an infinite project. There is always another system to integrate, another data quality issue to resolve, another governance framework to implement.

    A more effective approach is to select two or three AI use cases that are genuinely valuable and work backwards from their data requirements. This forces concrete decisions — which systems need to be integrated, which concepts need to be defined, which pipelines need to run at what latency — and creates a tractable scope of work.

    It also creates something a generic data strategy rarely produces: a business outcome against which progress can be measured.