← all reports.
Data Strategy & AI Readiness.
Thursday, 11 June 2026

Data Strategy, Not Models, Is Driving Today’s AI Winners

🎧
listen to podcast version.
A slew of new reports and industry moves this week all point to one conclusion: the key to unlocking AI’s value lies in data. From infrastructure and quality to governance, unique data assets and new tools, organizations that get their data strategy right are pulling ahead in the AI race.

Data Infrastructure: The AI Leader’s Edge

Recent findings underscore how critical a strong data foundation is for AI success. In fact, a new survey revealed that only **6% of enterprise AI leaders feel their data infrastructure is fully ready for AI** ([1]). This readiness gap – now seen as one of the biggest constraints on AI progress – highlights a simple truth: the companies reaping real benefits from AI are not necessarily those with the most advanced algorithms, but those with the best data foundations. As one data CEO put it, 'today, AI is constrained by data' ([2]) rather than by model ingenuity.

The divide between AI leaders and laggards increasingly comes down to how they manage data. High-performing organizations almost always invest early in modern data architectures – from cloud data lakehouses to unified data warehouses – to ensure data is accessible, integrated and trustworthy. The **contrast is striking: 60% of companies at the highest level of AI maturity have built advanced data infrastructure, while 53% of those struggling with AI cite immature, fragmented data systems as a major hindrance** ([3]). This gap is costing businesses time, money and competitive advantage ([4]), as AI initiatives stall without the fuel of reliable data.

Accordingly, we’re seeing a strategic shift in IT investment. Organizations are recognizing that improving data pipelines and platforms yields greater returns than obsessing over the latest machine-learning model. **Only 9% of companies now rank new AI model development as their top priority, while 83% are investing in centralized, consistent data access layers** ([5]). By building "connected, contextual, semantically consistent" data environments ([6]), these leaders create an AI-ready springboard – one that turns data into a durable asset rather than a bottleneck.

When Data Falls Short, So Does AI

The flip side of this story is that poor data quality and silos are sabotaging AI projects at an alarming rate. Analysts have long warned that most AI initiatives don’t fail due to algorithms, but due to subpar data. This week, that warning came with a stark statistic: **Gartner predicts that through 2026, 60% of AI projects lacking “AI-ready” data will be abandoned** ([1]). Many companies enthusiastically pour resources into AI only to hit a wall when confronted with messy, unstructured, or inaccessible data. In banking, for example, an AI model is only as good as the customer and risk data it can learn from; if that data is erroneous or locked in departmental silos, the model’s outputs won’t be reliable.

New industry surveys illustrate these data pain points in detail. One report found that 74% of enterprises expect to integrate over 500 data sources to fuel their AI ambitions, creating significant complexity ([2]). Even heavily "centralized" organizations struggle: **67% of such companies still spend over 80% of their data engineering time just maintaining pipelines** – time siphoned away from innovation ([3]). And it’s not just about the volume of data sources; timeliness matters too. **41% of organizations say the lack of real-time data access is preventing AI models from delivering timely insights**, while nearly a third point to data trapped in silos as a major blocker to AI success ([4]). In short, many teams are mired in assembling and cleaning data when they should be focusing on deploying AI solutions.

All this underscores that data quality is not a back-office IT concern – it’s a fundamental business issue in the AI era. Every extra hour that data scientists spend on "data plumbing" is an hour not spent on value-adding analysis or innovation, a frustration echoed by 71% of AI teams who waste over a quarter of their time on data wrangling tasks ([5]). Poor data also introduces risk: AI systems trained on flawed or biased data can produce erroneous or even unethical outcomes, exposing organizations to financial and reputational damage. The encouraging news is that improving data quality pays off. In fact, **61% of data leaders report that better data quality has already helped their generative AI pilots succeed** ([6]), validating the push to invest in data preparation, master data management, and real-time integration. Leaders are learning that high-quality, well-governed data isn’t a "nice to have" – it’s the make-or-break factor for AI-driven ROI.

The AI Governance & Compliance Gap

As AI expands across the enterprise, a new challenge has emerged: **how to govern and control AI deployments at scale**. A report from IBM’s Institute for Business Value this week highlights a worrying accountability gap. After surveying 2,000 technology executives, IBM found that **two-thirds of CIOs and CTOs are being held accountable for AI systems they do not fully control** ([1]). Moreover, 70% of these tech leaders say that employees are now deploying tech (including AI and automation) faster than the IT department can keep track of it ([2]). It’s no surprise, then, that **77% of organizations admit AI adoption is outpacing their governance and oversight capabilities** ([3]). In essence, AI is spreading into products and workflows faster than companies can implement policies to manage it responsibly.

Analysts are calling this mismatch “shadow AI” – a proliferation of AI systems operating outside the purview of central IT governance ([4]). AI-powered features are embedded in cloud applications, low-code tools, and departmental projects, which means innovation often happens at the fringes while oversight remains centralized. **CIOs retain responsibility for risk and compliance 'but lack control due to federated models and hyperscaler-led abstraction,'** as one Forrester analyst put it ([5]). Another expert noted that enterprises have *“decentralised experimentation far faster than they have decentralised accountability”*, leaving leadership with responsibility for AI outcomes without direct line of sight into every system leveraging sensitive data ([6]).

Closing this governance gap is now a top priority for C-level data leaders. Many are establishing enterprise-wide AI governance frameworks and cross-functional AI councils to regain visibility. Experts – and regulators – advise building **controls and compliance into AI systems from the start** rather than bolting them on later ([7]). This approach pays dividends: organizations that bake automated governance and security checks into their AI platforms see *25% fewer incidents* when scaling up, compared to those relying on manual oversight ([8]). And with **59% of enterprises citing regulatory compliance as the number-one challenge in managing data for AI** ([9]), robust governance isn’t just about preventing accidents – it’s essential for meeting data privacy laws and maintaining customer trust.

Data as a Competitive Moat

With AI models and tools increasingly accessible, businesses are discovering that **their proprietary data is becoming the key competitive differentiator**. A popular LinkedIn essay shared this week argues that traditional software advantages are eroding – features can be copied in days, UIs cloned in a weekend – due to accelerated AI-driven development ([1]). For example, tasks that used to require five years of coding and millions of dollars can now be accomplished almost overnight with modern AI-assisted software generation ([2]). This rapid commoditization of technology means **the defensibility of pure software IP is declining**, pushing companies to find alternative ways to stand out.

The consensus? The *only true, lasting AI advantage may lie in unique data*. As the LinkedIn author pointed out, *'the one thing AI can’t fake is years of real, verified, proprietary data'* ([3]). No matter how sophisticated an AI model is, it can’t simply conjure up a rich trove of authentic customer interactions, transaction records, or domain-specific knowledge that a business has accumulated over time. You 'can’t prompt your way to 300 million professional profiles' or generate fully accurate, up-to-date product data out of thin air ([4]). In other words, competitors might access similar models, but they cannot easily recreate your data assets.

Companies that have invested in building these data “moats” are already reaping the benefits. A striking example comes from financial services: Bloomberg’s new GPT model was trained on **40 years of proprietary financial data, news, and research** that no rival can easily replicate ([5]). Analysts note that *“the data moat is not the model architecture – any competitor could build a similar model – but the dataset that powers it”* ([6]). This exclusive data foundation gives Bloomberg a persistent edge. Likewise, across industries, leading firms are treating high-quality data as a form of intellectual property – safeguarding it, enriching it, and leveraging it to fuel AI solutions that competitors can’t match.

Building AI-Ready Data Architectures

The tech industry is responding to these challenges with new platforms and architectural approaches that put data at the center. Last week, at its Build 2026 conference, Microsoft unveiled **Project Rayfin**, an open-source toolkit designed to help developers deploy AI-powered applications on its unified Microsoft Fabric data platform ([1]). Rayfin aims to make it vastly easier to launch AI apps by combining data storage, databases, APIs, and machine learning runtime in one managed environment ([2]). By eliminating much of the "data plumbing" and integration work typically needed to stand up an AI solution, Rayfin not only speeds development but also inherently keeps **application data within a governed, analytics-ready estate** ([3]) ([4]). This "governance by default" approach – with security, compliance, and access controls automatically enforced from day one – serves as what one expert called a 'governed on-ramp' for enterprise AI, helping prevent the sprawl of unapproved, siloed AI projects ([5]) ([6]).

Data platform leaders are also doubling down on delivering **built-in business context and governance**. At its annual user conference, Snowflake introduced *Horizon Context*, a semantic metadata layer that curates and connects business definitions, data lineage, and access rules across the company’s Data Cloud ([7]). By providing a **centralized, governed map of the entire data estate**, this tool gives AI systems and analytics teams a consistent understanding of key metrics and terms (for instance, what counts as a "customer" or "revenue") when they access data ([8]). In the past, many enterprises created patchworks of data catalogs, BI definitions, and governance tools – resulting in different teams using conflicting data definitions ([9]). With a unified context layer, Snowflake aims to ensure an AI agent answering a customer query or generating a report uses the same trusted data and rules that a human analyst would – reducing errors and confusion.

Beyond these specific products, the broader trend is toward **AI-ready data infrastructure** that seamlessly blends flexibility, speed, and control. One fast-emerging technology is the vector database, which stores data in a format that generative AI models can query to retrieve facts, documents, and context in real time. Gartner forecasts that by 2026, **over 30% of enterprises will have adopted vector databases to ground their AI models with proprietary business data – up from less than 2% in 2023** ([10]). This explosion in *retrieval-augmented generation (RAG)* techniques shows how quickly companies are moving to connect their AI systems with the **latest, most relevant internal data**. Likewise, modern "data lakehouse" architectures – which combine the scale of data lakes with the reliability and governance of data warehouses – are becoming the default for large enterprises. In an open lakehouse, a **unified metadata catalog acts as the central brain**, enforcing consistent security policies and data definitions across all analytics and AI workloads ([11]). The takeaway: to enable AI at scale, organizations are redesigning their data ecosystems for agility (through cloud and streaming data pipelines), for trust (through governance and data quality controls), and for strategic value (through proprietary data assets).

key takeaway.
The case for AI is now a case for data. New research and real-world examples show that robust data foundations, governance, and unique proprietary information are what separate successful AI initiatives from costly failures and compliance nightmares.

Key Statistics

Only 6% of enterprise AI leaders say their data infrastructure is fully AI-ready (www.prnewswire.com).
Gartner predicts 60% of AI projects will be abandoned by 2026 due to lack of “AI-ready” data (www.gartner.com).
67% of CIOs & CTOs are being held accountable for AI systems they do not fully control (newsroom.ibm.com).
Over 30% of enterprises will use vector databases by 2026 (vs <2% in 2023) to integrate proprietary data into AI models (aibuzz.blog).
83% of companies are investing in centralized data access layers for AI, while only 9% still prioritize new model development (www.cdata.com).

sources.

CData Study Finds Only 6% of AI Leaders Believe Their Data Infrastructure Is Ready for AI
https://www.prnewswire.com/news-releases/cdata-study-finds-only-6-of-ai-leaders-believe-their-data-infrastructure-is-ready-for-ai-302629216.html
New IBM Study Finds CIOs and CTOs Face Growing AI Control Gap as Enterprise Deployment Scales
https://newsroom.ibm.com/2026-06-08-new-ibm-study-finds-cios-and-ctos-face-growing-ai-control-gap-as-enterprise-deployment-scales
Rayfin signals Microsoft’s push to make Fabric an AI app runtime
https://www.infoworld.com/article/4181166/rayfin-signals-microsofts-push-to-make-fabric-an-ai-app-runtime.html
The Data Moat: Why Proprietary Information is the New Competitive Edge, According to Federicodonatone…
https://enterprisezone.cc/the-data-moat-why-proprietary-information-is-the-new-competitive-edge-according-to-federicodona/
Lack of AI-Ready Data Puts AI Projects at Risk
https://www.gartner.com/en/newsroom/press-releases/2025-02-26-lack-of-ai-ready-data-puts-ai-projects-at-risk
Fivetran Report Finds Nearly Half of Enterprise AI Projects Fail Due to Poor Data Readiness
https://www.fivetran.com/press/fivetran-report-finds-nearly-half-of-enterprise-ai-projects-fail-due-to-poor-data-readiness
CDO Insights 2026 – AI Adoption Accelerates, But Trust and Governance Lag Behind
https://www.informatica.com/blogs/cdo-insights-2026-ai-adoption-accelerates-but-trust-and-governance-lag-behind.html
Snowflake’s Horizon Context aims to give AI agents a common understanding of the business
https://www.cio.com/article/4180170/snowflakes-horizon-context-aims-to-give-ai-agents-a-common-understanding-of-the-business.html
generated by lumo insights.
get weekly reports via whatsapp.
Data Strategy & AI Readiness
Subscribe QR code
scan to subscribe
or
Download PDF Report