Enterprise AI conversations often revolve around model performance, copilots, and automation breakthroughs, but a more consequential truth is quietly emerging inside the enterprise. AI progress is increasingly constrained by how well enterprises can discover, understand, govern, and prepare their unstructured data.
The latest industry research on unstructured data management shows that organizations are not just facing a data growth problem, they are facing a capability barrier. The next divide in enterprise AI will be between those that build AI data management expertise and those that do not.
Unstructured Data Is the Foundation of Enterprise AI
Unstructured data has become central to how organizations operate and drive AI innovation. Banks rely on emails and chat transcripts to detect fraud patterns that structured systems miss. Healthcare providers depend on physician notes and medical images to improve early detection and care decisions. Retailers and e-commerce platforms analyze customer reviews, social content, and support interactions to automate and enhance customer experience.
At the same time, data volumes are exploding year after year. 74% of enterprises now store more than 5 petabytes of unstructured data (a 57% increase since just 2024), and 40% store more than 10PB. To put that in perspective, 10PB is roughly equivalent to two trillion songs or 10 trillion books. Managing even 5PB is not trivial: over five years, storage can represent roughly $6.55 million on-premises or nearly $8.94 million in a public cloud environment. The cost of management climbs even higher after factoring in the operational overhead of governance, security, and data preparation.
Yet the issue extends far beyond just cost, including control, visibility, and AI readiness.
AI Is Both Consuming and Creating Unstructured Data
The explosive growth in unstructured data is being driven by structural shifts:
- AI workloads generate massive training datasets, embeddings, logs, and model artifacts
- Application modernization produces continuous telemetry, machine logs, and event data
- Rich media (video, imaging, design files) is proliferating across industries
- Regulatory retention requirements extend how long data must be stored
- Cloud sprawl leads to duplicate copies, backups, and snapshots
AI is now using unstructured data to create even more unstructured data. Model outputs, AI-generated documents, synthetic data, transcripts, summaries, and interaction logs all feed back into the enterprise data estate. The result is a compounding cycle: more data fuels AI, and AI produces more data, accelerating both storage demands and governance exposure.
The Skills Behind the AI Readiness Gap
While data volumes surge, organizational capability is struggling to keep pace. 62% of organizations report gaps in AI data management skills, a sharp increase from recent years. This is now the top skills shortfall, ahead of cloud storage strategy and even traditional data security and compliance expertise.
AI data management requires the ability to:
- Discover and classify data across silos
- Analyze metadata and automate data lifecycle decisions
- Apply policy-driven governance for PII, retention, and access controls
- Integrate storage and data environments with AI and analytics pipelines
- Evaluate data quality and suitability for AI use
In other words, enterprises must move beyond managing storage systems to managing data estates.
Classification: The Make-or-Break Capability
Organizations recognize that unstructured data classification is a cornerstone of AI readiness. It is ranked as a leading strategy for governance, security, ransomware defense, and AI data curation. Future requirements consistently point to classification and tagging, analytics and reporting, and sensitive data detection as foundational capabilities.
At the same time, classification is the top challenge in preparing data for AI. The reasons for this challenge are that unstructured data is spread across NAS, cloud object stores, SaaS platforms, backups, and archives; files often lack consistent metadata; and tools remain fragmented across platforms. Petabyte-scale environments make manual and siloed approaches impossible.
This makes classification the make-or-break capability in AI readiness: enterprises understand what must be done, but lack the skills, tools, and operating models to do it at scale.
A Shift in IT’s Role
The AI era is reshaping the role of IT and infrastructure teams. Historically, success was measured by system uptime and storage capacity. Now, IT is expected to provide visibility into what data exists, where it resides, how sensitive it is, and whether it is ready and safe for AI use.
This is why many organizations are investing in centralized data management platforms, forming cross-functional AI task forces that include IT, security, and legal, and hiring leaders focused on building the AI data foundation. These changes reflect a broader shift: AI success depends on disciplined, consistent data management as much as on model innovation.
The Next AI Imperative
Enterprises are rich in data, but often poor in usable intelligence. As AI initiatives scale, that gap becomes a limiting factor. Models can only be as effective, trustworthy, and compliant as the data behind them.
The organizations that take the lead in AI innovation will have built internal capability and expertise in discovering, classifying, governing, and mobilizing unstructured data at scale. In the AI era, the most important infrastructure may not be GPUs or cloud deployment. It will be the practices that turn vast, unmanaged data estates into governed, AI-ready assets.