From off-limits to AI-Ready: Preparing unstructured data directly in Microsoft Fabric with Tonic Textual
嗯,用户让我总结这篇文章的内容,控制在一百个字以内,而且不需要特定的开头。首先,我需要快速浏览文章,抓住主要信息。 文章主要讲的是AI团队在处理数据时遇到的问题,特别是非结构化文本数据。Gartner预测很多AI项目会因为数据问题而失败。然后介绍了Tonic Textual这个工具,它可以在Microsoft Fabric上帮助处理这些敏感数据,使其适合AI使用。 所以,我需要把重点放在AI数据准备的挑战、Tonic Textual的功能以及它带来的好处上。确保语言简洁明了,不超过一百个字。 可能的结构是:先说明问题,然后介绍解决方案及其优势。这样既全面又简洁。 </think> AI团队面临数据准备挑战,尤其是非结构化文本中的敏感信息限制了AI开发。Tonic Textual在Microsoft Fabric中提供安全的数据检测、去识别和准备功能,帮助组织将大量未使用的敏感文本转化为可用于AI的高质量、安全数据。 2026-3-27 13:16:24 Author: securityboulevard.com(查看原文) 阅读量:6 收藏

AI teams need to move quickly, but the reality is that most of the data they need simply isn’t ready for AI. In fact, Gartner predicts that through 2026, 60 % of AI projects will be abandoned because they lack AI-ready data.

Across the enterprise, the most valuable inputs for AI live in unstructured text — notes, support tickets, contracts, call transcripts, internal documentation, and customer communications all contain rich signals models need to learn from. Yet this data is frequently off-limits because it embeds sensitive information that jeopardizes compliance or violates internal privacy policies.

Today, we are excited to announce the general availability of Tonic Textual on Microsoft Fabric, giving teams a secure, Fabric-native way to detect, de-identify, and prepare unstructured text data for AI development.

With this release, organizations can finally move large backlogs of sensitive text from “off limits” to “AI ready” without moving data out of their governed Fabric environment.

Unstructured data is the bottleneck for AI development

The signals that matter most for AI live in unstructured text: how customers speak, how clinicians document care, how work actually gets done. This data holds enormous potential, but its complexity and sensitivity make it the primary constraint on AI progress.

That said, this type of data is laborious to inspect, difficult to govern, and risky to share with models. Sensitive entities such as names, identifiers, locations, and domain-specific values are deeply embedded in free text, PDFs, and scanned documents. Manual redaction does not scale, and one-off scripts can be brittle and incomplete.

As a result, teams either delay AI projects or accept unnecessary privacy risk by exposing raw text to models during training, evaluation, or inference.

AI success depends on having data that is not only high quality, but safe to use.

Preparing unstructured data inside Fabric

Tonic Textual brings AI-native data preparation directly into Microsoft Fabric.

Running inside the Fabric ecosystem, Textual automatically detects sensitive entities across unstructured files stored in OneLake and applies consistent transformations that make text safe for AI development. Teams can redact values, replace them with high-fidelity synthetic data, or apply custom handling based on their use case.

This allows organizations to prepare text data for a wide range of AI workflows, including:

  • Model training and fine-tuning
  • Retrieval-augmented generation pipelines
  • Evaluation and testing environments
  • Secure prompt and inference workflows

All this is now possible without exporting data to external systems or breaking existing governance controls.

From backlog to usable data 

With Tonic Textual on Fabric, AI teams can automatically detect sensitive information across large volumes of unstructured text and apply de-identification or high-fidelity synthetic replacement at scale. Document structure and contextual integrity are preserved, ensuring that prepared data remains useful for downstream AI workflows rather than stripped of meaning.

By enforcing consistent privacy policies across training, evaluation, and inference, teams can move beyond isolated experiments and confidently operationalize AI in production environments. What was once a massive backlog of unusable text becomes a trusted, governed input for AI development.

Ready for production workloads 

General availability of Tonic Textual on Microsoft Fabric marks an important milestone for teams building AI in production environments.

Textual on Fabric is built for scale, reliability, and enterprise governance. It integrates directly with Fabric workloads and OneLake storage, enabling privacy-first AI data preparation as a native part of the AI lifecycle rather than a downstream control.

By preparing unstructured data before it ever reaches a model, organizations reduce risk, accelerate development, and create a foundation for responsible AI at scale.

Available Now

Tonic Textual for Microsoft Fabric is generally available today and can be added directly from the Fabric Workload Hub.

If you are building AI with unstructured data and need a secure way to prepare sensitive text for production, you can start a project with Textual on Fabric today.

*** This is a Security Bloggers Network syndicated blog from Expert Insights on Synthetic Data from the Tonic.ai Blog authored by Expert Insights on Synthetic Data from the Tonic.ai Blog. Read the original post at: https://www.tonic.ai/blog/tonic-textual-microsoft-fabric-general-availability


文章来源: https://securityboulevard.com/2026/03/from-off-limits-to-ai-ready-preparing-unstructured-data-directly-in-microsoft-fabric-with-tonic-textual/
如有侵权请联系:admin#unsafe.sh