πŸ“„

Document Intelligence and Search

Data extraction from documents and intelligent search services

⏱️ Estimated reading time: 15 minutes

Azure AI Document Intelligence

Azure AI Document Intelligence (formerly Form Recognizer) extracts structured data from documents using AI.

Pre-trained Models:
- Invoices: Extracts vendors, amounts, dates, items
- Receipts: Purchase information, totals, payment methods
- ID Cards: Personal data from IDs, passports, licenses
- Tax Forms: Tax return information
- Custom Documents: Trained with specific documents

Capabilities:
- Structured Extraction: Converts documents to JSON with named fields
- Advanced OCR: Reads printed and handwritten text with high accuracy
- Layout Analysis: Understands tables, forms, and complex structures
- Confidence and Validation: Provides confidence scores per field

Usage Process:
1. Select Model: Choose pre-trained model or train custom
2. Send Document: Upload document image or PDF
3. Process: AI analyzes and extracts information
4. Receive Results: Get structured data in JSON format

🎯 Key Points

  • βœ“ Extracts structured data from unstructured documents
  • βœ“ Pre-trained models for invoices, receipts, and IDs
  • βœ“ Supports training of custom models
  • βœ“ Converts documents to structured JSON format
  • βœ“ High accuracy with OCR and layout analysis
  • βœ“ Provides confidence scores

Knowledge Store

Knowledge Store is a feature of Azure AI Search that allows saving enriched information (extracted by AI) independently of the search index.

What is it for?
Normally, enriched data only lives inside the search engine. Knowledge Store allows 'projecting' or exporting this data to Azure Storage (Tables or Blobs) for other uses.

Key Exam Use Cases:
- Analytics and Reporting: Connecting Power BI directly to enriched data to visualize trends.
- Data Science: Using clean, structured data to train new Machine Learning models.
- Audit: Saving a permanent copy of extracted information.

Types of Projections:
- Table Projections: Saves structured data for relational analysis.
- Object Projections: Saves complex JSON structures.
- File Projections: Saves extracted or generated images.

🎯 Key Points

  • βœ“ Saves enriched data outside the search index
  • βœ“ Allows using AI data in Power BI or Machine Learning
  • βœ“ Uses Azure Storage (Tables and Blobs)
  • βœ“ Key concept: Projections
  • βœ“ Essential for reusing extracted knowledge