👁️

Computer Vision

Azure AI Vision services for image and video processing

⏱️ Estimated reading time: 18 minutes

Azure AI Vision (Image Analysis)

Azure AI Vision provides services to analyze and extract information from images and videos.

Main Capabilities:
- Object Detection: Identifies and locates objects in images.
- Image Tagging: Assigns descriptive tags to entire images.
- Caption Generation: Creates detailed textual descriptions of images.
- Inappropriate Content Detection: Identifies adult, violent, or sensitive content.
- Face Analysis: Detects emotions, approximate age, gender.
- Optical Character Recognition (OCR): Extracts text from images.

Use Cases:
- Content moderation on social media
- Medical image analysis
- Inventory automation
- Accessibility improvement for visually impaired people

Key Points

✓ Object detection identifies and locates elements in images
✓ Automatic tagging assigns descriptive keywords
✓ Caption generation creates explanatory text
✓ Inappropriate content detection for moderation
✓ OCR extracts printed text from images
✓ Facial analysis detects emotions and features

Azure AI Face

Azure AI Face is a specialized service for facial analysis with advanced capabilities.

Face Detection:
- Precise face location in images
- Accessory detection (glasses, masks)
- Blur analysis and image quality
- Age, gender, and emotion estimation

Face Recognition/Identification:
- Face verification (1:1): Compares one face with another to verify identity
- Face identification (1:N): Searches for a face among a group of known people
- Facial similarity detection
- Grouping of similar faces

Ethical Considerations:
- Requires explicit consent from individuals
- Complies with privacy regulations
- Includes security measures to prevent abuse
- Transparent about biometric data usage

Key Points

✓ Detection identifies face location and features
✓ Verification compares two faces to confirm identity
✓ Identification searches for a face in a database
✓ Requires consent and complies with privacy
✓ Detects approximate emotions, age, and gender
✓ Includes security measures against abuse

Optical Character Recognition (OCR)

OCR is the technology that converts images of text into editable and searchable text.

Azure AI Vision Read API:
- Extracts printed and handwritten text
- Maintains document format and structure
- Supports multiple languages and fonts
- High accuracy on scanned documents and photos

Advanced Features:
- Layout Detection: Identifies paragraphs, tables, columns
- Structured Extraction: Converts documents to structured JSON formats
- Batch Processing: Handles multiple pages simultaneously
- Auto-correction: Improves accuracy with AI models

Common Applications:
- Document digitization
- Data entry automation
- Contract and form analysis
- Accessibility for visually impaired people

Key Points

✓ Converts text images to editable text
✓ Supports printed and handwritten text
✓ Maintains document format and structure
✓ Batch processing for efficiency
✓ High accuracy with advanced AI models
✓ Useful for digitization and automation

Azure AI Custom Vision (Custom Models)

Unlike Azure AI Vision (which uses models pre-trained by Microsoft), Custom Vision allows you to train your own image models using your own photos.

Two main tasks:
1. Image Classification: Predicts a label for the entire image.
* *Example:* Is this image a 'Dog' or a 'Cat'?
2. Object Detection: Finds the location (coordinates/bounding box) of specific objects within an image.
* *Example:* Where are the safety helmets in this construction site photo?

Process:
- Upload your own images.
- Tag them manually.
- Train the model.
- Evaluate and publish.

Key Points

✓ Trains models with user's own data
✓ Classification: Labels the entire image
✓ Object Detection: Locates object coordinates
✓ Requires a labeling and training process
✓ Ideal for very specific use cases (e.g., own brands)

Azure AI Video Indexer (Video Analysis)

A specialized service for extracting deep insights from video and audio files.

Key Capabilities:
- Face recognition in video: Identifies people and when they appear.
- Video OCR: Reads text that appears on screen (e.g., signs, slides).
- Audio transcription: Converts speech to text and generates captions.
- Topic and sentiment detection: Analyzes what is being discussed and the emotional tone.
- Speaker identification: Distinguishes who is speaking at any given moment.
- Indexing: Makes video content 'searchable'.

Tool: Typically used via the Video Indexer web portal or via API.

Key Points

✓ Extracts metadata and insights from videos
✓ Combines vision (faces, text) and audio (speech)
✓ Generates automatic transcriptions and captions
✓ Allows searching within video content
✓ Identifies speakers and key topics

← Back to AZURE-AI-900