πŸ‘οΈ

Computer Vision

Azure AI Vision services for image and video processing

⏱️ Estimated reading time: 18 minutes

Azure AI Vision (Image Analysis)

Azure AI Vision provides services to analyze and extract information from images and videos.

Main Capabilities:
- Object Detection: Identifies and locates objects in images.
- Image Tagging: Assigns descriptive tags to entire images.
- Caption Generation: Creates detailed textual descriptions of images.
- Inappropriate Content Detection: Identifies adult, violent, or sensitive content.
- Face Analysis: Detects emotions, approximate age, gender.
- Optical Character Recognition (OCR): Extracts text from images.

Use Cases:
- Content moderation on social media
- Medical image analysis
- Inventory automation
- Accessibility improvement for visually impaired people

🎯 Key Points

  • βœ“ Object detection identifies and locates elements in images
  • βœ“ Automatic tagging assigns descriptive keywords
  • βœ“ Caption generation creates explanatory text
  • βœ“ Inappropriate content detection for moderation
  • βœ“ OCR extracts printed text from images
  • βœ“ Facial analysis detects emotions and features

Azure AI Face

Azure AI Face is a specialized service for facial analysis with advanced capabilities.

Face Detection:
- Precise face location in images
- Accessory detection (glasses, masks)
- Blur analysis and image quality
- Age, gender, and emotion estimation

Face Recognition/Identification:
- Face verification (1:1): Compares one face with another to verify identity
- Face identification (1:N): Searches for a face among a group of known people
- Facial similarity detection
- Grouping of similar faces

Ethical Considerations:
- Requires explicit consent from individuals
- Complies with privacy regulations
- Includes security measures to prevent abuse
- Transparent about biometric data usage

🎯 Key Points

  • βœ“ Detection identifies face location and features
  • βœ“ Verification compares two faces to confirm identity
  • βœ“ Identification searches for a face in a database
  • βœ“ Requires consent and complies with privacy
  • βœ“ Detects approximate emotions, age, and gender
  • βœ“ Includes security measures against abuse

Optical Character Recognition (OCR)

OCR is the technology that converts images of text into editable and searchable text.

Azure AI Vision Read API:
- Extracts printed and handwritten text
- Maintains document format and structure
- Supports multiple languages and fonts
- High accuracy on scanned documents and photos

Advanced Features:
- Layout Detection: Identifies paragraphs, tables, columns
- Structured Extraction: Converts documents to structured JSON formats
- Batch Processing: Handles multiple pages simultaneously
- Auto-correction: Improves accuracy with AI models

Common Applications:
- Document digitization
- Data entry automation
- Contract and form analysis
- Accessibility for visually impaired people

🎯 Key Points

  • βœ“ Converts text images to editable text
  • βœ“ Supports printed and handwritten text
  • βœ“ Maintains document format and structure
  • βœ“ Batch processing for efficiency
  • βœ“ High accuracy with advanced AI models
  • βœ“ Useful for digitization and automation

Azure AI Custom Vision (Custom Models)

Unlike Azure AI Vision (which uses models pre-trained by Microsoft), Custom Vision allows you to train your own image models using your own photos.

Two main tasks:
1. Image Classification: Predicts a label for the entire image.
* *Example:* Is this image a 'Dog' or a 'Cat'?
2. Object Detection: Finds the location (coordinates/bounding box) of specific objects within an image.
* *Example:* Where are the safety helmets in this construction site photo?

Process:
- Upload your own images.
- Tag them manually.
- Train the model.
- Evaluate and publish.

🎯 Key Points

  • βœ“ Trains models with user's own data
  • βœ“ Classification: Labels the entire image
  • βœ“ Object Detection: Locates object coordinates
  • βœ“ Requires a labeling and training process
  • βœ“ Ideal for very specific use cases (e.g., own brands)

Azure AI Video Indexer (Video Analysis)

A specialized service for extracting deep insights from video and audio files.

Key Capabilities:
- Face recognition in video: Identifies people and when they appear.
- Video OCR: Reads text that appears on screen (e.g., signs, slides).
- Audio transcription: Converts speech to text and generates captions.
- Topic and sentiment detection: Analyzes what is being discussed and the emotional tone.
- Speaker identification: Distinguishes who is speaking at any given moment.
- Indexing: Makes video content 'searchable'.

Tool: Typically used via the Video Indexer web portal or via API.

🎯 Key Points

  • βœ“ Extracts metadata and insights from videos
  • βœ“ Combines vision (faces, text) and audio (speech)
  • βœ“ Generates automatic transcriptions and captions
  • βœ“ Allows searching within video content
  • βœ“ Identifies speakers and key topics