Azure Computer Vision: Analyzing Images with AI

Azure Computer Vision is a cloud-based AI service that enables applications to extract rich information from images and videos. Part of Azure AI Services (formerly Cognitive Services), it provides pre-built machine learning models so you can add vision capabilities to your apps without deep ML expertise.

What is Azure Computer Vision?

Azure Computer Vision offers two main APIs:

Image Analysis: Extract tags, objects, text, faces, colors, and descriptions from images.
OCR (Optical Character Recognition): Extract printed and handwritten text from images and documents.

The latest generation, Azure AI Vision v4.0, introduces the unified Image Analysis 4.0 API with a Florence-based foundation model, offering dramatically improved accuracy.

Key Capabilities

1. Image Analysis

Caption generation: Auto-generate human-readable descriptions of images.
Dense captions: Generate captions for multiple regions within an image.
Object detection: Identify and locate objects with bounding boxes.
Tagging: Return thousands of recognizable objects, living beings, scenery, and actions.
Smart cropping: Suggest crop coordinates based on the region of interest.
People detection: Detect people and their locations in an image.
Background removal: Segment the foreground subject from the background.

2. Optical Character Recognition (OCR)

Read printed and handwritten text in 164 languages.
Extract text from receipts, invoices, forms, and ID documents.
Return structured results with word-level bounding boxes and confidence scores.

3. Spatial Analysis (Video)

Analyze live video streams to detect people, count occupancy, and measure distances.
Useful for retail analytics, safety compliance, and smart buildings.

Setting Up Azure Computer Vision

Step 1: Create a Computer Vision Resource

Go to the Azure Portal.
Search for Computer Vision and click Create.
Fill in:
- Subscription and resource group.
- Region (choose one close to your users).
- Pricing tier (Free F0 for testing, S1 for production).
Click Review + Create.
After deployment, note the Endpoint and Key from the Keys and Endpoint tab.

Step 2: Install the SDK

# Python
pip install azure-ai-vision-imageanalysis

# .NET
dotnet add package Azure.AI.Vision.ImageAnalysis

# JavaScript / Node.js
npm install @azure-rest/ai-vision-image-analysis

Image Analysis with Code Examples

Python — Analyze an Image from URL

from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential

client = ImageAnalysisClient(
    endpoint="https://<your-resource>.cognitiveservices.azure.com/",
    credential=AzureKeyCredential("<your-key>")
)

result = client.analyze_from_url(
    image_url="https://example.com/sample.jpg",
    visual_features=[
        VisualFeatures.CAPTION,
        VisualFeatures.OBJECTS,
        VisualFeatures.TAGS,
        VisualFeatures.READ
    ]
)

# Print caption
print(f"Caption: {result.caption.text} (confidence: {result.caption.confidence:.2f})")

# Print tags
for tag in result.tags.list:
    print(f"Tag: {tag.name} ({tag.confidence:.2f})")

# Print detected objects
for obj in result.objects.list:
    print(f"Object: {obj.tags[0].name} at {obj.bounding_box}")

# Print OCR text
for line in result.read.blocks[0].lines:
    print(f"Text: {line.text}")

C# — Analyze a Local Image

using Azure;
using Azure.AI.Vision.ImageAnalysis;

var client = new ImageAnalysisClient(
    new Uri("https://<your-resource>.cognitiveservices.azure.com/"),
    new AzureKeyCredential("<your-key>")
);

using var imageStream = File.OpenRead("sample.jpg");
var result = await client.AnalyzeAsync(
    BinaryData.FromStream(imageStream),
    VisualFeatures.Caption | VisualFeatures.Tags | VisualFeatures.Objects
);

Console.WriteLine($"Caption: {result.Value.Caption.Text}");
foreach (var tag in result.Value.Tags.Values)
    Console.WriteLine($"Tag: {tag.Name} - {tag.Confidence:P}");

Optical Character Recognition (OCR) Deep Dive

Azure's Read API is the recommended OCR solution for high-accuracy text extraction.

result = client.analyze_from_url(
    image_url="https://example.com/receipt.jpg",
    visual_features=[VisualFeatures.READ]
)

for block in result.read.blocks:
    for line in block.lines:
        print(f"{line.text}")
        for word in line.words:
            print(f"  Word: '{word.text}' confidence={word.confidence:.2f}")

Background Removal

One of the powerful new features in v4.0 is background removal:

import requests

url = "https://<your-resource>.cognitiveservices.azure.com/computervision/imageanalysis:segment"
params = {"api-version": "2023-02-01-preview", "mode": "backgroundRemoval"}
headers = {"Ocp-Apim-Subscription-Key": "<your-key>", "Content-Type": "application/json"}
body = {"url": "https://example.com/person.jpg"}

response = requests.post(url, params=params, headers=headers, json=body)
with open("output_foreground.png", "wb") as f:
    f.write(response.content)

Use Cases

E-commerce: Auto-tag product images, generate alt text, detect defects.
Healthcare: Extract data from medical forms and patient records.
Retail: Count customers, analyze shelf occupancy, ensure planogram compliance.
Document processing: Parse invoices, receipts, and contracts.
Accessibility: Generate image descriptions for visually impaired users.
Security: Detect unauthorized access in restricted areas.

Pricing Overview

Feature	Free Tier (F0)	Standard (S1)
Image Analysis	5,000 transactions/month	$1.00 per 1,000 transactions
OCR (Read)	5,000 pages/month	$1.50 per 1,000 pages
Background Removal	Not included	$0.008 per image

Best Practices

Cache results: If the same image is analyzed repeatedly, store results to avoid redundant API calls.
Choose the right features: Only request the visual features you need to minimize latency and cost.
Handle confidence scores: Set appropriate thresholds (e.g., ignore tags with confidence < 0.7).
Use async operations: For large batches, use async analyze calls to avoid blocking.
Secure your keys: Store API keys in Azure Key Vault or environment variables, never in code.

Azure Computer Vision vs. Custom Vision

Aspect	Computer Vision	Custom Vision
Model type	Pre-built, general purpose	Custom-trained for your domain
Setup time	Immediate	Requires training data
Best for	General image understanding	Domain-specific classification
Customization	Limited	Full control over labels

Conclusion

Azure Computer Vision empowers developers to add sophisticated image understanding to any application with minimal effort. From OCR and object detection to background removal and spatial video analysis, it covers a wide range of AI vision scenarios. By combining it with other Azure AI services, you can build intelligent, vision-driven solutions that operate at cloud scale.

For more information, visit Azure Computer Vision Documentation.