Azure Computer Vision is a cloud-based AI service that enables applications to extract rich information from images and videos. Part of Azure AI Services (formerly Cognitive Services), it provides pre-built machine learning models so you can add vision capabilities to your apps without deep ML expertise.
What is Azure Computer Vision?
Azure Computer Vision offers two main APIs:
- Image Analysis: Extract tags, objects, text, faces, colors, and descriptions from images.
- OCR (Optical Character Recognition): Extract printed and handwritten text from images and documents.
The latest generation, Azure AI Vision v4.0, introduces the unified Image Analysis 4.0 API with a Florence-based foundation model, offering dramatically improved accuracy.
Key Capabilities
1. Image Analysis
- Caption generation: Auto-generate human-readable descriptions of images.
- Dense captions: Generate captions for multiple regions within an image.
- Object detection: Identify and locate objects with bounding boxes.
- Tagging: Return thousands of recognizable objects, living beings, scenery, and actions.
- Smart cropping: Suggest crop coordinates based on the region of interest.
- People detection: Detect people and their locations in an image.
- Background removal: Segment the foreground subject from the background.
2. Optical Character Recognition (OCR)
- Read printed and handwritten text in 164 languages.
- Extract text from receipts, invoices, forms, and ID documents.
- Return structured results with word-level bounding boxes and confidence scores.
3. Spatial Analysis (Video)
- Analyze live video streams to detect people, count occupancy, and measure distances.
- Useful for retail analytics, safety compliance, and smart buildings.
Setting Up Azure Computer Vision
Step 1: Create a Computer Vision Resource
- Go to the Azure Portal.
- Search for Computer Vision and click Create.
- Fill in:
- Subscription and resource group.
- Region (choose one close to your users).
- Pricing tier (Free F0 for testing, S1 for production).
- Click Review + Create.
- After deployment, note the Endpoint and Key from the Keys and Endpoint tab.
Step 2: Install the SDK
# Python
pip install azure-ai-vision-imageanalysis
# .NET
dotnet add package Azure.AI.Vision.ImageAnalysis
# JavaScript / Node.js
npm install @azure-rest/ai-vision-image-analysis
Image Analysis with Code Examples
Python — Analyze an Image from URL
from azure.ai.vision.imageanalysis import ImageAnalysisClient
from azure.ai.vision.imageanalysis.models import VisualFeatures
from azure.core.credentials import AzureKeyCredential
client = ImageAnalysisClient(
endpoint="https://<your-resource>.cognitiveservices.azure.com/",
credential=AzureKeyCredential("<your-key>")
)
result = client.analyze_from_url(
image_url="https://example.com/sample.jpg",
visual_features=[
VisualFeatures.CAPTION,
VisualFeatures.OBJECTS,
VisualFeatures.TAGS,
VisualFeatures.READ
]
)
# Print caption
print(f"Caption: {result.caption.text} (confidence: {result.caption.confidence:.2f})")
# Print tags
for tag in result.tags.list:
print(f"Tag: {tag.name} ({tag.confidence:.2f})")
# Print detected objects
for obj in result.objects.list:
print(f"Object: {obj.tags[0].name} at {obj.bounding_box}")
# Print OCR text
for line in result.read.blocks[0].lines:
print(f"Text: {line.text}")
C# — Analyze a Local Image
using Azure;
using Azure.AI.Vision.ImageAnalysis;
var client = new ImageAnalysisClient(
new Uri("https://<your-resource>.cognitiveservices.azure.com/"),
new AzureKeyCredential("<your-key>")
);
using var imageStream = File.OpenRead("sample.jpg");
var result = await client.AnalyzeAsync(
BinaryData.FromStream(imageStream),
VisualFeatures.Caption | VisualFeatures.Tags | VisualFeatures.Objects
);
Console.WriteLine($"Caption: {result.Value.Caption.Text}");
foreach (var tag in result.Value.Tags.Values)
Console.WriteLine($"Tag: {tag.Name} - {tag.Confidence:P}");
Optical Character Recognition (OCR) Deep Dive
Azure's Read API is the recommended OCR solution for high-accuracy text extraction.
result = client.analyze_from_url(
image_url="https://example.com/receipt.jpg",
visual_features=[VisualFeatures.READ]
)
for block in result.read.blocks:
for line in block.lines:
print(f"{line.text}")
for word in line.words:
print(f" Word: '{word.text}' confidence={word.confidence:.2f}")
Background Removal
One of the powerful new features in v4.0 is background removal:
import requests
url = "https://<your-resource>.cognitiveservices.azure.com/computervision/imageanalysis:segment"
params = {"api-version": "2023-02-01-preview", "mode": "backgroundRemoval"}
headers = {"Ocp-Apim-Subscription-Key": "<your-key>", "Content-Type": "application/json"}
body = {"url": "https://example.com/person.jpg"}
response = requests.post(url, params=params, headers=headers, json=body)
with open("output_foreground.png", "wb") as f:
f.write(response.content)
Use Cases
- E-commerce: Auto-tag product images, generate alt text, detect defects.
- Healthcare: Extract data from medical forms and patient records.
- Retail: Count customers, analyze shelf occupancy, ensure planogram compliance.
- Document processing: Parse invoices, receipts, and contracts.
- Accessibility: Generate image descriptions for visually impaired users.
- Security: Detect unauthorized access in restricted areas.
Pricing Overview
| Feature | Free Tier (F0) | Standard (S1) |
|---|---|---|
| Image Analysis | 5,000 transactions/month | $1.00 per 1,000 transactions |
| OCR (Read) | 5,000 pages/month | $1.50 per 1,000 pages |
| Background Removal | Not included | $0.008 per image |
Best Practices
- Cache results: If the same image is analyzed repeatedly, store results to avoid redundant API calls.
- Choose the right features: Only request the visual features you need to minimize latency and cost.
- Handle confidence scores: Set appropriate thresholds (e.g., ignore tags with confidence < 0.7).
- Use async operations: For large batches, use async analyze calls to avoid blocking.
- Secure your keys: Store API keys in Azure Key Vault or environment variables, never in code.
Azure Computer Vision vs. Custom Vision
| Aspect | Computer Vision | Custom Vision |
|---|---|---|
| Model type | Pre-built, general purpose | Custom-trained for your domain |
| Setup time | Immediate | Requires training data |
| Best for | General image understanding | Domain-specific classification |
| Customization | Limited | Full control over labels |
Conclusion
Azure Computer Vision empowers developers to add sophisticated image understanding to any application with minimal effort. From OCR and object detection to background removal and spatial video analysis, it covers a wide range of AI vision scenarios. By combining it with other Azure AI services, you can build intelligent, vision-driven solutions that operate at cloud scale.
For more information, visit Azure Computer Vision Documentation.


