Getting Started with Azure OpenAI Service

Azure OpenAI Service brings the power of OpenAI's foundation models—GPT-4o, GPT-4, GPT-3.5-Turbo, Embeddings, DALL·E, and Whisper—to the Azure cloud. It combines OpenAI's industry-leading AI with Azure's enterprise-grade security, compliance, regional availability, and integration with the broader Azure ecosystem.

What is Azure OpenAI Service?

Azure OpenAI Service is a managed API service that gives you access to OpenAI's large language models (LLMs) and multimodal models through a secure, enterprise-ready Azure endpoint. Unlike the public OpenAI API, Azure OpenAI offers:

Data residency: Your prompts and completions are not used to train OpenAI or Microsoft models.
Private networking: Connect via Azure Virtual Networks and Private Endpoints.
Compliance: Meets SOC 2, ISO 27001, HIPAA, and other standards.
RBAC and Azure AD: Fine-grained access control using Azure Active Directory.

Available Models

Model	Use Case
GPT-4o	Multimodal chat, complex reasoning, vision tasks
GPT-4 Turbo	Long-context completion, coding, analysis
GPT-3.5-Turbo	Fast, cost-efficient chat and text tasks
text-embedding-ada-002	Semantic search, clustering, classification
DALL·E 3	Image generation from text prompts
Whisper	Speech-to-text transcription

Setting Up Azure OpenAI

Step 1: Request Access and Create a Resource

Apply for access at Azure OpenAI Access Request.
Once approved, go to the Azure Portal > Create a resource > search Azure OpenAI.
Configure:
- Subscription, resource group, and region.
- Pricing tier: Standard S0.
Click Review + Create.

Step 2: Deploy a Model

Open your Azure OpenAI resource.
Click Go to Azure OpenAI Studio (or oai.azure.com).
Navigate to Deployments > Create new deployment.
Select a model (e.g., gpt-4o) and provide a deployment name.
Set tokens-per-minute (TPM) rate limit.

Making Your First API Call

Python SDK

pip install openai

from openai import AzureOpenAI

client = AzureOpenAI(
    api_key="<your-azure-openai-key>",
    api_version="2024-05-01-preview",
    azure_endpoint="https://<your-resource>.openai.azure.com/"
)

response = client.chat.completions.create(
    model="gpt-4o",  # Your deployment name
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain Azure OpenAI in simple terms."}
    ],
    temperature=0.7,
    max_tokens=500
)

print(response.choices[0].message.content)

Streaming Responses

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short poem about the cloud."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

C# / .NET

dotnet add package Azure.AI.OpenAI

using Azure;
using Azure.AI.OpenAI;

var client = new AzureOpenAIClient(
    new Uri("https://<your-resource>.openai.azure.com/"),
    new AzureKeyCredential("<your-key>")
);

var chatClient = client.GetChatClient("gpt-4o");
var response = await chatClient.CompleteChatAsync(
    new UserChatMessage("What is Azure OpenAI Service?")
);

Console.WriteLine(response.Value.Content[0].Text);

Working with Embeddings

Embeddings convert text into high-dimensional vectors, enabling semantic search and similarity comparisons.

response = client.embeddings.create(
    input="Azure OpenAI Service provides access to OpenAI models.",
    model="text-embedding-ada-002"  # Your embedding deployment name
)

vector = response.data[0].embedding
print(f"Embedding dimensions: {len(vector)}")

Common use cases for embeddings:

Semantic search: Find documents similar in meaning, not just keyword match.
Recommendation systems: Suggest items based on content similarity.
Clustering: Group documents by topic automatically.
RAG (Retrieval-Augmented Generation): Fetch relevant context before querying the LLM.

Image Generation with DALL·E 3

response = client.images.generate(
    model="dall-e-3",
    prompt="A futuristic city skyline at sunset with flying cars, digital art",
    size="1024x1024",
    quality="standard",
    n=1
)

print(response.data[0].url)  # URL to the generated image

Building RAG Applications

Retrieval-Augmented Generation (RAG) is a pattern where you retrieve relevant documents and pass them as context to the LLM:

def rag_query(user_question: str, retrieved_docs: list[str]) -> str:
    context = "\n".join(retrieved_docs)
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": f"Answer questions using only the context below:\n{context}"
            },
            {"role": "user", "content": user_question}
        ],
        temperature=0
    )
    return response.choices[0].message.content

For a fully managed RAG experience, use Azure AI Search together with Azure OpenAI.

Function Calling

Function calling allows the model to invoke external tools:

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"}
            },
            "required": ["city"]
        }
    }
}]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Hanoi?"}],
    tools=tools,
    tool_choice="auto"
)

Content Filtering

Azure OpenAI includes built-in content filters for harmful content categories (hate, violence, sexual, self-harm). You can configure filter strictness per category in Azure OpenAI Studio under Content filters.

Monitoring and Cost Management

Use Azure Monitor to track token consumption, latency, and error rates.
Set up alerts for TPM quota usage.
Use max_tokens and temperature carefully to control cost.
Enable token streaming to improve perceived latency.

Best Practices

System prompts: Craft clear, specific system messages to guide model behavior.
Temperature: Use 0 for deterministic tasks (Q&A, classification), 0.7–1.0 for creative tasks.
Token management: Estimate prompt tokens with tiktoken library to avoid exceeding context limits.
Retry logic: Implement exponential backoff for 429 (rate limit) errors.
Prompt injection defense: Sanitize user inputs to prevent prompt injection attacks.

Conclusion

Azure OpenAI Service brings enterprise-grade reliability, security, and compliance to OpenAI's most powerful models. Whether you're building chatbots, search engines, document processors, or creative tools, Azure OpenAI provides the infrastructure and SDKs to deploy AI solutions at scale with confidence.

For full documentation, visit Azure OpenAI Service Documentation.