Claude API Tutorial: Build Python Integration in 15 Minutes

I call the Claude API dozens of times a day. For client projects, internal tools, content workflows, quick prototypes. What took me a full afternoon to figure out the first time should take you about 15 minutes.

This is a claude api tutorial that gets you to working code fast. Setup, first call, streaming, and the cost controls I wish someone had shown me on day one. No theory. Let’s build.

Set Up in Under 5 Minutes

Get Your API Key

Head to console.anthropic.com, create an account, and navigate to API Keys. Generate a new key. Copy it immediately – you won’t see it again.

Install and Configure

Two packages. That’s it.

pip install anthropic python-dotenv

Create a .env file in your project root:

ANTHROPIC_API_KEY=sk-ant-your-key-here

I’ve seen too many API keys accidentally committed to GitHub. Use a .env file from day one and add it to your .gitignore. Future you will be grateful.

from dotenv import load_dotenv
import anthropic

load_dotenv()
client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY automatically

That’s your setup. The Anthropic SDK picks up the environment variable without you passing it explicitly.

Your First API Call

Here’s the minimum code that actually works:

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain what an API is in two sentences."}
    ]
)

print(message.content[0].text)

That’s a complete, working claude api python script. Run it and you’ll get a response in about a second.

A few things worth noting about what’s happening here:

Model choice matters. I’m using claude-sonnet-4-6 because it’s the best balance of speed and quality for most tasks. More on model selection below.

max_tokens controls output length. This caps the response at 1,024 tokens (roughly 750 words). Set this intentionally – the default is 4096, and most tasks don’t need that much. Lower means cheaper and faster.

The messages format uses role-based conversation. You can also add a system parameter for instructions that shape every response. For a deeper dive on writing effective system prompts, see the prompt engineering guide:

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a senior Python developer. Give concise, practical answers.",
    messages=[
        {"role": "user", "content": "How do I handle rate limiting in a REST API?"}
    ]
)

Check message.usage in every response. It tells you exactly how many input and output tokens you used. I log this in production – it adds up faster than you’d expect.

Add Streaming for Real-Time Output

When you’re building anything user-facing, streaming makes a big difference. Instead of waiting 3-5 seconds for a complete response, words appear as Claude generates them.

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a short product description for a task management app."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

That’s five lines added to your original code. The stream.text_stream iterator yields text chunks as they arrive. For batch processing or backend tasks where nobody’s watching, skip streaming – it adds complexity without benefit.

Choose the Right Model and Control Costs

This is where daily experience pays off. Here’s my honest breakdown of this anthropic api tutorial’s most practical section:

Model Cost (input/output per 1M tokens) Best For
Haiku 4.5 $1 / $5 Chat interfaces, simple tasks, high volume
Sonnet 4.6 $3 / $15 Most work – writing, analysis, coding
Opus 4.6 $5 / $25 Complex reasoning, multi-step problems

I use Sonnet for 80% of my API calls. Haiku handles the quick stuff where speed matters more than depth. Opus comes out for tasks where I’d otherwise need multiple Sonnet calls to get the right answer.

Two Cost Controls I Use Every Day

1. Set max_tokens intentionally. If you need a one-paragraph summary, set max_tokens=300, not the default 4096. I’ve cut my API bills roughly in half just by being specific about output length.

2. Use prompt caching for repeated system prompts. If you’re building a chatbot that sends the same system prompt with every request, caching saves up to 90% on those input tokens:

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are a customer support agent for Acme Corp...",
            "cache_control": {"type": "ephemeral"}
        }
    ],
    messages=[{"role": "user", "content": "How do I reset my password?"}]
)

That cache_control flag tells the API to cache your system prompt. On subsequent calls, you pay a fraction of the input token cost. I’ve accidentally run up $50 bills before discovering this. Don’t be me.

What’s Next

You’ve got a working integration from this claude api tutorial. From here, the three things worth learning next:

  • Conversation memory: Pass previous messages in the messages array to maintain context across turns
  • Tool use: Let Claude call functions you define — this is how you build agents and AI automations that run without manual intervention
  • Error handling: Wrap calls in try/except for anthropic.APIError and implement exponential backoff for rate limits

The official Anthropic docs are genuinely well-written (rare for API docs). But you don’t need to read them cover to cover. You’re running. Now build something.