Friday, July 3, 2026
HomeArtificial IntelligenceGetting Began with the Claude API in Python

Getting Began with the Claude API in Python

Getting Began with the Claude API in Python
 

Introduction

 
You need to add Claude to a Python software. Creating an account and making your first API name is easy. The official documentation can get you from zero to a working request in a couple of minutes. The subsequent questions are normally extra sensible:

  • What does the response object include?
  • How do you stream responses so customers can see output because it’s generated?
  • How do you construction prompts and deal with responses in a manufacturing software?

The Claude Python SDK takes care of a lot of the underlying API interplay. It gives typed response objects, built-in retry dealing with, and a easy interface for working with the Messages API.

This text walks you thru setup, your first API name, studying the response, system prompts, and streaming. By the top, you will have a working basis.

 

Stipulations and Set up

 
You want Python 3.9 or increased, a free Claude Console account, and an API key from the Console’s Settings > API Keys web page. You’ll be able to add $5 in credit and work by way of all the pieces on this article.

With these in place, set up the SDK:

 

By no means hardcode your API key in supply information. Retailer it as an surroundings variable as an alternative:

export ANTHROPIC_API_KEY="YOUR-API-KEY-HERE"

 

Or add it to a .env file on the undertaking root should you’re utilizing python-dotenv. The SDK reads the ANTHROPIC_API_KEY out of your surroundings, so that you need not move it wherever in your code.

 

Making Your First API Name

 
The entry level for each interplay is consumer.messages.create(). Let’s ask Claude to clarify what a context window is, one thing you will really want to grasp as you utilize the API.

You move three issues: the mannequin ID, a max_tokens restrict, and a messages checklist. The messages checklist is all the time a listing of dicts, every with a "function" and "content material" key.

import anthropic

consumer = anthropic.Anthropic()

response = consumer.messages.create(
    mannequin="claude-sonnet-5",
    max_tokens=256,
    messages=[
        {
            "role": "user",
            "content": "In one sentence, what is a context window?"
        }
    ]
)

print(response.content material[0].textual content)

 

The mannequin discipline takes the precise mannequin ID string. max_tokens is a tough ceiling on what number of output tokens Claude will produce; the response stops there even when the thought is not full, so set it excessive sufficient for open-ended requests. The messages checklist should all the time begin with a "person" flip.

Pattern output:

A context window is the utmost quantity of textual content (measured in tokens) {that a} language
mannequin can course of and take into account at one time, encompassing each your enter and its output.

 

Understanding the Response Object

 
The response from messages.create() is a typed Message object. It is price inspecting the total construction earlier than constructing something on prime of it.

Substitute the print line within the earlier instance with:

 

Operating that provides you the total object:

Message(
  id='msg_01XFDUDYJgAACzvnptvVoYEL',
  kind="message",
  function="assistant",
  content material=[TextBlock(text="A context window is...", type="text")],
  mannequin="claude-sonnet-5",
  stop_reason='end_turn',
  stop_sequence=None,
  utilization=Utilization(input_tokens=19, output_tokens=42)
)

 

Just a few fields right here matter greater than they first seem. stop_reason tells you why Claude stopped producing. end_turn means Claude completed by itself phrases. When you see max_tokens, the response was lower off by your restrict, and chances are you’ll want to lift it or rethink the immediate.

The utilization discipline tracks each enter and output tokens for the request. That is how Anthropic calculates billing, and it is also the way you detect when a immediate is creeping too near the mannequin’s context restrict. content material is a listing — in customary textual content responses it all the time has one merchandise, a TextBlock — so response.content material[0].textual content is the idiomatic technique to pull the textual content out.

 

Utilizing System Prompts

 
A system immediate enables you to give Claude a persistent function, set constraints, or present context that ought to apply throughout your entire dialog. You move it as a top-level system parameter — separate from the messages checklist, not as a message itself.

Right here we configure Claude to behave as a code reviewer who solely responds in Python and avoids common explanations:

import anthropic

consumer = anthropic.Anthropic()

response = consumer.messages.create(
    mannequin="claude-sonnet-5",
    max_tokens=512,
    system=(
        "You're a Python code reviewer. "
        "Reply solely with corrected or improved Python code. "
        "Don't clarify adjustments except the person explicitly asks."
    ),
    messages=[
        {
            "role": "user",
            "content": (
                "def get_user(id):n"
                "    db = connect()n"
                "    return db.query('SELECT * FROM users WHERE id=' + id)"
            )
        }
    ]
)

print(response.content material[0].textual content)

 

The system immediate sits above the dialog in Claude’s context. It carries the identical authority all through all turns, so function directions, formatting guidelines, and area constraints you set right here persist with out you repeating them in each message.

 

Streaming Responses

 
For requests the place Claude might take a couple of seconds to reply, streaming enables you to show textual content because it arrives as an alternative of ready for the total response. The SDK exposes this by way of consumer.messages.stream(), used as a context supervisor.

The text_stream iterator yields particular person textual content chunks in actual time. Every chunk is a string fragment, not a full sentence. You move finish="" and flush=True to print() so output seems constantly reasonably than buffering:

import anthropic

consumer = anthropic.Anthropic()

with consumer.messages.stream(
    mannequin="claude-sonnet-5",
    max_tokens=512,
    messages=[
        {
            "role": "user",
            "content": "Walk me through what happens when a Python list grows beyond its initial capacity."
        }
    ]
) as stream:
    for chunk in stream.text_stream:
        print(chunk, finish="", flush=True)

print()  # newline after stream ends

 

The context supervisor ensures the HTTP connection is closed cleanly when the block exits, even when an exception is raised mid-stream. When you want the entire Message object after streaming — together with token utilization counts — name stream.get_final_message() earlier than the block closes.

Pattern output:

Python lists are dynamic arrays. Once you append a component and the checklist has no
room, Python allocates a brand new, bigger block of reminiscence — usually 1.125x the present
dimension — copies all current components into it, and releases the outdated block. This
operation is O(n) within the worst case, however as a result of it occurs occasionally relative to
the variety of appends, the amortized price per append stays O(1). You'll be able to pre-allocate
capability with a listing comprehension or by passing an iterable to the checklist constructor
if you already know the ultimate dimension upfront.

 

Subsequent Steps

 
You now have the core constructing blocks: requests, structured responses, system prompts, and streaming.

Subsequent, you may study error dealing with, token utilization, and multi-turn conversations. As a result of the API is stateless, that you must ship the dialog historical past with every request. The SDK documentation reveals the advisable strategy.

The API reference additionally consists of options like structured outputs and device use. Comfortable exploring!
 
 

Bala Priya C is a developer and technical author from India. She likes working on the intersection of math, programming, knowledge science, and content material creation. Her areas of curiosity and experience embody DevOps, knowledge science, and pure language processing. She enjoys studying, writing, coding, and low! At the moment, she’s engaged on studying and sharing her information with the developer group by authoring tutorials, how-to guides, opinion items, and extra. Bala additionally creates partaking useful resource overviews and coding tutorials.


RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments