LLM Function Calling in Python — Core Concepts
Function calling (also called tool use) is an API feature where an LLM can request that your application execute specific functions on its behalf. The model never runs code — it outputs a structured JSON request, your Python code executes it, and you feed the result back.
The execution loop
The standard flow has four steps:
- Define tools — you provide a list of function descriptions with JSON Schema parameters to the API.
- Model decides — based on the user’s message and tool descriptions, the model either answers directly or emits a tool-call request.
- Your code executes — Python receives the tool-call JSON, dispatches to the real function, and captures the return value.
- Feed result back — you append the function result to the conversation and call the model again so it can incorporate the data into its response.
This loop may repeat if the model needs multiple tools or wants to chain calls.
How tool definitions work
Each tool is described with a name, a natural-language description, and a JSON Schema for parameters. The quality of descriptions directly affects whether the model picks the right tool. Vague descriptions lead to wrong calls.
Good descriptions are specific about what the function does, what each parameter means, and what the return value contains. They read like concise API documentation, not marketing copy.
Parallel and sequential tool calls
OpenAI, Anthropic, and Google models support parallel tool calls — the model can request multiple functions in a single response. Your code should handle executing them concurrently when they are independent and sequentially when one depends on another’s result.
Mandatory vs. auto tool choice
APIs let you control tool behavior:
- auto — model decides whether to use tools (default).
- required — model must call at least one tool.
- none — tools are disabled for this call.
- specific — force a particular tool (useful for structured extraction).
Common misconception
People often assume function calling is just “prompt engineering with extra steps.” It is architecturally different. The model outputs structured JSON conforming to your schema, not free-form text you have to parse. This makes the interface between AI and code reliable and type-safe.
When to use function calling
Use it when the model needs real-time data (weather, prices, database queries), when it needs to perform actions (send email, create records), or when you want structured extraction from unstructured text. Do not use it for tasks the model can handle from its training data alone — it adds latency and cost.
The one thing to remember: Function calling creates a structured contract between the LLM and your Python code — the model requests actions via JSON, your code executes and returns results, and neither side crosses its boundary.
See Also
- Python Agent Frameworks An agent framework gives AI the ability to plan, use tools, and work through problems step by step — like upgrading a calculator into a research assistant.
- Python Embedding Pipelines An embedding pipeline turns words into numbers that capture meaning — like translating every sentence into coordinates on a giant map of ideas.
- Python Guardrails Ai Guardrails are safety bumpers for AI — they check what the model says before it reaches users, like a spellchecker but for facts, tone, and dangerous content.
- Python Llm Evaluation Harness An LLM evaluation harness is like a report card for AI — it runs tests and grades how well the model answers questions so you know if it is actually improving.
- Python Prompt Chaining Think of prompt chaining as a relay race where each runner hands a baton to the next — except the runners are AI prompts building on each other's work.