Industry Standard

Tool Call Orchestration

Agentic WorkflowsEU AI ActISO/IEC 42001

[EAAPL-WRK006] Tool Call Orchestration

Category: Agentic Workflows Sub-category: Tool Execution Architecture Version: 1.0 Maturity: Industry Standard Tags: tool-calling, function-calling, tool-use, parameter-extraction, result-injection, tool-budget Regulatory Relevance: ISO 42001 §8.4, APRA CPS 234, EU AI Act (Art. 9)

1. Executive Summary

The Tool Call Orchestration Pattern defines the execution mechanics for structured tool use within an agent's reasoning loop: how tools are selected from a registry, how parameters are extracted and validated, how tool results are injected back into context, how errors are handled, and how tool call budgets are enforced. While the Agent Tool Registry (EAAPL-AGT003) defines the registration and discovery contract for tools, this pattern covers the runtime orchestration of tool execution — the moment-to-moment mechanics of safely and reliably executing tool calls within an agentic workflow.

For CIO/CTO audiences: tools are what give AI agents their power to interact with the real world — querying databases, calling APIs, reading documents, writing records. But that power creates risk. An agent that can call any tool, with any parameters, without limits is an operational liability. This pattern defines the guardrails: parameter validation before execution, permission checking per tool call, result sanitisation before injection, error handling that does not silently corrupt the agent's reasoning, and hard limits on the number and cost of tool calls per task. These are not optional — they are the operational controls that make tool-using agents safe enough to deploy in regulated enterprise environments.

2. Problem Statement

Business Problem

Tool-using agents interact with live business systems — databases, APIs, file systems, communication services. Uncontrolled tool invocation creates operational risk: incorrect parameters corrupt data, excessive calls exhaust API quotas, and unhandled errors produce silent failures that the agent treats as successful tool calls.

Technical Problem

The raw output of an LLM tool-call inference step is a JSON object specifying a tool name and parameters. This raw output may contain: invalid parameter types, parameters that exceed allowed value ranges, calls to tools the current user is not authorised to use, and calls with hallucinated parameter values that would produce runtime errors. Simply forwarding this raw output to tool execution is insufficient.

Symptoms of Absence

Tool call errors are silently swallowed and misrepresented as successful observations in the agent scratchpad
No parameter validation: hallucinated parameter values cause downstream data corruption
No per-task tool call budget: runaway agents exhaust API quotas, incurring unexpected costs
Tool calls execute with the agent's full permissions regardless of the sensitivity of the specific tool

Cost of Inaction

Data Integrity: Unvalidated parameters passed to write-capable tools can corrupt production data
Cost Control: Unlimited tool calls create unpredictable cost exposure
Security: Unscoped tool permissions create lateral movement risk if an agent is compromised

3. Context

When to Apply

Agents execute tool calls within any reasoning loop (ReAct, Plan-and-Execute, Sequential Chain)
Tools interact with external systems, databases, or APIs
Per-task tool call budgets are required
Tool results contain potentially untrusted content that must be sanitised before context injection

When NOT to Apply

Pure LLM workflows with no external tool use
Fully sandboxed environments where tool isolation is provided by the execution environment (still apply parameter validation, but permission model may be simplified)

Prerequisites

EAAPL-AGT003 (Agent Tool Registry) for tool discovery and permission definitions
Tool call budget policy (max calls per task, max calls per tool type)
Parameter schema definitions for all registered tools
Result sanitisation policy per tool

Industry Applicability

Industry	Tool Types Used	Key Orchestration Requirement
Financial Services	Database query, API calls, calculation engine	Parameter validation to prevent SQL injection; result sanitisation
Legal	Document search, court record lookup, drafting API	Permission scoping per matter; budget control for search tools
Healthcare	Clinical database, drug interaction API, EHR write	Strict parameter validation for write tools; safety checks
Government	Records system, geospatial API, regulatory database	Audit every tool call; permission scoping per officer role
Technology	Code execution, test runner, version control	Sandbox enforcement; budget control for compute tools

4. Architecture Overview

The Tool Call Orchestration layer sits between the agent's LLM inference step and the actual tool execution infrastructure. It is a mandatory gate through which every tool call must pass.

Parameter Extraction and Validation The LLM produces a tool call specification in its inference output (either via native function calling JSON schema output or via parsed scratchpad action text). The Parameter Extractor parses this into a structured tool call object (tool_name, parameters dict). The Parameter Validator then validates every parameter against the tool's registered schema: type checking, value range validation, required field presence, and pattern matching. Invalid parameters are not forwarded to execution — they generate a correction observation that is injected back into the agent's context.

Permission Gate Before execution, the Permission Gate checks that the current task's permission scope includes the requested tool and the specific operation (read vs. write vs. admin). The permission scope is established at task initialisation time from the user's identity and the task type's permission policy. Tool calls outside the permission scope are rejected with a structured permission error — the agent can reason about this rejection and choose an alternative approach.

Tool Execution with Timeout Validated, permitted tool calls are forwarded to the tool execution layer (EAAPL-AGT003). Each call is wrapped in a timeout enforced by the orchestrator — a tool that hangs does not block the agent indefinitely. The timeout is configurable per tool type (fast API calls: 5s; slow database queries: 30s).

Result Sanitisation Tool results are processed through the Result Sanitiser before injection into the agent's context. Sanitisation: (a) enforces a maximum result length (truncates with a summary marker if exceeded), (b) strips potentially injected instruction patterns from string results (prompt injection defence), (c) validates the result schema against the tool's declared output schema.

Tool Call Budget Every tool call decrements the task's tool call budget. When the budget is exhausted, the orchestrator rejects further tool calls and injects a budget-exhausted observation into the agent's context, triggering the agent to synthesise its final answer from available information. The budget is tracked per tool type to enable fine-grained control (e.g., maximum 3 write operations, unlimited read operations).

Audit Record Every tool call — including rejected calls — is written to the task audit record: tool name, parameters (sanitised of secrets), result summary, permission outcome, timestamp, and budget state.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Agent["Agent Reasoning Loop"] A[LLM Tool Call] end subgraph Orchestration["Tool Call Orchestration Layer"] B[Parameter Extractor] C{Parameter Validation} D{Permission Gate} E{Budget Check} F[Tool Executor] G[Result Sanitiser] end subgraph Tools["Tool Registry"] H[Tool A: Database] I[Tool B: External API] J[Tool C: Write Op] end subgraph Feedback["Context Injection"] K[Valid Observation] L[Error Observation] end subgraph Audit["Audit"] M[(Tool Call Audit Log)] end A --> B B --> C C -->|invalid params| L C -->|valid| D D -->|denied| L D -->|permitted| E E -->|budget exhausted| L E -->|budget available| F F --> H & I & J H & I & J --> G G --> K K --> M L --> M

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Parameter Extractor	Logic Component	Parses LLM tool call output into structured tool call object	Native function calling parser; custom JSON/regex parser	Critical
Parameter Validator	Logic Component	Validates parameters against tool schema	Pydantic v2; JSON Schema validator; custom type checks	Critical
Permission Gate	Security	Checks tool + operation against task permission scope	Custom RBAC; OPA (Open Policy Agent); IAM policy evaluation	Critical
Budget Controller	Safety	Tracks and enforces per-task, per-tool-type call budgets	Counter in task state; configurable limits per task type	Critical
Tool Executor	Integration	Invokes the registered tool with validated parameters; enforces timeout	EAAPL-AGT003 tool invocation layer	Critical
Result Sanitiser	Security + Logic	Truncates, validates, and sanitises tool results before context injection	Custom Python; LangChain output parser; regex content filter	Critical
Error Observation Generator	Logic	Produces structured correction observations for validation/permission/budget failures	Custom; prompt templates per error type	High
Tool Call Audit Logger	Governance	Records every tool call attempt with full metadata	PostgreSQL; CloudWatch Logs; Splunk	High

7. Data Flow

Step	Actor	Action	Output
1	LLM	Produces tool call in inference output	`{"tool": "search_regulatory_db", "params": {"query": "CPS 234", "limit": 10}}`
2	Parameter Extractor	Parses tool call object	Structured: `{tool_name: "search_regulatory_db", params: {query: str, limit: int}}`
3	Parameter Validator	Validates against registered schema: query (string ≤ 500 chars ✓), limit (int 1–50 ✓)	PASS
4	Permission Gate	Task permission scope includes "read:regulatory_db" — tool requires "read:regulatory_db"	GRANTED
5	Budget Controller	Task budget: 8/10 calls remaining for "read" operations	PROCEED; budget decremented to 7/10
6	Tool Executor	Invokes search_regulatory_db with validated params; 2s timeout	Raw result: `[{doc_id: "CPS234-§3.4", content: "...500 chars..."}]`
7	Result Sanitiser	Content length OK (800 chars < 2000 char limit); no injection patterns; schema valid	Sanitised result
8	Context Injector	Injects as Observation in agent scratchpad	`Observation: 3 documents found: [...]`
9	Audit Logger	Records: timestamp, tool, params, result_summary, budget_state, permission	Audit entry persisted

Error Flow

Error	Detection	Recovery
Invalid parameter type (e.g. string passed for int field)	Parameter Validator	Inject: `Observation: Tool call failed: parameter 'limit' must be integer, got string '10'. Correct and retry.`
Tool call denied (permission not in scope)	Permission Gate	Inject: `Observation: Tool 'write_record' is not permitted for this task. Available tools: [list of permitted tools]`
Tool timeout	Executor timeout wrapper	Inject: `Observation: Tool 'query_legacy_db' timed out after 30s. Consider an alternative approach or a simpler query.`
Result exceeds max length	Result Sanitiser	Truncate; inject with truncation marker: `Observation: [TRUNCATED at 2000 chars] First 2000 chars of result: [...]`
Budget exhausted	Budget Controller	Inject: `Observation: Tool call budget exhausted (10/10 calls used). Synthesise answer from available information.`

8. Security Considerations

Parameter Injection into Tool Calls

LLM may hallucinate parameter values designed to exploit tools (e.g., SQL injection attempts in a database query parameter)
Mitigation: Tool implementations must use parameterised queries and never string-interpolate LLM-provided values; the Parameter Validator enforces type safety but does not substitute for safe tool implementation

OWASP LLM Top 10

OWASP LLM Risk	Tool Call Orchestration Applicability	Mitigation
LLM01 Prompt Injection	Tool results injected into context may contain instructions	Result sanitisation; content delimiters around all observations
LLM07 Insecure Plugin Design	Tool parameters pass LLM output to external systems	Parameter validation; tool implementations use parameterised APIs; no string interpolation
LLM08 Excessive Agency	Write-capable tools can cause irreversible side effects	Write-tool budget limits; human approval gate before write calls; permission scoping
LLM04 Model DoS	Unlimited tool calls exhaust API quotas	Per-task, per-tool-type call budgets enforced before execution

9. Governance Considerations

Write Tool Governance

Tools that write, update, or delete data in production systems must have separate governance from read tools
Write tool calls should require explicit human approval for irreversible operations (EAAPL-HITL001)
Write tool calls must be individually logged with the full parameter set (for audit and rollback)

Governance Artefacts

Artefact	Owner	Frequency	Purpose
Tool Permission Policy	Security + AI Governance	On change; quarterly review	Documents which tools are permitted per task type and user role
Tool Call Budget Policy	FinOps + AI Governance	Quarterly	Documents budget limits per task type and tool category
Tool Call Audit Archive	Compliance	Per call; retained per policy	Full record of every tool invocation for audit and investigation
Parameter Validation Schema Register	AI Platform	On tool registration or change	Version-controlled schemas for all registered tools

10. Operational Considerations

SLOs

SLO	Target	Window	Alert
Tool call success rate (validated + permitted + executed)	≥ 97%	1-hour rolling	< 93% triggers P2
Parameter validation pass rate	≥ 98%	24-hour rolling	< 95% triggers P3; review LLM tool call quality
Tool execution p95 latency	≤ tool-specific SLA (e.g., 5s for API, 30s for DB)	1-hour rolling	Exceeds 2× SLA triggers P2
Budget exhaustion rate	≤ 3% of tasks	24-hour rolling	> 8% triggers P3; review budget policy

Monitoring

Validation failure by tool and parameter field: identifies systematic LLM misuse of specific tool APIs
Permission denial rate trending: increasing denials may indicate agents attempting out-of-scope operations
Tool latency distribution: performance degradation in upstream tool dependencies

11. Cost Considerations

Cost Factor	Driver	Control
LLM inference for tool call generation	Number of tool calls per task	Per-task budget; efficient tool design to reduce required calls
External API call costs	API pricing × call volume	Per-task, per-tool budget; caching identical calls
Compute for parameter validation	Negligible vs. LLM cost	Not a significant optimisation target
Write tool risk cost	Data corruption, API abuse, quota exhaustion	Budget limits; permission scoping; monitoring

Budget Configuration Guidelines

Task Type	Recommended Read Budget	Recommended Write Budget
Information retrieval	10–20 read calls	0 write calls
Research and analysis	15–30 read calls	0–2 write calls (e.g., save result)
Automated processing	5–15 read calls	3–10 write calls (with approval gate)
Code generation + test	10 read calls	5 code execution calls

12. Trade-Off Analysis

Option	Safety	Flexibility	Latency Overhead	Complexity	Best For
A: Full orchestration layer (Recommended)	Very High	High	Low (< 10ms overhead)	Medium	Production agentic systems
B: Validation only (no budget/permission)	Medium	Very High	Very Low	Low	Development/prototyping
C: Permission + budget only (no validation)	Medium	High	Minimal	Low	Internal tools with trusted inputs
D: Direct tool invocation (no orchestration)	Low	Very High	None	None	Sandboxed research only

Architectural Tensions

Tension	Left Pole	Right Pole	Balance
Strict validation vs. Agent flexibility	Reject any deviation from schema	Accept anything; let tool handle errors	Strict type validation; permissive on optional fields
Budget tightness vs. Task completion	Very low budget (cost controlled)	High budget (high completion rate)	Set budget to p95 observed usage + 20% buffer
Result verbosity vs. Context efficiency	Full tool result in context	Summarised result only	Full result up to limit; summarise on truncation

13. Failure Modes

Failure Mode	Likelihood	Impact	Detection	Recovery
Parameter hallucination (LLM generates wrong param values)	Medium	Medium — tool call fails; agent retries	Validation failure rate per tool	Validation error observation; agent self-corrects on retry
Tool result prompt injection	Low	High — agent hijacked	Result sanitisation catches patterns	Sanitise; delimit; anomaly alert if injection pattern detected
Budget exhausted too early (budget set too low)	Medium	Medium — task completes with partial information	Budget exhaustion rate monitoring	Tune budget policy based on observed p95 usage
Write tool called with stale data (race condition)	Low	High — data corruption	Idempotency key; optimistic locking at tool level	Idempotency key per write call (EAAPL idempotency guidance)
Timeout cascade (slow tool blocks entire task)	Low–Medium	Medium — task latency spike	Per-tool timeout monitoring	Per-tool timeout; error observation injected; agent uses alternative approach

14. Regulatory Considerations

EU AI Act

Art. 9 (Risk Management): Tool call orchestration controls (parameter validation, permission gate, budget) are risk management measures for agentic AI systems interacting with live business systems.

APRA CPS 234

Every tool call that accesses or modifies information assets must be logged (tool call audit log) and access must be scoped to minimum necessary permissions (permission gate).

ISO 42001

§8.4: The tool permission policy and budget policy are operational controls that must be documented, version-controlled, and regularly reviewed.

Australian Context

For AFS-licensed entities, write tool calls that affect customer records must be individually auditable and the full parameter set must be retained for dispute resolution.
OAIC: Tool calls that access personal information must be scoped to minimum necessary; the permission gate implements this control.

15. Reference Implementations

AWS

Component	Service
Parameter Extraction + Validation	Lambda function with Pydantic validation layer
Permission Gate	AWS IAM policy evaluation per tool ARN; custom RBAC via DynamoDB
Tool Execution	AWS Lambda per tool (invoked via SDK)
Budget Tracking	DynamoDB counter per task; atomic decrement
Result Sanitisation	Lambda function with content filtering
Audit Logging	CloudWatch Logs → Kinesis → S3

Azure

Component	Service
Orchestration Layer	Azure Functions middleware chain
Permission Gate	Azure AD + custom RBAC claims
Tool Execution	Azure Functions per tool
Budget Tracking	Azure Cosmos DB counter
Audit Logging	Azure Monitor → Event Hubs → Blob Storage

On-Premises

Component	Technology
Full Orchestration	Custom Python orchestration layer; FastAPI middleware
Parameter Validation	Pydantic v2 with tool schema registry
Permission Gate	OPA (Open Policy Agent) with tool permission policies
Audit Log	PostgreSQL append-only table

Pattern	ID	Relationship Type	Notes
Agent Tool Registry	EAAPL-AGT003	Depends On	Registry provides tool schemas and permission definitions; orchestration enforces them at runtime
ReAct Agent Loop	EAAPL-WRK001	Integrates With	Every Action phase in ReAct passes through the tool call orchestration layer
Human Escalation	EAAPL-HITL001	Integrates With	Write tool calls may trigger human approval via escalation pattern
Workflow Tracing and Replay	EAAPL-WRK013	Integrates With	Tool call audit log is a primary input to the workflow trace
Iterative Constraint Satisfaction	EAAPL-WRK015	Complementary	Constraint checker can evaluate tool call plans before execution

17. Maturity Assessment

Overall Maturity: Industry Standard

Dimension	Score (1–5)	Evidence
Research Foundation	4	Function calling widely studied; tool use safety emerging literature
Production Deployment	5	Tool calling deployed at scale in OpenAI, Anthropic, Google APIs and all major frameworks
Framework Support	5	Native function calling in all major LLM APIs; LangChain tools; LlamaIndex tools
Parameter Validation Tooling	4	Pydantic + Instructor widely adopted; OpenAI structured output GA
Permission + Budget Tooling	3	Custom implementations common; standardised tooling emerging

18. Revision History

Version	Date	Author	Changes
1.0	2025-06-13	Architecture Board	Initial publication in Agentic Workflows category

Track this pattern for APRA/ASIC review

← Back to Library More Agentic Workflows →