[EAAPL-WRK006] Tool Call Orchestration
Category: Agentic Workflows
Sub-category: Tool Execution Architecture
Version: 1.0
Maturity: Industry Standard
Tags: tool-calling, function-calling, tool-use, parameter-extraction, result-injection, tool-budget
Regulatory Relevance: ISO 42001 §8.4, APRA CPS 234, EU AI Act (Art. 9)
1. Executive Summary
The Tool Call Orchestration Pattern defines the execution mechanics for structured tool use within an agent's reasoning loop: how tools are selected from a registry, how parameters are extracted and validated, how tool results are injected back into context, how errors are handled, and how tool call budgets are enforced. While the Agent Tool Registry (EAAPL-AGT003) defines the registration and discovery contract for tools, this pattern covers the runtime orchestration of tool execution — the moment-to-moment mechanics of safely and reliably executing tool calls within an agentic workflow.
For CIO/CTO audiences: tools are what give AI agents their power to interact with the real world — querying databases, calling APIs, reading documents, writing records. But that power creates risk. An agent that can call any tool, with any parameters, without limits is an operational liability. This pattern defines the guardrails: parameter validation before execution, permission checking per tool call, result sanitisation before injection, error handling that does not silently corrupt the agent's reasoning, and hard limits on the number and cost of tool calls per task. These are not optional — they are the operational controls that make tool-using agents safe enough to deploy in regulated enterprise environments.
2. Problem Statement
Business Problem
Tool-using agents interact with live business systems — databases, APIs, file systems, communication services. Uncontrolled tool invocation creates operational risk: incorrect parameters corrupt data, excessive calls exhaust API quotas, and unhandled errors produce silent failures that the agent treats as successful tool calls.
Technical Problem
The raw output of an LLM tool-call inference step is a JSON object specifying a tool name and parameters. This raw output may contain: invalid parameter types, parameters that exceed allowed value ranges, calls to tools the current user is not authorised to use, and calls with hallucinated parameter values that would produce runtime errors. Simply forwarding this raw output to tool execution is insufficient.
Symptoms of Absence
- Tool call errors are silently swallowed and misrepresented as successful observations in the agent scratchpad
- No parameter validation: hallucinated parameter values cause downstream data corruption
- No per-task tool call budget: runaway agents exhaust API quotas, incurring unexpected costs
- Tool calls execute with the agent's full permissions regardless of the sensitivity of the specific tool
Cost of Inaction
- Data Integrity: Unvalidated parameters passed to write-capable tools can corrupt production data
- Cost Control: Unlimited tool calls create unpredictable cost exposure
- Security: Unscoped tool permissions create lateral movement risk if an agent is compromised
3. Context
When to Apply
- Agents execute tool calls within any reasoning loop (ReAct, Plan-and-Execute, Sequential Chain)
- Tools interact with external systems, databases, or APIs
- Per-task tool call budgets are required
- Tool results contain potentially untrusted content that must be sanitised before context injection
When NOT to Apply
- Pure LLM workflows with no external tool use
- Fully sandboxed environments where tool isolation is provided by the execution environment (still apply parameter validation, but permission model may be simplified)
Prerequisites
- EAAPL-AGT003 (Agent Tool Registry) for tool discovery and permission definitions
- Tool call budget policy (max calls per task, max calls per tool type)
- Parameter schema definitions for all registered tools
- Result sanitisation policy per tool
Industry Applicability
| Industry |
Tool Types Used |
Key Orchestration Requirement |
| Financial Services |
Database query, API calls, calculation engine |
Parameter validation to prevent SQL injection; result sanitisation |
| Legal |
Document search, court record lookup, drafting API |
Permission scoping per matter; budget control for search tools |
| Healthcare |
Clinical database, drug interaction API, EHR write |
Strict parameter validation for write tools; safety checks |
| Government |
Records system, geospatial API, regulatory database |
Audit every tool call; permission scoping per officer role |
| Technology |
Code execution, test runner, version control |
Sandbox enforcement; budget control for compute tools |
4. Architecture Overview
The Tool Call Orchestration layer sits between the agent's LLM inference step and the actual tool execution infrastructure. It is a mandatory gate through which every tool call must pass.
Parameter Extraction and Validation
The LLM produces a tool call specification in its inference output (either via native function calling JSON schema output or via parsed scratchpad action text). The Parameter Extractor parses this into a structured tool call object (tool_name, parameters dict). The Parameter Validator then validates every parameter against the tool's registered schema: type checking, value range validation, required field presence, and pattern matching. Invalid parameters are not forwarded to execution — they generate a correction observation that is injected back into the agent's context.
Permission Gate
Before execution, the Permission Gate checks that the current task's permission scope includes the requested tool and the specific operation (read vs. write vs. admin). The permission scope is established at task initialisation time from the user's identity and the task type's permission policy. Tool calls outside the permission scope are rejected with a structured permission error — the agent can reason about this rejection and choose an alternative approach.
Tool Execution with Timeout
Validated, permitted tool calls are forwarded to the tool execution layer (EAAPL-AGT003). Each call is wrapped in a timeout enforced by the orchestrator — a tool that hangs does not block the agent indefinitely. The timeout is configurable per tool type (fast API calls: 5s; slow database queries: 30s).
Result Sanitisation
Tool results are processed through the Result Sanitiser before injection into the agent's context. Sanitisation: (a) enforces a maximum result length (truncates with a summary marker if exceeded), (b) strips potentially injected instruction patterns from string results (prompt injection defence), (c) validates the result schema against the tool's declared output schema.
Tool Call Budget
Every tool call decrements the task's tool call budget. When the budget is exhausted, the orchestrator rejects further tool calls and injects a budget-exhausted observation into the agent's context, triggering the agent to synthesise its final answer from available information. The budget is tracked per tool type to enable fine-grained control (e.g., maximum 3 write operations, unlimited read operations).
Audit Record
Every tool call — including rejected calls — is written to the task audit record: tool name, parameters (sanitised of secrets), result summary, permission outcome, timestamp, and budget state.
5. Architecture Diagram
flowchart TD
subgraph Agent["Agent Reasoning Loop"]
A[LLM Tool Call]
end
subgraph Orchestration["Tool Call Orchestration Layer"]
B[Parameter Extractor]
C{Parameter Validation}
D{Permission Gate}
E{Budget Check}
F[Tool Executor]
G[Result Sanitiser]
end
subgraph Tools["Tool Registry"]
H[Tool A: Database]
I[Tool B: External API]
J[Tool C: Write Op]
end
subgraph Feedback["Context Injection"]
K[Valid Observation]
L[Error Observation]
end
subgraph Audit["Audit"]
M[(Tool Call Audit Log)]
end
A --> B
B --> C
C -->|invalid params| L
C -->|valid| D
D -->|denied| L
D -->|permitted| E
E -->|budget exhausted| L
E -->|budget available| F
F --> H & I & J
H & I & J --> G
G --> K
K --> M
L --> M
6. Components
| Component |
Type |
Responsibility |
Technology Options |
Criticality |
| Parameter Extractor |
Logic Component |
Parses LLM tool call output into structured tool call object |
Native function calling parser; custom JSON/regex parser |
Critical |
| Parameter Validator |
Logic Component |
Validates parameters against tool schema |
Pydantic v2; JSON Schema validator; custom type checks |
Critical |
| Permission Gate |
Security |
Checks tool + operation against task permission scope |
Custom RBAC; OPA (Open Policy Agent); IAM policy evaluation |
Critical |
| Budget Controller |
Safety |
Tracks and enforces per-task, per-tool-type call budgets |
Counter in task state; configurable limits per task type |
Critical |
| Tool Executor |
Integration |
Invokes the registered tool with validated parameters; enforces timeout |
EAAPL-AGT003 tool invocation layer |
Critical |
| Result Sanitiser |
Security + Logic |
Truncates, validates, and sanitises tool results before context injection |
Custom Python; LangChain output parser; regex content filter |
Critical |
| Error Observation Generator |
Logic |
Produces structured correction observations for validation/permission/budget failures |
Custom; prompt templates per error type |
High |
| Tool Call Audit Logger |
Governance |
Records every tool call attempt with full metadata |
PostgreSQL; CloudWatch Logs; Splunk |
High |
7. Data Flow
| Step |
Actor |
Action |
Output |
| 1 |
LLM |
Produces tool call in inference output |
{"tool": "search_regulatory_db", "params": {"query": "CPS 234", "limit": 10}} |
| 2 |
Parameter Extractor |
Parses tool call object |
Structured: {tool_name: "search_regulatory_db", params: {query: str, limit: int}} |
| 3 |
Parameter Validator |
Validates against registered schema: query (string ≤ 500 chars ✓), limit (int 1–50 ✓) |
PASS |
| 4 |
Permission Gate |
Task permission scope includes "read:regulatory_db" — tool requires "read:regulatory_db" |
GRANTED |
| 5 |
Budget Controller |
Task budget: 8/10 calls remaining for "read" operations |
PROCEED; budget decremented to 7/10 |
| 6 |
Tool Executor |
Invokes search_regulatory_db with validated params; 2s timeout |
Raw result: [{doc_id: "CPS234-§3.4", content: "...500 chars..."}] |
| 7 |
Result Sanitiser |
Content length OK (800 chars < 2000 char limit); no injection patterns; schema valid |
Sanitised result |
| 8 |
Context Injector |
Injects as Observation in agent scratchpad |
Observation: 3 documents found: [...] |
| 9 |
Audit Logger |
Records: timestamp, tool, params, result_summary, budget_state, permission |
Audit entry persisted |
Error Flow
| Error |
Detection |
Recovery |
| Invalid parameter type (e.g. string passed for int field) |
Parameter Validator |
Inject: Observation: Tool call failed: parameter 'limit' must be integer, got string '10'. Correct and retry. |
| Tool call denied (permission not in scope) |
Permission Gate |
Inject: Observation: Tool 'write_record' is not permitted for this task. Available tools: [list of permitted tools] |
| Tool timeout |
Executor timeout wrapper |
Inject: Observation: Tool 'query_legacy_db' timed out after 30s. Consider an alternative approach or a simpler query. |
| Result exceeds max length |
Result Sanitiser |
Truncate; inject with truncation marker: Observation: [TRUNCATED at 2000 chars] First 2000 chars of result: [...] |
| Budget exhausted |
Budget Controller |
Inject: Observation: Tool call budget exhausted (10/10 calls used). Synthesise answer from available information. |
8. Security Considerations
Parameter Injection into Tool Calls
- LLM may hallucinate parameter values designed to exploit tools (e.g., SQL injection attempts in a database query parameter)
- Mitigation: Tool implementations must use parameterised queries and never string-interpolate LLM-provided values; the Parameter Validator enforces type safety but does not substitute for safe tool implementation
OWASP LLM Top 10
| OWASP LLM Risk |
Tool Call Orchestration Applicability |
Mitigation |
| LLM01 Prompt Injection |
Tool results injected into context may contain instructions |
Result sanitisation; content delimiters around all observations |
| LLM07 Insecure Plugin Design |
Tool parameters pass LLM output to external systems |
Parameter validation; tool implementations use parameterised APIs; no string interpolation |
| LLM08 Excessive Agency |
Write-capable tools can cause irreversible side effects |
Write-tool budget limits; human approval gate before write calls; permission scoping |
| LLM04 Model DoS |
Unlimited tool calls exhaust API quotas |
Per-task, per-tool-type call budgets enforced before execution |
9. Governance Considerations
Write Tool Governance
- Tools that write, update, or delete data in production systems must have separate governance from read tools
- Write tool calls should require explicit human approval for irreversible operations (EAAPL-HITL001)
- Write tool calls must be individually logged with the full parameter set (for audit and rollback)
Governance Artefacts
| Artefact |
Owner |
Frequency |
Purpose |
| Tool Permission Policy |
Security + AI Governance |
On change; quarterly review |
Documents which tools are permitted per task type and user role |
| Tool Call Budget Policy |
FinOps + AI Governance |
Quarterly |
Documents budget limits per task type and tool category |
| Tool Call Audit Archive |
Compliance |
Per call; retained per policy |
Full record of every tool invocation for audit and investigation |
| Parameter Validation Schema Register |
AI Platform |
On tool registration or change |
Version-controlled schemas for all registered tools |
10. Operational Considerations
SLOs
| SLO |
Target |
Window |
Alert |
| Tool call success rate (validated + permitted + executed) |
≥ 97% |
1-hour rolling |
< 93% triggers P2 |
| Parameter validation pass rate |
≥ 98% |
24-hour rolling |
< 95% triggers P3; review LLM tool call quality |
| Tool execution p95 latency |
≤ tool-specific SLA (e.g., 5s for API, 30s for DB) |
1-hour rolling |
Exceeds 2× SLA triggers P2 |
| Budget exhaustion rate |
≤ 3% of tasks |
24-hour rolling |
> 8% triggers P3; review budget policy |
Monitoring
- Validation failure by tool and parameter field: identifies systematic LLM misuse of specific tool APIs
- Permission denial rate trending: increasing denials may indicate agents attempting out-of-scope operations
- Tool latency distribution: performance degradation in upstream tool dependencies
11. Cost Considerations
| Cost Factor |
Driver |
Control |
| LLM inference for tool call generation |
Number of tool calls per task |
Per-task budget; efficient tool design to reduce required calls |
| External API call costs |
API pricing × call volume |
Per-task, per-tool budget; caching identical calls |
| Compute for parameter validation |
Negligible vs. LLM cost |
Not a significant optimisation target |
| Write tool risk cost |
Data corruption, API abuse, quota exhaustion |
Budget limits; permission scoping; monitoring |
Budget Configuration Guidelines
| Task Type |
Recommended Read Budget |
Recommended Write Budget |
| Information retrieval |
10–20 read calls |
0 write calls |
| Research and analysis |
15–30 read calls |
0–2 write calls (e.g., save result) |
| Automated processing |
5–15 read calls |
3–10 write calls (with approval gate) |
| Code generation + test |
10 read calls |
5 code execution calls |
12. Trade-Off Analysis
| Option |
Safety |
Flexibility |
Latency Overhead |
Complexity |
Best For |
| A: Full orchestration layer (Recommended) |
Very High |
High |
Low (< 10ms overhead) |
Medium |
Production agentic systems |
| B: Validation only (no budget/permission) |
Medium |
Very High |
Very Low |
Low |
Development/prototyping |
| C: Permission + budget only (no validation) |
Medium |
High |
Minimal |
Low |
Internal tools with trusted inputs |
| D: Direct tool invocation (no orchestration) |
Low |
Very High |
None |
None |
Sandboxed research only |
Architectural Tensions
| Tension |
Left Pole |
Right Pole |
Balance |
| Strict validation vs. Agent flexibility |
Reject any deviation from schema |
Accept anything; let tool handle errors |
Strict type validation; permissive on optional fields |
| Budget tightness vs. Task completion |
Very low budget (cost controlled) |
High budget (high completion rate) |
Set budget to p95 observed usage + 20% buffer |
| Result verbosity vs. Context efficiency |
Full tool result in context |
Summarised result only |
Full result up to limit; summarise on truncation |
13. Failure Modes
| Failure Mode |
Likelihood |
Impact |
Detection |
Recovery |
| Parameter hallucination (LLM generates wrong param values) |
Medium |
Medium — tool call fails; agent retries |
Validation failure rate per tool |
Validation error observation; agent self-corrects on retry |
| Tool result prompt injection |
Low |
High — agent hijacked |
Result sanitisation catches patterns |
Sanitise; delimit; anomaly alert if injection pattern detected |
| Budget exhausted too early (budget set too low) |
Medium |
Medium — task completes with partial information |
Budget exhaustion rate monitoring |
Tune budget policy based on observed p95 usage |
| Write tool called with stale data (race condition) |
Low |
High — data corruption |
Idempotency key; optimistic locking at tool level |
Idempotency key per write call (EAAPL idempotency guidance) |
| Timeout cascade (slow tool blocks entire task) |
Low–Medium |
Medium — task latency spike |
Per-tool timeout monitoring |
Per-tool timeout; error observation injected; agent uses alternative approach |
14. Regulatory Considerations
EU AI Act
- Art. 9 (Risk Management): Tool call orchestration controls (parameter validation, permission gate, budget) are risk management measures for agentic AI systems interacting with live business systems.
APRA CPS 234
- Every tool call that accesses or modifies information assets must be logged (tool call audit log) and access must be scoped to minimum necessary permissions (permission gate).
ISO 42001
- §8.4: The tool permission policy and budget policy are operational controls that must be documented, version-controlled, and regularly reviewed.
Australian Context
- For AFS-licensed entities, write tool calls that affect customer records must be individually auditable and the full parameter set must be retained for dispute resolution.
- OAIC: Tool calls that access personal information must be scoped to minimum necessary; the permission gate implements this control.
15. Reference Implementations
AWS
| Component |
Service |
| Parameter Extraction + Validation |
Lambda function with Pydantic validation layer |
| Permission Gate |
AWS IAM policy evaluation per tool ARN; custom RBAC via DynamoDB |
| Tool Execution |
AWS Lambda per tool (invoked via SDK) |
| Budget Tracking |
DynamoDB counter per task; atomic decrement |
| Result Sanitisation |
Lambda function with content filtering |
| Audit Logging |
CloudWatch Logs → Kinesis → S3 |
Azure
| Component |
Service |
| Orchestration Layer |
Azure Functions middleware chain |
| Permission Gate |
Azure AD + custom RBAC claims |
| Tool Execution |
Azure Functions per tool |
| Budget Tracking |
Azure Cosmos DB counter |
| Audit Logging |
Azure Monitor → Event Hubs → Blob Storage |
On-Premises
| Component |
Technology |
| Full Orchestration |
Custom Python orchestration layer; FastAPI middleware |
| Parameter Validation |
Pydantic v2 with tool schema registry |
| Permission Gate |
OPA (Open Policy Agent) with tool permission policies |
| Audit Log |
PostgreSQL append-only table |
| Pattern |
ID |
Relationship Type |
Notes |
| Agent Tool Registry |
EAAPL-AGT003 |
Depends On |
Registry provides tool schemas and permission definitions; orchestration enforces them at runtime |
| ReAct Agent Loop |
EAAPL-WRK001 |
Integrates With |
Every Action phase in ReAct passes through the tool call orchestration layer |
| Human Escalation |
EAAPL-HITL001 |
Integrates With |
Write tool calls may trigger human approval via escalation pattern |
| Workflow Tracing and Replay |
EAAPL-WRK013 |
Integrates With |
Tool call audit log is a primary input to the workflow trace |
| Iterative Constraint Satisfaction |
EAAPL-WRK015 |
Complementary |
Constraint checker can evaluate tool call plans before execution |
17. Maturity Assessment
Overall Maturity: Industry Standard
| Dimension |
Score (1–5) |
Evidence |
| Research Foundation |
4 |
Function calling widely studied; tool use safety emerging literature |
| Production Deployment |
5 |
Tool calling deployed at scale in OpenAI, Anthropic, Google APIs and all major frameworks |
| Framework Support |
5 |
Native function calling in all major LLM APIs; LangChain tools; LlamaIndex tools |
| Parameter Validation Tooling |
4 |
Pydantic + Instructor widely adopted; OpenAI structured output GA |
| Permission + Budget Tooling |
3 |
Custom implementations common; standardised tooling emerging |
18. Revision History
| Version |
Date |
Author |
Changes |
| 1.0 |
2025-06-13 |
Architecture Board |
Initial publication in Agentic Workflows category |