JSON Similarity Evaluator¶
The JSON Similarity Evaluator performs flexible structural comparison of JSON-like outputs using a tree-based matching algorithm. It recursively traverses the JSON structure as a tree and compares leaf nodes (actual values) with type-specific similarity measures.
Overview¶
Evaluator ID: json-similarity
Use Cases:
- Compare complex nested JSON structures
- Validate API responses with tolerance for minor differences
- Test structured outputs where exact matches are too strict
- Measure similarity when numeric values may vary slightly
Returns: Continuous score from 0.0 to 1.0 (0-100% similarity)
Configuration¶
Agent Output Structure
agent_output must always be a dictionary. The evaluator compares dictionary structures recursively, making it ideal for complex nested JSON-like outputs.
JsonSimilarityEvaluatorConfig¶
| Parameter | Type | Default | Description |
|---|---|---|---|
name |
str |
"JsonSimilarityEvaluator" |
The evaluator's name |
target_output_key |
str |
"*" |
Specific key to extract from output (use "*" for entire output) |
default_evaluation_criteria |
OutputEvaluationCriteria or None |
None |
Default criteria if not specified per test |
Evaluation Criteria¶
OutputEvaluationCriteria¶
| Parameter | Type | Description |
|---|---|---|
expected_output |
dict[str, Any] or str |
The expected JSON structure to compare against |
How It Works¶
The evaluator uses a tree-based matching algorithm:
- Tree Structure: The JSON/dictionary structure is treated as a tree, with nested objects and arrays forming branches
- Leaf Comparison: Only leaf nodes (actual values) are compared, using type-specific similarity measures:
- Strings: Levenshtein distance (edit distance) to measure textual similarity
- Numbers: Absolute difference with tolerance (within 1% considered similar)
- Booleans: Exact match required (binary comparison)
- Structural Recursion:
- Objects: Recursively traverses and compares all expected keys
- Arrays: Compares elements by position (index-based matching)
Score Calculation: matched_leaves / total_leaves
The final score represents the percentage of matching leaf nodes in the tree structure.
Examples¶
Basic JSON Comparison¶
from uipath.eval.evaluators import JsonSimilarityEvaluator
from uipath.eval.models import AgentExecution
agent_execution = AgentExecution(
agent_input={},
agent_output={"name": "John Doe", "age": 30, "city": "New York"},
agent_trace=[]
)
evaluator = JsonSimilarityEvaluator(
id="json-sim-1",
config={
"name": "JsonSimilarityEvaluator"
# target_output_key defaults to "*" - compares entire output dict
}
)
result = await evaluator.validate_and_evaluate_criteria(
agent_execution=agent_execution,
evaluation_criteria={
"expected_output": {"name": "John Doe", "age": 30, "city": "New York"}
}
)
print(f"Score: {result.score}") # Output: 1.0 (perfect match)
print(f"Details: {result.details}") # Matched leaves: 3, Total leaves: 3
Numeric Tolerance¶
agent_execution = AgentExecution(
agent_input={},
agent_output={"temperature": 20.5, "humidity": 65},
agent_trace=[]
)
evaluator = JsonSimilarityEvaluator(
id="json-sim-numeric",
config={"name": "JsonSimilarityEvaluator"}
)
# Slightly different numbers
result = await evaluator.validate_and_evaluate_criteria(
agent_execution=agent_execution,
evaluation_criteria={
"expected_output": {"temperature": 20.3, "humidity": 65}
}
)
# High similarity despite numeric difference
print(f"Score: {result.score}") # ~0.99 (very high similarity)
String Similarity¶
agent_execution = AgentExecution(
agent_input={},
agent_output={"status": "completed successfully"},
agent_trace=[]
)
evaluator = JsonSimilarityEvaluator(
id="json-sim-string",
config={"name": "JsonSimilarityEvaluator"}
)
# Similar but not exact string
result = await evaluator.validate_and_evaluate_criteria(
agent_execution=agent_execution,
evaluation_criteria={
"expected_output": {"status": "completed sucessfully"} # typo
}
)
# High but not perfect similarity
print(f"Score: {result.score}") # ~0.95 (high similarity despite typo)
Nested Structures¶
agent_execution = AgentExecution(
agent_input={},
agent_output={
"user": {
"name": "Alice",
"profile": {
"age": 25,
"location": "Paris"
}
},
"status": "active"
},
agent_trace=[]
)
evaluator = JsonSimilarityEvaluator(
id="json-sim-nested",
config={"name": "JsonSimilarityEvaluator"}
)
result = await evaluator.validate_and_evaluate_criteria(
agent_execution=agent_execution,
evaluation_criteria={
"expected_output": {
"user": {
"name": "Alice",
"profile": {
"age": 25,
"location": "Paris"
}
},
"status": "active"
}
}
)
print(f"Score: {result.score}") # Output: 1.0
Array Comparison¶
agent_execution = AgentExecution(
agent_input={},
agent_output={"items": ["apple", "banana", "orange"]},
agent_trace=[]
)
evaluator = JsonSimilarityEvaluator(
id="json-sim-array",
config={"name": "JsonSimilarityEvaluator"}
)
# Partial match (2 out of 3 correct)
result = await evaluator.validate_and_evaluate_criteria(
agent_execution=agent_execution,
evaluation_criteria={
"expected_output": {"items": ["apple", "banana", "grape"]}
}
)
print(f"Score: {result.score}") # ~0.67 (2/3 correct)
Handling Extra Keys in Actual Output¶
agent_execution = AgentExecution(
agent_input={},
agent_output={
"name": "Bob",
"age": 30,
"extra_field": "ignored" # Extra field in actual output
},
agent_trace=[]
)
evaluator = JsonSimilarityEvaluator(
id="json-sim-extra",
config={"name": "JsonSimilarityEvaluator"}
)
# Only expected keys are evaluated
result = await evaluator.validate_and_evaluate_criteria(
agent_execution=agent_execution,
evaluation_criteria={
"expected_output": {
"name": "Bob",
"age": 30
}
}
)
print(f"Score: {result.score}") # Output: 1.0 (extra fields ignored)
Target Specific Field¶
agent_execution = AgentExecution(
agent_input={},
agent_output={
"result": {"score": 95, "passed": True},
"metadata": {"timestamp": "2024-01-01"}
},
agent_trace=[]
)
evaluator = JsonSimilarityEvaluator(
id="json-sim-targeted",
config={
"name": "JsonSimilarityEvaluator",
"target_output_key": "result"
}
)
# Only compares the "result" field
result = await evaluator.validate_and_evaluate_criteria(
agent_execution=agent_execution,
evaluation_criteria={
"expected_output": {"result": {"score": 95, "passed": True}}
}
)
print(f"Score: {result.score}") # Output: 1.0
Scoring Details¶
The evaluator returns a NumericEvaluationResult with:
- score (
float): Value between 0.0 and 1.0 - details (
str): Explanation like "Matched leaves: 8, Total leaves: 10"
Score Interpretation¶
| Score Range | Interpretation |
|---|---|
| 1.0 | Perfect match |
| 0.9 - 0.99 | Very high similarity (minor differences) |
| 0.7 - 0.89 | Good similarity (some differences) |
| 0.5 - 0.69 | Moderate similarity (significant differences) |
| 0.0 - 0.49 | Low similarity (major differences) |
Best Practices¶
- Use for structured data like JSON, dictionaries, or objects
- Set score thresholds based on your tolerance for differences (e.g., require score ≥ 0.9)
- Combine with exact match for critical fields that must match exactly
- Only expected keys matter - extra keys in actual output are ignored
- Consider array order - elements are compared by position
- Useful for API testing where responses may have minor variations
When to Use vs Other Evaluators¶
Use JSON Similarity when: - Comparing complex nested structures - Minor numeric differences are acceptable - String typos shouldn't cause complete failure - You need a granular similarity score
Use Exact Match when: - Output must match precisely - No tolerance for any differences - Simple string comparison needed
Use LLM Judge when: - Semantic meaning matters more than structure - Natural language comparison needed - Context and intent should be considered
Related Evaluators¶
- Exact Match Evaluator: For strict matching
- Contains Evaluator: For substring matching
- LLM Judge Output Evaluator: For semantic comparison