AI Sandbox
Prompt scenarios & test suites for PII leaks and policy breaks
Inside the product, AI Sandbox runs a library of prompt scenarios (normal use, adversarial wording, PII extraction attempts, prompt injection, and more) against your configured system prompt and a preview of synthetic dataset context—similar to how a grounded agent sees data in production. Each reply is evaluated automatically for safety and compliance signals—not a manual checklist.
What you run
- Prompt scenarios — Each scenario supplies the user message (benign, attack, or edge case). You can start from domain template packs or generate batches with AI assistance on paid plans.
- System prompt — Your company or agent rules, applied per project, shape how the model should behave.
- Synthetic context — A JSON preview of generated data is passed with the call so tests reflect grounded, RAG-style usage—not isolated chat with no context.
- Test runner — Executes scenarios against your chosen model provider configuration (including cloud LLMs where enabled) and aggregates results into runs you can inspect.
What the suite checks (outputs)
Scoring focuses on the assistant response, including:
- PII-like leakage — Patterns suggesting emails, phones, PAN/Aadhaar-like identifiers, salary/bank cues, and similar sensitive surface area in the answer text.
- Data overexposure — Signs the model echoed restricted fields from the synthetic context when it should not have.
- Prompt injection susceptibility — Behavior when scenarios mimic instruction override or extraction of hidden rules.
- Expected policy outcome — Mismatch when a scenario expects refuse or redact but the model answered substantively or exposed restricted content.
- Grounding / hallucination signals — Claims that do not appear supported by the provided preview context.
- Role boundary — Indicators of disclosures beyond the intended role for impersonation-style scenarios.
Automated checks complement—not replace—your legal and security review. Tune system prompts, scenarios, and policies based on findings.
Evidence for compliance conversations
Runs produce findings and reports you can share internally: what was exercised, what failed, and severity—useful when demonstrating that prompt and data governance were tested before rollout.