EU AI Act Compliance Report
Table of Contents
Per Annex IV, Regulation (EU) 2024/1689 — Technical Documentation for High-Risk AI Systems (Article 11(1))
- 1 General Description of the AI System — Annex IV §1
- 2 Development Process & Design Specifications — Annex IV §2
- 3 Monitoring, Functioning & Control — Annex IV §3
- 4 Appropriateness of Performance Metrics — Annex IV §4
- 5 Risk Management System — Annex IV §5, Art. 9
- 6 Pre-determined Changes & Lifecycle — Annex IV §6
- 7 Applied Standards & Specifications — Annex IV §7
- 8 EU Declaration of Conformity — Annex IV §8, Art. 47
- 9 Post-Market Monitoring System — Annex IV §9, Art. 72
- A Compliance Summary Checklist
1. General Description of the AI System ANNEX IV §1
Intended purpose, provider identity, system version, interaction with hardware/software, and deployment form — per Annex IV §1(a)-(g).
1.1 System Identity & Intended Purpose — §1(a)
Provider: Sentinel Project
Version: Policy vbd84462bf88a
Intended purpose: AI governance sidecar proxy that enforces safety, security, and compliance policy on LLM requests. Sits between AI agents and LLM providers, applying configurable gate pipelines (PII detection, prompt injection scanning, tool approval, budget limits, model allowlisting) before forwarding allowed requests.
1.2 Registered AI Systems Under Governance — §1(a)
| System Name | Owner / Deployer | Risk Level | Status | Intended Purpose | Version |
|---|---|---|---|---|---|
| CodeBot | Engineering | MINIMAL | ACTIVE | AI coding assistant for software development tasks including code generation, review, and debugging | v1.0.0 |
| EnterpriseBot | Business Operations | LIMITED | ACTIVE | Enterprise workflow automation agent with email, Slack, and database tool access | v1.0.0 |
| FinanceBot | Treasury Operations | HIGH | ACTIVE | Financial analysis and transaction processing agent handling PCI-DSS regulated data | v1.0.0 |
| HealthBot | Clinical Systems | HIGH | ACTIVE | Clinical decision support agent processing HIPAA-regulated patient data | v1.0.0 |
| SecurityBot | Security Team | LIMITED | ACTIVE | Security testing agent for model allowlist and output PII leak validation | v1.0.0 |
1.3 Risk Distribution — Art. 6
1.4 Hardware, Software & Deployment Form — §1(b)-(f)
Software dependencies: Python 3.11+, FastAPI, LiteLLM (multi-provider routing), Microsoft Presidio (PII), LLM Guard (injection detection), spaCy (NER), SQLite (audit store), Prometheus (metrics), Langfuse (tracing) — §1(b)
Hardware requirements: Minimum 2 CPU cores, 4GB RAM; no GPU required for governance layer — §1(d)
User interface: Admin web dashboard (Jinja2 + HTMX), REST API for programmatic integration — §1(f)
Instructions for use: Inline dashboard guidance, API documentation, and this compliance report — §1(g)
2. Development Process & Design Specifications ANNEX IV §2
Design methodology, system architecture, algorithms, key design choices, data requirements, and human oversight measures — per Annex IV §2(a)-(g).
2.1 Design Specifications & General Logic — §2(a)
Key design choice: Deterministic policy evaluation (no ML in the governance layer itself). Gates use rule-based + NLP pattern matching, not generative AI, to avoid recursive trust problems.
Rationale: A governance layer that itself uses LLMs to make safety decisions introduces circular dependency. Deterministic gates provide auditable, reproducible decisions.
Trade-offs: Pattern-based detection may miss novel attack vectors; mitigated by layered defense (multiple gates) and human oversight escalation.
2.2 System Architecture — §2(a)
Components:
- Proxy service (port 8000) — OpenAI-compatible API, request routing, audit logging
- Policy engine — Gate pipeline orchestration, YAML policy loader with hot-reload
- Admin UI (port 8001) — Approval console, monitoring dashboard, this report
- LiteLLM (port 4000) — Multi-provider routing to OpenAI, Anthropic, Groq, etc.
- SQLite — Audit log, agent registry, approval queue (WAL mode for concurrent access)
2.3 Gate Pipeline — Algorithmic Components — §2(a)
| Gate | Algorithm / Method | Phase | Evaluations | Block Rate |
|---|---|---|---|---|
| ToolCheck | Policy-defined tool allowlist/blocklist matching | Input | 8 | 0% |
| PII Detection | Presidio NER (en_core_web_lg), configurable entity types & score thresholds | Input | 6 | 100% |
| Output PII Detection | Presidio NER (en_core_web_lg), configurable entity types & score thresholds | Output | 3 | 100% |
| Injection Detection | LLM Guard pattern scanner, heuristic + regex ensemble | Input | 2 | 100% |
| ModelAllowlist | Exact string match against policy-defined model allowlist | Input | 1 | 100% |
2.4 Data Requirements — §2(b)-(c)
Input data specifications: OpenAI-compatible chat completion requests (model, messages[], optional tools[]). All input is text-based.
Expected output: Proxied LLM response (200), policy block with reason code (403), or human review ticket (202).
Pre-trained model provenance: spaCy en_core_web_lg (MIT license, trained on OntoNotes 5); LLM Guard (Apache 2.0, deberta-v3-base fine-tuned on injection datasets).
2.5 Human Oversight Measures — §2(e), Art. 14
Art. 14(4)(d) — Override/reverse: Approval queue (approve/reject pending requests); individual agent suspend/revoke
Art. 14(4)(e) — Stop button:
POST /v1/emergency/suspend-all — immediately suspends all agentsPer-agent overseers: FinanceBot (cfo@acme.com, risk@acme.com), HealthBot (hipaa-officer@hospital.org),
| Active Overseer | Actions Taken |
|---|---|
| api | 2 |
2.6 Validation & Testing — §2(f)
Test metrics: Pass/fail per scenario, decision correctness (expected vs actual status code and decision), latency (ms), cost attribution.
Test evidence: Scenario results exported as JSON (
scenario-results.json) with timestamps, signed by automated runner.
2.7 Cybersecurity Measures — §2(g), Art. 15
Prompt injection detection: LLM Guard scanner active on all routes
PII detection: Microsoft Presidio with configurable entity types and score thresholds
Input validation: Pydantic models on all API endpoints; no raw SQL string interpolation
Transport: HTTPS recommended for production; Docker internal networking for inter-service communication
3. Monitoring, Functioning & Control ANNEX IV §3 • ART. 12, 13
Capabilities and limitations in performance, foreseeable unintended outcomes, human oversight specifications, and input data requirements — per Annex IV §3(a)-(d).
3.1 Capabilities & Limitations — §3(a)
- Real-time policy enforcement on LLM requests (avg 913ms added latency)
- PII detection across 5 gate types with configurable sensitivity
- Human-in-the-loop escalation with sub-0.0s average response time
- Per-agent budget tracking with automatic enforcement
- Multi-provider LLM routing (OpenAI, Anthropic, Groq, OpenRouter, Gemini, Mistral)
- PII detection accuracy varies by entity type and language — Presidio is English-optimized
- Prompt injection detection uses pattern-based heuristics — novel attack vectors may bypass
- Cost tracking depends on provider-reported token counts — not independently verified
- Output gate scanning adds latency to allowed requests (dual-pass evaluation)
- Single-node SQLite deployment — not horizontally scalable without migration to PostgreSQL
3.2 Foreseeable Unintended Outcomes — §3(a)
| Risk | Impact | Mitigation |
|---|---|---|
| False positive PII detection | Legitimate requests blocked | Configurable score thresholds; human review escalation |
| False negative injection detection | Prompt injection bypasses gate | Layered defense (multiple gates); output gate as second barrier |
| Governance proxy downtime | All agent requests fail | Prometheus alerting (SentinelProxyDown); health endpoint monitoring |
| Budget tracking inaccuracy | Cost overruns or premature blocking | Conservative default limits; per-request cost logging for audit |
3.3 Automatic Logging — Art. 12
run_id — Unique request correlation IDtimestamp — ISO 8601 UTCagent_id — Requesting AI systemdecision — ALLOW / BLOCK / REVIEWgate_name — Gate that triggered decisioncodes — Decision reason codesmodel — LLM model requestedprovider — LLM provider usedlatency_ms — End-to-end request timetoken_count — Total tokens consumedcost_usd — Estimated costpolicy_version — Policy hash at evaluation timeactioned_by — Human overseer identityroute — Policy route applied3.4 Transparency — Decision Code Glossary — Art. 13
| Code | Meaning | Gate |
|---|---|---|
PII_DETECTED | Personal data found in request | Presidio PII Detection |
OUTPUT_PII_DETECTED | Personal data found in LLM response | Output Presidio Gate |
INJECTION_DETECTED | Prompt injection pattern detected | LLM Guard Injection Detection |
TOOL_REQUIRES_APPROVAL | High-risk tool invocation flagged | Tool Check / Action Monitor |
MODEL_NOT_ALLOWED | Requested model not in allowlist | Model Allowlist |
BUDGET_EXCEEDED | Agent cost budget limit reached | Budget Gate |
AGENT_SUSPENDED | Agent suspended or revoked | Agent Registry |
4. Appropriateness of Performance Metrics ANNEX IV §4 • ART. 15
Metrics used to measure accuracy, robustness, and compliance, including per-agent and per-provider breakdowns — per Annex IV §4.
4.1 Gate Effectiveness Metrics
| Gate | Evaluations | Block Rate | Distribution |
|---|---|---|---|
| ToolCheck | 8 | 0% | |
| PII Detection | 6 | 100% | |
| Output PII Detection | 3 | 100% | |
| Injection Detection | 2 | 100% | |
| ModelAllowlist | 1 | 100% |
4.2 Per-Agent Decision Breakdown
| Agent | Risk Level | Requests | Allowed | Blocked | Review | Cost |
|---|---|---|---|---|---|---|
| HealthBot | HIGH | 6 | 1 | 5 | 0 | $0.0252 |
| EnterpriseBot | LIMITED | 8 | 2 | 0 | 6 | $0.0088 |
| FinanceBot | HIGH | 8 | 3 | 5 | 0 | $10.0063 |
| CodeBot | MINIMAL | 4 | 0 | 2 | 2 | $0.0000 |
| SecurityBot | LIMITED | 1 | 1 | 0 | 0 | $0.0068 |
4.3 Per-Provider Performance
| Provider | Requests | Tokens | Cost |
|---|---|---|---|
| openrouter | 7 | 4,030 | $10.0306 |
5. Risk Management System ANNEX IV §5 • ART. 9
Description of the risk management system established per Art. 9 — risk identification, evaluation, mitigation measures, and testing evidence.
5.1 Risk Identification & Mitigation Measures — Art. 9(2)
| Identified Risk | Category | Mitigation (Gate) | Evidence |
|---|---|---|---|
| Unauthorized tool execution | Safety | ToolCheck | 0 blocked / 8 evaluated (0%) |
| Personal data exposure | Fundamental Rights | PII Detection | 6 blocked / 6 evaluated (100%) |
| Personal data exposure | Fundamental Rights | Output PII Detection | 3 blocked / 3 evaluated (100%) |
| Prompt injection / manipulation | Cybersecurity | Injection Detection | 2 blocked / 2 evaluated (100%) |
| Unauthorized model access | Compliance | ModelAllowlist | 1 blocked / 1 evaluated (100%) |
5.2 Policy Configuration — Art. 9(3)
Pending approval TTL: 30 minutes
Model allowlist: openrouter/openai/gpt-4o, anthropic/claude-sonnet-4-20250514, groq/llama-3.3-70b-versatile
Policy format: YAML with hot-reload (watchfiles) — no restart required for policy changes
5.3 Testing Evidence — Art. 9(7)
Test domains: Finance (PII, budget), Healthcare (PII, output gates), Enterprise (tool approval), Security (injection, model allowlist), Governance (kill switch, agent suspension), Stress (budget limits), Regression (canary tests).
Result: All gate decisions validated against expected outcomes with automated assertion checking.
6. Pre-determined Changes & Lifecycle ANNEX IV §6
Description of pre-determined changes to the AI system and its performance, together with information on continuous compliance — per Annex IV §6.
6.1 Policy Versioning & Change Management
Change mechanism: Policy changes via YAML configuration files only (no hardcoded rules). Policy hash computed per change for audit trail. Hot-reload via watchfiles — no service restart required.
Audit trail: Each audit log entry records the
policy_version hash active at the time of evaluation, enabling retrospective analysis of policy changes on decision outcomes.
6.2 Pre-determined System Changes
- Gate addition/removal: New gates can be added to the pipeline via policy YAML without code changes
- Threshold tuning: PII score thresholds, budget limits, and model allowlists are configuration-driven
- Agent registration: New AI systems can be registered with risk classification without service changes
- Provider expansion: New LLM providers added via LiteLLM configuration (no proxy changes required)
6.3 Expected Lifetime & Maintenance
Maintenance: Dependency updates (Python packages, Docker images), policy tuning based on operational metrics, gate model updates (spaCy, LLM Guard)
Deprecation: Agent revocation via API; policy gates can be disabled without removal
7. Applied Standards & Technical Specifications ANNEX IV §7
Harmonised standards, technical specifications, and frameworks applied — per Annex IV §7.
| Standard / Framework | Scope | Application in Sentinel |
|---|---|---|
| EU AI Act (2024/1689) | AI governance regulation | Primary compliance target; this document |
| NIST AI RMF 1.0 | AI risk management | Gate pipeline maps to GOVERN, MAP, MEASURE, MANAGE functions |
| ISO/IEC 42001:2023 | AI management systems | Audit logging, risk assessment, human oversight processes |
| ISO/IEC DIS 24970:2025 | AI system logging | Structured logging with 14+ fields per request (Section 3.3) |
| OWASP LLM Top 10 | LLM security | Prompt injection (LLM01), sensitive info disclosure (LLM06), excessive agency (LLM08) |
7.1 Governance Stack Components
LiteLLM: Multi-provider routing with retry and rate limiting
Presidio: Microsoft PII detection engine (Apache 2.0)
LLM Guard: Prompt injection and content safety scanner (Apache 2.0)
Langfuse: LLM observability and tracing (MIT)
Prometheus: Metrics collection and alerting (Apache 2.0)
8. EU Declaration of Conformity ANNEX IV §8 • ART. 47
Copy of the EU declaration of conformity referred to in Article 47, or reference to it — per Annex IV §8.
Formal EU declaration of conformity per Art. 47 is pending completion. The following compliance evidence is available:
- Policy version tracking: bd84462bf88a — verifiable hash of active governance policy
- Technical documentation: This report (auto-generated from operational data)
- Risk classification: 2 HIGH-risk, 2 LIMITED-risk, 1 MINIMAL-risk systems registered
- Compliance score: 8/10 articles actively supported, 2 partial
Note: Art. 47 declaration requires formal assessment and signature by a responsible natural person. This section will be updated upon completion of the conformity assessment process.
9. Post-Market Monitoring System ANNEX IV §9 • ART. 72
System in place to evaluate AI system performance in the post-market phase, including the monitoring plan per Art. 72(3) — per Annex IV §9.
9.1 Monitoring Infrastructure
| Tool | Purpose (Art. 72) | Metrics / Signals |
|---|---|---|
| Prometheus | Quantitative metrics & threshold alerting | sentinel_requests_total, sentinel_tokens_total, sentinel_gate_duration_seconds, sentinel_approval_queue_depth |
| Langfuse | Request tracing & observability | Per-request traces with policy spans, LLM generation tracking, cost attribution |
| Sentinel Rollup | Hourly aggregation for trend analysis | Gate pass/fail rates, p95 latency, token consumption, cost breakdown |
| Webhooks | Real-time event escalation (Art. 73) | BLOCK, REVIEW, APPROVAL, REJECTION events to configurable endpoints |
9.2 Alert Rules — Incident Detection
SentinelGateErrorRate — High: Gate errors exceed 5% over 5 minutes
SentinelBlockRateSpike — High: Block rate 2x rolling 1h average
SentinelApprovalQueueBacklog — High: >50 items pending for >10 minutes
SentinelWorkerStalled — High: Rollup worker heartbeat missing for >10 minutes
SentinelLatencyDegraded — Warning: P95 latency exceeds 30 seconds
9.3 Serious Incident Reporting — Art. 73
Appendix A: Compliance Summary Checklist
Article-by-article compliance status with evidence references and Annex IV section mapping.
| Article | Requirement | Annex IV | Status | Evidence |
|---|---|---|---|---|
| Art. 6 | Classification Rules for High-Risk AI | §1 | ACTIVE | 4 agents classified by risk level |
| Art. 9 | Risk Management System | §5 | ACTIVE | 5 active gates enforcing risk mitigations |
| Art. 11 | Technical Documentation | §1-9 | ACTIVE | This Annex IV compliance report (auto-generated from operational data) |
| Art. 12 | Record-Keeping / Automatic Logging | §3 | ACTIVE | 29 audit log entries with 14 fields, oldest: 2026-03-10 |
| Art. 13 | Transparency & Provision of Information | §3 | ACTIVE | Decision codes documented; 5 agents with purpose, risk level, overseers |
| Art. 14 | Human Oversight | §2(e) | ACTIVE | 2 oversight decisions; kill switch; 1 active overseers |
| Art. 15 | Accuracy, Robustness & Cybersecurity | §2(g), §4 | ACTIVE | Avg 913ms latency; 27 validated requests; API key hashing, injection detection, PII scanning |
| Art. 47 | EU Declaration of Conformity | §8 | PARTIAL | Policy version tracked; formal Art. 47 declaration pending conformity assessment |
| Art. 72 | Post-Market Monitoring | §9 | ACTIVE | Prometheus metrics, Langfuse tracing, hourly rollup, webhook alerts |
| Art. 73 | Reporting of Serious Incidents | §9 | PARTIAL | Webhook escalation for BLOCK/REVIEW events; formal incident reporting template pending |