High-Level Design Pipeline Rules Engine Data Flow Infrastructure Decisions Assumptions Security
High-Level System Architecture
EDGE LAYER — Nginx Reverse Proxy · HTTPS/TLS 1.3 Let's Encrypt SSL HSTS + Security Headers HTTP → HTTPS Redirect Password Auth Gate Proxy PRESENTATION LAYER — Next.js 16 · Port 3000 DashboardKPIs, charts, costs Upload FormMulti-file drag+drop Review PanelApprove / Reject AnalyticsToken usage, trends CompareSide-by-side diff Auth GateHMAC-SHA256 tokens REST API API & ORCHESTRATION — FastAPI · Port 8000 · Session Auth Middleware REST RouterCRUD, Swagger docs Pipeline Engine6-step orchestration Gmail API WatcherOAuth2 polling Auth ControllerLogin + rate limiting Review APIApprove / Override Pipeline call PROCESSING LAYER — AI + Rules + Conflict Detection Text ExtractorPyMuPDF, openpyxl ClassifierContent-only (no names) Data ExtractorRegex + Vision hybrid AI Analyzer4 parallel Gemini Rules Engine7 categories · 26–29/run Conflict DetectCross-doc validation Decision output ACCEPT 0 failures, 0 warnings REFER 0 failures, 1+ warnings DECLINE 1+ critical failures EXTERNAL SERVICES Gemini 2.5 Flash Gmail API (OAuth2) DATA LAYER SQLite / PostgreSQL Local File Storage https://knight.outreachbenefits.online
6-Step Processing Pipeline
End-to-end ~45 seconds with parallel processing
1

Text Extraction

Parallel image processing via ThreadPoolExecutor

PDF — PyMuPDF
Excel — openpyxl
CSV — pandas
Images — Gemini Vision OCR
2

Document Classification

Single batched Gemini call — content-based only (never filenames)

insurance_application
driver_list
equipment_list
loss_run
ifta_report
drivers_license
3

Data Extraction

Hybrid: Python regex ($0 cost) + Gemini Vision (images only)

Excel/CSV — $0
PDF text — $0
CDL images — Vision
Name deduplication
4

AI Risk Analysis

4 parallel Gemini calls for comprehensive risk assessment

Company risk
Driver risk
Fleet risk
Financial risk
5

Rules Engine + Conflict Detection

Rules across 7 categories + cross-document validation. Conditional rules fire only when relevant data is present (26–29 per submission).

Eligibility · 7
Driver · 7
Exposure · 6 (+1 roadmap)
Submission · 9
IFTA · 4
Selective · 3
Venture · 2
Conflict · 6
6

Decision + Team Routing

Automated accept/refer/decline with intelligent team assignment based on triggered rules

ACCEPT

Standard Review

REFER

Specialty / Driver / Ops

DECLINE

Senior Underwriting

Business Rules · 7 Categories
Derived from Knight Specialty Insurance underwriting guidelines. The registry holds 44 implemented rules; conditional rules fire only when relevant data is present, so 26–29 evaluate per submission.
Eligibility
7 rules
ELIG-001Target risk: semi-trucks only — dump/straight/tow trucks ineligible
ELIG-002Ineligible vehicle types — explicit prohibited list
ELIG-003Available states check — 13 approved states only → DECLINE if non-covered
ELIG-004Texas: north of I-10 — checks 12 known cities south of I-10; falls back to WARNING for unlisted cities
ELIG-005Illinois: selective basis only — REFER, never auto-DECLINE
ELIG-006Auto liability deductibles not allowed
ELIG-007Auto physical damage not available
Driver
7 rules
DRV-001Valid CDL required for all drivers
DRV-002Minimum 2 years CDL experience
DRV-003Minimum age 23
DRV-004DOT medical exam for age 65+
DRV-005Max 6 points on MVR in 3 years
DRV-006Max 4 points on MVR in 12 months
DRV-100Unacceptable history: DUI, reckless, hit-run, felony = auto-decline
Exposure
6 implemented + 1 roadmap
EXP-001Hazardous materials hauling prohibited
EXP-002Lithium battery cargo prohibited
EXP-003Mexico border: 50-mile restriction — keyword scan on email body + application text
EXP-004SAFER violations — not yet implemented (requires FMCSA API integration)
EXP-005Towing/recovery operations prohibited
EXP-006Intermodal/container hauling prohibited
EXP-007Waste disposal operations prohibited
Submission
9 rules
SUB-001FEIN/SSN required
SUB-002MC/DOT number required
SUB-003Current loss runs required (within 60 days)
SUB-0043 prior years of loss history
SUB-0054 IFTA quarters required
SUB-006Driver list document required
SUB-007Equipment schedule document required
SUB-008Driver license (CDL) images required
SUB-009Filename-content consistency check
IFTA
4 rules
IFTA-001Fleet MPG validation (4.0–9.0 range)
IFTA-002Company name consistency across quarters
IFTA-003IFTA name matches application
IFTA-004Non-covered states flagging
Selective + Venture
5 rules
SEL-001Box truck/van minimum premium $250K
SEL-002Box trucks/vans: minimum premium validation
SEL-003Power unit minimum: 20 if <$13K/unit
VENT-001New venture: 2 years CDL experience required
VENT-002Corporation: underwriter review required
Conflict Detection
6 rules
CON-001Vehicle count mismatch across documents
CON-002Driver count mismatch across documents
CON-003Company name inconsistency
CON-004FEIN/DOT number conflict
CON-005Duplicate CDL numbers detected
CON-006Duplicate VINs detected
Team Routing

Standard Review

Clean submissions

Specialty Risk

Hazmat, border, excess

Driver Review

Violations, age, CDL

Operations

Missing docs, conflicts

Senior UW

Declined submissions

Data Flow Diagram
INPUT DOCUMENTS Insurance Application (PDF) Driver Roster (Excel) Equipment Schedule (Excel) Loss Runs (PDF) IFTA Reports (PDF) CDL Images (JPG/PNG) EXTRACTED DATA Company Profile Driver Records Vehicle Fleet Claims History Mileage Data AI OUTPUT Risk Assessment + Score Data Conflicts Recommendations 44 Rules: Pass / Fail / Warn Decision + Team Routing
Deployment Architecture
AWS EC2 · Docker · Nginx · Let's Encrypt · Gmail API

Compute

EC2 t2.medium · 2 vCPU, 4GB RAM · Ubuntu 24.04 · Docker Compose

AWS

Reverse Proxy

Nginx · HTTPS/TLS 1.3 · HTTP/2 · Let's Encrypt auto-renewing cert · HSTS headers

Port 443

Frontend Container

Next.js 16 · Node 18 Alpine · Port 3000 · Auth Gate with HMAC tokens

app_frontend

Backend Container

FastAPI · Uvicorn ASGI · Port 8000 · Session auth middleware · Rate limiting

app_backend

Email Intake

Gmail API with OAuth2 · Polls every 5s · Auto-extract attachments · Creates submissions

Integrated

Database

SQLAlchemy ORM · SQLite (dev) · 6 tables · Docker bind mount for persistence

Postgres-Ready

File Storage

Local filesystem with Docker volumes · Uploaded documents and extracted data

Persistent

AI Service

Gemini 2.5 Flash · Vision + Text · ~$0.05–0.09 per submission

Pay Per Use

Domain & DNS

knight.outreachbenefits.online · Namecheap DNS · A record → EC2

HTTPS 🔒
Design Decisions
Key architectural choices and the reasoning behind them

AI Extracts, Deterministic Code Decides

Gemini classifies documents and extracts data; a 100% deterministic Python rules engine makes the accept/refer/decline call — no model in the decision path. Every outcome is auditable, reproducible, and explainable to a regulator or a disputing broker.

Content-Based Classification

Documents are typed by content, never filename — so a roster saved as "LossRuns.pdf" or a scan named "scan001.pdf" is still classified correctly.

Hybrid Extraction Strategy

Structured docs (Excel, CSV, text PDFs) parse with Python at $0 API cost; only images like CDLs use Gemini Vision — roughly 70% lower AI spend than an all-LLM approach.

Rules Mapped to the Appetite Guide

A canonical registry of 45 rules across 7 categories, each mapped to a specific line in Knight's guidelines. Conditional rules fire only when relevant data is present, so 26–29 evaluate per submission.

Human-in-the-Loop by Design

Nothing auto-approves. An accept means "eligible pending final approval." An underwriter approves, rejects, or overrides every decision — logged with notes and a timestamp.

Live-Validated Decisions

Verified with 8 end-to-end live submissions: 7 proven against fresh submission IDs — border decline, towing decline, under-23 decline, duplicate-CDL refer, Illinois refer, senior-driver refer, and clean accept. 1 open due to a document-extraction gap, not a rule error.

Cross-Document Conflict Detection

Validates vehicle and driver counts, company names, FEIN/DOT numbers, and duplicate CDLs and VINs across every document — surfacing the conflicting values rather than silently picking one.

Deliberate Human-Routing Where Automation Isn't Reliable

Two checks intentionally flag for underwriter review instead of faking automation: SAFER violations (needs an FMCSA integration) and Texas I-10 geofencing beyond known border cities. Better to say "a human must verify" than to auto-decide on data the system can't confirm.

Dual Intake Channels

Web upload form plus Gmail API over OAuth2 — agents can email submissions straight from their inbox, with no stored mail passwords.

Security & Governance
Server-side authentication · encrypted transport · audit trail

HTTPS/TLS 1.3 Encryption

  • Certificate: Let's Encrypt CA-signed, auto-renewing every 90 days
  • Protocol: TLS 1.2 + 1.3, strong cipher suite (ECDHE+AES-GCM, CHACHA20)
  • Headers: HSTS (1 year), X-Frame-Options DENY, X-Content-Type-Options nosniff
  • Domain: knight.outreachbenefits.online

Server-Side Password Authentication

  • Token: HMAC-SHA256 signed session tokens (stateless, no database)
  • Comparison: Constant-time password comparison (prevents timing attacks)
  • Rate Limiting: 5 login attempts per minute per IP address
  • Error Messages: Generic "Invalid credentials" (no information leakage)

Network Security

  • Nginx Proxy: Only ports 80/443 exposed — frontend (3000) and backend (8000) are internal only
  • Proxy Headers: X-Real-IP, X-Forwarded-For, X-Forwarded-Proto passed to backend
  • EC2 Security Group: Ports 22 (SSH), 80, 443 only — no direct access to 3000/8000

Complete Audit Trail

Every action logged to audit_logs table: uploads, classification, extraction, AI calls, rules evaluation, decisions, and reviews.

Data Encryption

  • In transit: HTTPS/TLS 1.3 for all traffic
  • Email: Gmail API over OAuth2 (no stored passwords)
  • Credentials: Docker environment variables, never in source control

Human-in-the-Loop Review

No auto-approval of any submission. Underwriter must explicitly Approve, Reject, or Override with notes and timestamp.

AI Cost Transparency

System tracks input tokens, output tokens, API cost, and call count for every submission processed.

Production Scalability Path

  • Database: SQLite → PostgreSQL (RDS)
  • Compute: EC2 → ECS/Fargate horizontal scaling
  • AI: Gemini API auto-scales with built-in rate limiting
  • Queue: Add SQS or Redis for async processing at scale
Assumptions
Grounded in the appetite guide and underwriting guidelines — the two source documents

Scope: No External Lookups

The system evaluates only what's in the submission package. SAFER violations are in the guidelines, but SAFER data isn't in the attachments — so I assumed no external lookups, and that check routes to a human instead.

Authority: Recommends, Never Binds

The system recommends; it never binds. Every decision, including declines, is a recommendation an underwriter confirms. Nothing auto-approves.

Illinois: Selective, Not Declined

IL isn't in the 13-state list but the footnote says "selective basis," so I assumed it refers to an underwriter rather than auto-declining as a non-listed state.

Non-Covered States: Domicile vs. Transit

"Not eligible" — I assumed a company based/operating-domiciled in a non-covered state is a critical decline, while incidental through-mileage in one is a flag for review, not an automatic decline.

"Examples Include…" = Non-Exhaustive

The unacceptable-driver-history and prohibited-exposure lists are explicitly examples, so I treated them as non-exhaustive — named items are hard rules, novel cases route to a human rather than passing silently.

Conflicting Documents: Surface, Don't Pick

When the application, MVR, and license disagree, I assumed the system surfaces the conflict with both values and lowers confidence to force review, rather than auto-picking a winner.

Texas I-10: City-Based With Human Fallback

I assumed the garaging/operating city is the basis for the north-of-I-10 check, with unconfirmed locations routed to a human pending full geocoding.

Power Unit = Tractors Only

For the per-unit premium and 20-unit-minimum rules, I assumed power units mean tractors only, not trailers.

Box Truck Contradiction: Surface for Judgment

Straight trucks are ineligible, but box trucks appear under selective exposures — and a box truck can be a straight truck. I assumed that tension surfaces for underwriter judgment rather than the system silently resolving it.

Completeness: Missing = Flagged, Never Clean

"Valued within 60 days" and "4 most recent IFTA quarters" are checks the system enforces, and a missing or unreadable document makes a submission incomplete and flagged — never silently treated as clean.