Knight Insurance — System Architecture

6-Step Processing Pipeline

End-to-end ~45 seconds with parallel processing

Text Extraction

Parallel image processing via ThreadPoolExecutor

PDF — PyMuPDF

Excel — openpyxl

CSV — pandas

Images — Gemini Vision OCR

Document Classification

Single batched Gemini call — content-based only (never filenames)

insurance_application

driver_list

equipment_list

loss_run

ifta_report

drivers_license

Data Extraction

Hybrid: Python regex ($0 cost) + Gemini Vision (images only)

Excel/CSV — $0

PDF text — $0

CDL images — Vision

Name deduplication

AI Risk Analysis

4 parallel Gemini calls for comprehensive risk assessment

Company risk

Driver risk

Fleet risk

Financial risk

Rules Engine + Conflict Detection

Rules across 7 categories + cross-document validation. Conditional rules fire only when relevant data is present (26–29 per submission).

Eligibility · 7

Driver · 7

Exposure · 6 (+1 roadmap)

Submission · 9

IFTA · 4

Selective · 3

Venture · 2

Conflict · 6

Decision + Team Routing

Automated accept/refer/decline with intelligent team assignment based on triggered rules

ACCEPT

Standard Review

REFER

Specialty / Driver / Ops

DECLINE

Senior Underwriting

Business Rules · 7 Categories

Derived from Knight Specialty Insurance underwriting guidelines. The registry holds 44 implemented rules; conditional rules fire only when relevant data is present, so 26–29 evaluate per submission.

Eligibility

7 rules

ELIG-001	Target risk: semi-trucks only — dump/straight/tow trucks ineligible
ELIG-002	Ineligible vehicle types — explicit prohibited list
ELIG-003	Available states check — 13 approved states only → DECLINE if non-covered
ELIG-004	Texas: north of I-10 — checks 12 known cities south of I-10; falls back to WARNING for unlisted cities
ELIG-005	Illinois: selective basis only — REFER, never auto-DECLINE
ELIG-006	Auto liability deductibles not allowed
ELIG-007	Auto physical damage not available

Driver

7 rules

DRV-001	Valid CDL required for all drivers
DRV-002	Minimum 2 years CDL experience
DRV-003	Minimum age 23
DRV-004	DOT medical exam for age 65+
DRV-005	Max 6 points on MVR in 3 years
DRV-006	Max 4 points on MVR in 12 months
DRV-100	Unacceptable history: DUI, reckless, hit-run, felony = auto-decline

Exposure

6 implemented + 1 roadmap

EXP-001	Hazardous materials hauling prohibited
EXP-002	Lithium battery cargo prohibited
EXP-003	Mexico border: 50-mile restriction — keyword scan on email body + application text
EXP-004	SAFER violations — not yet implemented (requires FMCSA API integration)
EXP-005	Towing/recovery operations prohibited
EXP-006	Intermodal/container hauling prohibited
EXP-007	Waste disposal operations prohibited

Submission

9 rules

SUB-001	FEIN/SSN required
SUB-002	MC/DOT number required
SUB-003	Current loss runs required (within 60 days)
SUB-004	3 prior years of loss history
SUB-005	4 IFTA quarters required
SUB-006	Driver list document required
SUB-007	Equipment schedule document required
SUB-008	Driver license (CDL) images required
SUB-009	Filename-content consistency check

IFTA

4 rules

IFTA-001	Fleet MPG validation (4.0–9.0 range)
IFTA-002	Company name consistency across quarters
IFTA-003	IFTA name matches application
IFTA-004	Non-covered states flagging

Selective + Venture

5 rules

SEL-001	Box truck/van minimum premium $250K
SEL-002	Box trucks/vans: minimum premium validation
SEL-003	Power unit minimum: 20 if <$13K/unit
VENT-001	New venture: 2 years CDL experience required
VENT-002	Corporation: underwriter review required

Conflict Detection

6 rules

CON-001	Vehicle count mismatch across documents
CON-002	Driver count mismatch across documents
CON-003	Company name inconsistency
CON-004	FEIN/DOT number conflict
CON-005	Duplicate CDL numbers detected
CON-006	Duplicate VINs detected

Team Routing

Standard Review

Clean submissions

Specialty Risk

Hazmat, border, excess

Driver Review

Violations, age, CDL

Operations

Missing docs, conflicts

Senior UW

Declined submissions

Deployment Architecture

AWS EC2 · Docker · Nginx · Let's Encrypt · Gmail API

Compute

EC2 t2.medium · 2 vCPU, 4GB RAM · Ubuntu 24.04 · Docker Compose

AWS

Reverse Proxy

Nginx · HTTPS/TLS 1.3 · HTTP/2 · Let's Encrypt auto-renewing cert · HSTS headers

Port 443

Frontend Container

Next.js 16 · Node 18 Alpine · Port 3000 · Auth Gate with HMAC tokens

app_frontend

Backend Container

FastAPI · Uvicorn ASGI · Port 8000 · Session auth middleware · Rate limiting

app_backend

Email Intake

Gmail API with OAuth2 · Polls every 5s · Auto-extract attachments · Creates submissions

Integrated

Database

SQLAlchemy ORM · SQLite (dev) · 6 tables · Docker bind mount for persistence

Postgres-Ready

File Storage

Local filesystem with Docker volumes · Uploaded documents and extracted data

Persistent

AI Service

Gemini 2.5 Flash · Vision + Text · ~$0.05–0.09 per submission

Pay Per Use

Domain & DNS

knight.outreachbenefits.online · Namecheap DNS · A record → EC2

HTTPS 🔒

Design Decisions

Key architectural choices and the reasoning behind them

AI Extracts, Deterministic Code Decides

Gemini classifies documents and extracts data; a 100% deterministic Python rules engine makes the accept/refer/decline call — no model in the decision path. Every outcome is auditable, reproducible, and explainable to a regulator or a disputing broker.

Content-Based Classification

Documents are typed by content, never filename — so a roster saved as "LossRuns.pdf" or a scan named "scan001.pdf" is still classified correctly.

Hybrid Extraction Strategy

Structured docs (Excel, CSV, text PDFs) parse with Python at $0 API cost; only images like CDLs use Gemini Vision — roughly 70% lower AI spend than an all-LLM approach.

Rules Mapped to the Appetite Guide

A canonical registry of 45 rules across 7 categories, each mapped to a specific line in Knight's guidelines. Conditional rules fire only when relevant data is present, so 26–29 evaluate per submission.

Human-in-the-Loop by Design

Nothing auto-approves. An accept means "eligible pending final approval." An underwriter approves, rejects, or overrides every decision — logged with notes and a timestamp.

Live-Validated Decisions

Verified with 8 end-to-end live submissions: 7 proven against fresh submission IDs — border decline, towing decline, under-23 decline, duplicate-CDL refer, Illinois refer, senior-driver refer, and clean accept. 1 open due to a document-extraction gap, not a rule error.

Cross-Document Conflict Detection

Validates vehicle and driver counts, company names, FEIN/DOT numbers, and duplicate CDLs and VINs across every document — surfacing the conflicting values rather than silently picking one.

Deliberate Human-Routing Where Automation Isn't Reliable

Two checks intentionally flag for underwriter review instead of faking automation: SAFER violations (needs an FMCSA integration) and Texas I-10 geofencing beyond known border cities. Better to say "a human must verify" than to auto-decide on data the system can't confirm.

Dual Intake Channels

Web upload form plus Gmail API over OAuth2 — agents can email submissions straight from their inbox, with no stored mail passwords.

Security & Governance

Server-side authentication · encrypted transport · audit trail

HTTPS/TLS 1.3 Encryption

Certificate: Let's Encrypt CA-signed, auto-renewing every 90 days
Protocol: TLS 1.2 + 1.3, strong cipher suite (ECDHE+AES-GCM, CHACHA20)
Headers: HSTS (1 year), X-Frame-Options DENY, X-Content-Type-Options nosniff
Domain: knight.outreachbenefits.online

Server-Side Password Authentication

Token: HMAC-SHA256 signed session tokens (stateless, no database)
Comparison: Constant-time password comparison (prevents timing attacks)
Rate Limiting: 5 login attempts per minute per IP address
Error Messages: Generic "Invalid credentials" (no information leakage)

Network Security

Nginx Proxy: Only ports 80/443 exposed — frontend (3000) and backend (8000) are internal only
Proxy Headers: X-Real-IP, X-Forwarded-For, X-Forwarded-Proto passed to backend
EC2 Security Group: Ports 22 (SSH), 80, 443 only — no direct access to 3000/8000

Complete Audit Trail

Every action logged to audit_logs table: uploads, classification, extraction, AI calls, rules evaluation, decisions, and reviews.

Data Encryption

In transit: HTTPS/TLS 1.3 for all traffic
Email: Gmail API over OAuth2 (no stored passwords)
Credentials: Docker environment variables, never in source control

Human-in-the-Loop Review

No auto-approval of any submission. Underwriter must explicitly Approve, Reject, or Override with notes and timestamp.

AI Cost Transparency

System tracks input tokens, output tokens, API cost, and call count for every submission processed.

Production Scalability Path

Database: SQLite → PostgreSQL (RDS)
Compute: EC2 → ECS/Fargate horizontal scaling
AI: Gemini API auto-scales with built-in rate limiting
Queue: Add SQS or Redis for async processing at scale

Assumptions

Grounded in the appetite guide and underwriting guidelines — the two source documents

Scope: No External Lookups

The system evaluates only what's in the submission package. SAFER violations are in the guidelines, but SAFER data isn't in the attachments — so I assumed no external lookups, and that check routes to a human instead.

Authority: Recommends, Never Binds

The system recommends; it never binds. Every decision, including declines, is a recommendation an underwriter confirms. Nothing auto-approves.

Illinois: Selective, Not Declined

IL isn't in the 13-state list but the footnote says "selective basis," so I assumed it refers to an underwriter rather than auto-declining as a non-listed state.

Non-Covered States: Domicile vs. Transit

"Not eligible" — I assumed a company based/operating-domiciled in a non-covered state is a critical decline, while incidental through-mileage in one is a flag for review, not an automatic decline.

"Examples Include…" = Non-Exhaustive

The unacceptable-driver-history and prohibited-exposure lists are explicitly examples, so I treated them as non-exhaustive — named items are hard rules, novel cases route to a human rather than passing silently.

Conflicting Documents: Surface, Don't Pick

When the application, MVR, and license disagree, I assumed the system surfaces the conflict with both values and lowers confidence to force review, rather than auto-picking a winner.

Texas I-10: City-Based With Human Fallback

I assumed the garaging/operating city is the basis for the north-of-I-10 check, with unconfirmed locations routed to a human pending full geocoding.

Power Unit = Tractors Only

For the per-unit premium and 20-unit-minimum rules, I assumed power units mean tractors only, not trailers.

Box Truck Contradiction: Surface for Judgment

Straight trucks are ineligible, but box trucks appear under selective exposures — and a box truck can be a straight truck. I assumed that tension surfaces for underwriter judgment rather than the system silently resolving it.

Completeness: Missing = Flagged, Never Clean

"Valued within 60 days" and "4 most recent IFTA quarters" are checks the system enforces, and a missing or unreadable document makes a submission incomplete and flagged — never silently treated as clean.