All Insights
AIFebruary 28, 2026

AI in Production: Lessons from Building AI Products for Regulated Industries

By Raiden Stack

Building AI features for a demo is easy. Building AI products that operate in production for regulated industries, where a wrong answer can have legal consequences, is a fundamentally different problem. We've been doing it for over a year now, and the lessons we've learned are not the ones the AI hype cycle talks about.

The most important lesson: AI without grounding is a liability. When we built RuleWize, our AI-powered employee handbook generator, we quickly learned that sending a prompt like "write an anti-discrimination policy for a New Mexico business" produces content that sounds professional but may be legally inaccurate. The model might reference federal law correctly but miss New Mexico's broader protected classes. It might generate a drug testing policy that ignores state-specific employee protections. It might cite a regulation that was amended or repealed.

The solution was building a structured legal knowledge base, a curated database of employment law organized by state, topic, and applicability. Every entry includes the actual statute citation, a plain-English summary of what the law requires, the effective date, and the employee count thresholds that determine whether the law applies. When the AI generates a section, it receives the relevant knowledge base entries as context. It's not relying on what it learned during training. It's working from verified, current legal facts.

This is what we mean by AI-as-architecture versus AI-as-feature. AI-as-feature is bolting a chatbot onto an existing product. AI-as-architecture is designing the entire system around AI's strengths (natural language generation, classification, analysis) while compensating for its weaknesses (hallucination, outdated training data, lack of domain expertise) through structured inputs and verification layers.

The section-by-section generation approach was another critical design decision. Early prototypes tried to generate entire documents in a single prompt. The output was coherent but shallow. Specific legal requirements were glossed over. Cross-references between sections were inconsistent. By generating each section independently with a tailored prompt that includes only the relevant legal context, we got dramatically better output. Each section is essentially a specialist prompt that knows exactly what legal requirements apply to that specific topic for that specific business in that specific state.

Cost optimization in production is a real concern that rarely gets discussed. Our generation pipeline makes dozens of AI calls per handbook. At scale, costs add up. We implemented a tiered model strategy: Anthropic's Haiku for classification and simple tasks, Sonnet for complex generation. A single handbook generation costs less than $1 in API fees. That's the difference between a sustainable SaaS product and one that bleeds money with every customer.

The automated monitoring pipeline taught us about the importance of AI for classification, not just generation. The pipeline scrapes legal sources daily and uses AI to classify whether a detected change affects employment law, determine which handbook sections are impacted, and assess the urgency of the change. This classification step is where smaller, faster models excel. You don't need the most capable model to answer "does this legislative update affect anti-discrimination policy?" You need speed, accuracy, and low cost at scale.

Error handling in AI products requires a different mindset. Traditional software either works or throws an error. AI outputs exist on a spectrum of quality. We built confidence scoring into the generation pipeline. If the AI generates a section and the confidence is below a threshold (based on the specificity of the legal knowledge base entries available), the section is flagged for human review rather than automatically published. This prevents the worst-case scenario: a customer receiving a policy section that sounds authoritative but doesn't accurately reflect current law.

For anyone building AI products in regulated industries, here's the short version: ground everything in verified data, generate in focused segments rather than monolithic outputs, implement confidence scoring and human review triggers, optimize costs with model tiering, and never forget that your customers are relying on your output for legal compliance. The bar is not "does it sound right?" The bar is "is it actually right?"