-
AI support prompt consistency issues
We’ve been building an AI support assistant for our SaaS product and recently started noticing a weird issue: small changes to the system prompt or tool instructions completely change the quality of responses. Sometimes the bot becomes overly verbose, sometimes it starts missing obvious intents, and we can’t reliably reproduce “good” vs “bad” behavior anymore. Right now we’re just manually tweaking prompts and hoping for the best, but it’s getting hard to track what actually improved or broke things over time. What are teams using to properly test, compare, and optimize prompts in production systems without losing control over iteration changes?