🎉 Experiments is here!

We are thrilled to announce that Experiments is out of beta.

Experiments is designed to help you tune your LLM prompt, test it on production data, and verify your iterations with quantifiable data.

Analyze production edge cases to refine your application’s performance.

Benchmark new releases rigorously before rolling out to production environments.

Implement LLM-as-a-judge or custom evaluation metrics, then compare prompt variations side-by-side with quick, actionable feedback loop.

Determine the best prompt for production by running evaluators to prevent performance regressions.

For detailed documentation, refer to our updated docs.