New updates and improvements
multi_turn_simulation()
methodSeparate score columns in the UI.
runAndEvaluate
(JS/TS) and run_and_evaluate
(Python), making it easier than ever to integrate evaluations into your workflow.We’ve made the SDK even more flexible - testset_id
is now optional in our helper methods, allowing you to run evaluations with custom inputs without requiring a pre-defined testset. Additional improvements include:runAndEvaluate
in the JS/TS SDK and run_and_evaluate
in the Python SDK let you easily evaluate systems against testcases and metrics.