I was wondering what tool you were using internally to manage your prompts and test them over test samples, but also how you manage the whole lifecycle.
More particularly, since we expect certain answers from the agents based on the chat history / memory, how do you handle prompt testing for output expected after various scenarios of conversation?
Thank you and have a great day.