A/B testing models, prompts, performance and costs within the Pickaxe interface?

Would be cool to have a split interface within Pickaxe to A/B test models and prompts in terms of performance and costs - see wireframe below.

This is something you should look at @inck! Very cool idea.