By using two agents to break down a prompt, generate a chain of thoughts, and verify subproblem outputs, we saw accuracy and safety improvements over a one-shot approach. This was inspired by OpenAI's recent o1 model as well as research in the field of multi-agent LLMs. Currently, the project uses two instances of Google's Gemini Flash 1.5 model, but this approach can be generalized to any number and type of models.
Learn more at our presentation.