Skip to content

Quantum-Software-Development/Q-Star

Repository files navigation

Q-Star Q* in Reinforcement Learning

Author: Miquel Noguer i Alonso - Founder at AI Finance Institute
Date: November 23, 2023

Q* is the currently accepted notation for the Optimal Action Value Function in RL.

Q* RL algorithm might be using AI generated data (Logic + Maths) and teaches the LLM to solve multi-step logic problems. Q* might be applied to GPT-5, giving it excellent reasoning and retrieval skills.

Reasoning

The biggest gains on reasoning come from strong reward models, as opposed to more SFT data or tools.

Much of (unpublished) research is now focused on finding a general planning algorithm for LLMs, i.e. some equivalent of the dlPFC. So PLANNING is the name of the game.

Maths

In the literature, we have seen different approaches to teaching math to AI models like Transformers + Beam Search or Large language models, which are capable of solving tasks that require complex multistep reasoning by generating solutions in a step-by-step chain-of-thought format.

One effective method in the second involves training reward models to discriminate between desirable and undesirable outputs.

Abstract

Access this document for a comprehensive overview of the Q-Star (Q*) concept in reinforcement learning, which delves into its mathematical formulation, significance, and the methods employed for approximation in learning algorithms.

Q* Bellman Equality

Q* Bellman Equality

In the literature we see two distinct methods

for training reward models: outcome supervision & process supervision.

Hodge-RiemannN Cohomology Classes

Hodge-RiemannN Cohomology Classes

Sponsor Quantum Software Development