Q-Star Q* in Reinforcement Learning

Author: Miquel Noguer i Alonso - Founder at AI Finance Institute
Date: November 23, 2023

Q* is the currently accepted notation for the Optimal Action Value Function in RL.

Q* RL algorithm might be using AI generated data (Logic + Maths) and teaches the LLM to solve multi-step logic problems. Q* might be applied to GPT-5, giving it excellent reasoning and retrieval skills.

Reasoning

The biggest gains on reasoning come from strong reward models, as opposed to more SFT data or tools.

Much of (unpublished) research is now focused on finding a general planning algorithm for LLMs, i.e. some equivalent of the dlPFC. So PLANNING is the name of the game.

Maths

In the literature, we have seen different approaches to teaching math to AI models like Transformers + Beam Search or Large language models, which are capable of solving tasks that require complex multistep reasoning by generating solutions in a step-by-step chain-of-thought format.

One effective method in the second involves training reward models to discriminate between desirable and undesirable outputs.

Abstract

Access this document for a comprehensive overview of the Q-Star (Q*) concept in reinforcement learning, which delves into its mathematical formulation, significance, and the methods employed for approximation in learning algorithms.

Q* Bellman Equality

In the literature we see two distinct methods

for training reward models: outcome supervision & process supervision.

Hodge-RiemannN Cohomology Classes

Name		Name	Last commit message	Last commit date
Latest commit History 255 Commits
.github		.github
Bellman Q*		Bellman Q*
profile		profile
Black_Holes_in_General_Relativity 2.pdf		Black_Holes_in_General_Relativity 2.pdf
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
Optimization.ipynb		Optimization.ipynb
README.md		README.md
quantum_data.ipynb		quantum_data.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Q-Star Q* in Reinforcement Learning

Q* is the currently accepted notation for the Optimal Action Value Function in RL.

Reasoning

Maths

Abstract

Copyright 2024 Quantum-Software-Development. Code released under the MIT license.

About

Sponsor this project

Languages

License

Quantum-Software-Development/Q-Star

Folders and files

Latest commit

History

Repository files navigation

Q-Star Q* in Reinforcement Learning

Q* is the currently accepted notation for the Optimal Action Value Function in RL.

Reasoning

Maths

Abstract

Copyright 2024 Quantum-Software-Development. Code released under the MIT license.

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Sponsor this project

Languages