Reflexion by dokuDoku reinforcement learning ai agents 🔍 Hide media ▲ 📄 Document Download PDF ⇔ Reflexion is a framework that reinforces language agents through linguistic feedback. It does this by storing reflective text in an episodic memory buffer to induce better decision making in subsequent trials. It has been found this is a good way to incorporate feedback with reasoning into your LLM. The issue with traditional reinforcement learning is that it requires massive amounts of training data and expensive fine-tuning of model weights. Reflexion instead does **not** update any model weights. The key here is that when the agent outputs a new trajectory, he can see his past reflections in the context window and will be able to improve the trajectory output. Feedback can be in either free form text or a scalar/binary score. This has been found to work with different types of agents and is very simple to implement. From the paper (see image) the core algorithm is as follows: **Note:** Trajectory is an arbitrary number of actions produced by the agent. 1. The actor LLM generates a complete trajectory for the current trial 2. Environment returns final feedback after trajectory ends 3. Self-reflection LLM receives full completed trajectory & final feedback 4. Self-reflection LLM appends the following concise verbal critique to the episodic buffer - what went wrong - what to improve 5. For next trial actor LLM receives task prompt & stored reflections - helps to guide subsequent actions 6. Actor generates new trajectory, repeat from step 1 In practice this works well and is very simple to implement. Reflexion boosted performance to 91% pass@1 on HumanEval (vs. GPT-4's 80% at the time). The actor, self-reflection, and evaluator language models can all be different and optimized for their specific task in both the model choice and their training. Additionally, no finetuning is needed, since this only affects what is placed into the context-window of the actor language model. Reflexion is a framework that reinforces language agents through linguistic feedback. It does this by storing reflective text in an episodic memory buffer to induce better decision making in subsequent trials. It has been found this is a good way to incorporate feedback with reasoning into your LLM. The issue with traditional reinforcement learning is that it requires massive amounts of training data and expensive fine-tuning of model weights. Reflexion instead does **not** update any model weights. The key here is that when the agent outputs a new trajectory, he can see his past reflections in the context window and will be able to improve the trajectory output. Feedback can be in either free form text or a scalar/binary score. This has been found to work with different types of agents and is very simple to implement. From the paper (see image) the core algorithm is as follows: **Note:** Trajectory is an arbitrary number of actions produced by the agent. 1. The actor LLM generates a complete trajectory for the current trial 2. Environment returns final feedback after trajectory ends 3. Self-reflection LLM receives full completed trajectory & final feedback 4. Self-reflection LLM appends the following concise verbal critique to the episodic buffer - what went wrong - what to improve 5. For next trial actor LLM receives task prompt & stored reflections - helps to guide subsequent actions 6. Actor generates new trajectory, repeat from step 1 In practice this works well and is very simple to implement. Reflexion boosted performance to 91% pass@1 on HumanEval (vs. GPT-4's 80% at the time). The actor, self-reflection, and evaluator language models can all be different and optimized for their specific task in both the model choice and their training. Additionally, no finetuning is needed, since this only affects what is placed into the context-window of the actor language model. Comments (0) Please log in to comment. No comments yet. Be the first to comment! ← Back to Blog
Comments (0)
Please log in to comment.
No comments yet. Be the first to comment!