登录    注册      
    
  

News Message

Microsoft’s Automatic Prompt Optimization Improves Prompts to Boost LLM Performance



Microsoft’s Automatic Prompt Optimization Improves Prompts to Boost LLM Performance



The recent rise of powerful large language models (LLMs) has revolutionized the field of natural language processing (NLP). The performance of these generative models is largely dependent on users’ prompts, which are becoming increasingly detailed and complex. A Google Trends search reveals a hundredfold increase in popularity for the term “prompt engineering” over the last six months, and social media is teeming with novel prompting tips and templates. Is trial-and-error really the best approach, or could there be a more efficient way to develop effective prompts?

In the new paper Automatic Prompt Optimization with “Gradient Descent” and Beam Search, a Microsoft research team presents Automatic Prompt Optimization (APO), a simple and general prompt optimization algorithm that automatically improves prompts for LLMs, significantly reducing the time and energy spent on manual prompting approaches.

The proposed APO is a general and nonparametric prompt optimization algorithm inspired by numerical gradient descent and designed to automatically optimize prompts. APO connects two existing automated approaches for helping humans write better prompts: 1) training auxiliary models or differentiable representations of the prompt and 2) applying discrete manipulations to prompts through reinforcement learning (RL) or LLM-based feedback.

APO differs from these approaches in that it addresses the discrete optimization barrier by applying gradient descent within a text-based Socratic dialogue, replacing differentiation with LLM feedback and backpropagation with LLM editing.

The proposed approach first adopts minibatches of training data to obtain the “gradients” in natural language (descriptions of a given prompt’s flaws), then edits the prompt toward the opposite semantic direction of the gradient. These steps serve as the expansion component of a wider beam search in the space of prompts, such that the task becomes a beam candidate selection problem, which increases algorithmic efficiency.

In their empirical study, the team compared their APO framework with three state-of-the-art prompt learning baselines — Monte-Carlo (MC, Zhou et al., 2022), RL, and AutoGPT — across Jailbreak detection, Ethos (hate speech detection, Mollas et al., 2020), Liar (fake news detection, Wang, 2017) and Sarcasm detection (Farha and Magdy, 2020) NLP tasks.

APO surpassed the baselines on all four tasks, improving over MC and RL by a significant 3.9 percent and 8.2 percent, respectively. Moreover, the improvements were achieved without extra hyperparameter tuning or model training, validating the proposed APO’s ability to efficiently and significantly improve prompts.



Share Http URL:  http://www.wittx.cn/get_news_message.do?new_id=1208



请输入评论





























Best Last Month

PE投资流程

PE投资流程

Information industry

by wittx


Deep Reinforcement Learning Papers

Deep Reinforcement Learning Papers

Computer software and hardware

by wittx


Three Small Stickers in Intersection Can Cause Tesla Autopil



中科院研发新冠抗体已在美进入II期临床试验



欧盟 1.8 万亿欧元财政刺激计划难产



创业板新股全线狂飙超 500% 海天味业市值突破 6000 亿



PE投资流程

PE投资流程

Information industry

by wittx


2020/10/12 金融行情

2020/10/12 金融行情

Information industry

by wittx


2020/12/30 金融行情

2020/12/30 金融行情

Information industry

by wittx


微软 10 月将发布 Windows 11

微软 10 月将发布 Windows 11

Information industry

by wittx