Personalized Daily Arxiv Papers 05/17/2024

Total relevant papers: 3

Paper selection prompt and criteria at the bottom

Table of contents with paper titles:

DEBATE: Devil's Advocate-Based Assessment and Text Evaluation Authors: Alex Kim, Keonwoo Kim, Sangwon Yoon
Spectral Editing of Activations for Large Language Model Alignment Authors: Yifu Qiu, Zheng Zhao, Yftah Ziser, Anna Korhonen, Edoardo M. Ponti, Shay B. Cohen
Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers Authors: Tuo Zhang, Jinyue Yuan, Salman Avestimehr

0. DEBATE: Devil's Advocate-Based Assessment and Text Evaluation

ArXiv ID: 2405.09935 Authors: Alex Kim, Keonwoo Kim, Sangwon Yoon

Abstract: arXiv:2405.09935v1 Announce Type: new Abstract: As natural language generation (NLG) models have become prevalent, systematically assessing the quality of machine-generated texts has become increasingly important. Recent studies introduce LLM-based evaluators that operate as reference-free metrics, demonstrating their capability to adeptly handle novel tasks. However, these models generally rely on a single-agent approach, which, we argue, introduces an inherent limit to their performance. This is because there exist biases in LLM agent's responses, including preferences for certain text structure or content. In this work, we propose DEBATE, an NLG evaluation framework based on multi-agent scoring system augmented with a concept of Devil's Advocate. Within the framework, one agent is instructed to criticize other agents' arguments, potentially resolving the bias in LLM agent's answers. DEBATE substantially outperforms the previous state-of-the-art methods in two meta-evaluation benchmarks in NLG evaluation, SummEval and TopicalChat. We also show that the extensiveness of debates among agents and the persona of an agent can influence the performance of evaluators.

Comment: Matches criterion 3 by proposing a new paradigm for evaluating NLG through a multi-agent scoring system, which is a novel approach to handling subjectivity in language model evaluation. Relevance: 8 Novelty: 7

1. Spectral Editing of Activations for Large Language Model Alignment

ArXiv ID: 2405.09719 Authors: Yifu Qiu, Zheng Zhao, Yftah Ziser, Anna Korhonen, Edoardo M. Ponti, Shay B. Cohen

Abstract: arXiv:2405.09719v1 Announce Type: new Abstract: Large language models (LLMs) often exhibit undesirable behaviours, such as generating untruthful or biased content. Editing their internal representations has been shown to be effective in mitigating such behaviours on top of the existing alignment methods. We propose a novel inference-time editing method, namely spectral editing of activations (SEA), to project the input representations into directions with maximal covariance with the positive demonstrations (e.g., truthful) while minimising covariance with the negative demonstrations (e.g., hallucinated). We also extend our method to non-linear editing using feature functions. We run extensive experiments on benchmarks concerning truthfulness and bias with six open-source LLMs of different sizes and model families. The results demonstrate the superiority of SEA in effectiveness, generalisation to similar tasks, as well as inference and data efficiency. We also show that SEA editing only has a limited negative impact on other model capabilities.

Comment: This paper does not directly match any of the specified criteria but is related to improving the behavior of large language models through a novel method, which might interest someone looking for clever practical tricks in language models. Relevance: 3 Novelty: 7

2. Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers

ArXiv ID: 2405.10276 Authors: Tuo Zhang, Jinyue Yuan, Salman Avestimehr

Abstract: arXiv:2405.10276v1 Announce Type: new Abstract: Numerous recent works aim to enhance the efficacy of Large Language Models (LLMs) through strategic prompting. In particular, the Optimization by PROmpting (OPRO) approach provides state-of-the-art performance by leveraging LLMs as optimizers where the optimization task is to find instructions that maximize the task accuracy. In this paper, we revisit OPRO for automated prompting with relatively small-scale LLMs, such as LLaMa-2 family and Mistral 7B. Our investigation reveals that OPRO shows limited effectiveness in small-scale LLMs, with limited inference capabilities constraining optimization ability. We suggest future automatic prompting engineering to consider both model capabilities and computational costs. Additionally, for small-scale LLMs, we recommend direct instructions that clearly outline objectives and methodologies as robust prompt baselines, ensuring efficient and effective prompt engineering in ongoing research.

Comment: This paper does not closely match any of the specified criteria. It discusses the limitations of small-scale LLMs in the context of optimization by prompting, which is related to improving LLMs' performance but does not specifically address new methodological improvements to RLHF or instruction-following, test set contamination, evaluating open-ended text generation, real-world usage and safety properties, or scaling laws. Relevance: 3 Novelty: 5

Paper selection prompt

New methodological improvements to RLHF or instruction-following which are specific fine-tuning steps that are taken to make language models better at following user instructions across a range of tasks.
- Relevant: papers that discuss specific methods like RLHF, or instruction-tuning datasets, improving these methods, or analyzing them. Usually these papers will explicitly mention RLHF, instruction-following or instruction-tuning.
- Not relevant: papers about adaptation to some task. Simply following instructions or inputs are not sufficient.
Shows new powerful test set contamination or membership inference methods for language models. Test set contamination is the phenomenon where a language model observes a benchmark dataset during pretraining.
- Relevant: test statistics that can detect contamination of benchmarks in language models. statistics that can provide guarantees are more interesting. membership inference methods that are general enough to apply to language models are also relevant.
- Not relevant: any papers that do not consider language models, or that do not consider test set contamination.
Describes new paradigms to evaluating open-ended text generation. Evaluating the outputs of language models is hard, especially in open-ended settings like for chatbots.
- Relevant: papers that fundamentally rethink language model evaluation -- especially by accounting for subjectivity or using adversaries.
- Not relevant: specific evaluations for specific tasks, identifying new properties or flaws of language models, or simply collecting new data.
Conducts surveys or provides data into real-world usage and safety properties of language models.
- Relevant: papers that create new datasets or surveys on real-world usage of language models.
- Not relevant: papers that apply language models to new real-world tasks.
Studies 'scaling laws' in the context of neural networks. Scaling laws refer to the very clear power-law relationship between the size or computational power used to train a model and the performance of that model.
- Relevant: theoretical or conceptual explanation behind scaling laws for language models.
- Not relevant: papers that have experiments at different model scales (but do not explicitly fit a scaling law) or papers that mention scaling laws, but the scaling laws are not the central subject of the paper

In suggesting papers to your friend, remember that he likes learning about surprising empirical results in language models, as well as clever practical tricks. He does not want to read papers that are primarily about applications of methods to the medical or law domains.