Published onFebruary 8, 2024Active Preference Learning for Large Language ModelsICML 2024LLMFine-TuningAlignmentWe propose a practical acquisition function for prompt/completion pairs based on the predictive entropy of the language model and a measure of certainty of the implicit preference model optimized by DPO.