We would like to maintain an up-to-date list of progress (papers, blogs, codes, and etc.) made in Human Feedback For AI (LLM,Text-image and other task), and provide a guide for some of the papers that have received wide interest. Please feel free to open an issue to add papers.
- Human Feedback for LLM
- Human Feedback for Text-Image
- Human Feedback for Robot Control
- About Reinforcement Learning
- Deep reinforcement learning from human preferences, nips'17. [paper]
- Recursively Summarizing Books with Human Feedback, arxiv'22. [paper]
- InstructGPT: Training Language Models to Follow Instructions With Human Feedback, nips'22. [paper] [video]
- Fine-tuning language models to find agreement among humans with diverse preferences, nips'22. [paper]
- Constitutional AI: Harmlessness from AI Feedback, arxiv'22. [paper]
- Training a helpful and harmless assistant with reinforcement learning from human feedback, arxiv'22. [paper]
- Direct Preference Optimization:Your Language Model is Secretly a Reward Model, arxiv'23. [paper] [code] [blogs]
- RRHF: Rank responses to align language models with human feedback without tears, arxiv'23. [paper] [code] [blogs]
- RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment, arxiv'23. [paper] [code] [blogs]
- Fine-Grained Human Feedback Gives Better Rewards for Language Model Training, arxiv'23. [paper] [code] [blogs]
- Fine-Tuning Language Models with Advantage-Induced Policy Alignment, arxiv'23. [paper]
- Scaling Laws for Reward Model Overoptimization, ICLR'23. [paper]
- Reward Collapse in Aligning Large Language Models, arxiv'23. [paper] [blogs]
- Chain of Hindsight Aligns Language Models with Feedback, arxiv'23. [paper]
- Principled Reinforcement Learning with Human Feedback from Pairwise or K, arxiv'23. [paper]
- Reinforcement Learning from Diverse Human Preferences, arxiv'23. [paper]
- Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback, arxiv'23. [paper]
- Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization, iclr'23. [paper] [code]
- How to Query Human Feedback Efficiently in RL? arxiv'23. [paper]
- Pretraining Language Models with Human Preferences, icml'23. [paper]
- Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback, arxiv'23. [paper]
- Aligning text-to-image models using human feedback, arxiv'23. [paper] [blogs]
- Better aligning text-to-image models with human preference, arxiv'23. [paper] [code]
- DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models, arxiv'23. [paper]
- ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation, arxiv'23. [paper] [code]
- AGIQA-3K: An Open Database for AI-Generated Image Quality Assessment, arxiv'23. [paper] [code]
- AIGCIQA2023: A Large-scale Image Quality Assessment Database for AI Generated Images: from the Perspectives of Quality, Authenticity and Correspondence, arxiv'23. [paper] [code]
- Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation, arxiv'23. [paper]
- Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis, arxiv'23. [paper] [code]
- Aligning human preferences with baseline objectives in reinforcement learning, icra'23. [paper]
- Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training, icml'21. [paper]
- Augmented Proximal Policy Optimization for Safe Reinforcement Learning, aaai'23. [paper]