Skip to content

Latest commit

 

History

History
192 lines (149 loc) · 20.2 KB

README.md

File metadata and controls

192 lines (149 loc) · 20.2 KB

LLM-Table-Survey

Table of Contents

📄 Paper List

Large Language Model

  • GPT-3, Language Models are Few-Shot Learners. NeurIPS 20. [Paper]
  • T5, Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. [Paper]
  • FLAN, Finetuned Language Models Are Zero-Shot Learners. ICLR 22. [Paper] [Code]
  • DPO, Direct Preference Optimization: Your Language Model is Secretly a Reward Model. NeurIPS 23. [Paper]
  • PEFT, The Power of Scale for Parameter-Efficient Prompt Tuning. EMNLP 21. [Paper]
  • LoRA, LoRA: Low-rank Adaptation of Large Language Models. ICLR 22. [Paper]
  • Chain-of-thought Prompting, Chain-of-thought prompting elicits reasoning in large language models. NeurIPS 22. [Paper]
  • Least-to-most Prompting, Least-to-most prompting enables complex reasoning in large language models. ICLR 23. [Paper]
  • Self-consistency Prompting, Self-consistency improves chain of thought reasoning in language models. ICLR 23. [Paper]
  • ReAct, ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 23. [Paper] [Code]

Pre-LLM Era Table Training

  • TaBERT, TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data. ACL 20 Main. [Paper] [Code]
  • TaPEx, TAPEX: Table Pre-training via Learning a Neural SQL Executor. ICLR 22. [Paper] [Code] [Models]
  • TABBIE, TABBIE: Pretrained Representations of Tabular Data. NAACL 21 Main. [Paper] [Code]
  • TURL, TURL: Table Understanding through Representation Learning. VLDB 21. [Paper] [Code]
  • RESDSQL, RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL. AAAI 23. [Paper] [Code]
  • UnifiedSKG, UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models. EMNLP 22 Main. [Paper ] [Code]
  • SpreadsheetCoder, SpreadsheetCoder: Formula Prediction from Semi-structured Context. ICML 21. [Paper] [Code]

Table Instruction-Tuning

Code LLM

Hybrid of Table & Code

Parameter-Efficient Fine-Tuning

Direct Preference Optimization

  • SENSE, Synthesizing Text-to-SQL Data from Weak and Strong LLMs. ACL 24. [Paper]

Small Language Model + Large Language Model

  • ZeroNL2SQL, Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL. VLDB 24. [Paper]

Multimodal Table Understanding & Extraction

  • LayoutLM, LayoutLM: Pre-training of Text and Layout for Document Image Understanding. KDD 20. [Paper]
  • PubTabNet, Image-Based Table Recognition: Data, Model, and Evaluation. ECCV 20. [Paper] [Code & Data]
  • Table-LLaVA, Multimodal Table Understanding. ACL 24. [Paper] [Code] [Model]
  • TableLVM, TableVLM: Multi-modal Pre-training for Table Structure Recognition. ACL 23. [Paper]
  • PixT3, PixT3: Pixel-based Table-To-Text Generation. ACL 24. [Paper]

Representation

  • Tabular representation, noisy operators, and impacts on table structure understanding tasks in LLMs. NeurIPS 2023 second table representation learning workshop. [Paper]
  • SpreadsheetLLM, SpreadsheetLLM: Encoding Spreadsheets for Large Language Models. arXiv 24. [Paper]
  • Enhancing Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies. EMNLP 23. [Paper] [Code]
  • Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs. arXiv 24. [Paper]

Prompting

NL2SQL

  • The Dawn of Natural Language to SQL: Are We Fully Ready? VLDB 24. [Paper] [Code]
  • MCS-SQL, MCS-SQL: Leveraging Multiple Prompts and Multiple-Choice Selection For Text-to-SQL Generation. [Paper]
  • DIN-SQL, DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction Prompting, Decompose. NeurIPS 23. [Paper] [Code]
  • DAIL-SQL, Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation. VLDB 24. [Paper] [Code]
  • C3, C3: Zero-shot Text-to-SQL with ChatGPT. arXiv 24. [Paper] [Code]

Table QA

  • Dater, Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning. SIGIR 23. [Paper] [Code]
  • Binder, Binding language models in symbolic languages. ICLR 23. [Paper] [Code]
  • ReAcTable, ReAcTable: Enhancing ReAct for Table Question Answering. VLDB 24. [Paper] [Code]
  • E5, E5: Zero-shot Hierarchical Table Analysis using Augmented LLMs via Explain, Extract, Execute, Exhibit and Extrapolate. NAACL 24. [Paper] [Code]
  • Chain-of-Table, Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding. ICLR 24. [Paper]
  • ITR, An Inner Table Retriever for Robust Table Question Answering. ACL 23. [Paper]
  • LI-RAGE, LI-RAGE: Late Interaction Retrieval Augmented Generation with Explicit Signals for Open-Domain Table Question Answering. ACL 23. [Paper]

Spreadsheet

  • SheetCopilot, SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models Agent. NeurIPS 23. [Paper] [Code]
  • SheetAgent, SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models. arXiv 24. [Paper]
  • Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities. arXiv 24. [Paper]

Multi-task Framework

  • StructGPT, StructGPT: A General Framework for Large Language Model to Reason over Structured Data. EMNLP 23 Main. [Paper] [Code]
  • TAP4LLM, TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning. arXiv 23. [Paper]
  • UniDM, UniDM: A Unified Framework for Data Manipulation with Large Language Models. MLSys 24. [Paper]
  • Data-Copilot, Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow. arXiv 23. [Paper] [Code]

Tools

  • LlamaIndex
  • PandasAI
  • Vanna
  • DB-GPT. DB-GPT: Empowering Database Interactions with Private Large Language Models. [Paper] [Code]
  • RetClean. RetClean: Retrieval-Based Data Cleaning Using Foundation Models and Data Lakes. [Paper] [Code]

Survey

  • A Survey of Large Language Models. [Paper]
  • A Survey on Large Language Model Based Autonomous Agents. [Paper]
  • Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks. [Paper]
  • Transformers for tabular data representation: A survey of models and applications. [Paper]
  • A Survey of Table Reasoning with Large Language Models. [Paper]
  • A survey on table question answering: Recent advances. [Paper]
  • Large Language Models(LLMs) on Tabular Data - A Survey. [Paper]
  • A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions. [Paper]

📊 Datasets & Benchmarks

Benchmarks

Name Keywords Artifact Paper
MBPP Code link arXiv 21
HumanEval Code link arXiv 21
Dr.Spider NL2SQL, Robustness link ICLR 23
WiKiTableQuestions Table QA link ACL 15
WiKiSQL Table QA,NL2SQL link arXiv 17
TabFact Table Fact Verification link ICLR 20
HyBirdQA Table QA link EMNLP 20
FetaQA Table Fact Verification link TACL 22
RobuT Table QA link ACL 23
AnaMeta Table Metadata link ACL 23
GPT4Table Table QA, Table-to-text link WSDM 24
ToTTo Table-to-text link EMNLP 20
SpreadsheetBench Spreadsheet Manipulation link NeurIPS 24
BIRD NL2SQL link NeurIPS 23
Spider NL2SQL link EMNLP 18
Dr.Spider NL2SQL link ICLR 23
ScienceBenchmark NL2SQL link VLDB 24
DS-1000 Data Analysis link ICML 23
InfiAgent-DABench Data Analysis link ICML 24
TableBank Table Detection link LERC 20
PubTabNet Table Extraction link ECCV 20
ComTQA Visual Table QA, Table Detection, Table Extraction link arXiv 24

Datasets

Name Keywords Artifact Paper
TableInstruct Table Instruction Tuning link arXiv 23
WDC Web Table link WWW 16
GitTables GitHub CSVs link SIGMOD 23
DART Table-to-text link NAACL 21
MMTab Multimodal Table Understanding link ACL 24
SchemaPile Database Schemas link SIGMOD 24