AIGC 模型汇总
2023-07-10 16:56 更新
模型
Model | Release Date | Description |
---|---|---|
BERT(opens in a new tab) | 2018 | Bidirectional Encoder Representations from Transformers |
GPT(opens in a new tab) | 2018 | Improving Language Understanding by Generative Pre-Training |
RoBERTa(opens in a new tab) | 2019 | A Robustly Optimized BERT Pretraining Approach |
GPT-2(opens in a new tab) | 2019 | Language Models are Unsupervised Multitask Learners |
T5(opens in a new tab) | 2019 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer |
BART(opens in a new tab) | 2019 | Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension |
ALBERT(opens in a new tab) | 2019 | A Lite BERT for Self-supervised Learning of Language Representations |
XLNet(opens in a new tab) | 2019 | Generalized Autoregressive Pretraining for Language Understanding and Generation |
CTRL(opens in a new tab) | 2019 | CTRL: A Conditional Transformer Language Model for Controllable Generation |
ERNIE(opens in a new tab) | 2019 | ERNIE: Enhanced Representation through Knowledge Integration |
GShard(opens in a new tab) | 2020 | GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding |
GPT-3(opens in a new tab) | 2020 | Language Models are Few-Shot Learners |
LaMDA(opens in a new tab) | 2021 | LaMDA: Language Models for Dialog Applications |
PanGu-α(opens in a new tab) | 2021 | PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation |
mT5(opens in a new tab) | 2021 | mT5: A massively multilingual pre-trained text-to-text transformer |
CPM-2(opens in a new tab) | 2021 | CPM-2: Large-scale Cost-effective Pre-trained Language Models |
T0(opens in a new tab) | 2021 | Multitask Prompted Training Enables Zero-Shot Task Generalization |
HyperCLOVA(opens in a new tab) | 2021 | What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers |
Codex(opens in a new tab) | 2021 | Evaluating Large Language Models Trained on Code |
ERNIE 3.0(opens in a new tab) | 2021 | ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation |
Jurassic-1(opens in a new tab) | 2021 | Jurassic-1: Technical Details and Evaluation |
FLAN(opens in a new tab) | 2021 | Finetuned Language Models Are Zero-Shot Learners |
MT-NLG(opens in a new tab) | 2021 | Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model |
Yuan 1.0(opens in a new tab) | 2021 | Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning |
WebGPT(opens in a new tab) | 2021 | WebGPT: Browser-assisted question-answering with human feedback |
Gopher(opens in a new tab) | 2021 | Scaling Language Models: Methods, Analysis & Insights from Training Gopher |
ERNIE 3.0 Titan(opens in a new tab) | 2021 | ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation |
GLaM(opens in a new tab) | 2021 | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts |
InstructGPT(opens in a new tab) | 2022 | Training language models to follow instructions with human feedback |
GPT-NeoX-20B(opens in a new tab) | 2022 | GPT-NeoX-20B: An Open-Source Autoregressive Language Model |
AlphaCode(opens in a new tab) | 2022 | Competition-Level Code Generation with AlphaCode |
CodeGen(opens in a new tab) | 2022 | CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis |
Chinchilla(opens in a new tab) | 2022 | Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data. |
Tk-Instruct(opens in a new tab) | 2022 | Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks |
UL2(opens in a new tab) | 2022 | UL2: Unifying Language Learning Paradigms |
PaLM(opens in a new tab) | 2022 | PaLM: Scaling Language Modeling with Pathways |
OPT(opens in a new tab) | 2022 | OPT: Open Pre-trained Transformer Language Models |
BLOOM(opens in a new tab) | 2022 | BLOOM: A 176B-Parameter Open-Access Multilingual Language Model |
GLM-130B(opens in a new tab) | 2022 | GLM-130B: An Open Bilingual Pre-trained Model |
AlexaTM(opens in a new tab) | 2022 | AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model |
Flan-T5(opens in a new tab) | 2022 | Scaling Instruction-Finetuned Language Models |
Sparrow(opens in a new tab) | 2022 | Improving alignment of dialogue agents via targeted human judgements |
U-PaLM(opens in a new tab) | 2022 | Transcending Scaling Laws with 0.1% Extra Compute |
mT0(opens in a new tab) | 2022 | Crosslingual Generalization through Multitask Finetuning |
Galactica(opens in a new tab) | 2022 | Galactica: A Large Language Model for Science |
OPT-IML(opens in a new tab) | 2022 | OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization |
LLaMA(opens in a new tab) | 2023 | LLaMA: Open and Efficient Foundation Language Models |
GPT-4(opens in a new tab) | 2023 | GPT-4 Technical Report |
PanGu-Σ(opens in a new tab) | 2023 | PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing |
BloombergGPT(opens in a new tab) | 2023 | BloombergGPT: A Large Language Model for Finance |
PaLM 2(opens in a new tab) | 2023 | A Language Model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. |