BERTs Are Generative In-Context Learners Hackernews Viewer

BERTs Are Generative In-Context Learners

141 points by fzliu 14 November 2024 | 46 comments

Comments

patelajay285 14 November 2024

We found the same result a few years ago in our ICLR paper: https://arxiv.org/pdf/2209.14500

We found Google's T5 models which were released in 2019, pre-GPT-3, were "secretly" capable of in-context learning with a simple inference technique.

Given they use a bidirectional MLM (Masked Language Modeling) objective, it wasn't obvious how to do it, but MLM objectives are known to produce better language representations than causal (next token prediction) objectives. We were able to outperform much larger sized GPT-3 models or get very close to their performance with far smaller T5 models.

jsenn 14 November 2024

The "embarrassingly simple inference technique" is to put a bunch of [MASK] tokens at the end of the prompt.

I'm having trouble understanding whether this paper is saying anything new. The original BERT paper already compared it favourably to causal models including GPT. Was there any doubt that BERT-style models could be in-context learners?

From what I gather as a non-expert, the problem with BERT is scaling/training efficiency: GPT gets C-1 training examples out of a training input of length C, but BERT only gets 0.15*C examples. Indeed, the author points out that DeBERTa required 3x more compute than GPT-3 to achieve the level of performance reported, which makes sense.

srameshc 14 November 2024

As someone who has very limited understanding but tried to use BERT for classification, is BERT still relavant when compared to LLMs ? Asking because I hardly see any mention of BERTs anymore.

Ldorigo 14 November 2024

Do we a know whether the current SOTA foundation models (Gemini, gpt4o, Claude, etc) are actually all GPT-based (as in, causal models)?