Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss

Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss
研究通过在OpenWebText上使用GPT-2 medium模型进行实验，验证了理论框架中的半径假设，并通过测量最后一层的激活距离来分析下一个令牌预测。此外，还训练了多种GPT-2 small模型和vanilla Transformer模型以分析其交叉熵损失，结果支持理论框架的有效性。 2025-6-21 17:0:3 Author: hackernoon.com(查看原文) 阅读量:6 收藏

by Reinforcement Technology AdvancementsJune 21st, 2025

Read on Terminal Reader

Read this story w/o Javascript

Open TLDR

Too Long; Didn't Read

These experiments with GPT-2 medium on OpenWebText validate the radius hypothesis from our theoretical framework, measuring activation distances in the last layer for next-token prediction.

People Mentioned

featured image - Empirical Results: GPT-2 Analysis of Transformer Memorization & Loss

‘a futuristic empire’ Image created by HackerNoon AI Image Generator

Reinforcement Technology Advancements HackerNoon profile picture

0-item

Table of Links

Abstract and 1 Introduction

2 Related Work

3 Model and 3.1 Associative memories

3.2 Transformer blocks

4 A New Energy Function

4.1 The layered structure

5 Cross-Entropy Loss

6 Empirical Results and 6.1 Empirical evaluation of the radius

6.2 Training GPT-2

6.3 Training Vanilla Transformers

7 Conclusion and Acknowledgments

Appendix A. Deferred Tables

Appendix B. Some Properties of the Energy Functions

Appendix C. Deferred Proofs from Section 5

Appendix D. Transformer Details: Using GPT-2 as an Example

References

6 Empirical Results

We explore the hypothesis regarding the radius r in Section 5 using a pre-trained GPT-2 medium model. Additionally, we train various GPT-2 small models and vanilla Transformer models to analyze their cross-entropy losses.

6.1 Empirical evaluation of the radius

Figure 3: Cross-entropy loss of GPT-2 small model trained on (left) 100%, (middle) 1%, and (right) 0.1% of OpenWebText-9B dataset with a typical training time.

Authors:

(1) Xueyan Niu, Theory Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd.;

(2) Bo Bai baibo ([email protected]);

(3) Lei Deng ([email protected]);

(4) Wei Han ([email protected]).

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

1. available at https://github.com/openai/gpt-2

文章来源: https://hackernoon.com/empirical-results-gpt-2-analysis-of-transformer-memorization-and-loss?source=rss
如有侵权请联系:admin#unsafe.sh