by Reinforcement Technology AdvancementsJune 21st, 2025

![]()



‘a futuristic empire’ Image created by HackerNoon AI Image Generator


3 Model and 3.1 Associative memories
6 Empirical Results and 6.1 Empirical evaluation of the radius
6.3 Training Vanilla Transformers
7 Conclusion and Acknowledgments
Appendix B. Some Properties of the Energy Functions
Appendix C. Deferred Proofs from Section 5
Appendix D. Transformer Details: Using GPT-2 as an Example
We explore the hypothesis regarding the radius r in Section 5 using a pre-trained GPT-2 medium model. Additionally, we train various GPT-2 small models and vanilla Transformer models to analyze their cross-entropy losses.


Authors:
(1) Xueyan Niu, Theory Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd.;
(2) Bo Bai baibo ([email protected]);
(3) Lei Deng ([email protected]);
(4) Wei Han ([email protected]).
This paper is
1. available at https://github.com/openai/gpt-2