What Is Learned by DreamLLM? Dream Query Attention

Dream Query Attention In DREAMLLM, the conditional embedding is derived from MLLMs with some learned dream queries. Fig. 6 demonstrates a visualization of the learned cross-attention mechanism between these queries and the diffusion latent. Similar to (Hertz et al., 2023), we visualize the attention map averaged across all timestamps. It is seen that: i) The query attention is structured, disentangled, and semantically-oriented.

This is evidenced by the fact that distinct queries adeptly capture different subject and background semantics. ii) Despite varying prompts, attention patterns exhibit remarkable similarity as shown in Fig. 6 (a) and (b). This contrasts with the token attentions from the original SD, which are typically text-token dependent. We postulate that this arises from the model’s causal nature, leading to a consistent semantic structure order.

Figure 6: Cross-attention of dream queries and the diffusion U-Net latent. Similar to (Hertz et al., 2023), the 64 queries can be viewed as 64 “words”. Each attention map is computed as the cross-attention between each query and the latent feature in the U-Net. The 64 queries are ordered as 8×8 grid sequentially, and each attention map is the result averaged across all timestamps.

Authors:

(1) Runpei Dong, Xi’an Jiaotong University and Internship at MEGVII;

(2) Chunrui Han, MEGVII Technology;

(3) Yuang Peng, Tsinghua University and Internship at MEGVII;

(4) Zekun Qi, Xi’an Jiaotong University and Internship at MEGVII;

(5) Zheng Ge, MEGVII Technology;

(6) Jinrong Yang, HUST and Internship at MEGVII;

(7) Liang Zhao, MEGVII Technology;

(8) Jianjian Sun, MEGVII Technology;

(9) Hongyu Zhou, MEGVII Technology;

(10) Haoran Wei, MEGVII Technology;

(11) Xiangwen Kong, MEGVII Technology;

(12) Xiangyu Zhang, MEGVII Technology and a Project leader;

(13) Kaisheng Ma, Tsinghua University and a Corresponding author;

(14) Li Yi, Tsinghua University, a Corresponding authors and Project leader.

文章来源: https://hackernoon.com/what-is-learned-by-dreamllm-dream-query-attention?source=rss
如有侵权请联系:admin#unsafe.sh