DoubleStreamBlock is a Transformer block in modern Diffusion models for which multiple sequences of tokens (for example, image and text tokens) are processed in parallel. Except for a shared self attention (on concatenated sequences), the sequences could potentially be processed more efficiently in parallel (given enough compute).
#Efficient DoubleStreamBlock kernel
4 messages · Page 1 of 1 (latest)
Hey @minor palm do you recommend any resources on this I could look into?
is there already a team on this? would love to join efforts