Titan: New Architecture by Google | PauseAI | Page 1

tulip grail Jan 15, 2025, 5:57 PM

#

https://arxiv.org/abs/2501.00663
This seems like a big step at first glance!

"We observe that our Titan architecture
outperforms all modern recurrent models as well as their hybrid variants (combining with sliding-window attention) across
a comprehensive set of benchmark"

arXiv.org

Titans: Learning to Memorize at Test Time

Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accur...

#

gulp!

nocturne relic Jan 15, 2025, 7:25 PM

#

Sounds like this could be big. Seems to perform better with larger models. But we've also seen some earlier architectures (like samba) that didn't really scale that well in the end. Wonder how this performs on truly large models.

nocturne relic Jan 15, 2025, 10:24 PM

#

Who thinks this is part of the magic of gemini's insane Context length?

reef egret Jan 15, 2025, 10:51 PM

#

Quite possible

#

Man, we need good news

real bison Jan 15, 2025, 11:05 PM

#

nocturne relic Who thinks this is part of the magic of gemini's insane Context length?

Nearly impossible that it is.

tulip grail Jan 16, 2025, 10:05 AM

#

nocturne relic Who thinks this is part of the magic of gemini's insane Context length?

Yeah the old record of 2M tokens mentioned in the paper is the gemini family. This is definitely a new approach.

nocturne relic Jan 17, 2025, 4:37 PM

#

tulip grail Yeah the old record of 2M tokens mentioned in the paper is the gemini family. Th...

Kind of wild that they'd publish this research if it is potentially so consequential for performance

tulip grail Jan 17, 2025, 6:01 PM

#

Yeah that's been bugging me too. But then again they also released the transformer paper. But there you could argue they weren't aware of the importance which shouldn't apply now.

tough radish Jan 17, 2025, 8:39 PM

#

Google isn't a single coherent organization. Teams can operate with various degrees of autonomy, so if there wasn't an order from above stating that specific types of research cannot be published, the research teams want to publish.

#Titan: New Architecture by Google