RWKV-papers | EleutherAI | Page 11

willow condor Apr 12, 2025, 3:33 PM

#

Is there a new RWKV paper being written? If so, is it possible for me to contribute?

rose mango Apr 12, 2025, 5:34 PM

#

RWKV 7 paper is already complete

willow condor Apr 13, 2025, 8:06 AM

#

rose mango RWKV 7 paper is already complete

i mean what about rwkv 8? is that going to happen in the next year?

obsidian quest Apr 13, 2025, 10:50 AM

#

yeah rwkv8 paper = early next year

fresh mulch Apr 13, 2025, 3:07 PM

#

1 year release cycle sounds chill

gusty condor Apr 14, 2025, 1:53 AM

#

11-month release cycle:
RWKV-4: 2305.xxxx
RWKV-5/6: 2404.xxxx
RWKV-7: 2503.xxxx
RWKV-8: 2602.xxxx?

rose mango Apr 14, 2025, 2:32 AM

#

Personally I like this almost-a-year-but-not-quite release cycle

iron parrot Apr 15, 2025, 4:03 AM

#

Something big is coming

acoustic knoll Apr 15, 2025, 5:50 AM

#

iron parrot Something big is coming

From openai?

gusty condor Apr 15, 2025, 6:16 AM

#

From RWKV for sure!

void quartz Apr 15, 2025, 7:26 AM

#

rose mango Personally I like this almost-a-year-but-not-quite release cycle

honestly it feels like less then 11 months

#

oh - because 5/6 was merged together in 1 paper.... definately felt like less then 11 months per version haha

obsidian quest Apr 15, 2025, 9:12 AM

#

are they using hybrid / rnn 😂

willow condor Apr 16, 2025, 12:46 AM

#

Its possible that this just helps focus attention, because maybe the model expects the instructions to be at either position and this reduces the chance that the model will "miss"

#

Also the "above" context makes sense not just in the case of RNNs, because causal attention can only attend to previous tokens, so having instrucitons at the beginning will be the difference between the model having a superposition of possible instructions vs. instruction information propagating from the start.

#

crazy if true, hope they open source 4.1

gusty condor Apr 16, 2025, 3:21 PM

#

I believe they are using hybrid

obsidian quest Apr 17, 2025, 5:32 AM

#

obsidian quest are they using hybrid / rnn 😂

gpt4o likely already hybrid

young sparrow Apr 17, 2025, 7:03 AM

#

gusty condor I believe they are using hybrid

What evidence do you have for this?

gusty condor Apr 17, 2025, 7:37 AM

#

young sparrow What evidence do you have for this?

See Blink's evidence

quaint quiver Apr 17, 2025, 9:03 AM

#

obsidian quest gpt4o likely already hybrid

its most likely if anything a local global hybrid

obsidian quest Apr 17, 2025, 9:17 AM

#

quaint quiver its most likely if anything a local global hybrid

true

wraith heron Apr 17, 2025, 4:29 PM

#

so just something like sliding window attention?

young sparrow Apr 17, 2025, 5:25 PM

#

gusty condor See Blink's evidence

This is not meaningful evidence.

gusty condor Apr 18, 2025, 1:40 AM

#

Just some speculation.
We are not writing an article, just chatting, so no need to ask evidence for every claim.

obsidian quest Apr 19, 2025, 1:36 PM

#

https://x.com/BlinkDL_AI/status/1913587335087964262

BlinkDL (@BlinkDL_AI) on X

RWKV-7 inference by hand in Excel: https://t.co/7Iw2jiP85F🙂similar to @ProfTomYeh style

obsidian quest Apr 29, 2025, 5:16 PM

#

lets test #off-topic message

young sparrow May 11, 2025, 1:30 AM

#

@obsidian quest @void quartz I'm trying to track down Guanyu Song, do you have his email address / contact details?

void quartz May 11, 2025, 2:31 AM

#

young sparrow <@870137517020688415> <@644428303293349888> I'm trying to track down Guanyu Song...

Do you happen to know the discord handle? is it @steady ether ?

void quartz May 11, 2025, 2:31 AM

#

young sparrow <@870137517020688415> <@644428303293349888> I'm trying to track down Guanyu Song...

@misty igloo

misty igloo May 11, 2025, 2:32 AM

#

young sparrow <@870137517020688415> <@644428303293349888> I'm trying to track down Guanyu Song...

He is @steady ether

gusty condor May 11, 2025, 3:28 AM

#

He is Guangyu, not Guanyu

misty igloo May 11, 2025, 4:15 AM

#

(I already DM'd Stella contact info)

gusty condor May 20, 2025, 2:45 AM

#

https://colmweb.org/dates.html
RWKV-7 paper rebuttal will start next week. Stay tuned

misty igloo May 20, 2025, 3:18 AM

#

ughhh not fun timing for me

gusty condor May 20, 2025, 5:10 AM

#

misty igloo ughhh not fun timing for me

I will do.

misty igloo May 20, 2025, 10:47 AM

#

gusty condor I will do.

thanks! I will be able to work on it too, I just have to move (physically) and do the rebuttal for RADLADS at the same time 🙂

obsidian quest May 26, 2025, 1:54 PM

#

https://x.com/BlinkDL_AI/status/1926941496684519805

BlinkDL (@BlinkDL_AI)

RWKV-8 "Heron" preview (1) - DeepEmbed. Seems Gemma3n is trying similar tricks (Per-Layer Embedding), so I will discuss it first 🪶 It's essentially free performance - lots of params, but can be offloaded to RAM/SSD, and simple to train and deploy🚀

gusty condor May 27, 2025, 9:34 AM

#

The reviews will be out in a few hours.
I strongly believe that our paper will get high scores
(If I were the reviewer, I would give a score between "good accept" and "highlight")

obsidian quest May 27, 2025, 11:32 AM

#

https://github.com/Benjamin-Walker/structured-linear-cdes let's fix their rwkv7 result

gusty condor May 27, 2025, 1:56 PM

#

Reviews are out!
Score: 6,8,8,8

#

That probably means 100% accept and top 2% of all papers and top 5% of the accepted papers

fresh mulch May 27, 2025, 2:24 PM

#

rebuttals seem pretty easy too, mostly clarifications and a few ablations

gusty condor May 27, 2025, 2:25 PM

#

fresh mulch rebuttals seem pretty easy too, mostly clarifications and a few ablations

We have them in the appendix

tropic minnow May 27, 2025, 3:48 PM

#

gusty condor Reviews are out! Score: 6,8,8,8

good job guys 🥂

misty igloo May 27, 2025, 9:03 PM

#

Yeah I think we're solid here and mostly need to point out the existing ablation studies in the appendix to reviewer UGzf to address their concern #1
Fantastic job, everybody! I knew the paper was really solid, but it's still nice to see it confirmed via double blind review 🤣

dawn pewter May 27, 2025, 9:13 PM

#

good job!

fresh mulch May 27, 2025, 9:15 PM

#

misty igloo Yeah I think we're solid here and mostly need to point out the existing ablation...

likewise for reviewer QvXt concern #2, but it seems like they may want even more

misty igloo May 27, 2025, 9:22 PM

#

~~I feel like there's no way to address #2~~ sorry I didnt see you meant the other reviewer

misty igloo May 27, 2025, 9:23 PM

#

fresh mulch likewise for reviewer QvXt concern #2, but it seems like they may want even more

yeah but QvXt gave us an 8 already

#

the only score we really need to improve is UGzf, and even if they keep it we're still fine

fresh mulch May 27, 2025, 9:25 PM

#

oh fair enough

gusty condor May 28, 2025, 2:25 AM

#

misty igloo the only score we really need to improve is UGzf, and even if they keep it we're...

The reviews seemed not to have read the paper carefully

misty igloo May 28, 2025, 2:25 AM

#

gusty condor The reviews seemed not to have read the paper carefully

its a long paper with a million appendices

#

I agree

gusty condor May 28, 2025, 2:29 AM

#

I think reviewer GjhW's questions are solid and should be addressed first

misty igloo May 28, 2025, 2:52 AM

#

gusty condor I think reviewer GjhW's questions are solid and should be addressed first

I've created a google doc where we can work on the responses https://docs.google.com/document/d/17BCQcG5gH28fqmipBxjM2Z-vlf37I6YlJfE6776gSqk/edit?usp=sharing

Google Docs

RWKV-7 COLM Response

gusty condor May 28, 2025, 3:02 AM

#

@bronze frost I think we need some clarification on kernel speed benchmarks. Could you please test kernels on specific settings encountered in training, such as (12,64,4096)?

tropic minnow May 28, 2025, 12:05 PM

#

i updated figure3 for both arxiv and colm paper, as requested by 1 reviewer

#

with or without the black rectangle ?

fresh mulch May 28, 2025, 12:21 PM

#

i prefer without unless you can add "see insert" text or the like

#

"see left" i guess

tropic minnow May 28, 2025, 12:33 PM

#

then without, more similar to the original and addresses the points from reviewer

obsidian quest May 29, 2025, 11:06 AM

#

https://github.com/HazyResearch/zoology
certainly a wrong implementation of rwkv7. lets fix it 🙂

GitHub

GitHub - HazyResearch/zoology: Understand and test language model a...

Understand and test language model architectures on synthetic tasks. - HazyResearch/zoology

#

@gusty condor @steady ether

fresh mulch May 29, 2025, 7:27 PM

#

gusty condor That probably means 100% accept and top 2% of all papers and top 5% of the accep...

corroborated by data 🎉 https://bsky.app/profile/colmweb.org/post/3lq6acxagpk2w

Conference on Language Modeling (@colmweb.org)

-# ↩ Conference on Language Modeling (@colmweb.org)
Here are some graphs

obsidian quest May 30, 2025, 7:55 PM

#

obsidian quest https://github.com/HazyResearch/zoology certainly a wrong implementation of rwkv...

https://github.com/HazyResearch/zoology/issues/34

Firstly, RWKV-7 state_size is exaggerated to 4x and 16x for d_model=128 and d_model=256

GitHub

Incorrect RWKV-7 state_size computation · Issue #34 · HazyResearc...

Here in state_size function: https://github.com/HazyResearch/zoology/blob/main/zoology/mixers/rwkv7.py should be self.num_heads * self.head_dim* self.head_dim So RWKV-7 state_size is exaggerated to...

gusty condor Jun 1, 2025, 7:07 AM

#

https://arxiv.org/pdf/2505.23735
They got RWKV-7 formula wrong.

obsidian quest Jun 1, 2025, 8:16 AM

#

gusty condor https://arxiv.org/pdf/2505.23735 They got RWKV-7 formula wrong.

and they are using wrong RWKV-7 state_size from zoology, in figure 7

gusty condor Jun 1, 2025, 12:40 PM

#

All done. I think it's time to reply to the reviewers.

misty igloo Jul 8, 2025, 3:45 AM

#

The RWKV-7 paper got accepted, and so did RADLADS 🙂

#

Great job, everyone!

young sparrow Jul 8, 2025, 5:58 AM

#

Amazing work!

misty igloo Jul 8, 2025, 1:49 PM

#

The other funny thing is that it seems that the lowest score review was disregarded due to review quality, so we ended up with maybe the equivalent of an amazing 8,8,8 for scores!

fresh mulch Jul 8, 2025, 1:59 PM

#

disregarded due to review quality? huh

#

damn

void quartz Jul 13, 2025, 5:04 AM

#

Was it an ai generated review?

#

berk

fresh mulch Jul 20, 2025, 10:21 PM

#

so how is the camera ready coming along? (is it at all?)

gusty condor Jul 21, 2025, 12:45 AM

#

Not hurrying, deadline is August 7, 2025.

tropic minnow Jul 29, 2025, 1:26 PM

#

@iron parrot do you have the pg19 code used to run the models and produce the plots in eagle and finch paper

iron parrot Jul 29, 2025, 4:10 PM

#

tropic minnow <@701460149134688386> do you have the pg19 code used to run the models and produ...

I believe Smerky ran the pg19 test for the Eagle and Finch paper, while I handled the one for the Goose paper.
I have the code to reproduce the results from the Goose paper. Do you need it?

tropic minnow Jul 29, 2025, 4:40 PM

#

yup thx

iron parrot Jul 29, 2025, 6:42 PM

#

use eval.py to test the model and plot.py to visualize the test results

📎 loss.zip

obsidian quest Aug 1, 2025, 5:08 AM

#

we can work on a rwkv7s paper

seamless upgrading rwkv7-g1a 0.1b
+de (orange) vs +de+dea (blue, a bit difficult for a trained model to utilize in the beginning, then works)

tropic minnow Aug 1, 2025, 11:39 AM

#

obsidian quest we can work on a rwkv7s paper seamless upgrading rwkv7-g1a 0.1b +de (orange) vs...

x axis is billion tokens?

obsidian quest Aug 1, 2025, 11:41 AM

#

yes

misty igloo Aug 1, 2025, 3:35 PM

#

obsidian quest we can work on a rwkv7s paper seamless upgrading rwkv7-g1a 0.1b +de (orange) vs...

DE is basically a form of hash routing moe and I think we can make it a lot more efficient, see my comment in the rwkv discord #1109810049607532555 message

gusty condor Aug 3, 2025, 1:33 AM

#

obsidian quest we can work on a rwkv7s paper seamless upgrading rwkv7-g1a 0.1b +de (orange) vs...

You have not unveiled the newest design of DE and DEA

gusty condor Aug 4, 2025, 2:16 AM

#

gusty condor Not hurrying, deadline is August 7, 2025.

4 days to go!

misty igloo Aug 4, 2025, 5:53 PM

#

gusty condor 4 days to go!

I'm editing the radlads paper atm but I should be done with that and can start on the rwkv paper soon too

#

We need to make some changes to satisfy things we promised the reviewers we'd do, and reorg a bit for the authors and new 10p limit

misty igloo Aug 5, 2025, 12:54 AM

#

did a first pass on all that...

obsidian quest Aug 5, 2025, 7:10 PM

#

total 191.034624 M
activated 140.702976 M
read 768 numbers per token (embed)

rwkv7+DE 0.1b
total 997.852416 M
activated 142.226688 M
read 13056 numbers per token (embed+DE)

rwkv7+DE+DEA 0.1b
total 1806.753024 M
activated 145.833216 M
read 25344 numbers per token (embed+DE+DEA)
state size = 589824 + 768 * ctxlen = 768 * (768 + ctxlen)```

misty igloo Aug 5, 2025, 10:08 PM

#

@here Anyone know if there is anything preventing us from submitting the camera ready to COLM? If there is, please let me know so we can fix it!

rose mango Aug 5, 2025, 10:12 PM

#

don't think @here works, gonna have to ping individually

misty igloo Aug 5, 2025, 11:29 PM

#

Okay, well I have submitted the current version as our camera ready. I'll attach the pdf here (but if you're an author you should be able to obtain it from openreview, too!). If people think there are issues please bring them to my attention immediately!

#

📎 RWKV_7_CoLM_cameraready_v1.pdf

#

@fresh mulch @gusty condor @iron parrot @obsidian quest @rose mango @tropic minnow @brisk bronze @paper dove @sonic horizon @steady ether @crystal hull @hushed orchid @bronze frost @keen tartan sorry, not sure what other author handles I'm missing - please make sure everything is cool with the pdf above, or look at the current version via your openreview account. Final camera ready submission ends Thursday.

fresh mulch Aug 5, 2025, 11:47 PM

#

what's with the gap here? i assume this is the superscript b but it's way more noticeable in the third section than the fourth

#

seeing as we have some space left i wonder whether we can bring figure 7 back up next to figure 4: they were once next to each other and we moved it to the appendix on space concerns

#

minor nits though generally lgtm

gusty condor Aug 6, 2025, 1:52 AM

#

#

I would like to state the theorems in the main pages, for the extra space left.

fresh mulch Aug 6, 2025, 1:57 AM

#

This is probably a better use of the space actually ^

gusty condor Aug 6, 2025, 2:10 AM

#

fresh mulch This is probably a better use of the space actually ^

Two lines, actually

misty igloo Aug 6, 2025, 2:41 AM

#

gusty condor

good catch!

tropic minnow Aug 6, 2025, 7:53 AM

#

do we want to include references to G0 here?

misty igloo Aug 6, 2025, 2:54 PM

#

tropic minnow do we want to include references to `G0` here?

not sure.. I guess the problem is we can't really include all that stuff in the paper
and yet, it probably wont go in any other paper until RWKV-8

obsidian quest Aug 6, 2025, 3:23 PM

#

there could be a paper on using state tuning for RLHF and RLVR

gusty condor Aug 7, 2025, 6:51 AM

#

obsidian quest there could be a paper on using state tuning for RLHF and RLVR

Then there should be RWKV fast batch inference and HF-compatible API 🙂 (No FLA)

mint merlin Aug 8, 2025, 6:27 AM

#

are there any thoughts on using RWKV for vision?, there is a recent paper that got iclr spotlight for vision-RWKV but i assume it was an independent endeavor from the authors and not from here, any thoughts of improving or thinking a novel direction in vision domain?

obsidian quest Aug 8, 2025, 7:28 AM

#

mint merlin are there any thoughts on using RWKV for vision?, there is a recent paper that g...

please check https://rwkv.com for 100+ papers 🙂

RWKV Language Model

The RWKV Language Model

misty igloo Sep 11, 2025, 2:00 AM

#

Hmm RADLADS and RWKV-7 are both in poster session 4. @tropic minnow and I could probably split those up since we're both on both papers, but who else is going to COLM who was an author on RWKV-7?

fresh mulch Sep 11, 2025, 3:11 AM

#

I am, but am not sure how much I would be able to contribute

tropic minnow Sep 11, 2025, 6:34 AM

#

misty igloo Hmm RADLADS and RWKV-7 are both in poster session 4. <@469771066399784971> and I...

Maybe we could negotiate with the org to have both of them placed next to each other? lol

misty igloo Sep 11, 2025, 3:46 PM

#

tropic minnow Maybe we could negotiate with the org to have both of them placed next to each o...

Sounds good - could you reach out to them to ask?

#

I don't want either of us to have to miss out on either poster!

fresh mulch Sep 11, 2025, 5:39 PM

#

Could even ask to switch poster sessions at that rate

gusty condor Sep 12, 2025, 7:09 AM

#

Good!
Sadly, I cannot go to COLM. It does not make sense to me traveling 10,000 km and skip two classes for an NLP conference, given that I am a math student. My advisor doesn't agree with that either.

misty igloo Sep 12, 2025, 4:40 PM

#

gusty condor Good! Sadly, I cannot go to COLM. It does not make sense to me traveling 10,000 ...

Yeah it's a lot of travel. I'm sad that you won't be there though - it would have been fun to meet up in person!

obsidian quest Sep 15, 2025, 2:09 PM

#

https://x.com/BlinkDL_AI/status/1967573927468917012

BlinkDL (@BlinkDL_AI)

As RNNs start to gain momentum, I will share a framework to improve RNNs (RWKV-8 and beyond), or, how to write 100 architecture papers 🙂 Hope it could be useful for researchers interested in the field: https://t.co/UdmOSudvu0

willow condor Sep 17, 2025, 10:27 AM

#

obsidian quest https://x.com/BlinkDL_AI/status/1967573927468917012

Thanks for sharing such a comprehensive framework with the community!

However, I don't understand why larger state is an improvement as it's just moving along the Parento frontier.

obsidian quest Sep 17, 2025, 10:35 AM

#

willow condor Thanks for sharing such a comprehensive framework with the community! However, ...

yes i wont do this, but most researchers seem to like it, such as headsz=256

misty igloo Sep 17, 2025, 1:29 PM

#

Larger state is cool if it's dynamically allocable

obsidian quest Sep 17, 2025, 6:52 PM

#

misty igloo Larger state is cool if it's dynamically allocable

and can combine 1 + 2

fresh mulch Oct 1, 2025, 1:29 PM

#

have we got a poster ready?

misty igloo Oct 1, 2025, 1:52 PM

#

fresh mulch have we got a poster ready?

Not finished yet, I'll send you a link if you'd like to help work on it!

misty igloo Oct 2, 2025, 10:21 PM

#

@obsidian quest I'm not sure if we can fit it, but if we show any newer RWKV-7 results or DE/DEA preview on the COLM poster, what would you like those to be?

obsidian quest Oct 3, 2025, 3:19 AM

#

misty igloo <@870137517020688415> I'm not sure if we can fit it, but if we show any newer RW...

lets show G1a results first?

misty igloo Oct 8, 2025, 12:52 PM

#

RWKV-7 Poster presentation at COLM went great last night. A bunch of people were excited and told me it was the best thing they'd seen at the conference!

#

The RADLADS poster session is tonight.

young sparrow Oct 8, 2025, 3:11 PM

#

misty igloo RWKV-7 Poster presentation at COLM went great last night. A bunch of people were...

That's awesome feedback! I'm quite curious what the interest level in the complexity theory stuff was

misty igloo Oct 8, 2025, 3:11 PM

#

Mixed... People I met initially didn't believe it and then got convinced

#

I put a short proof sketch on the poster

young sparrow Oct 8, 2025, 3:12 PM

#

That's curious because I was thinking mostly "did people care"

#

Did you get more theory-oriented people showing up?

misty igloo Oct 8, 2025, 3:12 PM

#

Some people said well obviously transformers are more expressive

#

So I got to retort with "we prove the opposite"

misty igloo Oct 8, 2025, 3:12 PM

#

young sparrow Did you get more theory-oriented people showing up?

Not really no

#

There was one group who had some stuff on the topic in a new paper but I didn't catch which it is

young sparrow Oct 8, 2025, 3:16 PM

#

misty igloo Some people said well obviously transformers are more expressive

Ah that's more what I expected

obsidian quest Oct 8, 2025, 3:45 PM

#

misty igloo RWKV-7 Poster presentation at COLM went great last night. A bunch of people were...

and RWKV-8 is the genuine transformer killer 🙂 https://x.com/BlinkDL_AI/status/1975922536492716103

BlinkDL (@BlinkDL_AI)

The new mechanism in RWKV-8 "Heron" 🪶 is named ROSA (acronym, note SA ≠ Self-Attention here) 🌹 ROSA is compromise-free: we get efficient, scalable, genuine infinite ctx, by applying some beautiful algorithms.

#

please mention this too https://x.com/BlinkDL_AI/status/1975946959715471656 RNN magic 🙂

BlinkDL (@BlinkDL_AI)

RWKV7 7.2B fp16 decoding on 5090 can reach 10000+ token/s now 🚀 (bsz960, and 9000+ token/s for bsz320). Always const speed & VRAM because it's RNN. Try https://t.co/E8cfZH64nO in https://t.co/oMxIrwVVEN

young sparrow Oct 8, 2025, 3:55 PM

#

obsidian quest and RWKV-8 is the genuine transformer killer 🙂 https://x.com/BlinkDL_AI/status/...

Let's do a real scaling laws suite for this architecture

willow condor Oct 8, 2025, 11:15 PM

#

obsidian quest and RWKV-8 is the genuine transformer killer 🙂 https://x.com/BlinkDL_AI/status/...

Retrieval Oriented/Optimized State/Slot/Sparse Attention

#

given the timing likely related to DeepSeek V3.2 Attn

obsidian quest Oct 9, 2025, 2:20 PM

#

and https://x.com/BlinkDL_AI/status/1976290168035512765

BlinkDL (@BlinkDL_AI)

RWKV-7 G1a 2.9B more evals: https://t.co/X2R2f6EeRB MMLU Pro 42% (+CoT), GSM8K 77%, MATH 50%. Note this is a base model, no mid-training, no post-training. I just add everything to pretraining dataset.

nova frost Oct 9, 2025, 2:57 PM

#

obsidian quest and https://x.com/BlinkDL_AI/status/1976290168035512765

what do you mean by adding everything to the pre-training dataset 😅

obsidian quest Oct 9, 2025, 3:00 PM

#

nova frost what do you mean by adding everything to the pre-training dataset 😅

reasoning/instruction/chat data, not test set 🙂

nova frost Oct 9, 2025, 3:09 PM

#

obsidian quest reasoning/instruction/chat data, not test set 🙂

ah makes sense

obsidian quest Oct 10, 2025, 8:03 AM

#

obsidian quest and https://x.com/BlinkDL_AI/status/1976290168035512765

obsidian quest Oct 10, 2025, 10:26 AM

#

willow condor Retrieval Oriented/Optimized State/Slot/Sparse Attention

all 4 words wrong 😂

weak urchin Oct 10, 2025, 6:07 PM

#

anyone whom went to COLM gonna write a blog post afterwards from this group ?

willow condor Oct 11, 2025, 1:43 AM

#

obsidian quest all 4 words wrong 😂

now if i had to guess I’d say its just python simulating induction head.

#

but i dont know the acronym

obsidian quest Oct 11, 2025, 7:30 AM

#

https://x.com/BlinkDL_AI/status/1976912771985146184

BlinkDL (@BlinkDL_AI)

RWKV-8 ROSA 🌹 mechanism: neurosymbolic infinite-range lossless information propagator beyond attention, enabling LLMs to invent their own inner monologue languages. First step towards scalable post-neural methods, for a new era in AI 🌌

obsidian quest Oct 11, 2025, 7:30 AM

#

willow condor now if i had to guess I’d say its just python simulating induction head.

you are correct 🙂 the key is to make it work for more scenarios

tropic minnow Oct 12, 2025, 11:12 AM

#

obsidian quest https://x.com/BlinkDL_AI/status/1976912771985146184

Hi guys we were just thinking, in the general case , even a soft version could be built, which could be the partition function where prefix length becomes the energy level, so it is a weighted average instead of a single option; which would recover the current discrete formulation under temperature=0

#

This shows there is a nice connection between attention and this variant; where attention would do inner product as the similarity function (and one would hope this becomes contextual over the layers) whereas we would do directly on token identities and subsequences

#

Does anyone want to write a short paper on this connection and maybe try a few experiments (language modelling or synthetic tasks) ?

obsidian quest Oct 12, 2025, 1:20 PM

#

another Q is the expressivity of RWKV7+ROSA => will it be practically (not limited by float precision) turing complete, if we allow some CoT 🙂

misty igloo Oct 12, 2025, 6:50 PM

#

I had also drawn the comparison to a discretized version of linear attention here, maybe also helps think about how these discrete methods relate to continuous ones

#

@tropic minnow

rose mango Oct 12, 2025, 6:51 PM

#

tropic minnow Does anyone want to write a short paper on this connection and maybe try a few e...

I'm definitely up to do some experiments

obsidian quest Oct 12, 2025, 9:44 PM

#

https://discord.com/channels/992359628979568762/1426889957221466153

obsidian quest Oct 13, 2025, 5:54 PM

#

https://x.com/BlinkDL_AI/status/1977795092321812880

BlinkDL (@BlinkDL_AI)

RWKV-8 ROSA: How to Train It? (Oct 13, 2025)

obsidian quest Oct 15, 2025, 8:10 AM

#

https://x.com/BlinkDL_AI/status/1978372669847347432

BlinkDL (@BlinkDL_AI)

RWKV8 ROSA training demo - the first serious neurosymbolic LM? for a new era in AI 🌌 Code: https://t.co/j0eFQDISvu

strange gazelle Oct 16, 2025, 7:48 PM

#

misty igloo I had also drawn the comparison to a discretized version of linear attention her...

Would the associative operator just be using the values from later tokens, and defaulting to earlier ones if empty?

misty igloo Oct 16, 2025, 7:50 PM

#

strange gazelle Would the associative operator just be using the values from later tokens, and d...

not sure what you mean about associative operator or earlier or later tokens... this just writes to and retrieves from individual slots in an array

#

pseudocode for recurrent version was shown, prefill/training would be implemented slightly differently

strange gazelle Oct 16, 2025, 7:57 PM

#

misty igloo not sure what you mean about associative operator or earlier or later tokens... ...

In the simplest case of linear attention, the associative operator would be adding the earlier and later matrices together. It lets you parallelize along the sequence dimension for training and prefill

#

This kind of thing is what I’m referring to: https://docs.jax.dev/en/latest/_autosummary/jax.lax.associative_scan.html

misty igloo Oct 16, 2025, 8:26 PM

#

strange gazelle In the simplest case of linear attention, the associative operator would be addi...

sure, this is associative as well.. instead of addition you could consider the operator being set(a,b)=b if b>0 else a for each int in a vector of ints if zero is a special sentinel value meaning that argmax did not choose that slot
I think that's what you meant in your original message too
(but the idea was not to construct a real parallelizable machine, but rather to create a theoretical stepping stone to ROSA based on linear attention)

obsidian quest Oct 17, 2025, 5:27 PM

#

https://x.com/BlinkDL_AI/status/1979237513043791932

BlinkDL (@BlinkDL_AI)

RWKV8 ROSA 🌹 simply scales, producing mysterious new languages. Training small LMs soon 🙂 Code: https://t.co/j0eFQDJql2