#RWKV-papers

10179 messages Β· Page 11 of 11 (latest)

willow condor
#

Is there a new RWKV paper being written? If so, is it possible for me to contribute?

rose mango
#

RWKV 7 paper is already complete

willow condor
obsidian quest
#

yeah rwkv8 paper = early next year

fresh mulch
#

1 year release cycle sounds chill

gusty condor
#

11-month release cycle:
RWKV-4: 2305.xxxx
RWKV-5/6: 2404.xxxx
RWKV-7: 2503.xxxx
RWKV-8: 2602.xxxx?

rose mango
#

Personally I like this almost-a-year-but-not-quite release cycle

iron parrot
#

Something big is coming

acoustic knoll
gusty condor
#

From RWKV for sure!

void quartz
#

oh - because 5/6 was merged together in 1 paper.... definately felt like less then 11 months per version haha

obsidian quest
#

are they using hybrid / rnn πŸ˜‚

willow condor
#

Its possible that this just helps focus attention, because maybe the model expects the instructions to be at either position and this reduces the chance that the model will "miss"

#

Also the "above" context makes sense not just in the case of RNNs, because causal attention can only attend to previous tokens, so having instrucitons at the beginning will be the difference between the model having a superposition of possible instructions vs. instruction information propagating from the start.

#

crazy if true, hope they open source 4.1

gusty condor
#

I believe they are using hybrid

obsidian quest
young sparrow
gusty condor
quaint quiver
wraith heron
#

so just something like sliding window attention?

young sparrow
gusty condor
#

Just some speculation.
We are not writing an article, just chatting, so no need to ask evidence for every claim.

obsidian quest
obsidian quest
#

lets test #off-topic message

young sparrow
#

@obsidian quest @void quartz I'm trying to track down Guanyu Song, do you have his email address / contact details?

void quartz
gusty condor
#

He is Guangyu, not Guanyu

misty igloo
#

(I already DM'd Stella contact info)

gusty condor
misty igloo
#

ughhh not fun timing for me

gusty condor
misty igloo
# gusty condor I will do.

thanks! I will be able to work on it too, I just have to move (physically) and do the rebuttal for RADLADS at the same time πŸ™‚

obsidian quest
gusty condor
#

The reviews will be out in a few hours.
I strongly believe that our paper will get high scores
(If I were the reviewer, I would give a score between "good accept" and "highlight")

obsidian quest
gusty condor
#

Reviews are out!
Score: 6,8,8,8

#

That probably means 100% accept and top 2% of all papers and top 5% of the accepted papers

fresh mulch
#

rebuttals seem pretty easy too, mostly clarifications and a few ablations

gusty condor
tropic minnow
misty igloo
#

Yeah I think we're solid here and mostly need to point out the existing ablation studies in the appendix to reviewer UGzf to address their concern #1
Fantastic job, everybody! I knew the paper was really solid, but it's still nice to see it confirmed via double blind review 🀣

dawn pewter
#

good job!

fresh mulch
misty igloo
#

I feel like there's no way to address #2 sorry I didnt see you meant the other reviewer

misty igloo
#

the only score we really need to improve is UGzf, and even if they keep it we're still fine

fresh mulch
#

oh fair enough

gusty condor
misty igloo
#

I agree

gusty condor
#

I think reviewer GjhW's questions are solid and should be addressed first

gusty condor
#

@bronze frost I think we need some clarification on kernel speed benchmarks. Could you please test kernels on specific settings encountered in training, such as (12,64,4096)?

tropic minnow
#

i updated figure3 for both arxiv and colm paper, as requested by 1 reviewer

#

with or without the black rectangle ?

fresh mulch
#

i prefer without unless you can add "see insert" text or the like

#

"see left" i guess

tropic minnow
#

then without, more similar to the original and addresses the points from reviewer

obsidian quest
#

@gusty condor @steady ether

obsidian quest
# obsidian quest https://github.com/HazyResearch/zoology certainly a wrong implementation of rwkv...

https://github.com/HazyResearch/zoology/issues/34

Firstly, RWKV-7 state_size is exaggerated to 4x and 16x for d_model=128 and d_model=256

GitHub

Here in state_size function: https://github.com/HazyResearch/zoology/blob/main/zoology/mixers/rwkv7.py should be self.num_heads * self.head_dim* self.head_dim So RWKV-7 state_size is exaggerated to...

gusty condor
obsidian quest
gusty condor
#

All done. I think it's time to reply to the reviewers.

misty igloo
#

The RWKV-7 paper got accepted, and so did RADLADS πŸ™‚

#

Great job, everyone!

young sparrow
#

Amazing work!

misty igloo
#

The other funny thing is that it seems that the lowest score review was disregarded due to review quality, so we ended up with maybe the equivalent of an amazing 8,8,8 for scores!

fresh mulch
#

disregarded due to review quality? huh

#

damn

void quartz
#

Was it an ai generated review?

fresh mulch
#

so how is the camera ready coming along? (is it at all?)

gusty condor
#

Not hurrying, deadline is August 7, 2025.

tropic minnow
#

@iron parrot do you have the pg19 code used to run the models and produce the plots in eagle and finch paper

iron parrot
tropic minnow
#

yup thx

iron parrot
obsidian quest
#

we can work on a rwkv7s paper

seamless upgrading rwkv7-g1a 0.1b
+de (orange) vs +de+dea (blue, a bit difficult for a trained model to utilize in the beginning, then works)

obsidian quest
misty igloo
gusty condor
gusty condor
misty igloo
#

We need to make some changes to satisfy things we promised the reviewers we'd do, and reorg a bit for the authors and new 10p limit

misty igloo
#

did a first pass on all that...

obsidian quest
#
total 191.034624 M
activated 140.702976 M
read 768 numbers per token (embed)

rwkv7+DE 0.1b
total 997.852416 M
activated 142.226688 M
read 13056 numbers per token (embed+DE)

rwkv7+DE+DEA 0.1b
total 1806.753024 M
activated 145.833216 M
read 25344 numbers per token (embed+DE+DEA)
state size = 589824 + 768 * ctxlen = 768 * (768 + ctxlen)```
misty igloo
#

@here Anyone know if there is anything preventing us from submitting the camera ready to COLM? If there is, please let me know so we can fix it!

rose mango
#

don't think @here works, gonna have to ping individually

misty igloo
#

Okay, well I have submitted the current version as our camera ready. I'll attach the pdf here (but if you're an author you should be able to obtain it from openreview, too!). If people think there are issues please bring them to my attention immediately!

#

@fresh mulch @gusty condor @iron parrot @obsidian quest @rose mango @tropic minnow @brisk bronze @paper dove @sonic horizon @steady ether @crystal hull @hushed orchid @bronze frost @keen tartan sorry, not sure what other author handles I'm missing - please make sure everything is cool with the pdf above, or look at the current version via your openreview account. Final camera ready submission ends Thursday.

fresh mulch
#

what's with the gap here? i assume this is the superscript b but it's way more noticeable in the third section than the fourth

#

seeing as we have some space left i wonder whether we can bring figure 7 back up next to figure 4: they were once next to each other and we moved it to the appendix on space concerns

#

minor nits though generally lgtm

gusty condor
#

I would like to state the theorems in the main pages, for the extra space left.

fresh mulch
#

This is probably a better use of the space actually ^

gusty condor
misty igloo
tropic minnow
#

do we want to include references to G0 here?

misty igloo
obsidian quest
#

there could be a paper on using state tuning for RLHF and RLVR

gusty condor
mint merlin
#

are there any thoughts on using RWKV for vision?, there is a recent paper that got iclr spotlight for vision-RWKV but i assume it was an independent endeavor from the authors and not from here, any thoughts of improving or thinking a novel direction in vision domain?

obsidian quest
misty igloo
#

Hmm RADLADS and RWKV-7 are both in poster session 4. @tropic minnow and I could probably split those up since we're both on both papers, but who else is going to COLM who was an author on RWKV-7?

fresh mulch
#

I am, but am not sure how much I would be able to contribute

tropic minnow
misty igloo
#

I don't want either of us to have to miss out on either poster!

fresh mulch
#

Could even ask to switch poster sessions at that rate

gusty condor
#

Good!
Sadly, I cannot go to COLM. It does not make sense to me traveling 10,000 km and skip two classes for an NLP conference, given that I am a math student. My advisor doesn't agree with that either.

misty igloo
obsidian quest
willow condor
obsidian quest
misty igloo
#

Larger state is cool if it's dynamically allocable

obsidian quest
fresh mulch
#

have we got a poster ready?

misty igloo
misty igloo
#

@obsidian quest I'm not sure if we can fit it, but if we show any newer RWKV-7 results or DE/DEA preview on the COLM poster, what would you like those to be?

obsidian quest
misty igloo
#

RWKV-7 Poster presentation at COLM went great last night. A bunch of people were excited and told me it was the best thing they'd seen at the conference!

#

The RADLADS poster session is tonight.

young sparrow
misty igloo
#

Mixed... People I met initially didn't believe it and then got convinced

#

I put a short proof sketch on the poster

young sparrow
#

That's curious because I was thinking mostly "did people care"

#

Did you get more theory-oriented people showing up?

misty igloo
#

Some people said well obviously transformers are more expressive

#

So I got to retort with "we prove the opposite"

misty igloo
#

There was one group who had some stuff on the topic in a new paper but I didn't catch which it is

young sparrow
obsidian quest
young sparrow
willow condor
#

given the timing likely related to DeepSeek V3.2 Attn

obsidian quest
nova frost
obsidian quest
nova frost
obsidian quest
weak urchin
#

anyone whom went to COLM gonna write a blog post afterwards from this group ?

willow condor
#

but i dont know the acronym

obsidian quest
obsidian quest
tropic minnow
# obsidian quest https://x.com/BlinkDL_AI/status/1976912771985146184

Hi guys we were just thinking, in the general case , even a soft version could be built, which could be the partition function where prefix length becomes the energy level, so it is a weighted average instead of a single option; which would recover the current discrete formulation under temperature=0

#

This shows there is a nice connection between attention and this variant; where attention would do inner product as the similarity function (and one would hope this becomes contextual over the layers) whereas we would do directly on token identities and subsequences

#

Does anyone want to write a short paper on this connection and maybe try a few experiments (language modelling or synthetic tasks) ?

obsidian quest
#

another Q is the expressivity of RWKV7+ROSA => will it be practically (not limited by float precision) turing complete, if we allow some CoT πŸ™‚

misty igloo
#

I had also drawn the comparison to a discretized version of linear attention here, maybe also helps think about how these discrete methods relate to continuous ones

#

@tropic minnow

rose mango
obsidian quest
obsidian quest
strange gazelle
misty igloo
#

pseudocode for recurrent version was shown, prefill/training would be implemented slightly differently

strange gazelle
misty igloo
# strange gazelle In the simplest case of linear attention, the associative operator would be addi...

sure, this is associative as well.. instead of addition you could consider the operator being set(a,b)=b if b>0 else a for each int in a vector of ints if zero is a special sentinel value meaning that argmax did not choose that slot
I think that's what you meant in your original message too
(but the idea was not to construct a real parallelizable machine, but rather to create a theoretical stepping stone to ROSA based on linear attention)

obsidian quest
obsidian quest
obsidian quest
gusty condor