#general

1 messages · Page 1 of 1 (latest)

pliant oar
small gazelle
tawny crescent
pliant oar
#

This is cool! Part of me wants to write up the long list of things that kind-of worked but didn’t move the needle on the contest 🔥

#

I had this odd architecture that was inspired by fast weights PKM, but with more of a hash/ngram prior

#

I’m curious if anyone else has some favorite failures 😅

#

I’ll do a write up a bit later on and post it to git. A lot of the stuff I worked on improved BPC early on, but didn’t necessarily gel with the more optimized baselines. I really liked watching all the leaderboard approaches stream in 🔥

golden basalt
whole slate
inland sundial
inland sundial
pliant oar
civic kernel
civic kernel
hallow shell
fervent cobalt
hallow shell
blissful quest
brittle robin
rancid pasture
#

Yo

blissful quest
warm stump
#

heya

stable kindle
#

yo

pliant oar
#

Tiny readme on one of the failing approaches that I liked working on: fast weight hash memory. I took Instant NGP + fast weight PKM as priors, built something out with Claude. I hoped fast weights with a multiscale ngram prior would augment context + provide better local language modeling. Early results showed modest BPB improvement, but nothing on par with stronger ngram approaches or later optimized contest submissions. I think the n-gram approach might have limited the benefits of adaptive/online memory — could be cool to re-explore the design sometime.

https://github.com/cloud-777-boy/fwhash.git

GitHub

experimental hash-addressed memory for language models - cloud-777-boy/fwhash

#

curious what other approaches got tried and left out of submission, it was a lot of fun doing this

novel topaz
inland sundial
#

https://youtube.com/shorts/LEEucshktFg?si=Wm_M3FaULDDB42jU

this reminds me of what some of these approaches like PP and deed and gram we’re trying to do use a cheaper model where the next token is more likely easier to predict

Speculative decoding makes LLM inference 2-3× faster with identical output — the trick has been public since 2022, and Google just baked it into Gemma 4 as architecture (MTP drafter). Here is how two language models running side by side beat one running alone.


🚀 Want to learn agentic coding with live daily events and workshops?
Check...

▶ Play video
inland sundial
#

Maybe N gram is useful after all

Can llama.cpp do speculative decoding?

Yes. llama.cpp supports speculative decoding, including a small draft model that predicts ahead of the main model, plus no-extra-model n-gram speculative methods. Its docs explicitly describe llama-server speculative decoding and note that n-gram pattern matching can be useful for code rewriting because repeated patterns often appear in generated text.

For your use case, llama.cpp is good for:

  • local GGUF models
  • quantized models
  • simple OpenAI-compatible serving
  • speculative decoding with a smaller draft model
  • coding-agent clients like Cline, Roo, Continue, VS Code chat, etc.

It is weaker for:

  • high-throughput multi-user serving
  • advanced tool-calling fidelity
  • long-context scheduling
  • MTP/EAGLE research workflows
  • diffusion-model integration
inland sundial
pliant oar
#

honestly cool stuff

#

given the trend toward hybrid models, I imagine optimized, convolution heavy or state space style models producing even more savings for the fast path