#general | Parameter Golf Post-Contest | Page 1

pliant oar May 3, 2026, 6:07 PM

#

small gazelle May 3, 2026, 6:07 PM

#

tawny crescent May 3, 2026, 6:08 PM

#

pliant oar May 3, 2026, 6:09 PM

#

This is cool! Part of me wants to write up the long list of things that kind-of worked but didn’t move the needle on the contest 🔥

#

I had this odd architecture that was inspired by fast weights PKM, but with more of a hash/ngram prior

#

I’m curious if anyone else has some favorite failures 😅

#

I’ll do a write up a bit later on and post it to git. A lot of the stuff I worked on improved BPC early on, but didn’t necessarily gel with the more optimized baselines. I really liked watching all the leaderboard approaches stream in 🔥

golden basalt May 3, 2026, 6:40 PM

#

whole slate May 3, 2026, 6:53 PM

#

inland sundial May 3, 2026, 8:10 PM

#

inland sundial May 3, 2026, 8:10 PM

#

inland sundial

pliant oar May 3, 2026, 8:19 PM

#

tawny crescent

civic kernel May 3, 2026, 8:46 PM

#

civic kernel May 3, 2026, 8:47 PM

#

civic kernel

hallow shell May 3, 2026, 9:06 PM

#

fervent cobalt May 3, 2026, 9:57 PM

#

hallow shell May 3, 2026, 10:22 PM

#

fervent cobalt

blissful quest May 4, 2026, 12:01 AM

#

brittle robin May 4, 2026, 2:32 AM

#

rancid pasture May 4, 2026, 5:53 AM

#

Yo

blissful quest May 4, 2026, 5:53 AM

#

rancid pasture

warm stump May 4, 2026, 11:12 AM

#

heya

stable kindle May 4, 2026, 11:34 PM

#

yo

pliant oar May 6, 2026, 2:01 AM

#

Tiny readme on one of the failing approaches that I liked working on: fast weight hash memory. I took Instant NGP + fast weight PKM as priors, built something out with Claude. I hoped fast weights with a multiscale ngram prior would augment context + provide better local language modeling. Early results showed modest BPB improvement, but nothing on par with stronger ngram approaches or later optimized contest submissions. I think the n-gram approach might have limited the benefits of adaptive/online memory — could be cool to re-explore the design sometime.

https://github.com/cloud-777-boy/fwhash.git

GitHub

GitHub - cloud-777-boy/fwhash: experimental hash-addressed memory f...

experimental hash-addressed memory for language models - cloud-777-boy/fwhash

#

curious what other approaches got tried and left out of submission, it was a lot of fun doing this

novel topaz May 11, 2026, 4:34 AM

#

inland sundial May 11, 2026, 11:30 AM

#

https://youtube.com/shorts/LEEucshktFg?si=Wm_M3FaULDDB42jU

this reminds me of what some of these approaches like PP and deed and gram we’re trying to do use a cheaper model where the next token is more likely easier to predict

YouTube

DIY Smart Code

Hidden inside Gemma 4 — the inference trick from 2022 #AI #GoogleAI

Speculative decoding makes LLM inference 2-3× faster with identical output — the trick has been public since 2022, and Google just baked it into Gemma 4 as architecture (MTP drafter). Here is how two language models running side by side beat one running alone.

🚀 Want to learn agentic coding with live daily events and workshops?
Check...

▶ Play video

inland sundial May 12, 2026, 12:33 PM

#

Maybe N gram is useful after all

Can llama.cpp do speculative decoding?

Yes. llama.cpp supports speculative decoding, including a small draft model that predicts ahead of the main model, plus no-extra-model n-gram speculative methods. Its docs explicitly describe llama-server speculative decoding and note that n-gram pattern matching can be useful for code rewriting because repeated patterns often appear in generated text.

For your use case, llama.cpp is good for:

local GGUF models
quantized models
simple OpenAI-compatible serving
speculative decoding with a smaller draft model
coding-agent clients like Cline, Roo, Continue, VS Code chat, etc.

It is weaker for:

high-throughput multi-user serving
advanced tool-calling fidelity
long-context scheduling
MTP/EAGLE research workflows
diffusion-model integration

inland sundial May 12, 2026, 10:30 PM

#

Now byte level models are in the News , using speculative decoding

https://www.marktechpost.com/2026/05/11/meta-and-stanford-researchers-propose-fast-byte-latent-transformer-that-reduces-inference-memory-bandwidth-by-over-50-without-tokenization/

MarkTechPost

Meta and Stanford Researchers Propose Fast Byte Latent Transformer ...

Meta and Stanford introduce BLT Diffusion, BLT-S, and BLT-DV to cut byte-level inference memory bandwidth by over 50%.

pliant oar May 19, 2026, 9:00 PM

#

honestly cool stuff

#

given the trend toward hybrid models, I imagine optimized, convolution heavy or state space style models producing even more savings for the fast path