I have a (decoding) function which I've been trying to optimize the best I can for a while.
I finally managed to get it to what I believe to be the 'best' possible, at ~4.1Gib/s.
I then went to actually clean up my crate, splitting things into modules and such.
However, I noticed that performance had effectively halved after doing the organization, despite no actual code changing.
I went to fixing this issue, which turned out be inlining (which was my suspicion since I moved away from the 'everything in one module' approach), I added some #[inline]s to certain methods, and got back to my original performance.
I thought that that was where this issue would end, but I added encoding to my crate, and despite nothing even remotely interacting with the decoder function, it halved in performance again.
If I comment out the encoding part of my crate, performance is back to full.
At this point I'm a bit lost, since nothing actually interacts with my decoder function, so I don't even have a starting point to work with.
If I comment out the encoding module in my lib.rs (leaving just the decoding module), performance is back to full.
Uncomment it, and performance is halved.
Why on earth could performance be so fragile?
I genuinely don't understand how methods that don't even interact with the decoder function could impact performance so much.
I even tried using a single codegen unit, thinking the change in layout may have resulted in an unfortunate assignment.
I also tried LTO, thinking extra inlining might help.
However, trying those didn't help (in fact, it even worsened things, #1470786138179371124, but that is not what this issue is about)
