I'm using flamegraph to try and optimize my playstation emulator. Assuming I am reading this right, it looks like the right shift operators are taking up a pretty large chunk of my instruction decode function time. This is surprising to me since shifts should be pretty fast. Is there a faster way to grab a subset of bits other than shift and mask?
This is compiled with --release and
[profile.release]
debug = true
in my Cargo.toml