#Right shift ops taking up a large portion of function time

1 messages · Page 1 of 1 (latest)

half valve
#

I'm using flamegraph to try and optimize my playstation emulator. Assuming I am reading this right, it looks like the right shift operators are taking up a pretty large chunk of my instruction decode function time. This is surprising to me since shifts should be pretty fast. Is there a faster way to grab a subset of bits other than shift and mask?

This is compiled with --release and

[profile.release]
debug = true

in my Cargo.toml

#

For example, Rt is implemented like so

impl InstructionArgs for u32 {
    #[inline(always)]
    fn rt(&self) -> u8 {
        ((self >> 16) & 0x1F) as u8
    }

  ...
honest needleBOT
#

Addresses memory by bits, for packed collections and bitfields

Version

1.0.1

Downloads

18 120 251

crisp delta
#

^ you might consider using this

hidden badge
#

Under high optimization levels, sometimes instructions can get filed under unexpected instructions.

#

Can you look at the assembly of the hot part in perf?

half valve
#

Although I might be missing something. This is the first time I have used perf directly

hidden badge
#

So yeah, it's not the shift taking up time, it's the move right before it.

#

Perhaps the input %edi stalls waiting for an earlier operation to complete?

#

(Alas, I'm no expert in processors. My main knowledge is that doing more is generally slower than doing less, so you should find places where you can do less.)