#Optimization tips for my implementation of Brainf*** in Rust

136 messages · Page 1 of 1 (latest)

tranquil nacelle
#

Hi, I am an intermediate in Rust and I am trying to write a fast brainf*** interpreter in Rust.

A problem which I've noticed is that the interpreter becomes slow when running mandelbrot.bf from the example/ of my project (took 2m 07 seconds to finish executing on my machine)

Code -> https://github.com/vs-123/bf-rs

I would appreciate any tips to improve the code

GitHub

Contribute to vs-123/bf-rs development by creating an account on GitHub.

crystal flare
#

are you compiling in --release mode?

tranquil nacelle
crystal flare
#

that's expected

tranquil nacelle
#

Can it go faster than that?

crystal flare
#

it finishes in 15 seconds on my pc

#

the code is fine

tranquil nacelle
#

Oh okay

tranquil nacelle
crystal flare
#

I guess

dreamy condor
#

If you're asking about microoptimization, note that your Command struct is 16 bytes, assuming you're on 64-bit.

#

Consider if it'd make sense to split it up and store the loop offsets separately so your could use memory more efficiently

#

For example, maybe just have a HashSet<usize, usize> to look up the extra loop data for a program counter position, and thus keep Command down to 1 byte.

#

Looping over chars is also costing you in complexity. Replace it with source.as_bytes().iter().copied().enumerate(), and change the match arms to things like b'+' => output.push(Command::Increment), instead, and your parse should go much faster.

#

Instead of let mut memory = [0u8; 30_000];, I'd suggest let mut memory = [0u8; u16::MAX as usize];, and change your pointer to u16. Then you can use memory[pointer as usize] and the compiler will remove the bounds checks.

#

(Your branch predictor is probably making those bounds checks pretty cheap anyway, but you'll save code size which might help)

#

Other than that, you should make Command be Copy. Anything two usizes or fewer is definitely better copied than worrying about references (and that'll be even more true if you make it just a byte, like I mentioned above).

#

Other than that, if you want to go faster in something like this you'll need to make the code more complicated to better take advantage of common patterns in the input

#

For example, >>>>>>>>>> is common, so instead of

            Command::MoveRight => {
                pointer = pointer.saturating_add(1);
            }

you might do

            Command::MoveRight => loop {
                pointer = pointer.saturating_add(1);
                if commands[command_index + 1] != Command::MoveRight { break }
                command_index += 1;
            }

so you only go back to the big match when changing to a different kind of command

molten hedge
#
/*u-macro BF*/macro_rules!b{(f$($c:tt)*)=>
{{let mut x=([0u8;1<<15],0,stdout(),stdin(
));use std::io::*;$(b!{x$c};)*}};($f:tt->)
=>{b!($f-);b!($f>)};($f:tt<-)=>{b!($f<);b!
($f-)};($f:tt<<)=>{$f.1-=2};($f:tt>>)=>{$f
.1+=2};($f:tt..)=>{b!($f.);b!($f.)};($f:tt
>)=>{$f.1+=1};($f:tt<)=>{$f.1-=1};($f:tt+)
=>{$f.0[$f.1]+=1};($f:tt-)=>{$f.0[$f.1]-=1
};($f:tt.)=>{$f.2.write(&[$f.0[$f.1]]).and
($f.2.flush()).ok()};($f:tt[$($c:tt)*])=>{
while$f.0[$f.1]>0{$(b!{$f$c};)*}};($f:tt,)
=>{$f.3.read(&mut$f.0[$f.1..=$f.1]).ok()};
($f:tt$x:tt)=>{};}

pub fn main() {b!(f
    ++++++++[>++++[>++>+++>+++>+<<<<-]>+>->+>>+[<]<-]>>.>
    >---.+++++++..+++.>.<<-.>.+++.------.--------.>+.>++.
)}

Could someone run it and see ferrisPlead . I don't have a computer to test on right now

unique vale
fossil steppe
#

JIT compiler when hyperdaax

#

I'd be quite interested to see a cranelift implementation ^^

unique vale
# fossil steppe JIT compiler when <:hyperdaax:492801350736805897>

I do have a JIT-like thing for a simple stack-based RPN-like math language with BF-inspired loops; it probably wouldn't be too much work to make actual BF out of it. Well, the . and , would be a bit hard to JIT, but could just call out to prewritten functions for that.
https://github.com/zachs18/Simple-RPN-Math-Compiler-Rust

GitHub

Converts an RPN-like math language to amd64/i686 (SysV calling convention) or armv7 machine code in Rust - GitHub - zachs18/Simple-RPN-Math-Compiler-Rust: Converts an RPN-like math language to amd6...

dreamy condor
#

Or go easy mode and translate to WASM, and use one of those runtimes to JIT it

fossil steppe
#

That's too easy lol

tranquil nacelle
dreamy condor
tranquil nacelle
#

It takes now 47 seconds to execute the mandelbrot

dreamy condor
#

(You're running in --release, right?)

tranquil nacelle
#

Yes with release it is 47 seconds

dreamy condor
#

(Oh, you already said yes above)

tranquil nacelle
#

Let me try it real quick

molten hedge
#

From previous testing I know that llvm has a hard time optimizing the Rust code the macro makes

tranquil nacelle
#

It fails to run the mandelbrot brainfuck program

tranquil nacelle
#

Or saturating addition/subtraction

molten hedge
tranquil nacelle
molten hedge
#

It may need a bigger memory strip

tranquil nacelle
#

I'll try it again

molten hedge
#

Oh also you may have to offset the starting index into the buffer

#

Some bf programs use some space before the starting index

tranquil nacelle
#

Ngl tho that's a really cool macro lol

#

It reminds me the donut C code

#

Oh now it works

#

I had overridden the release profile

molten hedge
#

Its the 3rd version, the idea was to make a bf interpreter that fits in a discord message and works with the ?play command

tranquil nacelle
#

Cool

molten hedge
tranquil nacelle
#

Do you know a way to improve the compilation speed?

molten hedge
#

Not really

#

This macro eats rustc and llvm alive with a big enough bf program

tranquil nacelle
molten hedge
#

Macros are very fun

tranquil nacelle
#

I updated the source code, can anyone review it please?

worthy venture
#

but the slowness is in interpret() and not in parse()

#

try to write/read to a buffer, once you are done. you display it in one go (instead of displaying/printing line by line)

tranquil nacelle
#

Guys I optimized it alot

#

It got a huge performance boost

#

It takes 15 seconds for me to do the mandelbrot

tranquil nacelle
#

I added support for consecutive commands

#

which improved the performance a lot

tranquil nacelle
#

Also I pushed the updated source

tranquil nacelle
dreamy condor
#

Ah, I see, doing the consecutive folding in the parsing instead of the interpreting. Makes sense.

tranquil nacelle
dreamy condor
#

Hmm, you know what that means? You could actually unify the two states.

 #[derive(Debug, Clone, Copy)]
 pub enum Command {
-    Increment(usize),
-    Decrement(usize),
+    Increment(u8),
     MoveLeft(usize),
     MoveRight(usize),
#

Because Decrement(1) and Increment(u8::MAX) do exactly the same thing, so there's no need for both.

dreamy condor
#

Because you're using wrapping_add anyway, so having the same type as you're using for a memory cell will make your life easier.

#

i8::wrapping_add and u8::wrapping_add are exactly the same operation. LLVM doesn't even have different instructions for them.

#

I'm personally opposed to signed values unless absolutely forced, but do whatever.

tranquil nacelle
#

Hmmm ok

#

I'll implement it rn

tranquil nacelle
#

I updated the source code

#

Can anyone review it please?

#

The time dropped to 12 seconds for the mandelbrot

tranquil nacelle
#

I used unsafe code to optimize it even more

#

Guys now it takes less than 10 seconds for me!

vivid carbon
tranquil nacelle
#

I merged the MoveLeft and MoveRight to a single variant

molten hedge
tranquil nacelle
vivid carbon
#

you could try output buffering like recommended before, but trying to preallocate part of the buffer, since you know how many writes at least you have to do

dusky mural
vivid carbon
#

you could also try compressing consecutive adds and subtracts into one

dreamy condor
dusky mural
#

@tranquil nacelle would recommend checking out nom as a parser for this. I've played around it for a bf parser and the code ends up being v elegant where you can parse your input into basically a Vec<Command> and then run it after.
This could make it easier to optimise multiple operations of + - > amd < as mentioned above ^

GitHub

Rust parser combinator framework. Contribute to rust-bakery/nom development by creating an account on GitHub.

dreamy condor
#

(Assuming on x64, so that usize is 8 bytes.)

vivid carbon
dusky mural
#

I might also recommend making the interpreter a struct, this could allow you for stepping through instructions and debugging the memory during execution

dusky mural
vivid carbon
#

there are also some bf patterns which can be optimized

#

for example [-] which sets a cell to zero

#

if you can detect them you will be able to optimize a lot

dreamy condor
dreamy condor
#

Well collapsing consecutive ones had already turned that into Increment(3)+Decrement(4), so I figured that when it turned to Increment(4)+Increment(252) that had already happened to have it be Increment(255).

#

Though actually, the only reason for code to have that would be aesthetics, right?

#

That mandlebrot demo has no +- or -+ sequences at all.

dusky mural
vivid carbon
dusky mural
vivid carbon
#

ig we are at that point where we need to detect such complex structures, which may be nested, that we need a parser like nom

dreamy condor
#

This is just a classic "have an IR and optimize it" compiler problem.

#

Heck, could just turn the whole thing into a classic CFG if you wanted.

vivid carbon
dusky mural
#

during runtime convert bf code to rust then compile and run it for optimal performance

vivid carbon
#

this:

temp0[-]
temp1[-]
x[temp0+temp1+x-]temp0[x+temp0-]
temp1[
 code
temp1[-]]

could be just optimized into a simple if like that

Command::If(usize) // if cell is 0 skip usize commands
vivid carbon
vivid carbon
tranquil nacelle
tranquil nacelle
tranquil nacelle
#

I updated the source code