#Code Review to improve the speed of this snippet

44 messages · Page 1 of 1 (latest)

sturdy fractal
#

This code is heavily used in my crate (more than 55% of the time is used in this two functions, according to cargo flamegraph)

The function is basically :
For every row & columns :

  • Check if we have [true, false, true, true, true, false, true]
  • Check if we have 5 or more consecutive values (true x6 for example, or false x5)

It is really basic, and I've work a lot on it but I'm out of ideas, if you have some I'd gladly get help !

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c04d3562c77125f0d8718b61c4834ff5

I'm not using Booleans and skipping some unwanted data (not everything is to be taken into account), but having new ideas on how to improve it further more would be very nice !
https://github.com/erwanvivien/fast_qr/blob/master/src/score.rs#L99-L191
You could see how I'm using it on Github

buoyant ingot
#

How about instead of count, you check if buffer & 0b111111 == 0b111110 and buffer & 0b111111 == 0b000001.

I don't think this will help with speed but

for i in 0..N {
    let line = &mat[i];
```should be
```rust
for (i, line) in mat.iter().enumerate()
```You did this for the inner loop anyway so not sure what happened here.
sturdy fractal
#

True, I tried something I guess, I forgot to revert

buoyant ingot
#

You could also store another 6-bit buffer for the purpose of tracking consecutives.

#

Then you don't have to & each time

sturdy fractal
buoyant ingot
#

Do you think it would be faster to go back and count how many there are when you find a consecutive sequence?

sturdy fractal
#

I'm providing the flamegraph :

sturdy fractal
sturdy fractal
#

But for score_line, I don't see much information, I'm kinda going in blind for now :/

#

I tried GodBolt, but the function are quite long and produces too much ASM

#

And I'm not that good reading ASM

buoyant ingot
#

Maybe break it apart into two loops, one for buffer and one for sequences, and then see how long each one takes? This will be slower but should help in identifying what's taking the most time.

#

Also put these in different functions, of course

sturdy fractal
#

Good idea

#

I've thought about doing MThreading also

#

It is kinda the perfect use case

#

But I want to target wasm, I heard it was not that good using MThreading

sturdy fractal
#

Updated Flamegraph using two functions

#

Goes from 55% to 60.8% (obviously)

dusk scroll
#

Here's an improved version: https://play.rust-lang.org/?edition=2021&gist=6a864e724da913658b981c5a4fa94b26 [better version below]

Includes a nice Display implementation for the matrix, which you can see in the output when you run it.

Stores rows and columns as unsigned integers for speed, but provides an API that takes and gives bools. The 5-or-more-consecutive part is done with .trailing_zeros() to very quickly get a count of trailing zero bits. The pattern is gotten with a (line & 0b1111111) == 0b1011101 and a shift right by 5 if it succeeds or 1 if not, then it tries again.

The Matrix::from_array function can probably be sped up somewhat.

buoyant ingot
#

FYI if you use two block characters per point it's almost exactly square

dusk scroll
dusk scroll
dusk scroll
#

Ahh, just saw ModuleType. Let's see how to adjust.

sturdy fractal
#

Hello Chain ! I just woke up 30 mins ago (GMT+2) I'm looking at the links bottom up right now :)

#

Thanks for your inputs !

#

I've thought about having a special uXXX to store each row / columns
But first I had a problem because max size is 177, which is more than u128, but anyway, I should have made something custom like (u128, u64) which goes to 192 and is greated than 177

#

I encoded ModuleType in my arrays because bool takes 8 bits, and using bool, I was literally wasting 7 of them so I decided to encoded inside of u8 the module type

#

However, this is not 100% necessary, because for a given matrix size, we can easily computed the module type

#

I will try something from all this input, thanks a lot !

#

Doing like you said, precomputing from the start the transpose matrix (if I have a small type to store rows & columns)

#

Let's get working

#

This I like very much !

#

My first time using trailing zeroes, I had a boolean to use trailing_zeroes then trailing_ones

#

Which was inefficient and ugly

sturdy fractal
#

Update: I've precomputed the matrix transpose instead of doing it each time, which granted an overall 20% boost 🚀

#

I'm trying to integrate the trailing zero usage and checking module type while doing so

sturdy fractal
#

Update2:
I could not make the trailing bit count to work because of some cases:
Image1: an empty QRCode (we should not take into account those zones)
Image2: QRCode zone we should not take into account to compute a score

Image3: A QRCode with a dark bar having 0 score (since it does through the timing line)

#

But anyway, I'm happy to where I got

#

Thanks everyone !

#

Now I can work on having a very good API