Code Review to improve the speed of this snippet | Rust Programming Language Community | Page 1

sturdy fractal Jul 26, 2022, 8:57 PM

#

This code is heavily used in my crate (more than 55% of the time is used in this two functions, according to cargo flamegraph)

The function is basically :
For every row & columns :

Check if we have [true, false, true, true, true, false, true]
Check if we have 5 or more consecutive values (true x6 for example, or false x5)

It is really basic, and I've work a lot on it but I'm out of ideas, if you have some I'd gladly get help !

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c04d3562c77125f0d8718b61c4834ff5

I'm not using Booleans and skipping some unwanted data (not everything is to be taken into account), but having new ideas on how to improve it further more would be very nice !
https://github.com/erwanvivien/fast_qr/blob/master/src/score.rs#L99-L191
You could see how I'm using it on Github

buoyant ingot Jul 26, 2022, 9:13 PM

#

How about instead of count, you check if buffer & 0b111111 == 0b111110 and buffer & 0b111111 == 0b000001.

I don't think this will help with speed but

for i in 0..N {
    let line = &mat[i];
```should be
```rust
for (i, line) in mat.iter().enumerate()
```You did this for the inner loop anyway so not sure what happened here.

sturdy fractal Jul 26, 2022, 9:14 PM

#

True, I tried something I guess, I forgot to revert

buoyant ingot Jul 26, 2022, 9:14 PM

#

You could also store another 6-bit buffer for the purpose of tracking consecutives.

#

Then you don't have to & each time

sturdy fractal Jul 26, 2022, 9:15 PM

#

buoyant ingot How about instead of `count`, you check if `buffer & 0b111111 == 0b111110` and `...

The count is in case we have let's say 10 consecutive values, I need to count the 10

buoyant ingot Jul 26, 2022, 9:17 PM

#

Do you think it would be faster to go back and count how many there are when you find a consecutive sequence?

sturdy fractal Jul 26, 2022, 9:18 PM

#

I'm providing the flamegraph :

sturdy fractal Jul 26, 2022, 9:19 PM

#

buoyant ingot Do you think it would be faster to go back and count how many there are when you...

To be honest, I don't know, but it would probably remove the initial count, but add a new one, and some more indexing

sturdy fractal Jul 26, 2022, 9:20 PM

#

sturdy fractal I'm providing the flamegraph :

There's a lot of time in IndexMut and iterators etc

#

But for score_line, I don't see much information, I'm kinda going in blind for now :/

#

I tried GodBolt, but the function are quite long and produces too much ASM

#

And I'm not that good reading ASM

buoyant ingot Jul 26, 2022, 9:22 PM

#

Maybe break it apart into two loops, one for buffer and one for sequences, and then see how long each one takes? This will be slower but should help in identifying what's taking the most time.

#

Also put these in different functions, of course

sturdy fractal Jul 26, 2022, 9:22 PM

#

Good idea

#

I've thought about doing MThreading also

#

It is kinda the perfect use case

#

But I want to target wasm, I heard it was not that good using MThreading

sturdy fractal Jul 26, 2022, 9:40 PM

#

Updated Flamegraph using two functions

#

Goes from 55% to 60.8% (obviously)

dusk scroll Jul 27, 2022, 2:30 AM

#

Here's an improved version: ~~https://play.rust-lang.org/?edition=2021&gist=6a864e724da913658b981c5a4fa94b26~~ [better version below]

Includes a nice Display implementation for the matrix, which you can see in the output when you run it.

Stores rows and columns as unsigned integers for speed, but provides an API that takes and gives bools. The 5-or-more-consecutive part is done with .trailing_zeros() to very quickly get a count of trailing zero bits. The pattern is gotten with a (line & 0b1111111) == 0b1011101 and a shift right by 5 if it succeeds or 1 if not, then it tries again.

The Matrix::from_array function can probably be sped up somewhat.

buoyant ingot Jul 27, 2022, 2:34 AM

#

FYI if you use two block characters per point it's almost exactly square

dusk scroll Jul 27, 2022, 3:16 AM

#

Now the from_array is faster: https://play.rust-lang.org/?edition=2021&gist=a54d4c1c6bbc09b7586563f85335217b.

dusk scroll Jul 27, 2022, 3:49 AM

#

buoyant ingot FYI if you use two block characters per point it's almost exactly square

Now with half-height blocks: https://play.rust-lang.org/?edition=2021&gist=448edeaeb8814803fdf645452debd1e2

dusk scroll Jul 27, 2022, 4:24 AM

#

Now, with fast light and dark count calculations: https://play.rust-lang.org/?edition=2021&gist=cf1d158e8354d169c5f9cf67caa2ab72

dusk scroll Jul 27, 2022, 5:29 AM

#

Ahh, just saw ModuleType. Let's see how to adjust.

sturdy fractal Jul 27, 2022, 7:05 AM

#

Hello Chain ! I just woke up 30 mins ago (GMT+2) I'm looking at the links bottom up right now :)

#

Thanks for your inputs !

#

I've thought about having a special uXXX to store each row / columns
But first I had a problem because max size is 177, which is more than u128, but anyway, I should have made something custom like (u128, u64) which goes to 192 and is greated than 177

#

I encoded ModuleType in my arrays because bool takes 8 bits, and using bool, I was literally wasting 7 of them so I decided to encoded inside of u8 the module type

#

However, this is not 100% necessary, because for a given matrix size, we can easily computed the module type

#

I will try something from all this input, thanks a lot !

#

Doing like you said, precomputing from the start the transpose matrix (if I have a small type to store rows & columns)

#

Let's get working

#

This I like very much !

#

My first time using trailing zeroes, I had a boolean to use trailing_zeroes then trailing_ones

#

Which was inefficient and ugly

sturdy fractal Jul 28, 2022, 8:28 AM

#

Update: I've precomputed the matrix transpose instead of doing it each time, which granted an overall 20% boost 🚀

#

I'm trying to integrate the trailing zero usage and checking module type while doing so

sturdy fractal Jul 29, 2022, 3:08 PM

#

Update2:
I could not make the trailing bit count to work because of some cases:
Image1: an empty QRCode (we should not take into account those zones)
Image2: QRCode zone we should not take into account to compute a score

Image3: A QRCode with a dark bar having 0 score (since it does through the timing line)

#

But anyway, I'm happy to where I got

#

Thanks everyone !

#

Now I can work on having a very good API

#Code Review to improve the speed of this snippet