#Too many allocations

1 messages · Page 1 of 1 (latest)

tight jewel
#

I have an abomination of a function which is the bottleneck of my code. I think it's due to the many massive allocations that are done here:

pub(crate) fn construct_cdf_interpolator<const K: usize>(neighbors: Vec<Neighbours<f64, P3>>) -> DashMap<KNN, Interp1d<Cdf, Distance>> {


    // Get distance and sort them
    let mut dists: [Vec<f64>; K] = {
        
        let dists: Vec<[f64; K]> = neighbors
            .into_par_iter()
            .map(|neighbors| {

                // Disard indices, neighbor positoins for this query point
                let knn_dists: [f64; K] = neighbors
                    .iter()
                    .map(|neighbor| neighbor.dist2.sqrt())
                    .collect_vec()
                    .try_into()
                    .unwrap();

                knn_dists
            }).collect();

        (0..K).map(|k|{
            let mut knn_dists = dists
                .iter()
                .cloned()
                .map(|x| x[k])
                .collect_vec();
            knn_dists.sort_by(|a, b| a.partial_cmp(&b).expect("all distances should be finite"));
            knn_dists
        })
        .collect_vec()
        .try_into()
        .unwrap()
    };
    // Calculate cdf
    (0..K).map(|k| {

        let size = dists[k].len();
        let cdf = (1..=size)
            .map(|i| i as f64 / size as f64)
            .collect_vec();
        (k as u16 + 1, Interp1d::new_sorted(cdf, std::mem::take(&mut dists[k])).unwrap())
    })
    .collect()
}
proven sparrow
#

the first thing I would try is writing a collect_into_array() function to replace all those short vectors and seeing what happens

toxic crystal
#

you can "collect" into an array with some fanciness with array

tight jewel
#

Things you may need to know:

  1. Neighbours is a struct which contains K sorted Neighbours with a dist2 which I .sqrt()
  2. I have a Vec<Vec<Neighbor>> that I want to essentially transpose and then turn into a DashMap
#

is that in std?

toxic crystal
#

?eval ```rs
let mut iter = vec![1,2,3,4].into_iter();

[(); 4].map(|()| iter.next().unwrap())

tired marlinBOT
#
[1, 2, 3, 4]
tight jewel
#

brilliant

toxic crystal
#

if you can get this into a playground i could help more

tight jewel
#

i have the repo if u wnat that

toxic crystal
#

thats fine as well

tight jewel
#

if that's too much I can try a playgorund

toxic crystal
#

jesus

tight jewel
#

lol

toxic crystal
#

a playground would be nice

#

i dont like to download code from a random person on the internet and just run it without reviewing it

#

and this would certainly take a while

tight jewel
#

I understand

#

this repo does indeed download an 800 mb dataset on which this code operates, lol

toxic crystal
#

is there a reason you use .collect_vec and not .collect::<Vec<_>>()

#

and what even is collect_vec from

tight jewel
#

it's from itertools

#

it's the same thing psure

toxic crystal
#

i see

#

i got something compiling

#

with much stupid

tight jewel
#

@toxic crystal

#

didn't mess with any of the code inside the fn except for declaring neighbors inside instead of taking in an arg

#

there are some dummy types to make it work

toxic crystal
#

ty

tight jewel
#

gdi meant to link to playground

toxic crystal
#

gdi?

#

it doesnt really matter

#

im still doing it on my local computer

#

its just something i can read all of

#
use rayon::prelude::*;
use std::collections::HashMap as DashMap;

struct Interp1d;
impl Interp1d {
    fn new_sorted(x: Vec<f64>, y: Vec<f64>) -> Option<Interp1d> {
        Some(Interp1d)
    }
}

struct Neighbor {
    dist2: f64,
}

fn main() {
    let neighbors: Vec<[Neighbor; 4]> = vec![
        [
            Neighbor { dist2: 1.0 },
            Neighbor { dist2: 2.0 },
            Neighbor { dist2: 3.0 },
            Neighbor { dist2: 4.5 },
        ],
        [
            Neighbor { dist2: 1.5 },
            Neighbor { dist2: 2.5 },
            Neighbor { dist2: 3.5 },
            Neighbor { dist2: 4.0 },
        ],
    ];
    let _ = thing(neighbors);
}

fn thing<const K: usize>(neighbors: Vec<[Neighbor; K]>) -> DashMap<u16, Interp1d> {
    // Get distance and sort them
    let dists: Vec<[f64; K]> = neighbors
        .into_par_iter()
        .map(|neighbors| {
            // Disard indices, neighbor positoins for this query point
            let mut knn_dists_iter = neighbors.iter().map(|neighbor| neighbor.dist2.sqrt());

            [(); K].map(|_| knn_dists_iter.next().unwrap())
        })
        .collect();

    (0..K)
        .into_par_iter()
        .map(|k| {
            let mut knn_dists = dists.iter().copied().map(|x| x[k]).collect::<Vec<_>>();
            knn_dists.sort_by(|a, b| a.partial_cmp(b).expect("all distances should be finite"));
            knn_dists
        })
        .enumerate()
        .map(|(k, dist)| {
            let size = dist.len();
            let cdf = (1..=size)
                .map(|i| i as f64 / size as f64)
                .collect::<Vec<_>>();
            (k as u16 + 1, Interp1d::new_sorted(cdf, dist).unwrap())
        })
        .collect()
}
#

@tight jewel

#

thats probably better

tight jewel
#

I tried something in parallel as well and got

pub(crate) fn construct_cdf_interpolator<const K: usize>(neighbors: Vec<Neighbours<f64, P3>>) -> DashMap<KNN, Interp1d<Cdf, Distance>> {


    let size = neighbors.len();
    let cdf = (1..=size)
            .map(|i| i as f64 / size as f64)
            .collect_vec();

    // Get distance and sort them
    {
        
        let mut iterator = {
        
            let dists: Vec<[f64; K]> = neighbors
                .into_par_iter()
                .map(|neighbors| {

                    // Disard indices, neighbor positoins for this query point
                    let mut knn_dists_iter = neighbors
                        .iter()
                        .map(|neighbor| neighbor.dist2.sqrt());

                    [(); K].map(|_| knn_dists_iter.next().unwrap())
                }).collect();

            (0..K).map(move |k| {
                dists
                    .iter()
                    .map(|x| x[k])
                    .sorted_by(|a, b| a.partial_cmp(&b).expect("all distances should be finite"))
                    .collect_vec()
                })
        };

        (0..K as u16)
            .map(|k| (k+1, Interp1d::new_sorted(cdf.clone(), std::mem::take(&mut iterator.next().unwrap())).unwrap()))
            .collect()
    }
}

I think we both have 4 collects

toxic crystal
#

why are you casting to u16?

tight jewel
#

you have that map, enumerate, map tho

#

you do it as well, just later

#

(k as u16 + 1, Interp1d::new_sorted(cdf, dist).unwrap())

toxic crystal
#

ah

#

its kinda hard to optimize this

tight jewel
#

yea..

#

it's not that bad but it is like 60% of my run time rn

toxic crystal
#

its the .map(|x| x[k]) thats really the trouble

tight jewel
#

yea i'm really just trying to transpose

#

maybe I should use array2

toxic crystal
#

thats the line thats trashing perf because of cache problems

#

I made it a Vec<[T; K]>

tight jewel
#

when i extract the dist2 from the structs -- it's at that point that I can just have an array2

toxic crystal
#

so that is going to help a lot

tight jewel
#

not sure if that would help

toxic crystal
#

not anymore than making it an array

#

because you seem to know the size

toxic crystal
#

is 4 a typical value for you?

#

or similarly low numbers

tight jewel
#

for this pipeline, I expect K to always be 4 but it could be like 4-16 or so

toxic crystal
#

yeah thats fine

#

i would just change it to Vec<[Neighbor; K]> and roll with that