#Huge look-up matrices

1 messages ยท Page 1 of 1 (latest)

steel current
#

Hi all ๐Ÿ˜„
I know that joining a server and then immediately asking a question is unpopular, but I've been trying to figure this out for a few weeks now..

  • I am looking at a travelling salesman problem with ~200k cities, Euclid2 distance( of floats).
    Is there any way to save the entire distance matrix( triangular matrix dim ~ 200k^2 )? If not, is there a good way for me to save the distance matrices when I divide the grid into n parts( that way the matrices would be much smaller sized- files)? If both are bad ideas, what would be the best practices for doing something like this?

[ I've tried to work with mmap, but im not sure if it'd work with more than 1 matrix saved in it ]
ps. if you need more info, please ask, I will respond as soon as I see it ๐Ÿ™‚
Thanks

trail hare
#

I'm mostly curious, rather than being able to help with this particular problem. Does spatial partitioning not work with what you're trying to accomplish?

julia> n = 20_000
20000

julia> @time zeros(Float32, n, n);
  0.494382 seconds (2 allocations: 1.490 GiB, 24.26% gc time)

julia> 200_000 ^ 2 / 20_000 ^ 2
100.0

It looks like you'd need about 150 GiB to store this, though I'm sure you're well aware of that

#

oh i guess 75 GiB since they're triangular

steel current
# trail hare I'm mostly curious, rather than being able to help with this particular problem....

Im not sure in what sense do you mean spatial partitioning?
If you mean divide the grid of cities into many sub-grids, then that is exactly what I meant by" divide the grid into n parts"( that way my total file size would be much smaller, because I would always only have to compute a (200_000/n)^2 sized matrices)
If you mean spatial as in computer memory, then please tell me more.
I am aware that it'd take too many GiBs to even consider doing it at once on RAM, though it isn't a problem for hard-drive storage for me( Ill be deleting the file after Im done playing with the project anyway).

steel current
trail hare
#

I misunderstand and thought that you meant storing the 200k^2 matrix just in pieces

#

Would cities on borders of the grid want to peek into neighboring cells?

steel current
trail hare
gentle falcon
#

hmm, for a problem that big, it'd be nicer if you didn't save off the distances

steel current
#

Im sure that it would work if I just downloaded it and used it in C, even though very very slowly. It's a side project for fun so I'd really like to know whats going on inside the algorithm, and I the idea of going through 4000 lines of someone elses code scares me.

steel current
trail hare
#

how many times will you need the same distance?

gentle falcon
steel current
#

At the moment, I used the greedy alg( find a nearest possible city, and go to it), so each city would be checked many times until it's been used. But I dont know how many times I would need for LKH

trail hare
#

even though a city would be checked many times, it seems like you're only calculating the distance between any two cities a single time

#

so there shouldn't be a need to cache the distance

#

like devin said

steel current
#

๐Ÿคฆโ€โ™‚๏ธ ๐Ÿคฆโ€โ™‚๏ธ ๐Ÿคฆโ€โ™‚๏ธ I just checked, it takes 0.04 seconds to check 100_000 distances, I guess I completely overthought this one.. Sorryy

#

and thank you both for the help ๐Ÿ˜„

trail hare
#

I think spatial partitioning would still be helpful during the optimization phase, because it seems like you will be checking multiple times and it would be nice to not check far away cities

#

my brief look at the algorithm seems to indicate that but i may misunderstand

steel current
# trail hare I think spatial partitioning would still be helpful during the optimization phas...

I will try to implement spatial checking in the sense of epsilon neighborhood tomorrow. ( If at city (x,y), checks the \x+-epsilon,y+-epsilon\ box for cities, if there's cities inside, find the nearest one; if there's no cities, increase epsilon). I hope this works as well as I imagine it to, since I think Ill only need 2 arrays of ints( to keep the cityIDs sorted over X and Y axis', and a matrix to hold the info for all cities)

#

[again, Im not expecting greedy to give incredible performance or anything, it's just the only alg I know how to implement yet]

steel current
#

#update1:
I rewrote the code, implemented threads, no more saving matrices ๐Ÿ˜„