#Argamak - A Gleam library for tensor maths.

1 messages · Page 1 of 1 (latest)

warm wolf
honest epoch
#

Glorious ✨

warm wolf
#

Finally remembered (after releasing it hmm) that I wanted the new concat fn to err if the given find fn is false for every axis. That'll be in the next release.

warm wolf
#

Argamak v1.0.0 published 🐎
Argamak's been updated for Gleam v0.33+ and also includes the concat change mentioned above.

warm wolf
#

Argamak is ready for Gleam v1.0.0 nx

silk sleet
#

i think "vectors" are a type of tensor is that right?

warm wolf
#

Vectors are basically 1-dimensional tensors, afaiu.

silk sleet
#

so i could sensibly do vector things with argamak?

warm wolf
#

You should be able to, yeah.

silk sleet
#

any idea how to implement vector similarity 😄

#

oh shit gleam_community_maths have some functions for this

silk sleet
#

and i could use argamak for this?

#

to do the Maths magic_sparkles

warm wolf
#

I didn't implement any sine, cosine, etc functions yet.

long hearth
#

What do you mean by “vector similarity”

#

Same direction / magnitude ?

warm wolf
#

Yeah, that's a good question.
It looks like the simplest Euclidean comparison would be point for point.

silk sleet
# warm wolf Currently, you could do the Euclidean distance, I think, and it might be fairly ...

ok actually a more formulated question is this.

I'll have a bunch of List(Float) that represents some vector(s). I see gleam_community_maths already has a function for euclidean distance (and some others). im not sure if argamak gets me anything if i go through the hassle of converting the data repr to tensors etc or if i could just coast by with just a list of floats and that gleam_community_maths function

warm wolf
#

I guess it depends on how big the lists are.

#

I'm not sure when it becomes slow without tensors, but argamak is certainly fast at computation, by my standards.

silk sleet
#

thats a good question, i dont know how big they will be right now but do you have a rough idea on when those gains might be felt? 10 elements, 50, 100, ..?

warm wolf
#

I do not 😁

long hearth
#

It depends on what you need done. Fastest would be to store vectors as chunked bit arrays.

#

But lists are more convenient

honest epoch
#

Don't try and guess perf

#

Just implement it the easy way and then optimise later

warm wolf
#

It should just be a handful of operations to implement it with Argamak, if you want to try comparing.

#

subtract one from the other, power(2), sum, square_root.

warm wolf
silk sleet
#

ohh now we're getting somewhere

#

lets say i had 100 vectors "at rest" and then 1 in as "input", are you saying argamak would be suitable to find out which out of the 100 that input is most similar to?

warm wolf
#

I should think so, yeah.

silk sleet
#

each vector would be a dimension in the tensor?

warm wolf
#

Just a row. It'd be 2d.

warm wolf
# silk sleet each vector would be a dimension in the tensor?

It's not pretty here, but it should work and you can translate into Gleam.

t = :argamak@tensor; s = :argamak@space; axis = :argamak@axis; {:ok, d2} = s.d2({:infer, "Vector"}, {:axis, "Point", 3}); o_xs = [[1,0,3],[4,4,2],[-1,8,2]]; {:ok, xs} = t.from_floats(:gleam@list.flatten(o_xs), d2); {:ok, input} = t.from_floats([3, -5, 4], d2); input |> t.debug; xs |> t.debug; {:ok, step1} = input |> t.subtract(xs); step1 |> t.debug; {:ok, step2} = step1 |> t.power(t.from_float(2)); step2 |> t.debug; {:ok, step3} = step2 |> t.sum(fn a -> axis.name(a) == "Point" end) |> t.square_root; {:ok, closest_i} = step3 |> t.debug |> t.arg_min(fn _ -> true end) |> t.to_int; o_xs |> :gleam@list.at(closest_i)
silk sleet
#

i think in this space folks use "vector" to mean n-dimensional rather than specifically 2-dimensional

#

The length or dimensionality of the vector depends on the specific embedding technique you are using and how you want the data to be represented. For example, if you are creating word embeddings, they will often have dimensions ranging from a few hundred to a few thousand — something that is much too complex for humans to visually diagram. Sentence or document embeddings may have higher dimensions because they capture even more complex semantic information.

#

not that it matters?

warm wolf
#

The nice thing here should be that the same building blocks apply to any-dimensional data.

silk sleet
#

yeah thats what i figured

#

thank you i wil have a play this evening

#

for anyone curious i want to do some local LLM stuff where i can query my notes. you can use the models to generate vector embeddings of any text, they're meaningless on their own but their use is in finding similar text. if i create vector embeddings of my notes i can have a flow thats like:

  • ask a question
  • compare question vector to notes vector and extract any that are relevant
  • pass the question to LLM with relevant notes as additional context
warm wolf
silk sleet
#

woooah wowfrog

warm wolf
#

Kind of a pointless try fn there, I guess, but I find it's really helpful to see the state of the tensor at every step in a long calculation.

silk sleet
#

Axis("Relevancy", size: 3) how did you land on a size of 3 here?

warm wolf
#

The setup is always like those word problems you might know from school. You need to figure out how to represent some characteristics/variables in terms of numbers.
Argamak has a bit more ceremony in creating a space compared with Nx or TensorFlow, but once that part is done it's pretty OK, I think.
With Nx, or Elixir in general, before I came to Gleam, I'd often have one really long pipeline, and probably should make less effort to do it that way, but that's the main difference after that initial setup, that you have to deal with Result sometimes.

warm wolf
#

As an example.

silk sleet
#

ah okay

#

that makes sense, i think i can do it the other way around, always know the number of notes but infer the number of datapoints

#

thank you i really appreciate this

warm wolf
#

You can, but I think for the computation to work, you'll need to fill in empty spaces with something that makes sense.

#

Or just to be able to make a matrix in general too.

silk sleet
#

are you saying my assumption that every vector will be the same size is too optimistic? 😄

warm wolf
#

No, just that if they aren't the same size, you'll need to resolve that issue somehow.

silk sleet
#

seems like all the embeddings are 4096 elements

warm wolf
#

You don't need to infer any dimension's size, unless you want to.

silk sleet
#

oh wait shit argamak needs elixir?

warm wolf
#

Elixir seems faster tho, for the unit tests, at least, iirc.

silk sleet
#

😭

warm wolf
#

For 100, 4096-element comparisons, I'd expect tensors to be more performant than lists, but interested to hear real results if you try both.

warm wolf
# silk sleet 😭

Installing Elixir ain't no thang. It's almost just like a big Erlang package you can't get through Hex.

silk sleet
#

i really dislike installing languages i dont want to use lolsob this better be worth it!

warm wolf
#

No promises.

neon thunder
#

What's the motivation for Euclidean distance over cosine similarity?

warm wolf
neon thunder
#

Yeah I saw, I was wondering what @silk sleet wanted to use

#

i was thinking of making a data cleaning lib in gleam when i saw it have pipelines

#

maybe for a less busy time loll

neon thunder
#

Maybe I'm wrong

raven crater
#

What is the most suitable distance/similarity metric you use in a specific case essentially comes down to what type of data you're working with.

#

There's always the example with the Manhattan vs Euclidean distance. If you want to measure the distance between two locations in e.g. a city with a lot of apartment blocks the Manhattan distance can be the most suitable as it inherently takes into account the grid-structure of the apartment blocks. At sea, where there's no obstructions the Euclidean is more suitable. Similar analogies can be made with the cosine similarity.

fervent stump
#

@warm wolf how fast is this on the Javascript target? I'm gonna dabble in some ML stuff and I'd normally use C which is of course faster but if this is like numpy and the JavaScript target is "fast enough" then I'd so much rather use this and Gleam

warm wolf
# fervent stump <@430799258070024203> how fast is this on the Javascript target? I'm gonna dabbl...
hyperfine 'gleam test -t erl' 'gleam test -t js'
Benchmark 1: gleam test -t erl
  Time (mean ± σ):      1.003 s ±  0.017 s    [User: 0.892 s, System: 0.350 s]
  Range (min … max):    0.973 s …  1.025 s    10 runs

Benchmark 2: gleam test -t js
  Time (mean ± σ):      1.211 s ±  0.020 s    [User: 1.352 s, System: 0.217 s]
  Range (min … max):    1.184 s …  1.256 s    10 runs

Summary
  gleam test -t erl ran
    1.21 ± 0.03 times faster than gleam test -t js
#

For 85 tests. Seems pretty snappy. Not sure I'm using the most optimal backend with Node.js. IIRC, there should be a way to run it on the GPU.

fervent stump
#

Cool! I'll play around with this and see how it goes. Thanks for doing this!!

warm wolf