Argamak - A Gleam library for tensor maths. | The Gleam Programming Language | Page 1

warm wolf Sep 12, 2023, 2:20 AM

#

2023-09-11 - Argamak now requires Gleam v0.30+; added the concat Tensor slicing/joining function

✨ In the future, more slicing/joining functions are planned~

https://github.com/tynanbe/argamak
https://hex.pm/packages/argamak
https://hexdocs.pm/argamak/

honest epoch Sep 12, 2023, 8:50 AM

#

Glorious ✨

warm wolf Sep 14, 2023, 2:10 PM

#

Finally remembered (after releasing it hmm ) that I wanted the new concat fn to err if the given find fn is false for every axis. That'll be in the next release.

warm wolf Dec 22, 2023, 11:16 PM

#

Argamak v1.0.0 published 🐎
Argamak's been updated for Gleam v0.33+ and also includes the concat change mentioned above.

warm wolf Feb 18, 2024, 2:24 AM

#

Argamak is ready for Gleam v1.0.0

silk sleet Feb 19, 2024, 4:06 PM

#

i think "vectors" are a type of tensor is that right?

warm wolf Feb 19, 2024, 4:06 PM

#

Vectors are basically 1-dimensional tensors, afaiu.

silk sleet Feb 19, 2024, 4:07 PM

#

so i could sensibly do vector things with argamak?

warm wolf Feb 19, 2024, 4:09 PM

#

You should be able to, yeah.

silk sleet Feb 19, 2024, 4:09 PM

#

any idea how to implement vector similarity 😄

#

oh shit gleam_community_maths have some functions for this

warm wolf Feb 19, 2024, 4:13 PM

#

silk sleet any idea how to implement vector similarity 😄

OK. Here's what I found on the web.

silk sleet Feb 19, 2024, 4:13 PM

#

and i could use argamak for this?

#

to do the Maths magic_sparkles

warm wolf Feb 19, 2024, 4:17 PM

#

silk sleet and i could use argamak for this?

Currently, you could do the Euclidean distance, I think, and it might be fairly trivial.

#

I didn't implement any sine, cosine, etc functions yet.

long hearth Feb 19, 2024, 4:17 PM

#

What do you mean by “vector similarity”

#

Same direction / magnitude ?

warm wolf Feb 19, 2024, 4:19 PM

#

Yeah, that's a good question.
It looks like the simplest Euclidean comparison would be point for point.

silk sleet Feb 19, 2024, 4:21 PM

#

warm wolf Currently, you could do the Euclidean distance, I think, and it might be fairly ...

ok actually a more formulated question is this.

I'll have a bunch of List(Float) that represents some vector(s). I see gleam_community_maths already has a function for euclidean distance (and some others). im not sure if argamak gets me anything if i go through the hassle of converting the data repr to tensors etc or if i could just coast by with just a list of floats and that gleam_community_maths function

warm wolf Feb 19, 2024, 4:21 PM

#

I guess it depends on how big the lists are.

#

I'm not sure when it becomes slow without tensors, but argamak is certainly fast at computation, by my standards.

silk sleet Feb 19, 2024, 4:23 PM

#

thats a good question, i dont know how big they will be right now but do you have a rough idea on when those gains might be felt? 10 elements, 50, 100, ..?

warm wolf Feb 19, 2024, 4:23 PM

#

I do not 😁

long hearth Feb 19, 2024, 4:23 PM

#

It depends on what you need done. Fastest would be to store vectors as chunked bit arrays.

#

But lists are more convenient

honest epoch Feb 19, 2024, 4:23 PM

#

Don't try and guess perf

#

Just implement it the easy way and then optimise later

warm wolf Feb 19, 2024, 4:24 PM

#

It should just be a handful of operations to implement it with Argamak, if you want to try comparing.

#

subtract one from the other, power(2), sum, square_root.

warm wolf Feb 19, 2024, 4:52 PM

#

silk sleet thats a good question, i dont know how big they will be right now but do you hav...

Probably, tensors start to shine when you throw all your vectors into a matrix and do the math to find out which are the most similar, without explicitly iterating or recursing.

silk sleet Feb 19, 2024, 4:53 PM

#

ohh now we're getting somewhere

#

lets say i had 100 vectors "at rest" and then 1 in as "input", are you saying argamak would be suitable to find out which out of the 100 that input is most similar to?

warm wolf Feb 19, 2024, 4:54 PM

#

I should think so, yeah.

silk sleet Feb 19, 2024, 5:05 PM

#

each vector would be a dimension in the tensor?

warm wolf Feb 19, 2024, 5:06 PM

#

Just a row. It'd be 2d.

warm wolf Feb 19, 2024, 5:29 PM

#

silk sleet each vector would be a dimension in the tensor?

It's not pretty here, but it should work and you can translate into Gleam.

t = :argamak@tensor; s = :argamak@space; axis = :argamak@axis; {:ok, d2} = s.d2({:infer, "Vector"}, {:axis, "Point", 3}); o_xs = [[1,0,3],[4,4,2],[-1,8,2]]; {:ok, xs} = t.from_floats(:gleam@list.flatten(o_xs), d2); {:ok, input} = t.from_floats([3, -5, 4], d2); input |> t.debug; xs |> t.debug; {:ok, step1} = input |> t.subtract(xs); step1 |> t.debug; {:ok, step2} = step1 |> t.power(t.from_float(2)); step2 |> t.debug; {:ok, step3} = step2 |> t.sum(fn a -> axis.name(a) == "Point" end) |> t.square_root; {:ok, closest_i} = step3 |> t.debug |> t.arg_min(fn _ -> true end) |> t.to_int; o_xs |> :gleam@list.at(closest_i)

#

silk sleet Feb 19, 2024, 5:32 PM

#

i think in this space folks use "vector" to mean n-dimensional rather than specifically 2-dimensional

#

The length or dimensionality of the vector depends on the specific embedding technique you are using and how you want the data to be represented. For example, if you are creating word embeddings, they will often have dimensions ranging from a few hundred to a few thousand — something that is much too complex for humans to visually diagram. Sentence or document embeddings may have higher dimensions because they capture even more complex semantic information.

#

not that it matters?

warm wolf Feb 19, 2024, 5:34 PM

#

The nice thing here should be that the same building blocks apply to any-dimensional data.

silk sleet Feb 19, 2024, 5:35 PM

#

yeah thats what i figured

#

thank you i wil have a play this evening

#

for anyone curious i want to do some local LLM stuff where i can query my notes. you can use the models to generate vector embeddings of any text, they're meaningless on their own but their use is in finding similar text. if i create vector embeddings of my notes i can have a flow thats like:

ask a question
compare question vector to notes vector and extract any that are relevant
pass the question to LLM with relevant notes as additional context

warm wolf Feb 19, 2024, 8:18 PM

#

silk sleet thank you i wil have a play this evening

https://gist.github.com/tynanbe/403f9e5e35ccdf1bf2d0de225ce4ef65

Gist

Euclidean distance with Argamak

Euclidean distance with Argamak. GitHub Gist: instantly share code, notes, and snippets.

silk sleet Feb 19, 2024, 8:21 PM

#

woooah wowfrog

warm wolf Feb 19, 2024, 8:23 PM

#

Kind of a pointless try fn there, I guess, but I find it's really helpful to see the state of the tensor at every step in a long calculation.

silk sleet Feb 19, 2024, 8:27 PM

#

Axis("Relevancy", size: 3) how did you land on a size of 3 here?

warm wolf Feb 19, 2024, 8:27 PM

#

The setup is always like those word problems you might know from school. You need to figure out how to represent some characteristics/variables in terms of numbers.
Argamak has a bit more ceremony in creating a space compared with Nx or TensorFlow, but once that part is done it's pretty OK, I think.
With Nx, or Elixir in general, before I came to Gleam, I'd often have one really long pipeline, and probably should make less effort to do it that way, but that's the main difference after that initial setup, that you have to deal with Result sometimes.

warm wolf Feb 19, 2024, 8:29 PM

#

silk sleet `Axis("Relevancy", size: 3)` how did you land on a size of 3 here?

You can only infer one dimension's size. In this case, we want to use the same space for both the input and the dataset, so I just inferred the number of "notes" and assumed it would always have 3 data points per note.

#

As an example.

silk sleet Feb 19, 2024, 8:29 PM

#

ah okay

#

that makes sense, i think i can do it the other way around, always know the number of notes but infer the number of datapoints

#

thank you i really appreciate this

warm wolf Feb 19, 2024, 8:30 PM

#

You can, but I think for the computation to work, you'll need to fill in empty spaces with something that makes sense.

#

Or just to be able to make a matrix in general too.

silk sleet Feb 19, 2024, 8:31 PM

#

are you saying my assumption that every vector will be the same size is too optimistic? 😄

warm wolf Feb 19, 2024, 8:31 PM

#

No, just that if they aren't the same size, you'll need to resolve that issue somehow.

silk sleet Feb 19, 2024, 11:38 PM

#

seems like all the embeddings are 4096 elements

warm wolf Feb 19, 2024, 11:48 PM

#

silk sleet seems like all the embeddings are 4096 elements

Should be good to go then.

#

You don't need to infer any dimension's size, unless you want to.

silk sleet Feb 20, 2024, 12:15 AM

#

oh wait shit argamak needs elixir?

warm wolf Feb 20, 2024, 12:21 AM

#

silk sleet oh wait shit argamak needs elixir?

Or Node.js.

#

Elixir seems faster tho, for the unit tests, at least, iirc.

silk sleet Feb 20, 2024, 12:24 AM

#

😭

warm wolf Feb 20, 2024, 12:25 AM

#

For 100, 4096-element comparisons, I'd expect tensors to be more performant than lists, but interested to hear real results if you try both.

warm wolf Feb 20, 2024, 12:26 AM

#

silk sleet 😭

Installing Elixir ain't no thang. It's almost just like a big Erlang package you can't get through Hex.

silk sleet Feb 20, 2024, 12:27 AM

#

i really dislike installing languages i dont want to use lolsob this better be worth it!

warm wolf Feb 20, 2024, 12:28 AM

#

No promises.

neon thunder Mar 11, 2024, 7:28 PM

#

What's the motivation for Euclidean distance over cosine similarity?

warm wolf Mar 11, 2024, 7:43 PM

#

neon thunder What's the motivation for Euclidean distance over cosine similarity?

I simply haven't implemented any trigonometric functions in Argamak yet.

neon thunder Mar 11, 2024, 7:45 PM

#

Yeah I saw, I was wondering what @silk sleet wanted to use

#

i was thinking of making a data cleaning lib in gleam when i saw it have pipelines

#

maybe for a less busy time loll

neon thunder Mar 12, 2024, 12:41 AM

#

https://twitter.com/_reachsumit/status/1767045820384477575?t=PJDV-HPYSrjvlHsEj2cT0w&s=19

Sumit (@_reachsumit) on X

Is Cosine-Similarity of Embeddings Really About Similarity?

Netflix cautions against blindly using cosine similarity as a measure of semantic similarity between learned embeddings, as it can yield arbitrary and meaningless results.

📝https://t.co/rbtsmXQ19s

#

Maybe I'm wrong

raven crater Mar 13, 2024, 8:26 AM

#

What is the most suitable distance/similarity metric you use in a specific case essentially comes down to what type of data you're working with.

#

There's always the example with the Manhattan vs Euclidean distance. If you want to measure the distance between two locations in e.g. a city with a lot of apartment blocks the Manhattan distance can be the most suitable as it inherently takes into account the grid-structure of the apartment blocks. At sea, where there's no obstructions the Euclidean is more suitable. Similar analogies can be made with the cosine similarity.

fervent stump Apr 8, 2024, 5:54 PM

#

@warm wolf how fast is this on the Javascript target? I'm gonna dabble in some ML stuff and I'd normally use C which is of course faster but if this is like numpy and the JavaScript target is "fast enough" then I'd so much rather use this and Gleam

warm wolf Apr 8, 2024, 5:57 PM

#

fervent stump <@430799258070024203> how fast is this on the Javascript target? I'm gonna dabbl...

hyperfine 'gleam test -t erl' 'gleam test -t js'
Benchmark 1: gleam test -t erl
  Time (mean ± σ):      1.003 s ±  0.017 s    [User: 0.892 s, System: 0.350 s]
  Range (min … max):    0.973 s …  1.025 s    10 runs

Benchmark 2: gleam test -t js
  Time (mean ± σ):      1.211 s ±  0.020 s    [User: 1.352 s, System: 0.217 s]
  Range (min … max):    1.184 s …  1.256 s    10 runs

Summary
  gleam test -t erl ran
    1.21 ± 0.03 times faster than gleam test -t js

#

For 85 tests. Seems pretty snappy. Not sure I'm using the most optimal backend with Node.js. IIRC, there should be a way to run it on the GPU.

fervent stump Apr 8, 2024, 6:01 PM

#

Cool! I'll play around with this and see how it goes. Thanks for doing this!!

warm wolf Apr 8, 2024, 6:04 PM

#

fervent stump Cool! I'll play around with this and see how it goes. Thanks for doing this!!

No problem. Argamak doesn't have many of the more advanced functions implemented yet, just FYI.

#Argamak - A Gleam library for tensor maths.