#Argamak - A Gleam library for tensor maths.
1 messages · Page 1 of 1 (latest)
Glorious ✨
Finally remembered (after releasing it
) that I wanted the new concat fn to err if the given find fn is false for every axis. That'll be in the next release.
Argamak v1.0.0 published 🐎
Argamak's been updated for Gleam v0.33+ and also includes the concat change mentioned above.
Argamak is ready for Gleam v1.0.0 
i think "vectors" are a type of tensor is that right?
Vectors are basically 1-dimensional tensors, afaiu.
so i could sensibly do vector things with argamak?
You should be able to, yeah.
any idea how to implement vector similarity 😄
oh shit gleam_community_maths have some functions for this
Currently, you could do the Euclidean distance, I think, and it might be fairly trivial.
I didn't implement any sine, cosine, etc functions yet.
Yeah, that's a good question.
It looks like the simplest Euclidean comparison would be point for point.
ok actually a more formulated question is this.
I'll have a bunch of List(Float) that represents some vector(s). I see gleam_community_maths already has a function for euclidean distance (and some others). im not sure if argamak gets me anything if i go through the hassle of converting the data repr to tensors etc or if i could just coast by with just a list of floats and that gleam_community_maths function
I guess it depends on how big the lists are.
I'm not sure when it becomes slow without tensors, but argamak is certainly fast at computation, by my standards.
thats a good question, i dont know how big they will be right now but do you have a rough idea on when those gains might be felt? 10 elements, 50, 100, ..?
I do not 😁
It depends on what you need done. Fastest would be to store vectors as chunked bit arrays.
But lists are more convenient
It should just be a handful of operations to implement it with Argamak, if you want to try comparing.
subtract one from the other, power(2), sum, square_root.
Probably, tensors start to shine when you throw all your vectors into a matrix and do the math to find out which are the most similar, without explicitly iterating or recursing.
ohh now we're getting somewhere
lets say i had 100 vectors "at rest" and then 1 in as "input", are you saying argamak would be suitable to find out which out of the 100 that input is most similar to?
I should think so, yeah.
each vector would be a dimension in the tensor?
Just a row. It'd be 2d.
It's not pretty here, but it should work and you can translate into Gleam.
t = :argamak@tensor; s = :argamak@space; axis = :argamak@axis; {:ok, d2} = s.d2({:infer, "Vector"}, {:axis, "Point", 3}); o_xs = [[1,0,3],[4,4,2],[-1,8,2]]; {:ok, xs} = t.from_floats(:gleam@list.flatten(o_xs), d2); {:ok, input} = t.from_floats([3, -5, 4], d2); input |> t.debug; xs |> t.debug; {:ok, step1} = input |> t.subtract(xs); step1 |> t.debug; {:ok, step2} = step1 |> t.power(t.from_float(2)); step2 |> t.debug; {:ok, step3} = step2 |> t.sum(fn a -> axis.name(a) == "Point" end) |> t.square_root; {:ok, closest_i} = step3 |> t.debug |> t.arg_min(fn _ -> true end) |> t.to_int; o_xs |> :gleam@list.at(closest_i)
i think in this space folks use "vector" to mean n-dimensional rather than specifically 2-dimensional
The length or dimensionality of the vector depends on the specific embedding technique you are using and how you want the data to be represented. For example, if you are creating word embeddings, they will often have dimensions ranging from a few hundred to a few thousand — something that is much too complex for humans to visually diagram. Sentence or document embeddings may have higher dimensions because they capture even more complex semantic information.
not that it matters?
The nice thing here should be that the same building blocks apply to any-dimensional data.
yeah thats what i figured
thank you i wil have a play this evening
for anyone curious i want to do some local LLM stuff where i can query my notes. you can use the models to generate vector embeddings of any text, they're meaningless on their own but their use is in finding similar text. if i create vector embeddings of my notes i can have a flow thats like:
- ask a question
- compare question vector to notes vector and extract any that are relevant
- pass the question to LLM with relevant notes as additional context
woooah 
Kind of a pointless try fn there, I guess, but I find it's really helpful to see the state of the tensor at every step in a long calculation.
Axis("Relevancy", size: 3) how did you land on a size of 3 here?
The setup is always like those word problems you might know from school. You need to figure out how to represent some characteristics/variables in terms of numbers.
Argamak has a bit more ceremony in creating a space compared with Nx or TensorFlow, but once that part is done it's pretty OK, I think.
With Nx, or Elixir in general, before I came to Gleam, I'd often have one really long pipeline, and probably should make less effort to do it that way, but that's the main difference after that initial setup, that you have to deal with Result sometimes.
You can only infer one dimension's size. In this case, we want to use the same space for both the input and the dataset, so I just inferred the number of "notes" and assumed it would always have 3 data points per note.
As an example.
ah okay
that makes sense, i think i can do it the other way around, always know the number of notes but infer the number of datapoints
thank you i really appreciate this
You can, but I think for the computation to work, you'll need to fill in empty spaces with something that makes sense.
Or just to be able to make a matrix in general too.
are you saying my assumption that every vector will be the same size is too optimistic? 😄
No, just that if they aren't the same size, you'll need to resolve that issue somehow.
seems like all the embeddings are 4096 elements
Should be good to go then.
You don't need to infer any dimension's size, unless you want to.
oh wait shit argamak needs elixir?
Or Node.js.
Elixir seems faster tho, for the unit tests, at least, iirc.
😭
For 100, 4096-element comparisons, I'd expect tensors to be more performant than lists, but interested to hear real results if you try both.
Installing Elixir ain't no thang. It's almost just like a big Erlang package you can't get through Hex.
i really dislike installing languages i dont want to use
this better be worth it!
No promises.
What's the motivation for Euclidean distance over cosine similarity?
I simply haven't implemented any trigonometric functions in Argamak yet.
Yeah I saw, I was wondering what @silk sleet wanted to use
i was thinking of making a data cleaning lib in gleam when i saw it have pipelines
maybe for a less busy time loll
Maybe I'm wrong
What is the most suitable distance/similarity metric you use in a specific case essentially comes down to what type of data you're working with.
There's always the example with the Manhattan vs Euclidean distance. If you want to measure the distance between two locations in e.g. a city with a lot of apartment blocks the Manhattan distance can be the most suitable as it inherently takes into account the grid-structure of the apartment blocks. At sea, where there's no obstructions the Euclidean is more suitable. Similar analogies can be made with the cosine similarity.
@warm wolf how fast is this on the Javascript target? I'm gonna dabble in some ML stuff and I'd normally use C which is of course faster but if this is like numpy and the JavaScript target is "fast enough" then I'd so much rather use this and Gleam
hyperfine 'gleam test -t erl' 'gleam test -t js'
Benchmark 1: gleam test -t erl
Time (mean ± σ): 1.003 s ± 0.017 s [User: 0.892 s, System: 0.350 s]
Range (min … max): 0.973 s … 1.025 s 10 runs
Benchmark 2: gleam test -t js
Time (mean ± σ): 1.211 s ± 0.020 s [User: 1.352 s, System: 0.217 s]
Range (min … max): 1.184 s … 1.256 s 10 runs
Summary
gleam test -t erl ran
1.21 ± 0.03 times faster than gleam test -t js
For 85 tests. Seems pretty snappy. Not sure I'm using the most optimal backend with Node.js. IIRC, there should be a way to run it on the GPU.
Cool! I'll play around with this and see how it goes. Thanks for doing this!!
No problem. Argamak doesn't have many of the more advanced functions implemented yet, just FYI.
