#data-science-and-ml
1 messages · Page 138 of 1
pretty nice summary:
SIMD is the 'concept', SSE/AVX are implementations of the concept. All SIMD instruction sets are just that, a set of instructions that the CPU can execute on multiple data points. As long as the CPU supports executing the instructions, then it is feasible for multiple SIMD instruction sets to coexist, regardless of data size.
In my current role people are just happy if it's running
My boss keeps paying absolutely insane cloud bills without knowing why they're so high
I guess it depends what market you're in
I am in adtech so cost per user/site/operation has an big impact
Right now I'm in R&D with many people that aren't really concerned about going to prod
The worst thing is that one guy decided to put all our services in 1 azure resource group
Drives me crazy
How is the whole Azure AI experience? I could never understand their naming schemes or pricing of services
To figure out the cost of a project you need to find/filter each individual resource and tally up their costs instead of just taking the entire RG
Hmm, in all honesty. We do all ML/AI stuff on prem
We have 2 Quadro cards, it's very comfy
Fair enough
The place I'm moving to does everything in databricks so that's that
I'm not sure how I feel about being bound to spakr for everything
I guess streamlining your stack saves engineering time
Tbh imo, ML/AI wise, as long as what ever you are doing can go into ONNX, you're entire deploy setup and infra side is just a breeze
I don't want to get philosophical but
The bigger issue is that the people working on the models are typically not engineers and have never heard of ONNX
But my sample size are just people I know irl of course
So I agree, but I argue that it is not hard to introduce them to onnx. Especially if you're using something like PyTorch, really all it becomes is a "Hey just add this 10 lines of python" or you can do it via CI or some CLI tool for example
Sometimes a magic script is the best kind of script
I see what you mean
I'm projecting a bit, most of my colleagues deliberately scoped things to never have to worry about prod
which is entirely possible in R&D, just call it something like "proof-of-value" and then it's done
Yeah, I mean I think that is probably a good thing, but maybe the missing step there is "we create this model, send it to the blackbox in CI that does the training and then spits back results"
Which is unironically what I was trying to make
Effectively what I ended up doing for our web classifier model which I actually don't understand why it isn't a mainstream library, is a framework that lets multiple people work on seperate models within the repo, and then run them via just run_model("my.path.to.pthon_file:MyClass") and then everything else is handled for them, which also lets me keep the system generic and abstract enough to automatically convert the models, load datasets, etc...
Probably the best bit of time I invested in and 2k LOC was setting that up, because it just made life so much better
What was your reasoning of not just having them give you the trained model
Is it something they couldn't do from their machines?
Or machines they have access to
They depend on CI /remote servers for training since which multiple GPUs and 24GB VRAM
and they are all working on the same repo
the biggest issue is when you have multiple people working on different models to do the same thing in order to compare, is you end up with them writing lots of bits of random code all over the place and different entrypoints to train and test the models
Specifically CI for model training is interesting because I can think of many times I'd want a new mode in a way that is decoupled to my commits
Like, new data arriving
But I suspect your use case is vastly different from the ones I have in mind
this approach meant the entry points for training, testing and exporting all are the same across models, and the testing system didn't have to be specialized for each model
I think so yeah, in our case we are targetting one end goal with the models, and it is more just a way of comparing different models with rapid development
the only painpoint is the system never really ended up setting up checkpoints
so if it failed right at the end you'd be 🫡 Waiting another 24 hours
pheeeww
hi, from sklearn.preprocessing import MinMaxScaler,StandardScaler sc=StandardScaler()
X_train= sc.fit_transform(X_train)
X_test= sc.fit_transform(X_test)
for these codes I got this error: TypeError: Property names are only supported if all input properties have string names, but your input contains ['str', 'tuple'] as property name/column name types. If you want the property names to be stored and validated, you should convert them all to strings, for example using X.columns = X.columns.astype(str). Otherwise, you can remove the property/column names from your input data or convert them all to a non-string data type. do u know?how can I fix this error?
How do you suggest I do this?
Put it in 3 backticks like so ``` your code ```
Also, you can put python between the 3 backticks, like so:
```python
X_train= sc.fit_transform(X_train)
X_test= sc.fit_transform(X_test)
```
that gives you
X_train= sc.fit_transform(X_train)
X_test= sc.fit_transform(X_test)
No unfourtanetly, I got this error in this time :SyntaxError: invalid syntax
can you show the whole exact code and the whole error message?
!paste
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the Paste! button in the bottom left, or by pressing CTRL + S. After doing that, you will be navigated to the new paste's page. Copy the URL and post it here so others can see it.
I was just talking about the formatting
I solved it now , It worked after disabling name verification like X_train= sc.fit_transform(X_train.values) . thanks a lott
I'm trying to create an ai which helps in traffic congestion. How can i do that
I don't recommend this as a first project as it will be exceptionally challenging and you probably won't make any progress before losing motivation.
You'd need a traffic simulation where the model can control things like the color of each traffic light. and you can probably use a reinforcement learning approach where the model is trying to maximize the speed of each vehicle.
what does this "notions of dimensional modelling, ETL and the basics of data engineering go a long way" part means?
Not my first project, but i had created to detect the vehicles, but i want to add any unique feature. This seems just like google maps. If u have any idea then pls share!
Etl is extract transform load, the first step to data project to prepare data i recommend seeing thing in detail, in some team it will be the entire job of a data engineer, but you still need to know about it sometimes you have to do it on smaller project
also there is different data architectures note that etl isn't the only way to go
Yeah, the answer that @spare forum gave is spot on, exactly what I meant
so i have a bunch of images that look like this, along with the same image with a red outline around the number, and a mask black/white image. i have an iot device thats takes centered pictures of the water meter centered / always in a similar place, how could i center the data in the dataset?
Hi, are there also Discord channels for scientific computing around or Slack communities? How does one find such things?
Can I ask a question about ETL? Is this not similiar to data processing from HDF5 to eiher another HDF5 or something else?
I import data, clean it, process it, and save it somewhere?
@dreamy topaz "object oriented programming" means different things to different people. If you want to do machine learning in python, you should know how to make a class in python, and know how to use classes that other people have created.
I have a dataframe column with one trend where the null values lay between 2 known rows
and then sudden blocks of NaN values (50 rows)
i want to fil only the first trend of Nans
and not the second
how would I achieve this
@nova matrix what method do you want to use to fill nans in the first trend
mean ideally
mean of the value before and value after
I tried different methods but my dataset has 50 million rows so I was wondering if I could do this with a package
@nova matrix you can use isna and shift
I'm standing on a train or I'd show you
@nova matrix these are the key
What's a nice solvable problem for LSTMs
that saves sooo much time thankss man
Did you solve it
About to implement But think it will work
will lyk
Oh okay
I'm new to python world.
I wanna ask about data science's prerequisites, roadmap, curriculum.
Is the ML,DL and DA,Stats combines together to become Data Science?????
There is a ocean named as "AI" and all rivers ended in that ocean!
I'm so fucking happy, finally I'm understanding how to do exploratory data properly, I'm literally getting it now.
The world ways of understanding is so fake and stupid, I came up with my way to understand which is crazy
My solution to solve a freaking problem is "don't ask people for help"
What did you do
i would generally be wary of this, especially when doing something math-heavy
THAT YOU DON'T QUESTION THE FREAKING DATA AT THE BEGINNING ITSELF
What? Why are you shouting?
People keep saying this thing that first you understand your problem statement bla bla bla bullshit
Because I was so stupid back then
The first step is actually to accept that you haven't done any data science.
The first step is actually to accept the fact that even a data scientist/analyst can't help you because he/she would only tell you what everyone would tell you
you might consider that if everyone tells you something, including experts in the field, maybe you're wrong
I'm not questioning any different solution coming from a expert, it's just that now I have found "MY OWN" way to solve shit
That's all which is 100000% better and it's working for me
and are you well equipped to show and explain why the method works?
because it gives you jumping off points for (interesting) places to look at in your data
Otherwise you're looking for a needle in a haystack. Look at it this way, how much harder would it be to do data analysis if I took a dataset with a meaning and renamed all the columns into A, B, C, D etc.
I've seen domain experts make assumptions and hypotheses that were incorrect but that in and of itself is very important. It lets you find out other interesting things like maybe your data quality is bad, the measurements are incorrect, you made a coding error and if you can exclude all of this their assumptions may have been wrong, which lead to additional questions
I read it, I get it, but I'm sorry I'm trying to find my way and idc to find a solution from an expert. I wanna make mistakes and learn from it than simply getting help.
Thank you
Wait
It's not about getting help, do you know what a domain expert is? I'm not asking this to sound demeaning, I'm just trying to make sure we're on the same wavelength
Oh okay
I have a bad habit to not trust myself when it comes to having a solution in my head and then I would just simply ask people if I'm correct or not which has kinda made me feel like I don't understand data analysis, but idk today morning I woke up feeling something else like I was just studying and suddenly things start to make sense because I was legit studying. I tried to have lil confidence in my way of understanding a data and it kinda worked for the first time and I just wanna be this person everyday now.
Somebody who trusts herself
I have a few tips about this later but before we go there, can you still try and explain with your own words what you think I meant with domain expert?
Okay we're on the same wavelength then
Oh okay idk the professional words but I do understand the professional words okay
Zester
So, I've done data projects (only talking about work ones) in additive manufacturing, health, sociology, finance, ... I'm absolutely not an expert in any of those. The only way I could get those done is by asking experts where to look when I get my dataset and to validate my findings
It's not about not trusting yourself
Often times data professionals are an expert in working with data and they throw you in the deep end in places you have no clue about
Ik but I wanna have my own journey with data.
But you want to do this for a living?
Yeah for a while not forever
Honestly, the more involved datasets I've worked with were not ones where you could "have your own journey with data". Simply because it would be like having all of your columns be named A, B, C, D, ...
😂
Then you need someone at your side. Because the subject matter is just the way it is
- 9 times out of 10 I'd not know where to look and that's absolutely normal. All the things I'd find were obvious and that makes sense.
Ofc but I love to play a role of some "smart" person who is finding things which already exists but i just love that "wow i just discover something new" feeling
For my current project I do find stuff that matters but when we meet the doctor we work with he always says "but could you look at X, Y and Z in conjunction with ..."
Well for those times I have this server, people are beautiful here
And that's where the gold is
Well, we're not domain experts in this server
I'm talking specifically about getting assistance from someone working in the field
And how that shouldn't make you feel like you're not doing a good job
I'm just a girl who wants to be happy thinking she's smart even if that's like basic for you all expert people
I wish I had someone but I don't
You're misunderstanding what I mean
I just have this server and Google
My point is simple
No matter how smart you are, if you get an average dataset you'd find in real life you'd struggle because you're missing a lot of context. This applies to me, you and anyone else
Yeah I know, i don't have any professional around actually IRL
Oh
Then you go for different sources, news articles, books, papers, blogs, ...
Yeah I do that, I do that a lot
I always try and immerse myself (even if there's people I can ask questions to)
Oh that's nice
Just be an expert in every field.
So yeah, to circle back. I'd say for an EDA the critical thing is to have background knowledge of the problem you're trying to solve. When you do this for a living there will always be someone you can ask
I love this server
For now all you can do is probably just read info about the topic online etc
Yeah ik that
And try to make sense out of that simple data
If you want to practice what worked for me is doing Kaggle competitions. Specifically those called "tabular playground". They're ML competitions but people always do an EDA, you can stop there if you want. The gist is, you make your own notebook, your own solution and then you read others
I'd always make it 100 % before looking at others
It's a mix of learning technical stuff "wow, is that how to make that kind of plot with Matplotlib?" and data analysis/ML intuitions "Oh that's how they looks at stuff, interesting conclusions they drew from ..."
Reading this Wikipedia article simulates the experience of seeing all those new strange column names for a specific field of work: https://en.wikipedia.org/wiki/Glossary_of_baseball_terms
This is an alphabetical list of selected unofficial and specialized terms, phrases, and other jargon used in baseball, along with their definitions, including illustrative examples for many entries.

What you don't know what a "cement mixer" is?
Omg thank you so much, you guys are all helping Angels
Thank you I'll look through that
I'm stealing this, this is such a nice way to describe it
I like to call all jargon in every field its "baseball terminology," or the "baseball barrier of entry."
Most extreme in things like business and law.
(And math, what you don't know the "hairy ball theorem?")
I like this one because even the explanation has references to things you wouldn't know. Torii Hunter, grand slam, walk-off homer, ...
(especially for us Europeans lol)
Yeah, like trying to understand a legal document in another language.
When I did the job in (additive) manufacturing I spent ym first week on the shop floor with the technicians and reading ISO standards in my spare time
Yeah, reading standards is a great way to get somewhere in stuff like that.
Also like just seeing the production steps
if I saw "wire EDM, 2 seconds"
You have to see the production process to gauge the plausibility of that
I suppose you could also make a histogram/boxplot of all lengths and come to the same conclusion
This was some part time gig I did as a student. By far the worst data job I've ever done.
I have noticed the pattern of part time student gigs for data science often making no sense.
I had others that were nice though
This one just had a terrible company culture. To do anything I needed authorization from Denver (-7h) and India (+7h) needed to execute iti
The "database" was Excel files on a network drive with a SQL view on top, queries could kill the entire plant
Did you have to fax them?
Luckily not, but I worked 2 days a week which meant if I needed to do the denver India cycle the week was over
Denver tells me I have a go end of day on the first day I ask or if they're busy on the next. By the time I can ask India it's their midnight
This is way too common, it's a strange property of accessibility, and being able to linearly add to something very easily, but it does not scale.
Ah and the other thing that I didn't like was people couldn't agree on basic terminology. For instance, what "how much % of parts were without defects this month" meant
For one group it meant parts manufactured this month / parts returned this month while for the other group (... me) it's parts manufactured in month A / parts returned of month A (irrespective of when they were returned)
So instead of cool stuff you can imagine I spent a lot of time in meetings getting people to agree (and write down) definitions
(It's funny because this happens to be part of the 1944 CIA organization sabotage guidelines, arguing on basic terminology (it's a really good way to get nothing done))
Hmm
(Not implying that the CIA is involved, just that it's a pretty bad thing for an organization to do if even the CIA recommends it as a method to disrupt)
I don't understand what are you trying to say
I actually often argue about basic terminology...
Self-reflection time...
bro wants to be a fed 
I wish I had permission to say "shut up" in this server
you do my friend and so much more
but i digress I don't wanna fill ds/ai channel with nonsense
Yeah
PyTorch running on amd gpu (more https://discuss.pytorch.org/t/how-to-run-torch-with-amd-gpu):
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1
JAX does not support it.
TF has some instructions here (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/tensorflow-install.html).
also has anyone used MLX ? https://github.com/ml-explore/mlx
ig it's not ready for prod, but being added to Keras (well, queued.)
noticed that lscpu in linux will tell you whether your cpu has avx,avx2 etc support (if i understand correctly)
easiest is to do cat /proc/cpuinfo
And under flags it will tell you, the supported CPU flags
On most x86 hardware, you will have AVX2 and the generations that came before it, i.e. SSE4, AVX, SSE3, etc...
thanks, yes
Only very new consumer hardware like AMD's Zen4 CPUs have AVX512 features.
Some Xeons Golds had AVX512 several years ago, but some problems they can have is it causes the chip to overhead and thermal throttle
interesting, i've got those on intel, not a new laptop
AVX512?
i was noticing that colab has xeon processors
Most ARM server chips will ship with NEON, notably Ampere, atlthough they don't mark it in their CPU info on VMs.
Some newer gen ARM chips like AWS Gravaton have SVE1 & 2 as well which provide more powerful SIMD operations on the cores
avx51f avx512dq avx512ifma avx512cd sha_ni avx512bw avx512vl avx512vbmi umip avx512_vbmi2 avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2in...
Fair enough, what chip is it?
Intel just with the worst possible naming schemes on their chips 😔 And then AMD is now doing the same
Right it's Rocket lake, so post-icelake
not via torch
via tf?
no just onnxruntime
we dont use TF, which is think is a general pattern in the industry
i see, for inference or training as well?
TF is just not a very fun framework to try use in comerical aspects and then deploying
Mostly inference
interesting, i've used tf for training, onnxruntime for web deployments
Can do training, but normally it is easier to use our PT framework so we just use nvidia
What on earth is ratio of starts to issues supposed to suggest lol
well, i'd expect that there aren't that many users, but there are many issues
if it works 4u, that's fine though...
🤨
could also mean programmers aren't very concerned about fixing issues. it's nor far from tf's n of issues
that is... 😅 Definitely not how you want to judge that lol
to me the response to user queries is quite a good proxy for success of a library
tf has both way more stars and fewer issues than numpy, and pytorch has way more issues and fewer stars than tf
imho it's a good library though
They are both massive libraries
Commercially I would generally lean towards pytorch over TF though
Theres quite a few beginner tutorials for TF
but generally in our experience PyTorch's frameworks are just more intuitive and normally faster on things like train time and memory
Keras metrics... With pytorch ?
i meant as a quick glance of a repository, low n of stars but high of issues does not seem great, was a minor observation..
Yes, why?
why on earth would you use that xD
keras did split back out of tf
I'll let you guess..
Keras is effectively built for TF
no
It is
keras started out and is now again separate from tf
no, just read about it
In reality, it is
Keras 3 is multibackend, and runs well with Jax and Torch as well
Like, I hate to break it to you, but using Keras and PyTorch is just a weird choice
In fact, is in the road to support MLX
it's not because you have the flexibility of testing the speed of many models
in != frameworks
(also for completeness, jax and tf use the same backend)
If you're using torch, it is infinitely easier to use something like PT lightning & friends than Keras
for gpu calculations? their support matrix is very different
meh
they both use XLA as default backend
yeah
I think it is a bit odd keras chose to abstract themselves out of just TF
ultimiately it is probably going to still be solely used with TF 😅 Because that is what all the tutorials will use
probably for a while, yeah. it did start out separately from tf though so maybe they can pull it off ok
it's actually nice conceptually
i don't remember when keras was acquired
looks like google acquired keras in 2017
keras has tutorials, they run on any backend
Maybe, but they are generally, if looking at PT, competing with PyTorch + PyTorch Lightning which is already generally very well supported
and does some things like Multi-GPU and multi-machine training without a lot of the footguns
i have to admit i've never heard of pytorch lightning before
I'd describe it as the thing you end up finding when you start training large scale models or multi-gpu models
because it manages all the checkpoints and device mounting for you
They also did https://lightning.ai/torchmetrics which is seperate to lightning but a great metric lib
same here, i've yet to work with more than one a100
usually in the team "if it doesn't fit there, go back to your notebook and do better math"
I generally like it for that reason
but more so in the context of when working with teams
it just helps prevent people writing some cursed loops or unreadable blocks
i don't know about xla apart from the name tbh
If you're in a team where everyone is good at Python and best practices, then it matters less imo, other than the multi-machine stuff
i admit i think pytorch will win to some extent
i just like fchollet general thought process i think
From the work I have done, I think it has already largely won the comerical space
they work very differently, it actually makes sense to try out different backends for some problems
he is so smart lol
the way jax/xla and pytorch build the computational graph is very different, and this affects the speed and memory usage of some operations
I think a lot of people start with TF and Keras doing tutorials and learning
maybe, i did the opposite, started with pytorch
But seems to be the defactor for comercial projects where we go "okay we need a classifier or something" it is pytorch time
I am guessing because PyTorch seems to be the choice for academic related things
that is the case, though i do see (and push for) more jax lately
Flash attention comes to mind with the recent LLM stuff
are you guys trying to help the world with your models
no
no
what i work on is either to publish papers or to make money, or both
(guess I'll ask here as well)
sry to interrupt, but
If you were getting people, who are completely new to programming, into python (mainly for statistical analysis), would you recommend installing from python.org, or use something like anaconda, or even just colab?
personally I've not used anaconda much at all so idrk what's good/bad about it
do you recycle
I can to some extent understand this, because it lets you do a more high level implementation
Personally I am not a huge jax fan, but I can understand why it exists
the biggest seller for me is really just that it looks like numpy, and most people already know that as background
Depends, I think colab is a good initial place to start, but then setting up conda (not anaconda) can make the next steps easier
well, i say that, but the next thing right up there is the ease of writing the code to look just like the math on paper does
oh i think i've seen you in the math forums now
i would say conda is still my preferred way of using python, it brings most of the stuff i need bundled together. also requires no sudo perms so it's easier to use it on compute clusters with restricted perms
i think you explained a bit about the condition number to me
could be it wasnt you
that sounds like something i would mention, but i can't say i remember
then there's no way i remember
Not python related, but one of the ML frameworks I am watching is Burn https://github.com/tracel-ai/burn
The Wgpu backend is just
So fucking nice for when you dont want to deal with nvidia drivers
ig you use more Julia than Python for ML? Or maybe it's not the right question
i've been meaning to learn julia, but don't know it. i mainly use python. used to use matlab before
I've only heard bad things about matlab
i see. it seems quite a beautiful language, at least from examples
in a very real sense, it's better than numpy for many numerical problems
good ratio
I think Julia is cool, but it suffers a bit from the issue of being very niche.
At work it is always a weigh up of:
- Does it need good performance?
- No -> Use Python
- Yes -> Use Rust
It is hard to justify Julia just for the data and ML side of things and getting everyone in the team to learn another language
it jits everything by default and has faster run times. it's also industry standard in many applications. sure burns a hole through your pocket though
Oh really? Interesting
Julia as a language is great, and makes awesome use of LLVMjit
but I think it is very niche
that being said, wish we used it instead of Python at one of my old jobs
i've no knowledge about that, but i think it's precise numerically (for linear algebra operations)
would have been amazing there
The hard thing is always getting devs to learn it though
unless all you do is numeric computing or ~adjacent stuff
yeah, i guess we are on a tangent on a tangent
which ML is
but wrapping it as part of a bigger product, not so much
do you have favourite ml researcher?
True, but with ML specifically I think it can also be hard to weight up VS PyTorch/TF/Onnx
(can be a list)
there are a lot of devs that know python, and a lot that know Python and an ML framework
would be interesting to see node js hop in
Also RIP my data engine, 32 cores and 64GB of memory and it still OOMs when aggregating 😔
do you guys open problems to freelancers anywhere?
a gpu w 32 cores you mean?
No I mean CPU
you do onnx inference in cpu?
This isn't the inference server
this is the system that just logs the data being spat out by the inference cluster
the inference system itself is on a GPU machine(s)
i see
Although ngl, I am kinda excited for the AI CPU chips
they have enough FLOPS processing power to make it realistically pretty cost affective to run on them instead of GPUs
which library are you using to aggregate?
I am biased because I used to work there, but https://quickwit.io/
Oooh, working on a search engine must have been interesting
Was good, I got a bit burnt out though
Learnt a relatively important life lesson that on those sorts of projects and systems, it really makes a difference to actively use the tool so you understand why things are done in certain ways and can see it from the end user's POV
otherwise it can seem a bit like a lot of stuff not quite lining up
interesting page, thx
basically compiles computation graph to machine code and has fusion operations which i hear are useful...
but idk what it means
the best part about jax, aside from looking just like numpy (and therefore looking very similar to matlab and julia as well), is that it exposes XLA's features directly
you can choose to grad, jit, and map however you like
As a part of the OpenXLA project, XLA is built collaboratively by industry-leading ML hardware and software companies, including Alibaba, Amazon Web Services, AMD, Apple, Arm, Google, Intel, Meta, and NVIDIA.
's got some backers
I started out with TF, it was what we used in uni for neural nets (that and matlab)
i'm unsure that tensorflow uses XLA compilation by default, where is that information?
I believe it is the default when your enable the JIT going of the XLA docs
Also 👏 AMD please give me a desktop chip of this
Otherwise I am going to be building a really cursed laptop server
yeah, so it's not the default since that's false by default
you probably have more tops on a usual server processor
ah, tf doesn't have jit enabled by default?
you can always try to cause a weird error and check the traceback
I don't think so
jax spits out hundres of lines of xla calls whenever you look at it wrong
If their measurements are in anyway close to what they say they are
that chip is several times faster than what you could do with a 64 core Epyc
i somehow strongly doubt that from a 28w processor
im checking
Honestly same, but then again, maybe not since having dedicated hardware makes a significant difference
In reality is a 54W chip, which I suspect is what they are actually doing to get those numbers
they are doing Max boost frequency to get those tops
the cache sizes would also have to be compared because that neuters the real life performance
time to fish up some benchmarks
For reference a 16 AMD ryzen on zen 4 will do about 1 TOPS from my testing
although I think Openblas tends to struggle past 800 GFLOPs because of memory bandwidth, also unsure if OMP is pinning the cores
tf.config.optimizer.get_jit()
returns ''
some old numbers from intel xeons from 2022 say 419 tflops with float32, and upwards of 1500 with int8
That sounds far too high
the h100 lists 26 to 3000 tflops depending on the data type
a A10g GPU is rated at 250 TOPS
for which datatype?
int8
i need to look up the a10g cuz i'Ve never worked with it
It is a nice GPU tbh, it is what the AWS G5 instances use
I am using it here just because it was the first one I know that has the specs clearly listed on nvidia's page
Sometimes getting spec sheets with numbers is a pain 😔
I suspect AMD's numbers there are int8 operations
complete guess, but any sort of floating point op seems far too high
yeah i would also think so
let's see here
if techpowerup is to be trusted, an a10g
and an a100
I think those numbers seem relatively OK, but the FP16 seems off, at work there is a noticable perf diff between FP16 ops and FP32 ops on the A10g, so them being the same seems a bit off
that probably about lines up with what i'd expect
and the h100 with 3000 tflops on int8 seems reasonable then
Maybe that 50 TOPS number isn't the most unrealistic thing in the world then
1/5th the compute of the A10g, but 1/3rd the TDP
indeed, if they disclose the data type 😛 otherwise this will become like another "nanometer" thing
Yeah, well once they release I'm sure we'll see some more numbers
otherwise 😅 I am doing some shopping and we'll do a comparison
TOPS with 1bit ints
Man that would really suck to find out xD
Although I am sure it could be utalised for some interesting bits of compute
like data filtering
We can actually probably work out
since I suspect, AMD's TOPS will be MS' TOPS
because of the whole AI PC shit
mhm
Least scuffed MS webpage
what i did find in an arstechnica link is that current amd 7000 and 8000 chips offer "12 to 16 TOPS"
that is what the AMD website lists
I think that is using the internal GPU
because they dont have a NPU
but do all have the Radeon graphics
so might be that what they are doing via their Ryzen AI™️ drivers, is effectively proxying the operations as graphics ops
Idk why they are so focused on them being in laptops tho
Like... Guys do you not see if you're getting 50TOPS of int8 performance how big that potential is for the server and desktop space?
just for inference and numeric computing
Ignore the whole MS AI PC stuff that no one actually gives a shit about
they're already selling it that way on the server lines for 2 or 3 years though
Qualcom are making ARM based laptops with excellent battery lives, but don't put that as their biggest feature
Everyone trying to push for AI but 99% of applications dont support it
I don't think so, at least not in the same way
pretty sure yes, the xeon i mentioned from 2022 explicitly says neural something or another
Like the best we have gotten really is better SIMD support and larger caches which mean more per-core performance
They dont actually have any dedicated hardware IIRC
but they do have a page detailing why and how at least.
what Intel have done is taken their perf cores and efficiency cores from the desktop market, and put it on Xeons
with a bigger cache
Going off their 'fact sheet' it seems it is just the E-cores and P-cores
i still have to find out what exactly that accelerator is
ah, intel AMX they call it
Take advantage of Intel® Advanced Matrix Extensions (Intel® AMX) accelerator capabilities to improve the performance of deep learning workloads.
dedicated matmul hardware on the P cores
for 3 gens already
they've got some new stuff for gpus also
the trend i've seen in the last 10 years is that the server scene is like "upstream" of desktop and laptop hardware, just like this
and the neat features trickle down... with arguable salesmanship like shoving AI down your throat
Reading their performance sheet tho
it just looks weird
wait nvm
In their graph they did (old)FP32 vs (new) BF16
and then in the next did (old)int8 vs (new)int8
for gpus they got some charts https://intel.github.io/intel-extension-for-tensorflow/latest/docs/guide/performance.html
and here i think https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/performance.html
Find performance data and hardware and software configurations for 5th and 4th gen Intel® Xeon® Scalable processors, 3rd gen Intel® Xeon® processors and Intel® Data Center GPU Flex Series processors.
Is there any documentation on using arrow keys to move sliders in plotly?
I just wanna confirm this can only be done in dash not standard plotly GO
I think a few years in the future "AI on chip" on desktop will actually be quite useful and the chip manufacturers don't want to fall behind the competition on that front
i think that's still a handful of generations away. the gap between AI tasks you can run on a desktop vs realistic tasks to be run on compute clusters is still comparable to the distance between heaven and hell
unless you put server grade hardware in your desktop, which requires you to shell out several tens of thousands of moneyz
TBF this looks like it is very recent
All the AMX stuff and Xeon 6th gen stuff is 2024 from what i can see
amx was introduced in 2020, but only implemented in 2023 with xeon 4 (According to google)
but that means they were selling the AI on cpu idea for a while already
I wonder if that is because of all the chip and arch issues intel were having previously
I remember they were having some serious issues with their new arch in the chip making processes
like self-destructing chips :x
Mostly a desktop thing
But one thing is going to be interesting is looking at the price of the 6th gens
Going off their press release sheet, they make it seem like the 6th gens are the ones with actually good performance with AMX
the rest are just AVX512 VNNI
so what is the price of that chip
and how much does it weight up against the Epycs?
Looks like AMD are rapidly catching up as well https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9754s.html
128 physical cores
fucking monster of a CPU
So the top line 6th gen Xeon is 128 P- cores and 288 E cores
so it is no slouch either
Will be interesting to see what the TDP and cost of that chip is
Competition is going to be 2x 9754S @ ~$11k a piece
I bet I could heat my house water on one of those.
what are good statistics to measure the strength and direction of association between a continuous variable and multi-class categorical output?
Can someone help me? I tried to make my own neural network without ai libraries but it does not learn properly. https://pastebin.com/K8HLiZB7
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
Console:
Epoch 1/10 - Accuracy: 19.78%
Epoch 2/10 - Accuracy: 15.88%
it goes down 🙄
it updates sometimes but 50% of it has no change
huh
🥺
idk I just started learning deep learning and all I had is errors Be glad you made it this far.
It's not that hard actually
the code im using
I know
but idk where error
Keep learning !
yep
where did you learn
i already know programmin for years i just had to learn the algorithm behind ai so a few tutorials were enough
tutorial about math behind the ai
by the way can't you do something like back feeding the info like telling what the answer should be and what it could be also?
i do train ai using back proganation
ohh
Which checks the result and compares with true one
if its right good 👍 if not change the weights and bias
By delta
derivative of sigmoid or ReLU
depends on what you are using
You can compute the mutual information score. Unlike computing the correlation coefficient, I think one of the drawbacks of mutual information score is in its result's explanability.
For a more detailed analysis, you'll need a statistical test. The kind of statistical test you need depends on the kind of categorical feature you're dealing with.
Do you have have an ordinal or nominal categorical feature?
Since you've already mentioned that the categorical feature is not dichotomous, we can rule out Point Biseral Test.
I think you should just test it more with more data.
i already have over 1k data
ohh my
ohh
all of them has 100 hand written numbers
ordinal, the categories represent various "levels"
there's a 25 hour long video teaching it
Does it use libraries like pytorch & tensorflow or by scratch?
could you help me please? 🙂
pytorch but there's a video by another creator about doing it be scratch.
can you give link
ok
https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3 https://www.youtube.com/watch?v=w8yWXqWQYmU https://www.youtube.com/watch?v=cAkMcPfY_Ns
Building neural networks from scratch in Python introduction.
Neural Networks from Scratch book: https://nnfs.io
Playlist for this series: https://www.youtube.com/playlist?list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3
Python 3 basics: https://pythonprogramming.net/introduction-learn-python-3-tutorials/
Intermediate Python (w/ OOP): https://pythonpr...
Kaggle notebook with all the code: https://www.kaggle.com/wwsalmon/simple-mnist-nn-from-scratch-numpy-no-tf-keras
Blog article with more/clearer math explanation: https://www.samsonzhang.com/2020/11/24/understanding-the-math-behind-neural-networks-by-building-one-from-scratch-no-tf-keras-just-numpy.html
Don't click this: https://tinyurl.com/bde5k7d5
💚 Link to Code: https://www.patreon.com/greencode
How I Learned This: https://nnfs.io/ (by the awesome @sentdex )
I'm not an AI expert by any means, I probably have made some mistakes. So I apologise in advance :)
Also, I only used PyTorch to test the forward pass. Apart from that, everything el...
The objective of this course is to give you a holistic understanding of machine learning, covering theory, application, and inner workings of supervised, unsupervised, and deep learning algorithms.
In this series, we'll be covering linear regression, K Nearest Neighbors, Support Vector Machines (SVM), flat clustering, hierarchical clustering, a...
Oh i watched the last 2 videos already but thanks ill watch 2 other 🙂
Samson's looks good tutorial ngl
Thanks again 🙂
you can also get a copy of his code
In that case you need a non-parametric test.
You can use Kendal Tau or Spearman-Rank correlation instead of Pearson correlation.
Another alternative is, to carry out a Chi-Square test of independence to determine the association between the two variables.
You can even use ANOVA as well to test for differences in means of the continuous variable across the different categories.
The code is here https://www.kaggle.com/code/wwsalmon/simple-mnist-nn-from-scratch-numpy-no-tf-keras
Too many definitations and terms that i don't know but thanks ill research them
i prefer watching tutorials
Oops sorry I quoted wrongly. My bad
Oh
In that case you need a non-parametric test.
You can use Kendal Tau or Spearman-Rank correlation instead of Pearson correlation.
Another alternative is, to carry out a Chi-Square test of independence to determine the association between the two variables.
You can even use ANOVA as well to test for differences in means of the continuous variable across the different categories.
can anyone help me create a countdown bot that stays in a vc 24/7 and countdowns like this: 3 minutes till countdown
2 minutes til countdown
1 minute til countdown
30 seconds till countsown
10
9
8
7
6
5
4
3
2
1
Go
Please help me with this
What do you need help with?
This
can y help me
import time
def countdown(t):
while t:
mins, secs = divmod(t, 60)
timer = '{:02d}:{:02d}'.format(mins, secs)
print(timer, "minutes till countdown")
time.sleep(60)
t -= 1
print("1 minute till countdown")
time.sleep(30)
print("30 seconds till countdown")
time.sleep(20)
print("10 seconds till countdown")
for i in range(10, 0, -1):
print(i)
time.sleep(1)
print("Go!")
Start countdown for 3 minutes
countdown(3)
dms and ty
bro how is this releated to artificial intelligence
wrong channel xd
İts ok lol
@proven inlet I'm currently not free at the moment but I will take a look at it once I'm a bit free.
It's okay thanks 👍
i ran it with 1 hidden layer
Loading Dataset...
Loaded.
Training Epoch: 0
Training Epoch 0 finished with %18.366666666666667 accuracy
Training Epoch: 1
Training Epoch 1 finished with %31.266666666666666 accuracy
Training Epoch: 2
Training Epoch 2 finished with %34.766666666666666 accuracy
Training Epoch: 3
Training Epoch 3 finished with %37.4 accuracy
Training Epoch: 4
seems fine, what's the issue @proven inlet
accuracy goes down with every epoch
i don't remember if the remaining steps of backprop are correct, but if you are using MSE the derivative is correct
i'm showing you it does not with 1 hidden layer at least
is the problem using more than 1 hidden layer
if i print new - old weights
most of them are 0
i commented some stuff
and changed the delta, and learning rate, i can't test much more now, just run it and check
its not learning 😦
thats not my code
https://colab.research.google.com/drive/16GCl7IZ3ZBwc3Vp3pCSpF8tfhqcDdki_?usp=sharing
is that one? @proven inlet
does seem to increase with extra hidden layers as well.
i said i modified a few things
in any case, im unsure if it'd get very far, since the surface can have many pockets
Uh you are using ai libraries
i ran a similar model with keras to compare
the ai library is completely irrelevant, you can remove the cell.
how do you want me to test the code otherwise?
that data is the mnist dataset, a dataset of handwritten digits.
it's just like that
idk what is the issue with your data, i can't access it, but the network seems not totally wrong
ok, maybe someone else can help
try it, please, this settings:
input_size = 28*28
hidden_layers = [10,10]
output_size = 10
print("Loading Dataset...")
images, labels = load_dataset()
labels = onehot(labels, output_size)
print("Loaded.")
nn = Brain(input_size, hidden_layers, output_size)
nn.train(x_test, y_test, 10, .2)
what happens with hidden_layers=[] ?
they're being added to self.layers which is list of Layer objects
no, if you use it empty
it would not work
why not?
then it must be the input data-labels pairs the problem
https://pastebin.com/K8HLiZB7 is something wrong??
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
you can load it into a grid of images and paste the labels
with the image
just for a subset
i do that like this:
def load_dataset():
images = []
labels = []
for label in range(10):
path = os.path.join("Dataset", str(label))
for filename in os.listdir(path):
if filename.endswith(".png"):
image_path = os.path.join(path, filename)
image = Image.open(image_path).convert("L")
img_array = np.array(image).flatten() / 255
images.append(img_array)
labels.append(label)
return np.array(images), np.array(labels)
imho there isn't, just errors.reverse() is not used
to me it seems correct, but we won't know until you plot image-label pair
The returned values from here passed to train function (labels are converted to onehot) and they are being used in for loop with zip method
plot it and check
input is image, expected_output is label. atleast it should be,
no, it's not what i am saying
plot the images with the label, like the image of 0 with the label, then you know it's matching
Oh
and the image isn't wrong etc
its just in array format tho
between 1 and 0
wait lemme append their path instead of their pixels so we can check
i changed load_data function to put img path instead of pixel array and it gave me these results:
it seems to work
because its 0
0*
also, if you have [] you still have 10 neurons with activations, and corresponding weights
so with empty list, it should do something
empty list to which param
hidden layers
hiddenlayers didn't work for empty layers
this one is with empty hiddenlayer array
i've tried it without trouble, i'm just saying that it should have been a simple test to try first
could it be possible that you are only loading one number?
i can't guess if you don't plot random data points.
no i printed onehot array and it showed me all numbers
i've tested that before running ai
and how many are u using? in total
10 numbers, all of them has 100 photos
1000 images total
Alr
:incoming_envelope: :ok_hand: applied timeout to @proven inlet until <t:1722376265:f> (10 minutes) (reason: attachments spam - sent 10 attachments).
The <@&831776746206265384> have been alerted for review.
!unmute 1018096765225938985
:incoming_envelope: :ok_hand: pardoned infraction timeout for @proven inlet.
Thx
it's not shuffled @proven inlet , maybe try that
you are passing first all 0s, then all 1s, ...
Yes
is that critical problem for ai ?
yeah, you are forcing the net to a local pocket probably
and it can't get out of it
mine are shuffled ;-)
but you update the weights as you go don't you
Yep
so try, you can create a random array of 1000 numbers and pick from there
and then labels[i], image[i]
or i can just shuffle the order of images and labels
thats better i think
as long as you shuffle them in the same order yes
Yep
otherwise they don't match
Yeah ima try it rn
it's literally unlearning 😭
permutation = np.random.permutation(len(images))
images = images[permutation]
labels = labels[permutation]
chollet likes amd apparently
can you try with no hidden layers, and lr=.2
and hidden_layers=[]
yup
been sayin
alright, gotta sleep
learning rate changed the game here i guess
Goodnight 🙂 thanks for helping
Yeah i forgot about that part
rl batch size how much?
depends on the task but 16-32 is usually a good starting range
how know if need more?
if your model struggles to converge
it's a hard variable to isolate especially in rl where there might not always be a clear metric for convergence
might be safer to start high like 64 and work your way down
im at 512
ideally it should be the minimum number of samples required to get a consistent estimation of the true gradient
if your samples are balanced well then that's probably super overkill, otherwise it's probably fine you just lose a bit of training speed from going so high
thanks you
Data science is vast field.
8 am confused in Ai/ML/Ds/Da.
That what to choose first.
there's a lot of terms that aren't mutually exclusive.
machine learning is pretty much a subset of AI.
"data science" is mostly a buzzword.
DS means "using data technique to solve problem" kinda, it's large, it can be ml ai etc... But it is not necessary, DA is analytics the name speak for itself you do analytics to track business things (kpi), DL is a subset of ML which is a subset of AI
this seems a nice article https://medium.com/decisionforce/understanding-mathematics-behind-floating-point-precisions-24c7aac535e3
"Understanding Mathematics behind floating-point precisions"
wait...
results are right though...but..
yes..!article reads nicely though, ig it's just a typo
Recently Microsoft released a 1-bit LLM variant namely BitNet b1.58 which uses ternary {-1, 0, 1} for every single parameter. Surprisingly it matches the FP16 or BF16 precision transformer model.
I don't understand what this means
it seems just to be about the values that the weights can take
thr paper is here: https://arxiv.org/pdf/2402.17764
you can see that on the first matrix
(it may be a lot more complex)
paper ends with:
Recent work like Groq5 has demonstrated promising results and great potential for building specific hardware (e.g., LPUs) for LLMs. Going one step further, we envision and call for actions to design new hardware and system specifically optimized for 1-bit LLMs, given the new computation paradigm enabled in BitNet [...
so i guess that's the underlying idea / goal
Interesting, so it's simplifying the values in the matrices for simpler calculations and that (in theory) gives the same results for less compute?
From my skimming that's what I got
exactly
this happens in quantisation (converting weights from FP to Int) as well, but up to a byte (int8 numbers.)
very interesting
idk why it's called bit though, probably just being ignorant
8 bits is one byte right?
I haven't
this is a random link but...https://techcrunch.com/2024/07/30/friend-is-an-ai-companion-backed-by-founders-of-solana-perplexity-and-zfellows/
if they would be able to run a 1 bit llm in small devices, it'd be a big money thing
(that device must use wifi)
hmm I don't know if AI is good enough yet for a device like that
maybe, i do enjoy talking to llms a lot. but never buy trendy tech
just mentioned cuz of the 1 bit llm
oh yeah, if LLMs get much smaller that would be very good for this market
do you find dull talking to llms?
not really, I'm just kind of too lazy to open up a tab in a browser and use an LLM lol
and I don't have a real use case for it besides having fun
I like the idea of something like AI dungeon/novel AI more than a chatbot, they basically let you write a story together with an LLM. that can be more fun for me
nice, haven't heard ab it
that's what got me interested in AI in the first place, although I didn't take it seriously until recently
AI dungeon used to use GPT2 in the beginning because that was the most advanced model
just realising log(3) is 1.58, the number in the llm
not sure what it means
basically having -1,1,0 reduces matrix muls to additions
there are 3 options for the numbers, -1, 0, and 1, maybe it has something to do with that
yeah, the measure of the entropy/information of each bit i think, but still idk
so for the alphabet is log(26), more possibilities, more entropy (i mean, not for the alphabet, cause it's not random in english text.)
hello has anyone worked with TABLEAU DESKTOP before for data analysis ?
Hi there, as a rule of thumb it's better to ask your question directly instead of looking for someone that may answer your question https://dontasktoask.com/
Many of us quickly look at the questions in the chat and prefer quickly answering something concrete
hey, I need some help, does anyone know how can I count the number of screws if they're overlapping or touching each other in this image, I tried dilation but didn't work
i'd guess SAM could be useful there
maybe something like https://sites.google.com/view/f-vlm/home ?
might be overkill, not sure
I've been asked to do this by classical image segmentation methods and not use any AI💀💀
that's why I'm scratching my head over this problem
Do you need the exact number of screws or is there an acceptable error margin?
95% accuracy it says
are the pics always at the same distance and using same sized screws?
without AI? really?
no the dataset contains 3 types of screws/nuts in total. these are the other 2
ikr, sucks
then opencv will be best ( wait,.. is there any other options?)
A bit
i would try opening instead of dilation
Every task doesn't require AI
If you're going to only use classical computer vision (no deep nets) you'll need an entire pipeline of steps
also your class looks super cool, idk what you mean with it sucks
if you change the original image and maybe apply some filter , with identifying contour for each pixel , it is possible
The ones that are facing up could probably be found with a hough transform (circle detection)
The ones that are flat ... maybe with template matching
I'd try it iteratively, first use basic template matching to see how many "hits" you have
the nuts are super simple, that's something you can 100 % do with template matching and post processing
i know of people doing it with ROI algorithms
it can not be that hard..(not saying it's easy either.)
Does that handles rotation?
yes
Or rather, there's extensions of the basic template matching that can handle rotation and scale
Like SIFT
gotta check this out
wow, thanks @past meteor
woah this is cool, I should look into cv
I had only been trying to do this just by OTSU and contouring. Definitely learning new things here
Anyway, you'll have to keep googling etc. that's what I'd do. A lot of googling and trial and error, but you should be on the right path now 🙂
okay, I really got nerdsniped by this one
Maybe just template matching but rotating your template and changing the scale can help
So basically, doing half of SIFTs algo
at that point I might use sift as well, I'll sit to code once again after my dinner let's see how this pans out
ignore my rants, I do find it interesting actually
Altho some of those nails look like they're facing up. 😢
Yup, you'd need 2 templates
Or template + hough transform like I mentioned
Hard to say what'll work without trying it 😄
Don't dm people, thx
hi friends, I ran this for several hours https://github.com/pytorch/examples/blob/main/reinforcement_learning/reinforce.py and i am wondering why it does not look as expected. its made by pytorch so i would expect it to converge on such a simple model but it clearly hasnot converged. The red is the running average and the blue are individual episode scores
it sounds like it's been broken for a while?
that example hasn't been updated since 2022 too, and there is an issue claiming it does not works at all https://github.com/pytorch/examples/issues/1213
- edit; could be that this guy is trying to run an older version of gym or maybe gym vs Gymnasium
just guessing though, I haven't tried running it myself though
do you know of any working RL examples? i have searched for hours cant havent found one
it runs for me just the statistics it uses dont make sense
I mean, https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html should work, iirc they actually run the Notebooks and the time/output displayed should be the real time/output from their CI/CD
look at the graph on the bottom of that
its not convergent
...........oh
do you guys know how to get to formula (1) ? from the previous one
it's from this paper https://arxiv.org/abs/1412.0233
We study the connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model under the assumptions of: i) variable independence, ii) redundancy in network parametrization, and iii) uniformity. These assumptions enable us to explain t...
it's explained in the text just underneath
you'll have to take out a pen and paper and try out their construction with 2 layers to see it
it has to be derivable algebraically i assume
i did test it, but that's not my doubt
what's the question?
how to get from formula on top, to the one in the bottom, algebraically
or if it's got a name or smth
for example, sigma isn't there
matrix multiplication can be defined as a double sum over the individual elements of the vectors and matrices involved
that's about it
that's not just mmul, you've got an activation
they replace the relus with a binary matrix A
that seems an approximation
why?
sigma doesn't just output 1 or 0
it does if you use a relu
oh, it
if you take an input x, a relu outputs either x or 0
it's not s sigmoid
the paper says everything it's using
i still don't think it's obvious the multiplicatory of weights
which is why i asked if there was some derivation
they explain it explicitly in words right under the equation
we are talking ab different things
you don't just build the formula reading the paragraph, do you?
yes you do
ok, sorry im not like you, so i asked to try to understand
using the paragraph and the definition of matrix multiplication, plus the info they gave above (using only dense layers and relu)
that's why i suggested you do it explicitly for 2 layers on paper
i already did that
did you really?
yes, why?
if you shows pics of your work on paper, others will be able to help more easily
since you'll have more concise questions about the parts that seem to be off
ill ask elsewhere, thanks, you must feel good now.
guys... has anyone here worked with sentiment analysis pipeline of transformers library? for some reason my longer inputs are taking up to 10x less time than my one line inputs
i am using my gpu to process... atleast that is what i have specified in my pipeline method
I I can use some help I'm trying to learn no networks unfortunately I need to hear somebody saying it for me to truly learn because I always lose my place with reading and sorry
good thing youtube videos exist
I tried and I still don't understand it I'm sorry
Hi! I'm trying to build a recommendation system as an ML noob, is there some kind of good resource that could help?
I believe the process is the same for all types, just training data could be different
I'm given 1hr30mins to solve this. For an ML noob, is this feasible?
Eh kind of
maybe not for a complete noob tho
A basic recommendation system, you want some vectors which represent user interests
and the easiest method is to mean pool those vectors of the users lists of interests
to get an average vector that hopefully encapsulates the context of all rolled up vectors
and then KNN search over some dataset of all the vectors to find similar things
How do I convert to vectors.
I understand I need to convert the input datasets to pandas dataframes.
Ehhhhhhh it kind of depends
I also understand I could use the surprise package for the algorithm and model evaluation (I could be wrong)
easiest honestly is spit out a bunch of keywords for what ever content
and then feed it into some LLM encoder to generate the emebeddings that can be used for KNN
Have a look at the sentence transformers library for that
hey , i was having some doubt regarding tensor conversion from text in pytorch and nlp
if my tokens are words, how should i convert them to tensor?
Pretty lost. How do I use the embeddings with KNNBasic class from surprise package for example?
Also by transformers lib, do you mean hugging face's?
I think what I need to do is content-based filtering not collaborative filtering so the KNN class won't be needed I guess
@buoyant vine let me know what you think pls
I believe I could get the cosine similarity after getting the embeddings with sentence transformers
Yes
I am about to go to sleep, but what you want
Is say a user wants some recommendations relating to star wars, you would encode that text/keywords into an embedding with sentence transformers
And then do knn search for close/similar results in your database
Where the keywords / records in your database have also been encoded with the same sentence transformer model
If you want recommendation by something like, "given a user's watch history recommend some other shows the user might like" then you take the average of all the embeddings the user has watched and use that averaged embedding to do the knn search
Or if you want "user watched X video, recommend them some other similar videos" you'd take the embedding of video X (be that generated from keywords or what not) and use that for the knn
Basically your whole goal is to get a index of various embeddings for the dataset of content you want to be able to select and return
Then it is just a case of generating a query embeddings to suite what sort of application you want
Make some sense?
It's basically an ecommerce platform
To provide products for users based on purchase history and whatnot
And browser activity
Doesn't knn search compare with preferences of other users??
No?
It is completely arbitrary
All it does is calculate the distance between two points in a graph effectively
A vector is like a set of coordinates or a postcode
When you do KNN search, you effectively are asking "what are the closest other data points available to me from this point?"
Cos the KNN class from surprise lib works with some kind of Reader format
I would suggest Sentence transformers to generate the embeddings, and PyNNDescent as the index
Index how?
You give it a bunch of embeddings, generated by what ever content you have
And it will make a index that can be searched, i.e. you give it a query vector/embedding, it gives you the top K back
Quickly* rather than brute forcing checking every point
I wouldn't worry about the algo ATM.
It doesn't really matter to your use case
Just that it is faster than brute force
And provides a convenient way of going "hey here are all my embeddings, give me the closest points to X"
Start with something basic, i.e. don't worry about taking in the user history or what ever
so what I'm imagining is having to pass a user with necessary context to my function, and it returns a list of products based on my products dataset
Just take some input text to begin with, encode it then do the knn search
Once you have that it should start to make more sense
And then you can start looking at doing it off of the user history
Do I need a db for this?
Unless your dataset is huge, but I suspect that is not really a issue here
Yeah
I'm wondering if a possible result can return multiple similarities
Instead of just one
That is what the K is
so say with nndescent you can tell it "take the top 10" etc...
That's dope
Thanks. I'll run with this
Would be my first actual intro to ML if I'm successful with the engine heh
Hello
Im trying to make a text-speech ai and I don't know where to start exactly any tips?
anyone? please?
well pick up python if you don't know it yet, and start exploring the hugging face transformers libarary and look into its pipelines, if the pipelines are not suitable for your application you can fine tune it into your own nlp model using pytorch libarary
Okay thank you so much
Would be my furst actual im trying to make twxt spech to im succesfull with the engine 🏋️
tried to use sift with flann but it still isn't that accurate
Hey - quick question - can you/what is the sanest way to load safetensors into keras ?
less ?
like the longer string is faster ?
wut O.O
Yeah
Yes one was a 5 letter sentence the other was a whole paragraph, I will send a screenshot once I get back from classes
should I open up a help channel ?
hi, im in high school and have a free choice data analysis project (using powerbi)
any interesting dataset/analysis ideas?
can I use feature matching algorithms like sift/surf to find matches that are close enough but not perfect matches?
reference
Code:
!pip install -q transformers==4.31.0
from transformers import DistilBertTokenizerFast
from transformers import TFDistilBertForSequenceClassification
sentiment_model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased',num_labels=2)
Error:
tokenizer_config.json: 0%| | 0.00/48.0 [00:00<?, ?B/s]
vocab.txt: 0%| | 0.00/232k [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/466k [00:00<?, ?B/s]
config.json: 0%| | 0.00/483 [00:00<?, ?B/s]
model_training/kinesiologie_tape_new/sentiment/tf_model.h5 exists on GCP = False
Sentiment analysis stage (1/2) - est. time is 2 minutes
model.safetensors: 0%| | 0.00/268M [00:00<?, ?B/s]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-31-7a684e7281c2> in <cell line: 2>()
30 ))
31
---> 32 sentiment_model = TFDistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased',num_labels=2)
33 optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5, epsilon=1e-08)
34 losss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
2 frames
/usr/local/lib/python3.10/dist-packages/transformers/modeling_tf_utils.py in build(self, input_shape)
1129 def build(self, input_shape=None):
1130 call_context = get_call_context_function()
-> 1131 if self.built or call_context().in_call:
1132 self.built = True
1133 else:
TypeError: 'NoneType' object is not callable
Any help?
I followed their official docs:
https://huggingface.co/docs/transformers/model_doc/distilbert
Hello I am trying to make a small AI model which is for my bot used for moderation,any idea where to start?
what are your intended goals?
does it need to process images/videos or just for text?
if text only look into nltk for sentiment analysis and keywords.
Or you can use langchain
Litterally everyone is just saying TOPS but not the datatype