#data-science-and-ml
1 messages · Page 144 of 1
can someone polite me assist me in the correct area for area detection
Jus tell the details and people will help you when they see it.
will a video help?
Sure why not.
Is that yours?
Yes
How many lines of code?
It around 200-300 lines
nice man
Failed to load cause lots of lines ig
Can you help with the issue
Can you try in desktop or external browser
why it shows 2300 lines then?
It includes the output lines
Oh fair!
Code is only roughly around 200 lines and pretty munch understandable
very big data though ngl
man
you are web scraping right?
i have a doubt
that main func
shows every single data you scraped
can you make it not show it?
the notebook would look clean and better if you did that
Sure
hello all , i have a query
I have 3 datasets that stores hotels info [name,address,city,state etc]-
A[c1,c2,c3] ,B[c3,c4,c5] ,C[c5,c6,c7]
[c1,c2,c3 etc are columns]
Problem Statement
1] find the hotels that are common in atleast 2 of these datasets.
2} Find hotels that are common in all datasets
my logic was -
1st we do inner join between A&B on column c3 , we get table AB[c1,c2,c3,c4,c5]
then inner join B&C on column c3 ,we get BC[c3,c4,c5,c6,c7]
then inner join C&A on column left_on = c5 & right_on = c1 , we get CA[c5/c1,c6,c7,c2,c3]
now i decided to do outer join on all 3 pairs
this is where i am confused , as to which columns i should choose to join these pairs, so that it will include hotels that are common in all tables.
the final data should be in this format [as image][yellow blocks means the hotel is common in those 2 tables]
Hey guys I am looking for ideas which I can use for my final year project.
Would be great if you could send some ideas
Any idea which can help in day-to-day life
I'd union the datasets, then groupby -> count to find how many times they appear. If the values can occur multiple times within each dataset, I'd distinct them first or add a descriptor column to identify where they came from.
Or, to generate that table, you could groupby the hotel name, and use a count to determine if it's in A, then another count for B then another count for D
Can u dm links for ai communities
How do I fix the fbgemm.dll error
which os? which instructions are you following? (link)
hmmm probably this https://github.com/pytorch/pytorch/issues/131662
!pip uninstall torch !pip install torch --index-url https://download.pytorch.org/whl/cu121 Installing collected packages: torch Successfully installed torch-2.4.0+cu121 import torch {"name&quo...
hold on i'll try this
is it for CPU?
I would recommend against manually installing the dll to system32 btw - just avoid 2.4.0, either install an older version or a more recent version
alright thank you
looks like 2.4.1 is scheduled for September 4?
maybe just use 2.3 until then
I think the torch version was the issue because I ran my script on another device and it worked fine
how do social media apps like instagram and tiktok recognize content in images/videos..? if someone likes a lot of posts with plants they are going to get more plants and maybe even pictures about outdoor stuff and DIY videos?
how can i train a model like this on a small scale just for my portfolio?
how to get proxies? and give me best site where to buy http/https proxies
category tags much of the time
https://www.kaggle.com/code/rounakbanik/movie-recommender-systems here's something along those lines (dataset and all)
or anyone got proxy scraper tool?
!rule 5
5. Do not provide or request help on projects that may violate terms of service, or that may be deemed inappropriate, malicious, or illegal.
hello guys i made a neuron network and a tokenizer all of that however i need training data the goal of the project is to make an ai that coud solve math problems ones that are descriped in paragraphs or just a plain equation
day 5 kaggle report: busy today no progress
i started coding when i was 8 but well im still a highschool student 9th grade
ohhh well actualy that is actualy a good idea sence neuron networks arent good at making a straight answear when it comes to numbers
ok thanks i will try to do that
ok and again thank you a lot
if you want inspiration, AlphaGeometry took a similar approach and achieved very good results, maybe check that out ('creative' AI + 'logical' system that solves stuff)
though that is backed by google, so they got a lotta money
alr thx man
thanks so much will look into this
hey just one question on this the movies already have data such as taglines, genre, ect. but i have an image, hashtags, and the poster
i need the model to at least look into the image and description to kind of put that post into a "genre" that a user might enjoy
I have deployed my pickled model on Hugging Face. How can I set it up to always pull the 'latest' version in my application? I want to ensure that my app automatically uses the most recent model version, even if it changes in the future, without needing to redeploy the app each time. Thanks 😄
there is a _hub library for that I guess
is it possible within web ui
what do you mean by that? are you considering on web stuff?
like you said "app" is that "webapp"
I am trying to overfit my bounding box regression problem on only 10 datapoints (one iteration is a full epoch)
I notice that my coordinate and no_object_loss (anchors with no object) is decreasing yet my object_loss (active anchors) is increasing, is this normal?
Also I dont have any classes so there is class loss
nvm I figured out why it was because my no_obj_loss was too high so my model was just predicting zero for all the presence scores, which in turn makes the object_loss or active box loss go up
My 3D grayscale images are (32, 256, 256) (D, H, W) - skimage but monai reads them as (256, 256, 32) is this correct?
I think order depends on library generally, here not sure
for example like pytorch channel-first, tensorflow channel-last
but monai is built upon pytorch and goves this
so shouldnt it also be (32, 256, 256)?
I wanted to filter data according to their genre
filtered_genre = df1[df1['genre'].str.contains('Comedy' and 'Horror')]
filtered_genre
When I generate it, it gives me all data with sometimes only Comedy showing up and a bunch of other genres.
I only want those when the two of them appear together. Is there a workaround here?
hi
>>> 'Comedy' and 'Horror'
'Horror'
````and` doesn't the work the way you intend to here; the expression evaluates to "Horror" before the pandas even sees it
a remedy is perform 2 .contains searches and "and" them together (with the infix `&`), i.e.,
```py
df.loc[col.str.contains("Comedy") & col.str.contains("Horror")]
(for a one-pass solution, you can craft a unnecessarily complex regex but I doubt it will be better (in many metrics) than this two-pass solution)
important note: try and here instead of & and see it fail; it will be then a pandas-specific and "issue"
I didn't know this
in Python, and is a binary operator that yields its first argument if it is "falseful", otherwise it yields the second argument whatever and however it is
similarly, or is a binary operator that yields its first argument if it is "truthful", otherwise the second argument
these have the so called "short-circuiting" behaviour -- if the first argument can be returned according to their criteria, they don't even look at the second guy
!E
print(0 and undefined_name_here_but_nobody_cares(here, too))
:white_check_mark: Your 3.12 eval job has completed with return code 0.
0
Can you elaborate on this pandas-specific and issue? I'm not at my computer so I can't try right now
so the and operator will query the first arguments "truthful"ness (so it can employ its short-circuiting behaviour)
but the truthfullness of pandas objects (Serieses and DataFrames) are deemed to be "ambigous"
like the "normal" Python objects such as 0, "", [] are all falseful, and -55, "ok", (5, "f") are all truthful
but what about, e.g., pd.Series([ False ])?
a) it is truthful because it has 1 element
b) it is falseful because all it has is falseful element(s)
hence, ambiguity; so bool(...) on them (Which and and or implicitly query) will error
now the other side of the problem: we want to combine Two (or more) Boolean arrays (so called "mask"s) to achieve the disjunction/conjunction we want
like the example above had:
contains("comedy") and contains("other")
now the individual parts are all okay -- they are Boolean arrays of length N, same as the column
contains("comedy") => pd.Series([True, False, False, ...])
contains("horror") => pd.Series([False, True, False, ...])
now we want both comedy and horror
so use and? no; as mentioned, it's "forbidden"
so the next best thing was the infix & operator, which is, in pure Python, used for bitwise-and operation (and also set intersection)
and unlike and, the & operator is overridable in a custom class over the __and__ dunder method, so all is fine
Wow, cool
Thanks for the explanation!
sure thing
example to show what the rambling has been about:
In [55]: a = pd.Series([True, False, False])
In [56]: b = pd.Series([True, True, False])
In [57]: a and b
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-57-61df3bd186ad> in <module>
----> 1 a and b
~\Anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1536 def __nonzero__(self):
1537 raise ValueError(
-> 1538 f"The truth value of a {type(self).__name__} is ambiguous. "
1539 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
1540 )
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
In [58]: a & b
Out[58]:
0 True
1 False
2 False
dtype: bool
Yeah that's what I would expect based on the explanation! Nice
You are amazing. This fixed the issue for me! 🫶 I didn't know the difference between "and" and "&" was that vast. You helped explaining that to us, too. I had no idea it only recognized the first argument if it came out as true and the rest didn't matter.
EDIT: I'm a beginner and I mess up my fundamentals a lot 
yeah don't worry, in numpy and pandas world this "and" vs "&" is a source of confusion at the beginning, but after you see the difference, it will be okay from then on
similarly, the "and" issue in pure Python is also a source of unintended behaviour because Python is in many ways natural to write (like they say it's almost a pseudocode) so people rightfully expect "and" works that way, too but reality is that Python is not that natural
it's so common actually that this server has a tag (or whatever itS called) explaining this:
!or
When checking if something is equal to one thing or another, you might think that this is possible:
# Incorrect...
if favorite_fruit == 'grapefruit' or 'lemon':
print("That's a weird favorite fruit to have.")
While this makes sense in English, it may not behave the way you would expect. In Python, you should have complete instructions on both sides of the logical operator.
So, if you want to check if something is equal to one thing or another, there are two common ways:
# Like this...
if favorite_fruit == 'grapefruit' or favorite_fruit == 'lemon':
print("That's a weird favorite fruit to have.")
# ...or like this.
if favorite_fruit in ('grapefruit', 'lemon'):
print("That's a weird favorite fruit to have.")
(it's "and"s cousin "or" but the underlying misapprehension of the workings is the same)
I have a large binary file containing a flat array of records. I now want to parse them into a dataframe. Is there a better way of doing this?
record_struct = struct.Struct("<....")
def read_bin_file(filename):
with open(filename, "rb") as f:
while chunk := f.read(record_struct.size):
if len(chunk) == record_struct.size:
yield record_struct.unpack(chunk)
columns = [...]
df = pd.DataFrame(read_bin_file(filename), columns=columns)
The struct has 41 fields, and a total of 385 bytes.
Nahita already gave a great explanation for why this doesn't work
I'll add that the .str. methods usually accept regexes as well, so in this case you could also do
df1[df1['genre'].str.contains('Comedy|Horror')]
is mxnet used, I think rather pytorch is more used right?
I see one book on based on mxnet, dive into dl mxnet but also pytorch
I've never heard of mxnet
all my coworkers use pytorch, but some are starting to use JAX.
from google, it seems the last release of mxnet was 2 years ago?
and pytorch is popular in the industry I hear
so its like for learning purposes is mxnet?
I don't think it's for "learning purposes". I think it's just a platform that failed to catch on.
I mean, why not just learn directly with pytorch?
keras was designed as a way of teaching neural networks. you should either pick a platform that was designed for learning, or one that's used in "real situations".
and I've never had a single coworker or university colleague produce tensorflow code (unless it was keras).
honestly idk wth is going on with keras
like it merged with tf, but then it became its own thing and now supports jax tf and pytorch?
but that's ORing yet they need AND, and as mentioned, the regex gets unnecessarily complex when you need ANDing behaviour (with all the lookarounds and stuff)
oh whoops, right 😅
well, it'd then look something like (?=.*Comedy)(?=.*Horror) ig
actually that'd not account for stuff with Horror in front... yeah just do the 2 .contains, is prob better
Mxnet is Amazon's preferred framework. They make a lot of time series stuff so if you're going in deep there you may need mxnet
As for the overdone Pytorch vs Tensorflow discussion. Just use Torch because that's where all the work is done nowadays
TF remains better in specific situations (TF lite comes to mind) but you can use ONNX with torch
Plus how likely is it that you have those use cases?
Not really.
I'm doing timeseries stuff, and kinda deep-ish
I purposedly avoid mxnet like the plague
I use pytorch
Same and same but you can't deny that when Amazon releases sota stuff it's in mxnet
If you're willing to port it to Torch or wait until someone else does sure
Can you give examples?
What ts sota that is currently on mxnet?
Like, maybe that's true 5 years ago. But on that list, the latest mxnet only stuff are from 2021.
After finishing basics, what should I learn next?
(I want to work with ML, I will like it if someone replies with a complete roadmap, cuz I’m lost)
C:\Users\muham\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default.
This error shows up when I try to load this model
https://huggingface.co/Charangan/MedBERT/tree/main
This is my roadmap.
I linked to some roadmap at the bottom as well
https://github.com/aprbw/ArianDLPrimer
that is a warning, not an error
You can try using whichever version they used in the original paper if it is documented, or just ignore it if it works
Honestly, I use Torch as well. I was just giving a domain mxnet might be handy. It's very niche either way
I read the docs and it's similar enough to torch/TF
They are all similar. The bigger problem is just duplicating code because it's not just the models, it's also the data access you need to duplicate. I want just 1 framework per project and if you pick one it's going to be torch 9/10
isn't Mxnet completely dead tho
This was something that always surprised me that AWS support MXnet despite it being dead
yet most of their inf systems dont support onnx
Can someone share me the code for the hand detector thingy
you'll have to be more specific
https://github.com/topics/computer-vision has a bunch of repositories that might be relevant, e.g. https://github.com/CMU-Perceptual-Computing-Lab/openpose has some hand pose estimation
Sounds like there's a bunch of legacy code no one wants to refactor
day 6 kaggle report: Learned deeper about convolutions and their math
a good video i found
Don't click this: https://tinyurl.com/bde5k7d5
💚 Link to Code: https://www.patreon.com/greencode
How I Learned This: https://nnfs.io/ (by the awesome @sentdex )
I'm not an AI expert by any means, I probably have made some mistakes. So I apologise in advance :)
Also, I only used PyTorch to test the forward pass. Apart from that, everything el...
That's a good video I watched it
Could someone write me a google collab notebook which upscales both images and videos using ESRGAN and RealESRGAN. In which I could load my media and model.??? Kindly do respond.
This server isn't a place to request whole programs be written for you.
It's just a google collab that's all
Not a whole 10gb software or something
yes, a collab notebook is a program.
Refer me to a person
Who has full command over python and ESRGAN
We don't have referrals for that.
I've told you before that this server is a place for learning how to program. If you require code that does something, you need to be willing to learn how to do what it would take to create that code. You keep asking for people to do entire tasks for you, without any effort on your part.
Atleast you know so.e user or community???
I don't.
Hello do any of u guys know where I could get a dataset of text for training an ai chatbot or if not what I should train it on??
Can I ask yall for an opinion on something ?
I’ve been working on a machine learning based trading software for the longest time, with the ability to train models on different stocks with different specifications. I planned on licensing it out, with 3 different plans, with different add ons to each plan , specifically more trading threads, and training threads. Been stressing bc after I implemented progress bars for training, if I trained two models at once, one of the threads kept crashing around 76%. So I spent legit 6 hours going through every line of code associated with the training process, and anything that updated the UI. Made it more efficient , and used a queue so different operations wouldn’t clash with one another, so then it would get to around 96% for each model, then one thread would crash. But I checked the amount of windows and other applications I had running was a little insane, so I closed atleast half, and finally it trained fine all the way for both models. Because of the progress bars, it made the calls on the UI much more frequent, and in sum, a lot more computationally heavy, and I know that 99% of people are going to want to be able to track their progress and see how much time is left until a model is done training, so I’m just left at a forkroad because I planned on having up to 3 training threads available at a time. And it was working perfectly fine with three models training at a time, up until adding the progress bars. So I’m just trying to decide if I should just have up to two available, and it be an add on or what. Or should I keep it at one thread ?
There are free available LLMS available as pip packages, they aren’t incredibly advanced, as they are older and free lmao. But it would be a good foundation.
Why would you prefer a google co lab over a .exe program to do this ?
Easiest way to learn, is to actually build. It’ll serve you better than going through tutorial heII. When you get stuck on a problem, and you’re nonstop working on trying to fix it for hours on end, maybe days. When you actually fix it, you’ll never run into that same issue. I started with a simple model that predicted the outcome of a soccer game. Downloaded a dataset online, and made it. Wasn’t too complex, but it was a start. Then voice recognition for an assistant and so on from there
Learning the different model architectures is a must when it comes to deciding which to implement based off a specific goal in mind
How are the progress bars computationally expensive? A single machine with an integrated GPU can render millions of them at 120 fps.
It’s not the progress bars itself. It’s the constant call to update the UI with other logic integrated into the Queues
It’s the fact that tkinter is being used as the UI framework
Ah, don't use tkinter.
What else would u suggest
I switched from tkinter to custom tkinter
Just for the actual looks of it
Anything else, tkinter is for small little widgets. It was meant for making micro UIs for stuff that would normally be shell commands.
But compared to other frameworks it seems like Ctk is the nicest looking one
Try any other UI library: https://github.com/hoffstadt/DearPyGui
I mean I’ve used pygui in the past, it just wasn’t as aesthetically appealing. As is my logic right now, it’s functioning how it needs to. So I may just first launch it with ctk, and later on switch over to something like pygui if need be. But the only issue I was having was the multi threading, calling on updating progress bars, on top of the actual computationally heavy task of training models , so it may not be a huge significant difference switching over
I’ve got a dev tab right now that has print statements pasted onto their , that also utilize the same queue, so I know the production one will still be more efficient
The computation should all be in the training, UI should be instantaneous.
Unless you are doing something really wrong.
Make sure your threading is actually doing threading, not like Python GIL stuff.
Wym by that
In Python you don't get performance benefits from threading due to the GIL.
(CPython)
But it can’t be instantaneous because tkinter isn’t threading safe, so I gotta use a a queue to update it , and I’m using threadpoolexecutor
I'm a noob, so sorry if this is stupid but my first instinct is to ask: how often do you actually need to make those calls to update the UI?
Like, if the window is out of focus/minimized there's no reason to make those calls right?
Should still be instantaneous. You have like 3 threads all locking to push onto the Queue and 1 locking to read off of it. You do need to make sure that the UI thread can chug through all of that faster than they are entered into the Queue.
Putting aside tkinter specifically for a second, it should take about a few nanoseconds (move this up to microseconds for Python) to update all the progress bars.
(Although there are a few milliseconds delay to see the change visually on screen)
If your Queue instake is overwhelmed the Queue will keep growing until you run out of memory.
And the progress bar updates will be behind what they actually are.
Then what else would be causing it to say something something async handler deleted by wrong thread , and doing this changed it
Then honestly the issue may had been memory
Even after deleting most of the things I had open, I checked my memory usage and it was at 70%
I think it's clearly some bug, which can be expected since this is threading.
I’ve got 4 things updating the ui every 100, 200, and 1000 milliseconds, on top of the training threads
I’m just not sure what else it would be
You may also want to use multiprocessing instead of threading.
This will isolate each in its own process to insulate from crashes during training, and also gives you true threading performance benefits due to no GIL.
U think creating seperate environments for each “thread” would resolve issues
Yeah might just do that
Yeah, that's what a process is.
It’s just going to be thousands and thousands of lines of more code 💀
Heavy thread that does not share the memory space with the parent.
Multiprocessing is straight forward in Python.
I’m sorry I’m not very familiar with that
We take multithreaded code for granted, but what's needed to make it work properly? We need two Dr Steve Bagleys to illustrate this!
https://www.facebook.com/computerphile
https://twitter.com/computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: https://bit.ly/nottscomputer
Computer...
with the exception of not having shared memory by default, (ie attributes inside self object of a class wont update accross Processes) the difference is literally
import threading
threading.Thread(target=foo, args=(bar,)).start()
vs:
import multiprocessing
multiprocessing.Process(target=foo, args=(bar,)).start()
just read the documention if you need to understand it better
Hey folks, I'm trying to implement some ml model in a lang other than Python, cos of real-time processing constraints on a cloud server accessible via API (most likely WebSocket). I'm considering C or Rust, but it does seem C++ has better ml ecosystem vs C. Or do I proceed with Python? I'd love opinions.
You are going to have to be way more specific about what you are making and what the constraints are.
(and if you're sure that you won't be using Python, then it's out-of-scope for this channel.)
It's a chrome extension basically that utilises this ML model. However, due to resource constraints on browser, I'm offloading the ML processing to a cloud server. The input is audio streams.
Need to apply the model to the streams and send a refined output back to the client
what kind of model?
How many clients?
Thing is, Im not sure yet
It's an open question
Stuck between choices
But I'm concerned about perf
if you're not sure what the model is going to be, then it's too early to say that Python won't be fast enough.
this is a textbook case of premature optimization.
Could be CNN, RNN or autoencoders
Then the answer is 100% use Python, everything else is massive waste of time.
One.
Yeah, just Python.
I mean could be multiple tbh
C, for ML?
that would be insane
It's my daily work.
(And Python though, bindings)
also, models generally run on the GPU, so, if you really would want to optimize (some of the stuff), you'd write shaders in a whole different language
when people say "python is slow", what they mean is "python does not scale well for tasks that are entirely CPU-bound", which is actually rarely the case.
Yeah, I see people use C on twitter lool
Yeah, unless it actually runs on the CPU. If GPU you can't really choose your language (not without making your own, GL).
(And you can execute / send those programs to the GPU from Python without leaving Python)
I mean, there are like a couple options
at that point it just doesn't matter whether you're using Python or C++ as much
So, if you are using existing solutions (that are often written in C/C++), Python is the ultimate language for that. If you want to raw implement CPU stuff, C (or C++ (there are more)). And GPU, whatever options you have, you have a few, but not many (and you can use any CPU language to then send those programs and data to the GPU to be executed, C, Python, etc).
For GPU, I guess you mean CUDA
That is a Nvidia specific option.
I'm thinking of using Azure. So do I use their machine learning platform or what?
No idea, highly recommend against touching Azure with a 100m stick.
Also, given the models, do they require GPUs?
So where do I deploy? AWS?
Not Azure.
It will be not be cheap, and in the case of AWS, if you get DDOSed or something, surprise 6 figure bill in the morning.
Ratelimiting?
you could check out Paperspace
Throttling?
You need to configure things correctly on AWS and it's notoriously confusing.
There is an entire industry of what amounts to basically AWS frontends for this.
Its called together.ai cloud now?
Hmm
I'll find my way around Azure lol
Or paperspace
Ive heard negative things about aws bills
Even though I think most cloud platforms are the same
Accelerate AI training, power complex simulations, and render faster with NVIDIA H100 GPUs on Paperspace. Easy setup, cost-effective cloud compute.
Paperspace is DigitalOcean, which I have not really heard issue about / had issues with.
Dope
yeah, I was delighted to find out DO acquired them when I found out about them first
also
DO appears to be slowly rolling out GPU droplets as well, but that's probably gonna take a while
That would be dope. So it takes it away from serverless I guess?
The other reason is when people make something in Python and naturally end up with a lot of dependencies due to how convenient pip is. But the slowness from this happens in every language, e.g. Photoshop with its startup times. Keep the dependency count (and how many dependencies those have) in check.
All software is imperfect, and so when you build on top of other software that carries over and adds up.
I've never used photoshop. but remember when photoshop was synonymous with falsified media?
now it's "deepfake".
Yeah, I found it funny when everyone said that you can't tell if an image is real anymore and immediately thought: "Did Photoshop stop existing? Are we ignoring the obviously fake slop everyone already falls for?"
These days you can just take a random clip, give it a caption as a made up story, and put it online and everyone starts getting mad.
Don't even need to really edit or generate anything.
tangentially related: today I was walking down the street, and a young black woman said "I think about their relationship all the time--it's my roman empire"
and I had first heard that use of "[someone's] roman empire" from my mother, an old white woman.
my mom exists in a very different media and social ecosystem than non-white young people, so apparently it's getting around.
Memetics are powerful, eventually they trickle into the silent majority that is usually disconnected.
I need a little help with mathplotlib
im using mathplotlib to create a graph and send that via a discord bot, but on my mac whenever i run the command that creates the graph i get this icon continuing to bounce until i end the python instance. Is there any way to fix this?https://cdn.discordapp.com/attachments/244238578400362498/1279261141360574466/Export-1725070007980.gif?ex=66d3cc47&is=66d27ac7&hm=273f2fcbcb0bb9308b6168e3e70f6a4517b08d4695b07d89613825c31396b0c5&
my code:
y_values = [snapshot["score"] for snapshot in user_snapshots]
x_values = [datetime.strptime(snapshot["date"], "%Y-%m-%d") for snapshot in user_snapshots]
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x_values, y_values)
# Format the x-axis
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d"))
ax.xaxis.set_major_locator(mdates.DayLocator(interval=1))
# Rotate and align the tick labels so they look better
plt.gcf().autofmt_xdate()
# Add labels and title
ax.set_xlabel("Date")
ax.set_ylabel("Score")
ax.set_title(f"{user.name}'s Score Over Time")
# Adjust layout and save
plt.tight_layout()
plt.savefig("graph.png")
fig.clf()
plt.cla()
plt.clf()
plt.close('all')
plt.close(fig)
del fig
i think you can simply kill the process
with kill 'pid number here'
on the terminal
What library are you using for this
idk this is just some dude ran an alibaba model
Convenience and pip in same sentence is a new one. I usually read people deride Python's package management system.
Personally, not sure I've had much problems with Pip.
There is this list on wikipedia https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research#Dialogues
These datasets are used in machine learning (ML) research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the field of machine learning. Major advances in this field can result from advances in learning algorithms (such as deep learning), computer hardware, and, less-intuitively, the availability of high-...
Thanks
and also this list https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research#Internet
These datasets are used in machine learning (ML) research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the field of machine learning. Major advances in this field can result from advances in learning algorithms (such as deep learning), computer hardware, and, less-intuitively, the availability of high-...
No worries.
IDK what are your capabilities and your requirements, but unless this is for learning exercise. Chances are, I'm not sure why you want to train your own chatbot using publicly available data, instead of just using existing ones.
Ok thanks
IDK what is your exact implementation, and I'm not familiar with multi-processing, but you can't pool info so it only push update to UI one every second or something like that?
Not mine, but a quick google search gave me this. https://github.com/open-mmlab/mmpose
you might find more on this list https://paperswithcode.com/task/3d-hand-pose-estimation
hey guys, i'm trying to learn NLP. i'm practically a beginner. can anyone recommend me textbooks about NLP (beginner-friendly)?
this is a course you can watch: https://www.youtube.com/playlist?list=PLoROMvodv4rOSH4v6133s9LFPRHjEmbmJ
what is your goal for learning NLP? the NLP community has done a hard shift towards interactive LLMs since ChatGPT was released, but there are other research areas, too.
For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: https://stanford.io/ai
There is also this https://github.com/EleutherAI/cookbook this is geared towards llm in mind
school
thoughts on this? https://github.com/google-research/tuning_playbook
Pretty nice I guess. I won't follow it to the letter, but that's a good read.
Hi guys.
I'm playing around with the code in monai.transforms and would like to know when its not necessary to use Orientationd?
Sorry to cut your message
Well hello,
I was looking for some guidance
I completed python basics (playlist 7 hrs) I have done all the basics of PIL and tkinter by myself
Now I do not know where to go from here...
Some people said pygame some said DSA
But I personally wanted to do research on AGI and AI , what should I do.... Well the AGI thing is def optimistic but I want to know where do I start
Just a stupid question. So I've done a couple of ML tutorials before but not given much thought to it beyond following the guide "it works" and then moving on to something else. But 3blue1brown's series on neural networks popped up in my feed and I've been listening to it while doing other work. Was thinking of hunting down a tutorial to set something up like this for the hell of it later. However.
The first couple videos they did on the topic use a NN made for recognizing handwritten numbers as an example for discussion. it's 768 inputs, 2 layers in the middle with 20 neurons each, and 10 outputs, 1 for each digit.
In the second video they talk about how due to how the weights and activiations add up through the network, you can give the NN a borderline random image of noise and it will "confidently" categorize it as a 5 or a 7 or something.
So I was looking at a couple datasets around for training on this kind of problem, and as expected they're all handwritten numbers.
Would there be any benefit in adding an extra output for "Not a Number" in this kind of scenario and adding a few sets of randomized noisy images, or also handwritten letters, other languages characters, and etc. Or is it just the kind of thing that one wouldn't bother with cause it could ruin the NN's ability to do the main job of recognizing numbers
Hey guys! Currently I am working on an ai application which works on the webcam on my desktop.
I also installed Camo Studio and it connects to my phone. However, I cant show that data on my phone itself.
I saw on stackoverflow (which I would really appreciate if you could find that article - along the lines of external camera opencv) that I would need to learn how to make a smartphone app to do this. Is this true? And if so, how and where should I start?
You could certainly train a model to try and distinguish noise from real numbers, but you're increasing the complexity of the task you're trying to learn. And that in itself is a difficult task to learn because you need a massive dataset to represent all the cases that aren't a real number
This idea is used in some places though. For example that's along the lines of how you train a discriminator in a GAN. But instead of having a solid dataset of fake images, you have another model try to generate a convincing image and use the discriminator to guess which one is real. This lets you train on patterns learned from the generator instead of needing those patterns represented in your dataset.
interesting. thank you, I'll go read about those next
Hello guys,
I am working on this project which generates optimal ship routing and I am very confused which algorithm and model i should use
and where should i get the datasets from
Anyone know how many images I would need to have a 80% accuracy when detecting Roblox Limiteds in a image? There are around 200 limiteds I want to be able to differentiate
as many and as varied as you can find
say it takes 100 samples per image then you would need 20000 minimum
This is my personal list https://github.com/aprbw/ArianDLPrimer
I guess for you, you can start from the 3rd item on the **basic **list
Would there be any benefit in adding an extra output for "Not a Number" in this kind of scenario
The answer is, yes definitely.
The big question is: how?
This is an actively researched open question
adding a few sets of randomized noisy images, or also handwritten letters, other languages characters, and etc
That is certainly one valid way, and people have tried it with good-ish results. The problem is, the space "not a number" in the pixel space is very huge.
It is discuessed here in Figure 11: https://openreview.net/pdf?id=BZ5a1r-kVsf read the energy based model
The travelling salesman problem, also known as the travelling salesperson problem (TSP), asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?" It is an NP-hard problem in combinatorial optimization...
a thousand for each class is a starting guesstimate.
100 feels a bit optimistic haha, I'll go for a thousand. But yea, you can start with a hundred each and go from there.
you'll never know until you try
yeah
I was giving an optimistic estimate 🙂
like, I was wondering how would you collect even a hundred
wow that is a lot
@quaint mulch actually um, we can artificially generate a dataset of 1000's of image now that I think about it, but would that not mean we can do away with just 1 of each kind?
I think im not familiar about roblox enough to comment more
ok, right, I have realized that I am not either 😳
haha
They are probably not complaining about the convenience.
What else could it be?
Myriad of options / lack of standardisation ?
Cos I think even the js ecosystem is not absolved
Which imo still boils down to convenience?
Thanks for the reply...
I have this confusion whether I should learn and struggle by myself or get myself a book like Hands on ml by Aurélien Géron (most famous)
I checked out the list by you and now I know the basic hierarchy of learning AIML , but I got overwhelmed by the amount of resources available
Should I get this book or learn online by the Andrew course
For a long time there was not a standard virtual environment system, there was stuff like Conda floating around, versioning issues / breaking binary compatibility, all the distutils, setuptools, etc, we now have new stuff for that and it's still ongoing.
A lot of complaints will also be out of date. Like how everyone still thinks PHP is as slow and bad as it used to be 10+ years ago (last time they used it).
We still have people asking about Conda here due to all the out of date information. Just use regular Python, Conda existed due to lack of various things that now exist.
(A lot of people tried to get into Python ML during peak Conda, and the conflicts between it and regular Python and various other Conda issues was one factor in this perception of Python package management)
I can imagine heh.
I'm upskilling in ML/AI and I'm considering learning linear algebra first since everything is based on that. Is this a waste of time? The end goal is to enable me work on my pet project which does some real-time audio processing/transformations.
I mean before touching libraries
Like Pytorch and Keras
Linear algebra is the least waste of time you could be doing relative to many things.
If you want to learn ML, don't learn it in terms of libraries.
I see...
I think thats how most of programming works or at least has been for me
Concepts-based learning
Pretty much everything interesting that involves number crunching on a computer involves linear algebra.
(And it's why the hardware is designed for it these days)
Altho, if you do want to learn some data skills while learning concepts, https://www.kaggle.com/learn is a good place to start
And little calculus?
To clarify a bit on this, learn some libraries to apply some concepts, but don't just learn how to apply a library.
Learning specific libraries is not deep knowledge, but it's needed to do anything in the end.
Linear algebra, calculus (including multivariate), probability, statistics.
I have to pick a starting point between keras, tensorflow and pytorch I think. Which would you suggest?
Pytorch.
For this, do I need ML-specific focus or just a study of any random textbook should work
Either. #data-science-and-ml message
There is infinite math to learn so your knowledge will be somewhat ML focused either way.
if you remember, here's how the trained model performs on real data- not 100% accurate, but would do
Anyone here have experience with qlora
Is there anyone that can point me towards some resources around NLP? I am trying to work on something that has my original data and needs to get data from an api in order of similarity to the original data - I can sortby relevancy in the api call but i am combining responses from different api's so i need them ordered by similarity once i have combined them. However, once i have done this i want to be able to check whether the most similar responses support my original data or go against it? I was looking at NLI with huggingface but tbh im not really sure which direction i need to go with this that makes the most sense
What does it mean for one text to "support your data"?
Can you give an example of two texts that "support" one another?
yea, this may be a bad example but lets say I have a claim that a certain team won the world cup in a certain year for example, if the data i get back surrounding the event is similar, i.e it found results that match words in the claim based on similarity such as world cup, specific year etc, but maybe a teamname wasnt correct or maybe the score is different, then i would say it doesnt support the original claim. Or maybe it matches the claim to a certain degree which i can calculate so that would be the likelihood that the claim happened as a % or a clarifying message that expresses that the teamname was wrong
hey, where can I find some live stock data
So transformers are basically the best for audio processing/transformations??
or CNN-based U-net??
One inefficiency here would be dealing with long-range deps
hmm there is unsupervised regression and unn(unsupervised nearest neighbors)
Isn't "unn" what you get in item-based collaborative filtering?
Hello, looking for someone with knowledge in transformers and "how people talk about it". Question: https://discord.com/channels/267624335836053506/1279739938652426283 would love to talk a bit about it, I need to look at the weights of the transformer part but I'm not sure if they mean the whole thing, encoder decoder etc etc
What are some of the best-written Python codes you know in term of quality?
I actuallu watch Corey Schafer's videos
But he doesnt have any on NumPy, and I am not finding other channels helpful. Can someone give me a good YT channel's NumPY vid THAT CAN ACTUALLY TEACH IT TO MT TINY BRAIN?
yt idk
if you don't mind reading, you can't go wrong with numpy's user guide
Start here: https://numpy.org/doc/stable/user/basics.html
Pay attention to the part about broadcasting
The answer is: doesn't really matter, there are many roads to Rome.
Just pick one and try to finish it. If you get stuck, try something else.
My preferences is:
- Just know barely enough about math, so you know WHEN to pick it up properly. You can do this just by watching 3blue1brown or something.
- Go head on into your pet project.
- Eventually you get stuck, but by then, you know which basic math you were missing, and then you can learn those. Usually, maths are wayy easier to learn once you have a concrete understanding of stuff the math is about.
Access real-time stock data with Marketstacks free or premium plans. Our market API supports volume requests, commercial use, and more. Try it now!
transformers are the best for literally everything if you have sufficient compute and data.
But I think it is better if you think about it in terms of geometry
https://arxiv.org/pdf/2104.13478
People are going very fast and loose with nomenclature. That's just what it is since the field is new. Sometimes people also come from differnet background: math, electrical and elctronics engineering, signal processsing, software development, computer science, physics, etc2. So they will bring their own nomenclature. You just have to live with it.
That's why every paper usually have a section called, "definition" or "problem definition", where these concepts are laid out exactly using the language of math. These tends to vary even between papers with the same authors.
So, just call it whatever, or use chatgpt when talking about it normally. When you need to go precise, read the math definitions, read the code, and write the math definition.
In the realm of deep learning, https://pypi.org/project/pytorch-lightning/
Maybe not exactly the best, but a good place to start to learn best practices.
Hmm... I had picked up this book: https://mml-book.github.io/book/mml-book.pdf
Plan was to read it as I work on the project iteratively
Apparently it's not time for model development since I have to write the frontend (chrome extension).
I also know of 3blue1brown lol...
Personally, I just want to learn enough linear algebra to not be confused with its application
can anyone suggest an algorithm that draws straight lines based on this picture, giving the starting and ending points of the lines?
I found an algorithm that draws lines, but the lines are not straight enough and do not give start and end points
day 7 kaggle report: created handwritten digits image generation model
image generation? so you tell it to produce 7, and it gives you a bunch of semi-randomly generated images of a hand-written 7?
as a starter to ai, i haven't made it that detailed, I plan to
it generates just any random number from noise
I'll look into modifying the neural network to improve it cuz some of them are just scribble
I had saved bunch of outputs every epoch and turned it to a gif
slow version I made for self satisfaction of seeing my neural network improve :)
did you looked at autoencoders and vae?
I mean did you use vae for this?
if yes now you can do it with gan
I mean I see in some book about generation of digits with vae and then gan
probably it was deep learning for computer vision from packt, and maybe it was in pytorch if I recalled corectly
@gilded belfry hmm maybe you need more iterations?
to get more straight lines or not 😅
i skipped to gan, I have no idea about vae lol
ah ok maybe better, more realistic results
gan is more suitable for these?
I have to figure out if i need to add more layers or modify existing one
but for learning I think its ok
I think the results will be better if i pass the labels along with noise as input
also you can see gan vs vae there is comparison on internet between two
hmm so dlss is gan 🤔
https://vitalflux.com/gan-vs-vae-differences-similarities-examples/
better showed here I think
how does vae generate images?
they say gans use noise but nothing about vae
vae uses randomness
probability distributions unlike in autoencoders
ok your question is what vae use noise or different thing
do they use noise to generate random images or they have a different way?
they dont use noise
sampling latent space
oh sorry there is noise distribution
I meant there is no noise, but there is noise distribution
more specifically prior and noise distributions
and there is kl-divergence between 2 distributions
so distance between 2 distributions
any references to understand it better?
(context: generating handwritten digits)
to pass on the labels what would be most effective?
converting digits to a decimal between 0-1? converting to a (i think it's called one hot encoded) array - ex: [0,0,1,...,0] for 2? or passing them as they are?
I think passing them as they are is the best choice to keep clear distinction for each digit and since noises are values between 0-1
If you represent the label with integers (0, 1, 2, ...) then that means the distance between 0 and 1 is smaller than the distance between 1 and 7 f.e.. If you think 1 should be represented as closer to 0 than 7, then keep it as integer (or normalize to values between 0 adn 1)
But what would make more sense is to give each possible value their own dimension
I.e. 1-hot encoding. And then each digit is equally distanced from all other digits
And you can use other tricks like " softmax" to turn this vector into a list of probabilities for each digit.
This is the most logical approach for classification
that is just 100 requests per month free
I'll go with that one
um it's about image generation not classification
is probability density estimation basically the same thing as energy based models? Just one is normalized and the other is not?
well yea, I'm not sure what you need and what's your budget?
free budget/budget free
vi is necessary because the posterior cant be directly estimated bc of the normalizer
and ebms work with the unnormalized density to avoid that
Hello today Jupyter notebook decided to be a bit rude and I cant import tensorflow:
ImportError Traceback (most recent call last)
Cell In[1], line 1
----> 1 import tensorflow as tf
File ~\anaconda3\lib\site-packages\tensorflow_init_.py:40
38 from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow # pylint: disable=unused-import
39 from tensorflow.python.tools import module_util as _module_util
---> 40 from tensorflow.python.util.lazy_loader import KerasLazyLoader as _KerasLazyLoader
42 # Make sure code inside the TensorFlow codebase can use tf2.enabled() at import.
43 _os.environ["TF2_BEHAVIOR"] = "1"
ImportError: cannot import name 'KerasLazyLoader' from 'tensorflow.python.util.lazy_loader' (C:\Users\David\anaconda3\lib\site-packages\tensorflow\python\util\lazy_loader.py)
can anyone help
this is noise
I am training a bounding box regression model, and the model is really good at predicting where the objects are but the height and widths are always super small, does anyone know why this could be?
Hello
What is this
a cheating program for some multiplayer shooter
not really a cheat just trying to train a object detector using the training sim
I see, so vi, ebm, and mcmc are like 3 different ways to do the same thing?
Why isn't there a channel for Data engineering?
im planning to create a model that is fed with data of transfer rate which is a continous value which resulting i need regression model. then it pass that info to classification model
is it practical and is achievable?
so that then the classification model will predict the anomaly of the networks traffic
remember that to train a supervised model you would need of many labelled examples for each kind of output you want, including both the features you'll feed to the model and its expected output
anomaly detection is a bit of a separate topic, there are some techniques that let you try and detect outliers without having to label the data
what else concept i hv to learn ? like to increase model's accuracy?
idk what else to learn for that
im currently trying kernal tricks
but look GAN is supervised so its classification in image generation
in sense they use supervised training
but GAN are unsupervised
hmm ok not sure about it
generally generative models are unsupervised
so maybe in the sense of unsupervised classification ( so clustering)
please correct me I think I near to explain, dont know how to formulate it
maybe I could say suppose if GAN is supervised so its classification
hmm so thinking sorta of like classification is regression, but no other way around
hmm so gan is rather like regression, because generate each pixel value
but hmm its not starting from empty thing, but noise so its like replace noise pixel value with generated pixel value?
is there sth to see this on pixel level (low level)?
for example like image segmentation is classification of each pixel
now I'm confused little about gan
the GAN algorithm is unsupervised, as you start without any labels for your data. This composes two pieces, a supervised task learned by the discriminator where the generator makes a fake image and the discriminator must pick the correct choice between that and a real image (classification). Then there is an unsupervised task learned by the generator, which calculates its loss from the accuracy of the discriminator.
so its semi supervised?
the algorithm itself is just unsupervised
the classification task learned there is supervised, but it's set up without providing any labels
so as a researcher the model is unsupervised, because it doesn't require us to map an input to a correct output
Hello, anyone interested in https://www.kaggle.com/competitions/playground-series-s4e9 and want a newbie study buddy?
Playground Series - Season 4, Episode 9
Midnight: convert to one-hot encoding, someone do it too in some book which I skim
conditional GANs are a different thing
Hi,
I have a basic bert model from hf's transformer:
(Pdb) model
BertForSequenceClassification(
(bert): BertModel(
(embeddings): BertEmbeddings(
(word_embeddings): Embedding(20, 32, padding_idx=0)
(position_embeddings): Embedding(128, 32)
(token_type_embeddings): Embedding(2, 32)
(LayerNorm): LayerNorm((32,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0, inplace=False)
)
(encoder): BertEncoder(
(layer): ModuleList(
(0): BertLayer(
(attention): BertAttention(
(self): BertSelfAttention(
(query): Linear(in_features=32, out_features=32, bias=True)
(key): Linear(in_features=32, out_features=32, bias=True)
(value): Linear(in_features=32, out_features=32, bias=True)
(dropout): Dropout(p=0, inplace=False)
)
(output): BertSelfOutput(
(dense): Linear(in_features=32, out_features=32, bias=True)
(LayerNorm): LayerNorm((32,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0, inplace=False)
)
)
(intermediate): BertIntermediate(
(dense): Linear(in_features=32, out_features=32, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): BertOutput(
(dense): Linear(in_features=32, out_features=32, bias=True)
(LayerNorm): LayerNorm((32,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0, inplace=False)
)
)
)
)
(pooler): BertPooler(
(dense): Linear(in_features=32, out_features=32, bias=True)
(activation): Tanh()
)
)
(dropout): Dropout(p=0, inplace=False)
(classifier): Linear(in_features=32, out_features=2, bias=True)
)
(Pdb) model.bert.embeddings.word_embeddings
Embedding(20, 32, padding_idx=0)
You can also see at the end I have this Embedding object, does anyone knwo the docs to this? I'd like to print the actual values.
Ah I think it's just nn.Embedding
It is not practical
Just learn about timeseries anomaly detection
From https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html I have this example
>>> # an Embedding module containing 10 tensors of size 3
>>> embedding = nn.Embedding(10, 3)
>>> # a batch of 2 samples of 4 indices each
>>> input = torch.LongTensor([[1, 2, 4, 5], [4, 3, 2, 9]])
>>> embedding(input)
tensor([[[-0.0251, -1.6902, 0.7172],
[-0.6431, 0.0748, 0.6969],
[ 1.4970, 1.3448, -0.9685],
[-0.3677, -2.7265, -0.1685]],
[[ 1.4970, 1.3448, -0.9685],
[ 0.4362, -0.4004, 0.9400],
[-0.6431, 0.0748, 0.6969],
[ 0.9124, -2.3616, 1.1151]]])
So that's nice and all but how can I check the values stored in embedding after assigning the input to it?
you wanna see the trainable parameters of the embedding layer?
you didn't assign input to the embeddings, you just passed it through the layer
embedding.weight should show you the weights
I mean yh thats much more practical for the project but i mean like what else method to enhance model's accuracy other than kernel trick, attention,etc
https://github.com/Karanchaudhary350/DiagnoSys
This error appears when I do 'flask run'
ValueError: Kernel shape must have the same length as input, but received kernel of shape (3, 3, 3, 32) and input of shape (None, None, 224, 224, 3).
I thought of nn.Embedding as a data structure holding my embeddings, not creating them. But that makes more sense now. I also see it inherits from nn-module.
thanks - god that makes more sense now
you can check these libaries https://github.com/unit8co/darts https://github.com/thuml/Time-Series-Library
can i have 2 input layers in my neural network? if so how do i do it in TensorFlow?
I don't get how the flow of layers would be for multiple input layers
the multiple input layers merge into one after functions or processing?
there are a ton of different ways to do this so it depends
there are ensemble learning algorithms like stacking, bagging, and boosting which focus on combining the outputs of multiple models into one to get a good output, then there are ways to like directly combine the output features like concatenation, where you add the outputs of separate layers together before feeding them into a single layer together
https://medium.com/@AnasBrital98/googlenet-cnn-architecture-explained-inception-v1-225ae02513fd here's an example of concatenation being used in the google inception model
thanks
@wooden sail Do you consider evaluating against the same test set multiple times multiple testing?
Meaning, if you keep going on and on and on you may end up with a final result / architecture that is good by chance but not because it really is intrinsically
sounds like cross validation to me?
It's different
Say you split 60 20 20. This means you use 80% to find your architecture
You may end up with something that fits 20 well more or less on accident
You have your remaining 20 to figure it out. Let's say there's a sizeable gap, what then? Option 1 is leaving it as is but then your final results will be off what it probably should be in reality and option 2 is going again on your test set with slightly different hyperparams but then you risk reporting numbers that are good on accident (which already happened, that's why you have this problem in the first place)
I don't know if you get my drift @agile cobalt
so pretty much over-fitting the architecture, and how to avoid that?
I think this is a fundamental problem in ML and one that most data people don't know/care about + you can cheat on it and go 100 % unnoticed. https://papers.nips.cc/paper_files/paper/2015/hash/bad5f33780c42f2588878a9d07405083-Abstract.html
The abstract explains it pretty good
Yup
I don't think you can prevent this except by having a really really large dataset
i would expect there to be ways of correcting for it by making the tests more strict/ have higher thresholds
i have to admit i'm not super familiar with this, i can't really comment
It's a really nasty problem
The error on my test set is 2-3x that on the validation
Mostly because the tuning algorithm itself overfit
Nice paper... bookmarked , this holdout reuse is a particular problem in finance where overfitting is (for the most part) an inevitability.
Ideally, collect a new test. Don't reuse the test.
Hey guys
The model YOLO, does it come with pretrained weights?
and we just use it like that?
Dont we need to build like how you build a CNN?
You can view it as yourself being part of the training algorithm, you are in the feedback loop. Information is leaking through you (from test) into the model by your decisions. It's still testing, but eventually not, depending on how often you do it, and other hard to measure factors.
But doing it more than once makes it not pure in the scientific sense.
The scientific method and the statistics built around it are explicitly designed to avoid stuff like this.
(Which is what diminishes its practical use a lot (it's a high bar))
(It's also why a lot of papers out there claiming to be scientific, or to be using statistics properly are not)
This is a cool paper, the algorithms given kind of give you (the human) early stopping on queries to prevent overfitting (to the degree that you want chosen by parameters).
anyone here have experience building a RAG?
Don't ask if someone has experience with something, just ask the question directly
Trying to build a RAG in Python for LMStudio using HF transformers/FAISS. Having this issue where the retriever is exiting script at 0% without throwing an exception - brand new to RAGs and not sure where to start troubleshooting (or honestly if I'm even approaching it correctly)
Could post a code snippet, which would probably be better for python-help, but really just want resources or a lead into what could possibly be going wrong with a retriever, i.e. bad indexing?
if your program is unexpectedly ending without an error message, that's a general debugging issue, not a RAG issue. Go ahead and show the code.
Your analysis is spot on and how I look at it
For a serious study I don't like touching the test set more than once (or I'd have 2)
It's a handicap because you undoubtedly benchmark yourself against data scientists that don't know or don't care and you'll have worse results. That being said I don't think the job is getting good results, it's about getting high fidelity estimates of performance
Yeah, "grad student descent" is a real issue.
AY I LOVE THIS ITS LEARNING TO WRITE OMG MY FIRST MODEL THAT'S ACTUALLY WRITING SO NICELY
this is after 3 epochs, there's more left
little baby's first try so cute
I thought it'll take a while to learn and made it 500 epochs 💀
can I stop it? It's already good enough from 10th epoch
or will the progress be lost if I cancel? Im using kaggle notebook
its free gpu I'll let it be
I want to become a Data Analyst or get an entry-level job. I have barely any knowledge of SQL and some knowledge of Excel. I took an SQL course in college two years ago and Finished the Statistics Course a few weeks ago. Statistics was good with a calculator and barely any Excel homework. I took Introduction to Data and Business Analysis. I haven't taken data visualization or machine learning. I'm just asking if I watch the Boot Camp of Data Analyst Video or what other way to get started strong to find a Data Analyst Job since I'm planning to graduate in May of 2025
SAP is a major player in cloud development. Companies like Microsoft, Google and IBM are great as well. They all have entry level jobs.
Portfolio, python with pandas, knowing about etl (at least a bit), maybe use gcp to see how it works you have free credits to learn at first
If you are okay doing VBA and sas you can practise them, but you better aim python/R for your own mental sanity
What's etl? Also: R and sanity in the same sentence? 😮
Extract transform load (data engineering concept)
R is fine, SAS is in the top 3 worst syntax I've ever used, but it's still in some data analyst positions
ngl I'm kinda digging the ggplot syntax
haven't created that complicated of graphs tho
ayy grats
is this a GAN? or is it mnist classification?
you might want to consider saving checkpoints every epoch or so (or every N batches if your epochs were too large) if you have enough disk space for it
https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html
my first attempt at fully making an a.i.
https://paste.pythondiscord.com/TTLA
It's easier to make plots with ggplot than any python library tbh
Going from ggplot2 to matplotlib feels like going back to the stone age
seaborn is simple af
I'm taking this Spring. Should I spend time learning on Data Anaylst boot camp on Youtube.
idk about bootcamps tbh I learned from master degree + projects
My python is horrible. DONE the course 3 years ago.
For practicing and understanding it.
key concepts, pandas, numpy, matplotlib/seabron is fine
This is what I found on Youtube
for python maybe juste grab some knowledge then do some notebooks on kaggle
It would be more efficient to grab key concepts and hop on projects
I was also going to suggest Kaggle, I think it's hard to learn from giant video courses like that
Also if your Python is rusty you might want to read A Byte of Python, it's a free book available online and it should be a good refresher
like for sql you setup your own example postgres db and you fool around
I would say that being able to do just a bit on Tableau and or Power Bi is plently enough especially before the first internship as albsolute beginner
The most internships are in the Summer and I plan on graduating in the Spring of 2025
I'm asking too much for you. Could you give me a list of things to learn from scratch? Like starting nothing. @spare forum
are you solid in stats ?
Just the basic @spare forum just took the course last semester
spent the majority of my time learning the calculator
stats : basic descriptive + tests (t test anova chi2 correlation) SQL : modelisation entity-relationship model, obv joins, views common table expressions, being good at queries python : pandas numpy, matplotlib/seaborn, excel basics, Pbi basics Tableau basics, added value R, SAS
On stats, I should probably use Excel more since I'm very familiar with t test correlation on T-84 Calculator.
the tool is a thing but you beed to know how to explain the choice of such tool in a scenario
is there a python api for GPU compute without having the user to write cuda code/c code? speed is not paramount but id prefer if theres some python lib that can work with amd gpus too.
im in the process of building an autograd engine
What is the best portfolio? And how long will take to get basic down?
there is no absolute best response, just find something that can use more than one tech at once
udemy
or just if you have a clear idea of project go for it and check the docs ( plus gpt)
need help with this
I don't have a clear project. I'll into Udemy while I'm working.
thank you
I'll let you know if I have questions or issues along the way
it's a gan
i don't get the idea of checkpoints, why would you need to save the lower versions of the model? Wouldn't you choose the best one?
I'm using TensorFlow, can I see the outputs layer by layer? I want to see how it draws them
- don't lose all progress if something goes wrong and you lose progress before you can save the final model
- run experiments on earlier versions of the model to compare with the final model
you can manually pass the inputs/outputs through each layer to see them
this I think? https://github.com/tensorflow/tensorflow/issues/33478
or just call each layer manually yourself
oh nice I can pass them through each layer
Also from what I understand, "the version that was trained the longest" isn't necessarily the best version, you might overtrain it
This is true.
How do you pass input to the layers? I tried this:```py
def visualize_steps(number = np.random.randint(10)):
noise = tf.random.normal([1, 100])
labels = tf.keras.utils.to_categorical([[number]], 10)
outputs = [[labels, noise]]
for layer in model.layers:
output = layer(outputs[-1])
outputs.append(output)
and recieved this error:nim
output = layer(*outputs[-1])
^^^^^^^^^^^^^^^^^^^
raise TypeError('too many positional arguments') from None
TypeError: too many positional arguments
I might have missed but I didn't find anything from the docs:
https://www.tensorflow.org/api_docs/python/tf/keras/Layer
is your code using layer(*outputs[-1]) or layer(outputs[-1])?
also number = np.random.randint(10) will suffer from the same problem as mutable default arguments, it'll call once when creating the function then re-use the same result every single time
it's a problem we can't individually pass the input to the layers. I have to make a separate model for each layer and track the inputs/outputs
I found the reason why my models misbehaved
It's a very exotic case of data leakage
Which is in and of itself worth publishing because I didn't do anything different than papers I read
I have a BERT model, tiny one. If you look at the architecture you see that before the attentio nit branched into query, key, values, skip. How can I find the node the branching happens?
@serene scaffold Hello these are some terminologies that were discussed today in the intro and I was all blank..
AutoML (Automated Machine Learning) Toolbox
Neural Architecture Search
Algorithm Selection
Metalearning
Automated Reinforcement Learning
Learning Curves
Recommender Systems
Fairness
Green AML
Hyperparameter Optimization
CASH (Combined Algorithm Selection and Hyperparameter Optimization)```
Oh, this is cool. I've heard that this "gap" is a fundamental problem, it's nice that there are people doing something to make it better
It's worse than that. If I'd publish it as is and get the same reviewers as the papers I read it's look amazing
But it's not, (part of) the evaluation was done incorrectly
Then it sounds like that's your most important finding
The first part I found copied from papers and for the second I rolled my own and it's on the handcrafted holdout set I found that the results were BS
So I think many numbers out there are inflated, because if you don't do due diligence and the reviewer doesn't catch you, you're good lol
It's an easy fix for me, I just need a tuning set that is similar to my holdout set, then the exotic leakage that I encountered will be punished fairly
It comes with pretrained weight. https://github.com/ultralytics/ultralytics
Is this a question?
Since this is an intro, I suppose that you will get familiar with them at the end?
How can I find the node the branching happens?
I'm not sure what is the question?
Does it make sense for a image dataset (e.g. the handwritten digits 16x16) to face multicollinearity?
I don't see how that makes sense for images as multicollinearity wouldn't really apply? You wouldn't remove columns of image data even if they were correlated?
fun idea, if you play chess on chesscom, you can get all the data of your games and create a postgres db out of it and do data analysis on it
Yes. Peolpe do use PCA whitening on images
Interesting. I think that's outside the scope of this course though & applying linear regression.
good day guys, please anyone know how to or a good site where I can use a pre trained vision transformers for my dataset
is there any more context to the question? at a glance, i would rather assume that the images are vectorized and the question is asking about the vector space spanned by the set of images. that vector space is often of a lower dimension than the number of images that generate it
This course isn't that complex. The goal was to run simple linear regression and KNN on the handwritten image dataset (16x16). In one of the solutions, it states (image). Additionally, some students solutions generated a correlation matrix as part of their EDA.
I didn't create one since I didn't think it's necessary for image data. I wanted to confirm if it was indeed needed or students were running through the motions as they tend to do in these homework reports.
what kind of regression?
Worst of all is the prof created a list of "things to include, if applicable" but students tend to ignore the "if applicable" and add everything.
it really does make sense here. if you wanna do linear regression on the images, you can generate a basis for the image space and just weigh the basis vectors
It's a linear regression for classification.
Classifying* between the numbers 2 and 7
still makes sense though
the covariance approach is probably the simplest thing to try first
it's the same as asking if your images live in a low dimensional vector space and are linearly separable
the "features" are the spanning set of the vector space, and this is lower dimensional than the number of images
You can check the official ViT repo: https://github.com/google-research/vision_transformer?tab=readme-ov-file#available-vit-models
Contribute to google-research/vision_transformer development by creating an account on GitHub.
you vectorize the images, make a correlation matrix, then use an EVD (which here is the same as an SVD, and if you center the correlation to get a covariance matrix, it's also the same as PCA). if you don't use a basis but rather an overcomplete spanning set, the regression coefficients are not unique and you'll have a really bad time with classification (not that linear classification is great without using other methods along with it)
thank you
It make sense. The pixels in the bottom right are quite correlated with each other.
Hmm, good to know in the future. Although homework doesn't cover PCA nor does this course cover anything that complex. Code wise, it was only a few lines.
it's difficult to tell without knowing what any of the functions do there
It's equivalent to:
lr = LinearRegression()
lr.fit(X_train, y_train)
In python, but provides additional auto generated metrics like R & R squared, F statistics, etc that the sklearn doesn't (automatically)
yeah but "linear regression" isn't enough to know what is happening there
what's the spanning set being used?
you can do linear regression in infinitely many ways
what are x_train and y_train here
data is ziptrain, V1 ~ means y is the V1 column of this dataset if I remember correctly
X is all other columns data[:, 1:], Y is the first column. data[:, 0]
yeah then it's exactly as i described before
you're making y out of a linear combination of all the other images
images of a similar type are usually in a low dimensional space, so if you collect several images of the same kind, they're linearly dependent (what you call multicollinearity)
you can get around that by finding a basis, e.g. through evd or svd or pca
it's the same way the old "eigenfaces" algorithm works
That's right, should have brought this up.
Hey, does anyone know how to convert a tmx file into an obstacle matrix?
for a pathfinding
Yea, gotcha. I'll check for next time.
okay ?
It seemed that I need to know those...
RangeIndex: 2249698 entries, 0 to 2249697
Data columns (total 6 columns):
# Column Dtype
--- ------ -----
0 PRODUCT_ID int64
1 TITLE object
2 BULLET_POINTS object
3 DESCRIPTION object
4 PRODUCT_TYPE_ID int64
5 PRODUCT_LENGTH float64
dtypes: float64(1), int64(2), object(3)
memory usage: 103.0+ MB```
and it has this level of missing values
TITLE 13
BULLET_POINTS 837366
DESCRIPTION 1157382
PRODUCT_TYPE_ID 0
PRODUCT_LENGTH 0
dtype: int64```
so dropping is bad idea , because bullet_points and descriptions are our features which I will put in model
so what can I do?
none of which i can read
Droping is not cool but there is no other column so...
Can't do that much magic here it seems
so I added like 'missing' in each null columns
for training, can't do ohter thing than drop here I think
but then my model performance will go down?
what does this error says?
or do I need to show full traceback?
you can't invent data where there is missing value, especially when there is not a ton of column to do inputing technique, where would you see qn other option here ?
okay so only dropping the whole row is best options here?
Why is a dictionary I set using multiprocessor manager, not being acknowledged within the child process ? It was working fine when processor.join() was being used, but I can’t use that because then I can’t run it multiple times at once ( it’s a training process for ML models), which is precisely why I’m using multiprocessing, to be able to simultaneously run it. What’s the work around? Even if it’s being passed directly to the child, it can’t access it properly.
with inplace= True or without?
idk you either to df = df.dop ... or df.drop(inplace=True) but personnally I have no idea if there is an ultimate best
General rule: don't use inplace=True ever
yeah, I just read about that
Why
I’ll into and also ask GPT for course in Udemy that you mention.
hey are you available?
good day everyone, please is there anyone who has worked with vision transformer online?
It's better to just ask your question. Saves a lot of time.
omg billyyy
sure sure 😭
(but I don't do vision / CV stuff)
omg, no please don't ask me not to
? I'm saying: Ask your question. If someone can answer it, they will.
so my school gave us a task to use vision transformer to train a model on a stroke dataset. but then its actually so large and heavy for my PC. the training time is mind blowing also. i tried using the pretrained but its giving me a zip error.
its there anyway to get this done without killing my laptop and with maybe a little less time?
Hello. A note about terminology: you do not train a dataset. you train a model on a dataset.
any time you refer to an error ("zip error", in this case), always show the whole error message starting from Traceback, and the code that caused it.
thanks about the terminology
alright, when I get back to my pc I will share the whole error
thanks
Why is a dictionary I set using multiprocessor manager, not being acknowledged within the child process ? It was working fine when processor.join() was being used, but I can’t use that because then I can’t run it multiple times at once ( it’s a training process for ML models), which is precisely why I’m using multiprocessing, to be able to simultaneously run it. What’s the work around? Even if it’s being passed directly to the child, it can’t access it properly.
Hey Im pretty new to this server and I am trying to find some software to help us. Im representing someone who owns a couple clubs (foods/drinks) and the owner wanted some sort of AI that is able to detect products being sold and be graphed (for exp. USER1 - 2 Waffles Sold) be able to detect whos selling them and how many, be able to see what kind of product it is and tell us about it from CCTV footage.
!recruiting
Could you tell me whats the right channel about that stuff? Someone told me I should ask here.
there's no recruiting allowed at all anywhere in the server
Oh so sorry, I didn't know that. Ill delete.
Is a image recognition as my porotofolio will land a intern
What should I make next to land as a intern
Land as a intern to what may i ask?
Ml engineer or data scientist
Image recognition is really useful, we are currently in-need of some software that is able to look at hours of cctv footage and give us information about sold products
Im sure, you could easily land as a intern if you are really good as a data scientist
if this is just the start of the course, then just skimming wikipedia entires of those stuff seems like a good start. And then you will know those in better details at the end.
how about not doing anything? Just proceed as normal?
No, I'm already taken.
I have not worked in ViT either.
maybe start with some numbers?
how big is the dataset? how many images? what's the resolution of each images?
How long is the training? How long per epoch? What's the batch size?
which vit model?
guys I made the layer by layer visualization
Woah that's cool
anything is higly valuable for an internship, you are not expected to do a senior project or something (especially if the internship is free lol)
Wait there's a paid one
thank you 
the dense layer has like 12k data points so u just see those old tv static screen like stuff 🤣
I did paid internship, and the interviewer just asked simple questions if I knew supervised unsupervised ml etc... the more you do the better, but to answer yes this project is very valuable for an internship
from this visualization I am thinking, I wouldn't need the dense layer with 12k data points do I? cuz they get reduced to the 7x7 image from where the drawing starts.
I should directly start from creating the image with no dense layer and just convolutions is what I inferred from this.
So u pay to get a work
I was paid
But I'm guessing ure allowed to put the project on ur resume right
depending on the country internship can be paid
Ohh even better then
I see did u search it online the internship what web u use to get it
ofc, any project, what i'm saying is what they expect from you is not to be expert, so the idea of project you said (image recognition) is valuable for sure
Ohh I see thanks for the info
you do projects , and add a "project" section in resume with the github link and description
I see,by resume u mean like a CV image or website based or just GitHub cuz all I have is github
you should make your resume on canvas or word
Ohh I see I can do word thanks for the help
then the model will be halucinated!
that paragraph?
3.2 Dataset Availability
The SEN1-2 dataset is shared under the open access license CC-BY and available for download at a persistent link provided by the library of the Technical University of Munich: ... This paper must be cited when the dataset is used for research purposes
ohh thanks
this is litterally 50 gb
pretty sure that's still on the small side for datasets
ohh, so how much you have handled, like highest?
I haven't really worked with any large datasets myself, but some can reach petabyte-scale (specially things crawled from the web)
@spare forum were you able to see what GPT recommended me on to look into at Udemy?
Yes but tbh pick something they will all be fine
I think that link is not working!
Here’s the other option
I can manually browse that topic that you mentioned
Check the site there are evaluations and syllabus the top courses are always good
I’ll into it. I should probably start with stats
For stats, should I into with excel or not?
@spare forum
Has anybody here been using marimo https://marimo.io/?
What does Excel have to do with this?
You should do it with pen and paper lol
Idk, I had a course in uni excel assignment in statistics
For solving problems that might require a calculator
arent batch normalization layers the same as the previous layers here?
do they not play a role here or?
In the lecture he said "This is a automated machine learning course so if you dont know the basics of machine learning its better to do that first"
is it too late to drop and get a refund?
and if you can't get a refund, remember the sunk cost fallacy
though I suppose in this case the cost may not have sunk all the way to the bottom, so if you pick up the basics, you may be able to continue with it 🤷♂️
So in a transformer, the queries are "questions" the model asks about surrounding context words, and the keys are the answers "information/content" about a word. But how are these values determined? What mathematics/linear transformation causes queries to become "questions" and keys to become "the information the queries are looking for". I understand the after-effect of dot-product similarity - I'm asking how the values get determined to begin with (why the attention mechanism works)?
I understand that once you have questions, and answers, represented in vectors, if you get dot-product similarity you get attention, but I'm asking how the questions and answers come to be in the first place. Through "weight-updates" is not a good enough answer lol
what exactly are you looking for?
that's really pretty much it
not really
the transformation for Q, K and V are the same mathematically, but they represent different things
I'm asking how
you're thinking too hard about it tbh
you start with vectors in 3 different vector spaces, but you want to be able to compare their similarity
Q, K, and V are matrices that project vectors in those original vector spaces to the same low dimensional vector space where they can now be compared against each other with dot products
since the original vector spaces are generally distinct, the way you project them into the "comparison space" is different
as to how exactly to do it, well that's learned by showing examples of which vectors should be similar
can you expand on this
that is how all neural networks learn
that is also the black box part that is not well understood
it's just a nasty non-convex optimization problem with no general guarantees
The original idea behind Q, K, V was indeed to act like a search system, but what they actually do is hard to tell. The current intuition seems to point in the direction that they basically sufficiently mix things (the inputs). And that the attention part that does this can be replaced with other things that also sufficiently mix things (but are more efficient, also maybe not even learned).
(Then the feed forward layers extract from this giant mixed soup)
(Bordering on something like a reservoir computer (lottery ticket hypothesis comes into play))
you have measurable data y that lives in some space Y. you have good reason to believe that the process that generates this data has parameters x in some space X. the process that generates the data is some function f, so that y = f(x), but you don't know f. so you replace it with a neural network N that has its own parameters w. so now we want y = N(w, x), but the w are unknown. if you have several examples of x and y, you can learn w. for the particular case of attention, a self attention block has w = (Q, K, V). now you present tons of labelled text to learn Q, K, and V
(measurable in that you measure it from somewhere, not in the measure theory sense btw)
and as squiggle says, and someone else has mentioned in this channel over the past few months, you can replace Q, K, and V with other kinds of transformations or force them to have special structure and they still work, because the whole idea is not really well motivated
The main thing that also makes it work is that it parallelizes well (throwing more compute at the problem).
okay, just forget the concept of a query and a key, just in terms of vector spaces, how does Q become Q and K become K (through math)
both start with the same word embeddings yeah? I'm asking how you go from word embedding * WQ -> "queries"
i'm not sure i understand your question
like mathematically/in terms of embeddings what does query mean here?
"query" is not a mathematical term
no but it is math + vector embeddings
what you tell the network to do is a weighted dot product
say you wanna compare a vector v to a vector u, but they're different lengths. we can multiply each by a properly sized rectangular matrix so that they now have the same lengths and we can take the dot product. let's call those Q and K, for example, so that we can do (Qu)^T (Kv)
and now we come along with labelled data and explicitly say "for all of these examples, (Qu)^T (Kv) should be large" (you can write this as an optimization problem)
you can now differentiate the cost function of your optimization problem w.r.t. Q and K and do some gradient based optimization so that they yield good results for all of the examples of u and v you chose
yeah I get that
so which part is troubling you?
I'll tell you but I need to do it through like 3 questions
okay word embeddings * linear query weights. What does each row/column in the query weight matrix represent?
this will lead to my other questions^
nothing in the real world
so the query weights represent nothing?
parameters in neural networks don't usually represent anything useful
the name "query" is also made up
Initially, random projections.
But after training, too complex to assign human meaning too.
yeah mathematically the whole matrix is a projection onto a lower dimensional vector space
okay and why are we projecting to a lower space for queries specifically?
and you find a "nice" projection that works well for the data you showed
it's just even more condensed vector representations (the lower space)?
usually for 2 reasons
one is simplicity. the other is that we often expect that useful data is useful precisely because it is "structured"
and structure often comes in the form of "low dimensionality"
if you explicitly account for this in your model, you end up with fewer parameters and also force the network to find out this structure... hopefully, at least
you've seen bottlenecks in other architectures like u-nets with CNNs, for example. natural images are "low dimensional", for a proper definition of "low dimensional"
Also with too many dimensions combinatorics explode, and that's not something we can compute even with super computers in a reasonable amount of time.
okay, so what does each row + column represent in the context of queries?
nothing
dude that's not possible
you can think of the rows as projection vectors if you like
this is the case for all deep learning, which is why it is referred to as "black box" and interpretable is an open, ongoing research field
it's a heuristic that works very well but is not well understood
well for starters each row is each word as a query right? and each column is one "feature" of the query space?
Imagine you are sitting infront of a box with a bunch of dials, and you have a light on top of the box. Your goal is to change the dials such that the light is as bright as possible. So you start moving the dials and they seem to have various non-linear effects on the brightness. But with a ton of trial and error you end up with a bright light. What do to the values of the dials represent?
the numbers are weights of linear transformations applied to the input vectors (words or sentence vectors, for example)
their positions
That is their values. So the values represent the values.
It's just a mechanical detail of how the values are stored.
okay let me ask another question. I understand that if you take the cosine similarity of 2 vectors if they equal to 1 they are the same vector. So a big thing in deep learning is the distributional hypothesis, which is words that appear in similar contexts mean the similar things
yeah?
so you use word2vec to find the embeddings of each word, and once they're in a shared vector space the distance between them is their semantic similarity. yeah?
give me 1 min
parallel, not equal
but semantically speaking would be equal
if you have enough labelled examples, this is what you hope your embedding achieves, sure
you have to learn the embedding just like Q, K, and V are learned
right, but there is a logic behind that. Word in similar contexts = similar meanings
that is your motivation
and that is what you hope the embedding achieves
it's not guaranteed that that is what it does
you train it in hopes that you achieve that. that motivates how you perform the embedding and which data you show
so queries = what the word is "looking for", keys = "words that can be offered". But what is the logic behind getting "looking for" vectors and "words that can be offered"? how does that happen? I know word2vec is fundamentally black box but the distributional hypothesis makes sense. What is the "logic" of Q, K and V? how do Q, K and V vectors come to be?
it's motivated somewhat by how database queries work
DL, no guarantees. It's "feels like a good idea" + converted into math + "it seems to work for a bunch of data, maybe this is due to my idea" (but could also be due to other things that are side effects of your idea, you are going off of correlation here).
but i have to highlight again that this is motivation only
just as squiggle says. you get inspiration from something and then try to make an architecture that will promote the behavior you want
you have no guarantees it will work
you also have no guarantees that even if it works, it does so because of your choices
if you make any network deep enough, it'll work regardless of architecture
To get down to what actually mattered, you have to do what many are doing now, which is replacing parts like the attention part with something else, and if it still works that was not it (or what they both have in common).
well just in terms of what we think how Q, K and V comes to be. Attention makes sense due to cosine similarity, but before you get Q * K = Attention, you need to first get queries and then keys
it doesn't need to be a perfect explanation, just maybe what the current guess is as to how it works
there isn't a good one
they had motivation to try that architecture
We introduce meta-prompting, an effective scaffolding technique designed to enhance the functionality of language models (LMs). This approach transforms a single LM into a multi-faceted conductor, adept at managing and integrating multiple independent LM queries. By employing high-level instructions, meta-prompting guides the LM to break down co...
whether that is why it worked is a different question, currently under research
It does all the combinations, which does not say a lot, but makes sense in how it can take everything into account when it backprops (information sent everywhere).
But it also seems pretty clear that it's doing way more work than needed, so it's kind of a brute force approach.
can you expand on this real quicl
like how databases store key-value pairs and you query the database to fish out the values you want, so the query needs to specify which keys to look for
that's really about it
you're looking for a deeper meaning where there isn't one tbh
I just don't know how someone can say queries (what the model is asking) * keys (what the vectors are offering) = attention, when no one knows how queries or keys even came to be
Because attention existed prior, and this is a different version of that.
I think the problem is I'm actually looking for a super simple explanation (like how word2vec works via "words that are in similar contexts = the same), but you guys know a lot about DL so you're actually trying to offer a deeper meaning that isn't discovered/found yet
I think I'm going to find what I'm looking for through the computation graph of the forward pass
keys closely matching a query are the ones whose values you should return when you poll a database
and since matrix multiplication can be written in terms of dot products, which when properly scaled are equivalent to cosine similarity, you can write that as query*key = value (weight)
there's no reason why it should be cosine similarity that is used
Because computers (in their current form, and especially GPUs) can do that fast.
well doesn't that part make sense because you can compare vector direction
right, other than matmul being fast
sure, but if you want accuracy there are many other choices of similarity you might consider
and more importantly, this motivation does not mean this is what the network is doing either
DL works at all because we conveniently had an entire evolution of GPUs (for games / movies / etc) that became more general over time and happen to work well for this.
oh I see, you're saying what we planned vs what the model does may be unaligned
(But this also restricts what people do in ML (cosine similarity))
but the motivation was kinda like that. get some ideas from how dbs work, and notice that matmul is fast, can encompass a large number of tokens at the same time, and also represents "similarity" in some sense
(Soon GPUs are getting built-in better support for Fourier transforms it seems (new convolution based shaders for VFX want them), which will be fun)
so you're saying the transformer is doing X (what we think it's doing/what the authors designed it to do) but it could really be doing Y
and we just don't know
there are several papers looking at it, replacing different parts of the model, enforcing structure on Q, K, V, etc
At large enough scale they start doing all kinds of things that we have found in them, including other types of networks and even mimicking some biology / evolved stuff.
For example, if you remove the position encoding it will learn to do it itself.
(And it ends up mimicking grid cells to do this (found in biology for positioning systems))
are you guys aware of any OG papers that explain why deep learning works (or could work)
example universal function approximation theorem
I heard the OG papers talk about why it works more than the current published stuff
there are several papers presenting different forms of universal approximation
you can also check out papers exploring the different optimizers, since they discuss nonconvex optimization and stochastic approximation
these again work as motivation
most of the general results in universal approximation are not constructive
2e.g. they say stuff like "as the number of layers goes to infinity", or "there is a number N of layers so that if you have a network with n > N layers, the error falls below epsilon", but they don't say what N is nor how it can be found
so the idea is "more layer good"
there are recent papers for specific architectures explaining under which conditions the training error goes to 0 though
iirc for unrolled ADMM and LISTA with relu activations. there might be others
for universal approx you can just follow the references in wikipedia
here's one with LISTA and ADMM https://ieeexplore.ieee.org/abstract/document/9746860
working on it
word you did a masters in AI?
and how good should someone's math be for phd? is an undergrad in math enough?
So, I've got a random data set with seemingly random columns (id <string>, f_0 to f_9 <random? numbers, although there seems to be a certain distribution>
I've got this data from an interviewer telling me to perform.. something?
basically i havent been given a specific task, just what the result should look like - IDs and 1s or 0s attached to them
any tips on how I should start?
you can link the homework if you want. visual example is better
example data
example target (yes, the IDs do match)
does this help
the task itself just says "figure it out"
distribution in one pic
and this is as far as correlation coefficients go
It's easier to see with seaborn heatmap
hello, does anyone know how to plot a heatmap with matplotlib.plotly
do you mean matplotlib.pyplot?
plotly is an entirely different package
iirc the default way is just using imshow though - pretty sure they have an example for it in the docs and/or gallery
sorry, meant matplotlib.pyplot
im trying to learn understand and practice using libraries like pandas, seaborn, numpy and so an. And I can create a heatmap in seaborn but i was asked for the same map but with matplotlib for the 'car_crashes'
Yup that means a PCA can be interesting (also sorry but you can do the argument annot=True with your heatmap if you want but np )
Generally I do like sns.heatmap(round(df.corr(),2), annot=True)
alr did, factorized some none numerical stats and added the target column
I'd assume the target is achieved by combining columns that have lower corr coeff
also worth mentioning, this is probably not the way to check correlation between categorical and numerical data
Hey guys, can someone help me? I’m having trouble importing a TICKER with yfinance. I want to hide the error if the ticker doesn’t exist in the Yahoo Finance database, but currently, the terminal shows this error:
catch the error
https://seaborn.pydata.org/generated/seaborn.pairplot.html how about a pair plot?
or now that you have a target, just plot each against the target
yes.
https://github.com/aprbw/ArianDLPrimer/tree/master
read everything under basic concepts and useful concepts.
These are some "theories" on how DL works, coz, as you figure it out, no body knows it yet....
you can use my list.
You can skim the stuff under basic concepts. If you can follow those, I think you got a good minimum to start a PhD.
I don’t think it’s a good idea to reference any heavy-hitting DL stuff after just doing MLP
There is also another list if you want a second opinion https://kidger.site/thoughts/just-know-stuff/
what do you mean?
I think there’s a lot of angles to study deep learning but from your list you went from KA Math -> NYU deep learning course
Well... there is the andrew ng course in the middle hahaha
and yes, there is a "speed run" feel to my list, you are correct
I think for PhD you need a shit ton of math
That's true. My list was not specifically for PhD.
But I think you missed what I meant.
What I meant was for you to the content of "Basic Concepts" https://github.com/aprbw/ArianDLPrimer/tree/master?tab=readme-ov-file#basic-concepts
and not "Basics (from literal zero)" https://github.com/aprbw/ArianDLPrimer/tree/master?tab=readme-ov-file#basics-from-literal-zero
So, if you can follow along the math in https://arxiv.org/abs/2104.13478 and https://arxiv.org/pdf/2304.12210 then I think you be confident to start a PhD
The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Indeed, many high-dimensional learning tasks previously thought to be beyond reach -- such as computer vision, playing Go, or protein folding -- are in fact feasible with appropriate computational scale. Remarkably,...
so that last matrix highlighted cell represents a weighted sum of all the words' features along a single dimension * the attention score of word 1 x word 1. I'm just wondering what getting a weighted sum of all the values * the attention score does?
I skimmed this video, seems like it answers your qeustions: https://www.youtube.com/watch?v=eMlx5fFNoYc
Demystifying attention, the key mechanism inside transformers and LLMs.
Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support
Special thanks to these supporters: https://www.3blue1brown.com/lessons/attention#thanks
An equally valuable form of support is to simply share the videos.
Demystifying self...
the answer is the highlighted cell is just the final "meaning" of 1 feature of the word yeah? you take the word (attention row 1) * all words' values (col 1), get R1C1 in the ouput matrix. The weighted sum = the attention scores of that word with all other words * the contribution of the meaning of all other words (value vectors col 1), results in how much 1 feature from all words contribute both attention + semantically to word 1?
@dusky pagoda hey
heres the code i have for the a.i. i mentioned yesterday
how and where to learn ai
and data science
and machine learning and deep learning
youtube
how old are you?
yes
neither my master nor phd are in ai, i do sigproc. most of the math and methods carries over transparently. an undergrad in math should be more than enough to put you in a great position for all of this stuff. except for a handful of specific things, that's possibly more math than you'd see in engineering phds
Hey, about to download 9,000,000 images to train a model. Where should I store them for easy access? I’m worried that if I put them on S3 I’ll have to redownload it all if I want to train something.
how big is the file?
it might be best to preprocess the images to the scale of the a.i. input so that way you dont have to process them for training and also to save storage size
I'm having hard time calculating the shapes of conv2d, conv2d transpose layers with padding, strides and kernel size