#data-science-and-ml

1 messages · Page 144 of 1

abstract wasp
#

yeah that makes sense

junior iris
#

can someone polite me assist me in the correct area for area detection

pine escarp
junior iris
#

will a video help?

pine escarp
#

Sure why not.

hearty crow
hearty crow
#

Yes

pine escarp
hearty crow
#

It around 200-300 lines

pine escarp
#

Failed to load cause lots of lines ig

hearty crow
#

Can you help with the issue

hearty crow
pine escarp
#

and its loading

pine escarp
hearty crow
#

It includes the output lines

pine escarp
#

Oh fair!

hearty crow
#

Code is only roughly around 200 lines and pretty munch understandable

pine escarp
#

very big data though ngl

pine escarp
#

you are web scraping right?

hearty crow
#

Yes

#

Don’t run the cell instead juz copy the output then proceed

pine escarp
#

that main func

#

shows every single data you scraped

#

can you make it not show it?

#

the notebook would look clean and better if you did that

hearty crow
#

Sure

uncut grove
#

hello all , i have a query
I have 3 datasets that stores hotels info [name,address,city,state etc]-
A[c1,c2,c3] ,B[c3,c4,c5] ,C[c5,c6,c7]
[c1,c2,c3 etc are columns]
Problem Statement
1] find the hotels that are common in atleast 2 of these datasets.
2} Find hotels that are common in all datasets

my logic was -
1st we do inner join between A&B on column c3 , we get table AB[c1,c2,c3,c4,c5]
then inner join B&C on column c3 ,we get BC[c3,c4,c5,c6,c7]
then inner join C&A on column left_on = c5 & right_on = c1 , we get CA[c5/c1,c6,c7,c2,c3]

now i decided to do outer join on all 3 pairs
this is where i am confused , as to which columns i should choose to join these pairs, so that it will include hotels that are common in all tables.

#

the final data should be in this format [as image][yellow blocks means the hotel is common in those 2 tables]

river cape
#

Hey guys I am looking for ideas which I can use for my final year project.

#

Would be great if you could send some ideas

#

Any idea which can help in day-to-day life

left tartan
#

Or, to generate that table, you could groupby the hotel name, and use a count to determine if it's in A, then another count for B then another count for D

errant bison
#

Can u dm links for ai communities

rigid timber
#

How do I fix the fbgemm.dll error

agile cobalt
#

which os? which instructions are you following? (link)

agile cobalt
rigid timber
#

is it for CPU?

agile cobalt
#

I would recommend against manually installing the dll to system32 btw - just avoid 2.4.0, either install an older version or a more recent version

rigid timber
#

alright thank you

agile cobalt
#

looks like 2.4.1 is scheduled for September 4?

maybe just use 2.3 until then

rigid timber
#

I think the torch version was the issue because I ran my script on another device and it worked fine

azure sequoia
#

how do social media apps like instagram and tiktok recognize content in images/videos..? if someone likes a lot of posts with plants they are going to get more plants and maybe even pictures about outdoor stuff and DIY videos?

how can i train a model like this on a small scale just for my portfolio?

wet mist
#

how to get proxies? and give me best site where to buy http/https proxies

small wedge
wet mist
agile cobalt
arctic wedgeBOT
#

5. Do not provide or request help on projects that may violate terms of service, or that may be deemed inappropriate, malicious, or illegal.

vast thorn
#

hello guys i made a neuron network and a tokenizer all of that however i need training data the goal of the project is to make an ai that coud solve math problems ones that are descriped in paragraphs or just a plain equation

past bramble
#

day 5 kaggle report: busy today no progress

vast thorn
#

i started coding when i was 8 but well im still a highschool student 9th grade

#

ohhh well actualy that is actualy a good idea sence neuron networks arent good at making a straight answear when it comes to numbers

#

ok thanks i will try to do that

#

ok and again thank you a lot

jaunty helm
azure sequoia
azure sequoia
#

i need the model to at least look into the image and description to kind of put that post into a "genre" that a user might enjoy

toxic mortar
#

I have deployed my pickled model on Hugging Face. How can I set it up to always pull the 'latest' version in my application? I want to ensure that my app automatically uses the most recent model version, even if it changes in the future, without needing to redeploy the app each time. Thanks 😄

unkempt apex
toxic mortar
unkempt apex
#

what do you mean by that? are you considering on web stuff?

#

like you said "app" is that "webapp"

faint quail
#

I am trying to overfit my bounding box regression problem on only 10 datapoints (one iteration is a full epoch)

I notice that my coordinate and no_object_loss (anchors with no object) is decreasing yet my object_loss (active anchors) is increasing, is this normal?
Also I dont have any classes so there is class loss

faint quail
#

nvm I figured out why it was because my no_obj_loss was too high so my model was just predicting zero for all the presence scores, which in turn makes the object_loss or active box loss go up

versed gulch
#

My 3D grayscale images are (32, 256, 256) (D, H, W) - skimage but monai reads them as (256, 256, 32) is this correct?

verbal oar
#

I think order depends on library generally, here not sure

#

for example like pytorch channel-first, tensorflow channel-last

versed gulch
#

but monai is built upon pytorch and goves this

#

so shouldnt it also be (32, 256, 256)?

ember sluice
#

I wanted to filter data according to their genre

filtered_genre = df1[df1['genre'].str.contains('Comedy' and 'Horror')]
filtered_genre

When I generate it, it gives me all data with sometimes only Comedy showing up and a bunch of other genres.
I only want those when the two of them appear together. Is there a workaround here?

untold bloom
#

(for a one-pass solution, you can craft a unnecessarily complex regex but I doubt it will be better (in many metrics) than this two-pass solution)

#

important note: try and here instead of & and see it fail; it will be then a pandas-specific and "issue"

untold bloom
#

in Python, and is a binary operator that yields its first argument if it is "falseful", otherwise it yields the second argument whatever and however it is

#

similarly, or is a binary operator that yields its first argument if it is "truthful", otherwise the second argument

#

these have the so called "short-circuiting" behaviour -- if the first argument can be returned according to their criteria, they don't even look at the second guy

#

!E

print(0 and undefined_name_here_but_nobody_cares(here, too))
arctic wedgeBOT
serene grail
untold bloom
#

so the and operator will query the first arguments "truthful"ness (so it can employ its short-circuiting behaviour)

#

but the truthfullness of pandas objects (Serieses and DataFrames) are deemed to be "ambigous"

#

like the "normal" Python objects such as 0, "", [] are all falseful, and -55, "ok", (5, "f") are all truthful

#

but what about, e.g., pd.Series([ False ])?

#

a) it is truthful because it has 1 element
b) it is falseful because all it has is falseful element(s)

#

hence, ambiguity; so bool(...) on them (Which and and or implicitly query) will error

#

now the other side of the problem: we want to combine Two (or more) Boolean arrays (so called "mask"s) to achieve the disjunction/conjunction we want

#

like the example above had:

#

contains("comedy") and contains("other")

#

now the individual parts are all okay -- they are Boolean arrays of length N, same as the column

#

contains("comedy") => pd.Series([True, False, False, ...])
contains("horror") => pd.Series([False, True, False, ...])

#

now we want both comedy and horror

#

so use and? no; as mentioned, it's "forbidden"

#

so the next best thing was the infix & operator, which is, in pure Python, used for bitwise-and operation (and also set intersection)

#

and unlike and, the & operator is overridable in a custom class over the __and__ dunder method, so all is fine

serene grail
#

Wow, cool
Thanks for the explanation!

untold bloom
#

sure thing

#

example to show what the rambling has been about:

In [55]: a = pd.Series([True, False, False])

In [56]: b = pd.Series([True, True, False])

In [57]: a and b
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-57-61df3bd186ad> in <module>
----> 1 a and b

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1536     def __nonzero__(self):
   1537         raise ValueError(
-> 1538             f"The truth value of a {type(self).__name__} is ambiguous. "
   1539             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1540         )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [58]: a & b
Out[58]:
0     True
1    False
2    False
dtype: bool
serene grail
#

Yeah that's what I would expect based on the explanation! Nice

ember sluice
untold bloom
#

yeah don't worry, in numpy and pandas world this "and" vs "&" is a source of confusion at the beginning, but after you see the difference, it will be okay from then on

similarly, the "and" issue in pure Python is also a source of unintended behaviour because Python is in many ways natural to write (like they say it's almost a pseudocode) so people rightfully expect "and" works that way, too but reality is that Python is not that natural

#

it's so common actually that this server has a tag (or whatever itS called) explaining this:

#

!or

arctic wedgeBOT
#
The or-gotcha

When checking if something is equal to one thing or another, you might think that this is possible:

# Incorrect...
if favorite_fruit == 'grapefruit' or 'lemon':
    print("That's a weird favorite fruit to have.")

While this makes sense in English, it may not behave the way you would expect. In Python, you should have complete instructions on both sides of the logical operator.

So, if you want to check if something is equal to one thing or another, there are two common ways:

# Like this...
if favorite_fruit == 'grapefruit' or favorite_fruit == 'lemon':
    print("That's a weird favorite fruit to have.")

# ...or like this.
if favorite_fruit in ('grapefruit', 'lemon'):
    print("That's a weird favorite fruit to have.")
untold bloom
#

(it's "and"s cousin "or" but the underlying misapprehension of the workings is the same)

heavy crow
#

I have a large binary file containing a flat array of records. I now want to parse them into a dataframe. Is there a better way of doing this?

record_struct = struct.Struct("<....")

def read_bin_file(filename):
    with open(filename, "rb") as f:
        while chunk := f.read(record_struct.size):
            if len(chunk) == record_struct.size:
                yield record_struct.unpack(chunk)

columns = [...]
df = pd.DataFrame(read_bin_file(filename), columns=columns)
#

The struct has 41 fields, and a total of 385 bytes.

jaunty helm
verbal oar
#

is mxnet used, I think rather pytorch is more used right?

#

I see one book on based on mxnet, dive into dl mxnet but also pytorch

serene scaffold
jaunty helm
verbal oar
#

so its like for learning purposes is mxnet?

serene scaffold
#

I don't think it's for "learning purposes". I think it's just a platform that failed to catch on.

jaunty helm
serene scaffold
#

keras was designed as a way of teaching neural networks. you should either pick a platform that was designed for learning, or one that's used in "real situations".

#

and I've never had a single coworker or university colleague produce tensorflow code (unless it was keras).

jaunty helm
#

honestly idk wth is going on with keras
like it merged with tf, but then it became its own thing and now supports jax tf and pytorch?

untold bloom
jaunty helm
#

well, it'd then look something like (?=.*Comedy)(?=.*Horror) ig
actually that'd not account for stuff with Horror in front... yeah just do the 2 .contains, is prob better

past meteor
#

Mxnet is Amazon's preferred framework. They make a lot of time series stuff so if you're going in deep there you may need mxnet

#

As for the overdone Pytorch vs Tensorflow discussion. Just use Torch because that's where all the work is done nowadays

#

TF remains better in specific situations (TF lite comes to mind) but you can use ONNX with torch

#

Plus how likely is it that you have those use cases?

fiery bane
#

I use pytorch

past meteor
#

Same and same but you can't deny that when Amazon releases sota stuff it's in mxnet

#

If you're willing to port it to Torch or wait until someone else does sure

fiery bane
#

Can you give examples?
What ts sota that is currently on mxnet?

past meteor
#

Honestly, a good example is DeepAR when it got released

#

It's ported now

fiery bane
#

Like, maybe that's true 5 years ago. But on that list, the latest mxnet only stuff are from 2021.

pearl parrot
#

After finishing basics, what should I learn next?
(I want to work with ML, I will like it if someone replies with a complete roadmap, cuz I’m lost)

rigid timber
#

C:\Users\muham\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default.
This error shows up when I try to load this model
https://huggingface.co/Charangan/MedBERT/tree/main

quaint mulch
agile cobalt
past meteor
#

I read the docs and it's similar enough to torch/TF

#

They are all similar. The bigger problem is just duplicating code because it's not just the models, it's also the data access you need to duplicate. I want just 1 framework per project and if you pick one it's going to be torch 9/10

buoyant vine
#

This was something that always surprised me that AWS support MXnet despite it being dead

#

yet most of their inf systems dont support onnx

pearl parrot
#

Can someone share me the code for the hand detector thingy

agile cobalt
# pearl parrot Can someone share me the code for the hand detector thingy

you'll have to be more specific
https://github.com/topics/computer-vision has a bunch of repositories that might be relevant, e.g. https://github.com/CMU-Perceptual-Computing-Lab/openpose has some hand pose estimation

GitHub

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation - CMU-Perceptual-Computing-Lab/openpose

main fox
past bramble
#

day 6 kaggle report: Learned deeper about convolutions and their math

pine escarp
#

a good video i found

strong cove
warm mortar
#

Could someone write me a google collab notebook which upscales both images and videos using ESRGAN and RealESRGAN. In which I could load my media and model.??? Kindly do respond.

serene scaffold
warm mortar
#

Not a whole 10gb software or something

serene scaffold
warm mortar
#

Who has full command over python and ESRGAN

serene scaffold
# warm mortar Refer me to a person

We don't have referrals for that.

I've told you before that this server is a place for learning how to program. If you require code that does something, you need to be willing to learn how to do what it would take to create that code. You keep asking for people to do entire tasks for you, without any effort on your part.

warm mortar
serene scaffold
strong cove
#

Hello do any of u guys know where I could get a dataset of text for training an ai chatbot or if not what I should train it on??

versed bough
#

Can I ask yall for an opinion on something ?

#

I’ve been working on a machine learning based trading software for the longest time, with the ability to train models on different stocks with different specifications. I planned on licensing it out, with 3 different plans, with different add ons to each plan , specifically more trading threads, and training threads. Been stressing bc after I implemented progress bars for training, if I trained two models at once, one of the threads kept crashing around 76%. So I spent legit 6 hours going through every line of code associated with the training process, and anything that updated the UI. Made it more efficient , and used a queue so different operations wouldn’t clash with one another, so then it would get to around 96% for each model, then one thread would crash. But I checked the amount of windows and other applications I had running was a little insane, so I closed atleast half, and finally it trained fine all the way for both models. Because of the progress bars, it made the calls on the UI much more frequent, and in sum, a lot more computationally heavy, and I know that 99% of people are going to want to be able to track their progress and see how much time is left until a model is done training, so I’m just left at a forkroad because I planned on having up to 3 training threads available at a time. And it was working perfectly fine with three models training at a time, up until adding the progress bars. So I’m just trying to decide if I should just have up to two available, and it be an add on or what. Or should I keep it at one thread ?

versed bough
versed bough
versed bough
# pearl parrot After finishing basics, what should I learn next? (I want to work with ML, I wi...

Easiest way to learn, is to actually build. It’ll serve you better than going through tutorial heII. When you get stuck on a problem, and you’re nonstop working on trying to fix it for hours on end, maybe days. When you actually fix it, you’ll never run into that same issue. I started with a simple model that predicted the outcome of a soccer game. Downloaded a dataset online, and made it. Wasn’t too complex, but it was a start. Then voice recognition for an assistant and so on from there

#

Learning the different model architectures is a must when it comes to deciding which to implement based off a specific goal in mind

iron basalt
versed bough
#

It’s the fact that tkinter is being used as the UI framework

iron basalt
versed bough
#

What else would u suggest

#

I switched from tkinter to custom tkinter

#

Just for the actual looks of it

iron basalt
#

Anything else, tkinter is for small little widgets. It was meant for making micro UIs for stuff that would normally be shell commands.

versed bough
#

But compared to other frameworks it seems like Ctk is the nicest looking one

iron basalt
versed bough
#

I mean I’ve used pygui in the past, it just wasn’t as aesthetically appealing. As is my logic right now, it’s functioning how it needs to. So I may just first launch it with ctk, and later on switch over to something like pygui if need be. But the only issue I was having was the multi threading, calling on updating progress bars, on top of the actual computationally heavy task of training models , so it may not be a huge significant difference switching over

#

I’ve got a dev tab right now that has print statements pasted onto their , that also utilize the same queue, so I know the production one will still be more efficient

iron basalt
#

The computation should all be in the training, UI should be instantaneous.

#

Unless you are doing something really wrong.

#

Make sure your threading is actually doing threading, not like Python GIL stuff.

iron basalt
#

(CPython)

versed bough
serene grail
iron basalt
#

Putting aside tkinter specifically for a second, it should take about a few nanoseconds (move this up to microseconds for Python) to update all the progress bars.

#

(Although there are a few milliseconds delay to see the change visually on screen)

#

If your Queue instake is overwhelmed the Queue will keep growing until you run out of memory.

#

And the progress bar updates will be behind what they actually are.

versed bough
#

Then what else would be causing it to say something something async handler deleted by wrong thread , and doing this changed it

#

Then honestly the issue may had been memory

#

Even after deleting most of the things I had open, I checked my memory usage and it was at 70%

iron basalt
#

I think it's clearly some bug, which can be expected since this is threading.

versed bough
#

I’m just not sure what else it would be

iron basalt
#

You may also want to use multiprocessing instead of threading.

#

This will isolate each in its own process to insulate from crashes during training, and also gives you true threading performance benefits due to no GIL.

versed bough
#

U think creating seperate environments for each “thread” would resolve issues

#

Yeah might just do that

iron basalt
#

Yeah, that's what a process is.

versed bough
#

It’s just going to be thousands and thousands of lines of more code 💀

iron basalt
#

Heavy thread that does not share the memory space with the parent.

#

Multiprocessing is straight forward in Python.

versed bough
iron basalt
# versed bough I’m sorry I’m not very familiar with that

We take multithreaded code for granted, but what's needed to make it work properly? We need two Dr Steve Bagleys to illustrate this!

https://www.facebook.com/computerphile
https://twitter.com/computer_phile

This video was filmed and edited by Sean Riley.

Computer Science at the University of Nottingham: https://bit.ly/nottscomputer

Computer...

▶ Play video
faint quail
# iron basalt Multiprocessing is straight forward in Python.

with the exception of not having shared memory by default, (ie attributes inside self object of a class wont update accross Processes) the difference is literally

import threading
threading.Thread(target=foo, args=(bar,)).start()

vs:

import multiprocessing
multiprocessing.Process(target=foo, args=(bar,)).start()
#

just read the documention if you need to understand it better

thorn flame
#

Hey folks, I'm trying to implement some ml model in a lang other than Python, cos of real-time processing constraints on a cloud server accessible via API (most likely WebSocket). I'm considering C or Rust, but it does seem C++ has better ml ecosystem vs C. Or do I proceed with Python? I'd love opinions.

iron basalt
serene scaffold
#

(and if you're sure that you won't be using Python, then it's out-of-scope for this channel.)

thorn flame
#

Need to apply the model to the streams and send a refined output back to the client

serene scaffold
#

what kind of model?

iron basalt
#

How many clients?

thorn flame
#

It's an open question

#

Stuck between choices

#

But I'm concerned about perf

serene scaffold
serene scaffold
thorn flame
iron basalt
thorn flame
#

Or what do you think?

#

ML noob here

thorn flame
iron basalt
thorn flame
#

I mean could be multiple tbh

spring field
thorn flame
#

Cos it's an extension

#

Could have concurrent users

iron basalt
#

(And Python though, bindings)

spring field
#

also, models generally run on the GPU, so, if you really would want to optimize (some of the stuff), you'd write shaders in a whole different language

serene scaffold
#

when people say "python is slow", what they mean is "python does not scale well for tasks that are entirely CPU-bound", which is actually rarely the case.

thorn flame
iron basalt
#

(And you can execute / send those programs to the GPU from Python without leaving Python)

spring field
iron basalt
# thorn flame Yeah, I see people use C on twitter lool

So, if you are using existing solutions (that are often written in C/C++), Python is the ultimate language for that. If you want to raw implement CPU stuff, C (or C++ (there are more)). And GPU, whatever options you have, you have a few, but not many (and you can use any CPU language to then send those programs and data to the GPU to be executed, C, Python, etc).

thorn flame
#

For GPU, I guess you mean CUDA

iron basalt
#

That is a Nvidia specific option.

thorn flame
#

I'm thinking of using Azure. So do I use their machine learning platform or what?

iron basalt
#

No idea, highly recommend against touching Azure with a 100m stick.

thorn flame
thorn flame
iron basalt
thorn flame
#

Also, cant be on barebones vm since it demands GPU I guess

#

Dunno lol

iron basalt
thorn flame
#

Ratelimiting?

spring field
thorn flame
#

Throttling?

iron basalt
#

You need to configure things correctly on AWS and it's notoriously confusing.

#

There is an entire industry of what amounts to basically AWS frontends for this.

thorn flame
thorn flame
#

I'll find my way around Azure lol

#

Or paperspace

#

Ive heard negative things about aws bills

#

Even though I think most cloud platforms are the same

spring field
iron basalt
#

Paperspace is DigitalOcean, which I have not really heard issue about / had issues with.

thorn flame
spring field
#

yeah, I was delighted to find out DO acquired them when I found out about them first

thorn flame
#

Oh wow

#

That's cooool

#

I love DO

spring field
#

also

#

DO appears to be slowly rolling out GPU droplets as well, but that's probably gonna take a while

thorn flame
#

That would be dope. So it takes it away from serverless I guess?

iron basalt
#

All software is imperfect, and so when you build on top of other software that carries over and adds up.

serene scaffold
iron basalt
#

These days you can just take a random clip, give it a caption as a made up story, and put it online and everyone starts getting mad.

#

Don't even need to really edit or generate anything.

serene scaffold
#

tangentially related: today I was walking down the street, and a young black woman said "I think about their relationship all the time--it's my roman empire"
and I had first heard that use of "[someone's] roman empire" from my mother, an old white woman.
my mom exists in a very different media and social ecosystem than non-white young people, so apparently it's getting around.

iron basalt
floral portal
#

I need a little help with mathplotlib

#

my code:

    y_values = [snapshot["score"] for snapshot in user_snapshots]
    x_values = [datetime.strptime(snapshot["date"], "%Y-%m-%d") for snapshot in user_snapshots]
    fig, ax = plt.subplots(figsize=(10, 6))
    ax.plot(x_values, y_values)
    # Format the x-axis
    ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d"))
    ax.xaxis.set_major_locator(mdates.DayLocator(interval=1))

    # Rotate and align the tick labels so they look better
    plt.gcf().autofmt_xdate()

    # Add labels and title
    ax.set_xlabel("Date")
    ax.set_ylabel("Score")
    ax.set_title(f"{user.name}'s Score Over Time")

    # Adjust layout and save
    plt.tight_layout()
    plt.savefig("graph.png")
    fig.clf()
    plt.cla()
    plt.clf()
    plt.close('all')
    plt.close(fig)
    del fig
proper crag
proper crag
#

with kill 'pid number here'

#

on the terminal

strong cove
proper crag
thorn flame
#

Personally, not sure I've had much problems with Pip.

quaint mulch
# strong cove Hello do any of u guys know where I could get a dataset of text for training an ...

These datasets are used in machine learning (ML) research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the field of machine learning. Major advances in this field can result from advances in learning algorithms (such as deep learning), computer hardware, and, less-intuitively, the availability of high-...

quaint mulch
#

These datasets are used in machine learning (ML) research and have been cited in peer-reviewed academic journals. Datasets are an integral part of the field of machine learning. Major advances in this field can result from advances in learning algorithms (such as deep learning), computer hardware, and, less-intuitively, the availability of high-...

quaint mulch
# strong cove Thanks

No worries.
IDK what are your capabilities and your requirements, but unless this is for learning exercise. Chances are, I'm not sure why you want to train your own chatbot using publicly available data, instead of just using existing ones.

quaint mulch
quaint mulch
radiant rock
#

hey guys, i'm trying to learn NLP. i'm practically a beginner. can anyone recommend me textbooks about NLP (beginner-friendly)?

serene scaffold
# radiant rock hey guys, i'm trying to learn NLP. i'm practically a beginner. can anyone recomm...

this is a course you can watch: https://www.youtube.com/playlist?list=PLoROMvodv4rOSH4v6133s9LFPRHjEmbmJ

what is your goal for learning NLP? the NLP community has done a hard shift towards interactive LLMs since ChatGPT was released, but there are other research areas, too.

quaint mulch
jaunty helm
quaint mulch
versed gulch
#

Hi guys.

I'm playing around with the code in monai.transforms and would like to know when its not necessary to use Orientationd?

inland mulch
#

Sorry to cut your message

#

Well hello,
I was looking for some guidance
I completed python basics (playlist 7 hrs) I have done all the basics of PIL and tkinter by myself
Now I do not know where to go from here...
Some people said pygame some said DSA
But I personally wanted to do research on AGI and AI , what should I do.... Well the AGI thing is def optimistic but I want to know where do I start

forest tapir
#

Just a stupid question. So I've done a couple of ML tutorials before but not given much thought to it beyond following the guide "it works" and then moving on to something else. But 3blue1brown's series on neural networks popped up in my feed and I've been listening to it while doing other work. Was thinking of hunting down a tutorial to set something up like this for the hell of it later. However.

The first couple videos they did on the topic use a NN made for recognizing handwritten numbers as an example for discussion. it's 768 inputs, 2 layers in the middle with 20 neurons each, and 10 outputs, 1 for each digit.

In the second video they talk about how due to how the weights and activiations add up through the network, you can give the NN a borderline random image of noise and it will "confidently" categorize it as a 5 or a 7 or something.

So I was looking at a couple datasets around for training on this kind of problem, and as expected they're all handwritten numbers.

Would there be any benefit in adding an extra output for "Not a Number" in this kind of scenario and adding a few sets of randomized noisy images, or also handwritten letters, other languages characters, and etc. Or is it just the kind of thing that one wouldn't bother with cause it could ruin the NN's ability to do the main job of recognizing numbers

unique spoke
#

Hey guys! Currently I am working on an ai application which works on the webcam on my desktop.

I also installed Camo Studio and it connects to my phone. However, I cant show that data on my phone itself.

I saw on stackoverflow (which I would really appreciate if you could find that article - along the lines of external camera opencv) that I would need to learn how to make a smartphone app to do this. Is this true? And if so, how and where should I start?

small wedge
#

This idea is used in some places though. For example that's along the lines of how you train a discriminator in a GAN. But instead of having a solid dataset of fake images, you have another model try to generate a convincing image and use the discriminator to guess which one is real. This lets you train on patterns learned from the generator instead of needing those patterns represented in your dataset.

forest tapir
#

interesting. thank you, I'll go read about those next

winter canyon
#

Hello guys,

#

I am working on this project which generates optimal ship routing and I am very confused which algorithm and model i should use

#

and where should i get the datasets from

tender umbra
#

Anyone know how many images I would need to have a 80% accuracy when detecting Roblox Limiteds in a image? There are around 200 limiteds I want to be able to differentiate

charred leaf
#

say it takes 100 samples per image then you would need 20000 minimum

quaint mulch
quaint mulch
#

adding a few sets of randomized noisy images, or also handwritten letters, other languages characters, and etc
That is certainly one valid way, and people have tried it with good-ish results. The problem is, the space "not a number" in the pixel space is very huge.

quaint mulch
# winter canyon I am working on this project which generates optimal ship routing and I am very ...

The travelling salesman problem, also known as the travelling salesperson problem (TSP), asks the following question: "Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city exactly once and returns to the origin city?" It is an NP-hard problem in combinatorial optimization...

quaint mulch
quaint mulch
#

you'll never know until you try

charred leaf
#

yeah

#

I was giving an optimistic estimate 🙂

#

like, I was wondering how would you collect even a hundred

charred leaf
#

@quaint mulch actually um, we can artificially generate a dataset of 1000's of image now that I think about it, but would that not mean we can do away with just 1 of each kind?

quaint mulch
#

I think im not familiar about roblox enough to comment more

charred leaf
#

ok, right, I have realized that I am not either 😳

quaint mulch
#

haha

iron basalt
thorn flame
#

Myriad of options / lack of standardisation ?

#

Cos I think even the js ecosystem is not absolved

#

Which imo still boils down to convenience?

inland mulch
iron basalt
# thorn flame What else could it be?

For a long time there was not a standard virtual environment system, there was stuff like Conda floating around, versioning issues / breaking binary compatibility, all the distutils, setuptools, etc, we now have new stuff for that and it's still ongoing.

#

A lot of complaints will also be out of date. Like how everyone still thinks PHP is as slow and bad as it used to be 10+ years ago (last time they used it).

#

We still have people asking about Conda here due to all the out of date information. Just use regular Python, Conda existed due to lack of various things that now exist.

#

(A lot of people tried to get into Python ML during peak Conda, and the conflicts between it and regular Python and various other Conda issues was one factor in this perception of Python package management)

thorn flame
#

I can imagine heh.

#

I'm upskilling in ML/AI and I'm considering learning linear algebra first since everything is based on that. Is this a waste of time? The end goal is to enable me work on my pet project which does some real-time audio processing/transformations.

#

I mean before touching libraries

#

Like Pytorch and Keras

iron basalt
iron basalt
thorn flame
#

I see...

#

I think thats how most of programming works or at least has been for me

#

Concepts-based learning

iron basalt
#

Pretty much everything interesting that involves number crunching on a computer involves linear algebra.

#

(And it's why the hardware is designed for it these days)

left tartan
iron basalt
#

Learning specific libraries is not deep knowledge, but it's needed to do anything in the end.

iron basalt
thorn flame
thorn flame
iron basalt
#

There is infinite math to learn so your knowledge will be somewhat ML focused either way.

thorn flame
#

Yeah, I feel

#

Thank you

dry field
#

if you remember, here's how the trained model performs on real data- not 100% accurate, but would do

true iris
#

Anyone here have experience with qlora

clever hollow
#

Is there anyone that can point me towards some resources around NLP? I am trying to work on something that has my original data and needs to get data from an api in order of similarity to the original data - I can sortby relevancy in the api call but i am combining responses from different api's so i need them ordered by similarity once i have combined them. However, once i have done this i want to be able to check whether the most similar responses support my original data or go against it? I was looking at NLI with huggingface but tbh im not really sure which direction i need to go with this that makes the most sense

serene scaffold
#

Can you give an example of two texts that "support" one another?

clever hollow
# serene scaffold What does it mean for one text to "support your data"?

yea, this may be a bad example but lets say I have a claim that a certain team won the world cup in a certain year for example, if the data i get back surrounding the event is similar, i.e it found results that match words in the claim based on similarity such as world cup, specific year etc, but maybe a teamname wasnt correct or maybe the score is different, then i would say it doesnt support the original claim. Or maybe it matches the claim to a certain degree which i can calculate so that would be the likelihood that the claim happened as a % or a clarifying message that expresses that the teamname was wrong

charred leaf
#

hey, where can I find some live stock data

thorn flame
#

So transformers are basically the best for audio processing/transformations??

#

or CNN-based U-net??

thorn flame
verbal oar
#

hmm there is unsupervised regression and unn(unsupervised nearest neighbors)

thorn flame
#

Isn't "unn" what you get in item-based collaborative filtering?

upbeat prism
leaden kayak
#

What are some of the best-written Python codes you know in term of quality?

spare forum
#

Wdym?

#

It's probably closed sourced in my company or something idk

pearl parrot
#

I actuallu watch Corey Schafer's videos

But he doesnt have any on NumPy, and I am not finding other channels helpful. Can someone give me a good YT channel's NumPY vid THAT CAN ACTUALLY TEACH IT TO MT TINY BRAIN?

jaunty helm
quaint mulch
quaint mulch
# thorn flame I'm upskilling in ML/AI and I'm considering learning linear algebra first since ...

My preferences is:

  1. Just know barely enough about math, so you know WHEN to pick it up properly. You can do this just by watching 3blue1brown or something.
  2. Go head on into your pet project.
  3. Eventually you get stuck, but by then, you know which basic math you were missing, and then you can learn those. Usually, maths are wayy easier to learn once you have a concrete understanding of stuff the math is about.
quaint mulch
quaint mulch
quaint mulch
# upbeat prism Hello, looking for someone with knowledge in transformers and "how people talk a...

People are going very fast and loose with nomenclature. That's just what it is since the field is new. Sometimes people also come from differnet background: math, electrical and elctronics engineering, signal processsing, software development, computer science, physics, etc2. So they will bring their own nomenclature. You just have to live with it.

That's why every paper usually have a section called, "definition" or "problem definition", where these concepts are laid out exactly using the language of math. These tends to vary even between papers with the same authors.

So, just call it whatever, or use chatgpt when talking about it normally. When you need to go precise, read the math definitions, read the code, and write the math definition.

quaint mulch
thorn flame
#

Plan was to read it as I work on the project iteratively

#

Apparently it's not time for model development since I have to write the frontend (chrome extension).

#

I also know of 3blue1brown lol...

#

Personally, I just want to learn enough linear algebra to not be confused with its application

gilded belfry
#

can anyone suggest an algorithm that draws straight lines based on this picture, giving the starting and ending points of the lines?

#

I found an algorithm that draws lines, but the lines are not straight enough and do not give start and end points

past bramble
#

day 7 kaggle report: created handwritten digits image generation model

serene scaffold
past bramble
#

I'll look into modifying the neural network to improve it cuz some of them are just scribble

#

I had saved bunch of outputs every epoch and turned it to a gif

#

slow version I made for self satisfaction of seeing my neural network improve :)

verbal oar
#

did you looked at autoencoders and vae?

#

I mean did you use vae for this?

#

if yes now you can do it with gan

#

I mean I see in some book about generation of digits with vae and then gan

#

probably it was deep learning for computer vision from packt, and maybe it was in pytorch if I recalled corectly

#

@gilded belfry hmm maybe you need more iterations?

#

to get more straight lines or not 😅

past bramble
verbal oar
#

ah ok maybe better, more realistic results

past bramble
#

gan is more suitable for these?

verbal oar
#

for realistic things

#

for this maybe its overkill

past bramble
#

I have to figure out if i need to add more layers or modify existing one

verbal oar
#

but for learning I think its ok

past bramble
#

I think the results will be better if i pass the labels along with noise as input

verbal oar
#

also you can see gan vs vae there is comparison on internet between two

#

hmm so dlss is gan 🤔

past bramble
#

how does vae generate images?
they say gans use noise but nothing about vae

verbal oar
#

vae uses randomness

#

probability distributions unlike in autoencoders

#

ok your question is what vae use noise or different thing

past bramble
#

do they use noise to generate random images or they have a different way?

verbal oar
#

they dont use noise

verbal oar
#

oh sorry there is noise distribution

#

I meant there is no noise, but there is noise distribution

#

more specifically prior and noise distributions

#

and there is kl-divergence between 2 distributions

#

so distance between 2 distributions

past bramble
#

any references to understand it better?

spare briar
past bramble
mild dirge
#

But what would make more sense is to give each possible value their own dimension

#

I.e. 1-hot encoding. And then each digit is equally distanced from all other digits

#

And you can use other tricks like " softmax" to turn this vector into a list of probabilities for each digit.

#

This is the most logical approach for classification

charred leaf
past bramble
quaint mulch
quaint mulch
charred leaf
spare briar
#

and ebms work with the unnormalized density to avoid that

digital lark
#

Hello today Jupyter notebook decided to be a bit rude and I cant import tensorflow:

#

ImportError Traceback (most recent call last)
Cell In[1], line 1
----> 1 import tensorflow as tf

File ~\anaconda3\lib\site-packages\tensorflow_init_.py:40
38 from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow # pylint: disable=unused-import
39 from tensorflow.python.tools import module_util as _module_util
---> 40 from tensorflow.python.util.lazy_loader import KerasLazyLoader as _KerasLazyLoader
42 # Make sure code inside the TensorFlow codebase can use tf2.enabled() at import.
43 _os.environ["TF2_BEHAVIOR"] = "1"

ImportError: cannot import name 'KerasLazyLoader' from 'tensorflow.python.util.lazy_loader' (C:\Users\David\anaconda3\lib\site-packages\tensorflow\python\util\lazy_loader.py)

#

can anyone help

spare briar
faint quail
#

I am training a bounding box regression model, and the model is really good at predicting where the objects are but the height and widths are always super small, does anyone know why this could be?

strong cove
#

Hello

delicate apex
#

a cheating program for some multiplayer shooter

faint quail
quaint mulch
valid minnow
#

Why isn't there a channel for Data engineering?

proper crag
#

im planning to create a model that is fed with data of transfer rate which is a continous value which resulting i need regression model. then it pass that info to classification model

#

is it practical and is achievable?

#

so that then the classification model will predict the anomaly of the networks traffic

agile cobalt
#

remember that to train a supervised model you would need of many labelled examples for each kind of output you want, including both the features you'll feed to the model and its expected output

anomaly detection is a bit of a separate topic, there are some techniques that let you try and detect outliers without having to label the data

proper crag
#

idk what else to learn for that

#

im currently trying kernal tricks

verbal oar
#

in sense they use supervised training

#

but GAN are unsupervised

#

hmm ok not sure about it

#

generally generative models are unsupervised

#

so maybe in the sense of unsupervised classification ( so clustering)

#

please correct me I think I near to explain, dont know how to formulate it

#

maybe I could say suppose if GAN is supervised so its classification

#

hmm so thinking sorta of like classification is regression, but no other way around

#

hmm so gan is rather like regression, because generate each pixel value

#

but hmm its not starting from empty thing, but noise so its like replace noise pixel value with generated pixel value?

#

is there sth to see this on pixel level (low level)?

#

for example like image segmentation is classification of each pixel

#

now I'm confused little about gan

small wedge
# verbal oar maybe I could say suppose if GAN is supervised so its classification

the GAN algorithm is unsupervised, as you start without any labels for your data. This composes two pieces, a supervised task learned by the discriminator where the generator makes a fake image and the discriminator must pick the correct choice between that and a real image (classification). Then there is an unsupervised task learned by the generator, which calculates its loss from the accuracy of the discriminator.

verbal oar
#

so its semi supervised?

small wedge
#

the algorithm itself is just unsupervised

#

the classification task learned there is supervised, but it's set up without providing any labels

#

so as a researcher the model is unsupervised, because it doesn't require us to map an input to a correct output

timid shuttle
verbal oar
#

Midnight: convert to one-hot encoding, someone do it too in some book which I skim

past bramble
#

conditional GANs are a different thing

upbeat prism
#

Hi,

I have a basic bert model from hf's transformer:

(Pdb) model
BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(20, 32, padding_idx=0)
      (position_embeddings): Embedding(128, 32)
      (token_type_embeddings): Embedding(2, 32)
      (LayerNorm): LayerNorm((32,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0): BertLayer(
          (attention): BertAttention(
            (self): BertSelfAttention(
              (query): Linear(in_features=32, out_features=32, bias=True)
              (key): Linear(in_features=32, out_features=32, bias=True)
              (value): Linear(in_features=32, out_features=32, bias=True)
              (dropout): Dropout(p=0, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=32, out_features=32, bias=True)
              (LayerNorm): LayerNorm((32,), eps=1e-12, elementwise_affine=True)
              (dropout): Dropout(p=0, inplace=False)
            )
          )
          (intermediate): BertIntermediate(
            (dense): Linear(in_features=32, out_features=32, bias=True)
            (intermediate_act_fn): GELUActivation()
          )
          (output): BertOutput(
            (dense): Linear(in_features=32, out_features=32, bias=True)
            (LayerNorm): LayerNorm((32,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0, inplace=False)
          )
        )
      )
    )
    (pooler): BertPooler(
      (dense): Linear(in_features=32, out_features=32, bias=True)
      (activation): Tanh()
    )
  )
  (dropout): Dropout(p=0, inplace=False)
  (classifier): Linear(in_features=32, out_features=2, bias=True)
)

(Pdb) model.bert.embeddings.word_embeddings
Embedding(20, 32, padding_idx=0)
#

You can also see at the end I have this Embedding object, does anyone knwo the docs to this? I'd like to print the actual values.

#

Ah I think it's just nn.Embedding

quaint mulch
quaint mulch
upbeat prism
#

From https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html I have this example

>>> # an Embedding module containing 10 tensors of size 3
>>> embedding = nn.Embedding(10, 3)
>>> # a batch of 2 samples of 4 indices each
>>> input = torch.LongTensor([[1, 2, 4, 5], [4, 3, 2, 9]])
>>> embedding(input)
tensor([[[-0.0251, -1.6902,  0.7172],
         [-0.6431,  0.0748,  0.6969],
         [ 1.4970,  1.3448, -0.9685],
         [-0.3677, -2.7265, -0.1685]],

        [[ 1.4970,  1.3448, -0.9685],
         [ 0.4362, -0.4004,  0.9400],
         [-0.6431,  0.0748,  0.6969],
         [ 0.9124, -2.3616,  1.1151]]])

So that's nice and all but how can I check the values stored in embedding after assigning the input to it?

small wedge
#

you didn't assign input to the embeddings, you just passed it through the layer

#

embedding.weight should show you the weights

proper crag
rigid timber
#

https://github.com/Karanchaudhary350/DiagnoSys
This error appears when I do 'flask run'

ValueError: Kernel shape must have the same length as input, but received kernel of shape (3, 3, 3, 32) and input of shape (None, None, 224, 224, 3).

GitHub

DiagnoSys is a comprehensive web application that provides advanced detection and analysis for various health conditions. This project leverages state-of-the-art machine learning algorithms to dete...

upbeat prism
#

thanks - god that makes more sense now

quaint mulch
# proper crag I mean yh thats much more practical for the project but i mean like what else me...
GitHub

A python library for user-friendly forecasting and anomaly detection on time series. - unit8co/darts

GitHub

A Library for Advanced Deep Time Series Models. Contribute to thuml/Time-Series-Library development by creating an account on GitHub.

past bramble
#

can i have 2 input layers in my neural network? if so how do i do it in TensorFlow?

I don't get how the flow of layers would be for multiple input layers

#

the multiple input layers merge into one after functions or processing?

small wedge
#

there are a ton of different ways to do this so it depends

#

there are ensemble learning algorithms like stacking, bagging, and boosting which focus on combining the outputs of multiple models into one to get a good output, then there are ways to like directly combine the output features like concatenation, where you add the outputs of separate layers together before feeding them into a single layer together

past bramble
#

thanks

past meteor
#

@wooden sail Do you consider evaluating against the same test set multiple times multiple testing?

#

Meaning, if you keep going on and on and on you may end up with a final result / architecture that is good by chance but not because it really is intrinsically

agile cobalt
#

sounds like cross validation to me?

past meteor
#

It's different

#

Say you split 60 20 20. This means you use 80% to find your architecture

#

You may end up with something that fits 20 well more or less on accident

#

You have your remaining 20 to figure it out. Let's say there's a sizeable gap, what then? Option 1 is leaving it as is but then your final results will be off what it probably should be in reality and option 2 is going again on your test set with slightly different hyperparams but then you risk reporting numbers that are good on accident (which already happened, that's why you have this problem in the first place)

#

I don't know if you get my drift @agile cobalt

agile cobalt
#

so pretty much over-fitting the architecture, and how to avoid that?

past meteor
#

The abstract explains it pretty good

past meteor
#

I don't think you can prevent this except by having a really really large dataset

wooden sail
#

i would expect there to be ways of correcting for it by making the tests more strict/ have higher thresholds

wooden sail
past meteor
#

It's a really nasty problem

#

The error on my test set is 2-3x that on the validation

#

Mostly because the tuning algorithm itself overfit

left tartan
iron basalt
river cape
#

Hey guys

#

The model YOLO, does it come with pretrained weights?

#

and we just use it like that?

#

Dont we need to build like how you build a CNN?

iron basalt
# past meteor Mostly because the tuning algorithm itself overfit

You can view it as yourself being part of the training algorithm, you are in the feedback loop. Information is leaking through you (from test) into the model by your decisions. It's still testing, but eventually not, depending on how often you do it, and other hard to measure factors.

#

But doing it more than once makes it not pure in the scientific sense.

#

The scientific method and the statistics built around it are explicitly designed to avoid stuff like this.

#

(Which is what diminishes its practical use a lot (it's a high bar))

#

(It's also why a lot of papers out there claiming to be scientific, or to be using statistics properly are not)

iron basalt
soft token
#

anyone here have experience building a RAG?

agile cobalt
soft token
#

Trying to build a RAG in Python for LMStudio using HF transformers/FAISS. Having this issue where the retriever is exiting script at 0% without throwing an exception - brand new to RAGs and not sure where to start troubleshooting (or honestly if I'm even approaching it correctly)

#

Could post a code snippet, which would probably be better for python-help, but really just want resources or a lead into what could possibly be going wrong with a retriever, i.e. bad indexing?

serene scaffold
past meteor
#

For a serious study I don't like touching the test set more than once (or I'd have 2)

#

It's a handicap because you undoubtedly benchmark yourself against data scientists that don't know or don't care and you'll have worse results. That being said I don't think the job is getting good results, it's about getting high fidelity estimates of performance

iron basalt
past bramble
#

AY I LOVE THIS ITS LEARNING TO WRITE OMG MY FIRST MODEL THAT'S ACTUALLY WRITING SO NICELY

#

this is after 3 epochs, there's more left

#

little baby's first try so cute

#

I thought it'll take a while to learn and made it 500 epochs 💀
can I stop it? It's already good enough from 10th epoch

#

or will the progress be lost if I cancel? Im using kaggle notebook

#

its free gpu I'll let it be

tepid tartan
#

I want to become a Data Analyst or get an entry-level job. I have barely any knowledge of SQL and some knowledge of Excel. I took an SQL course in college two years ago and Finished the Statistics Course a few weeks ago. Statistics was good with a calculator and barely any Excel homework. I took Introduction to Data and Business Analysis. I haven't taken data visualization or machine learning. I'm just asking if I watch the Boot Camp of Data Analyst Video or what other way to get started strong to find a Data Analyst Job since I'm planning to graduate in May of 2025

glacial yoke
spare forum
#

If you are okay doing VBA and sas you can practise them, but you better aim python/R for your own mental sanity

upbeat prism
#

What's etl? Also: R and sanity in the same sentence? 😮

spare forum
#

Extract transform load (data engineering concept)

spare forum
jaunty helm
#

ngl I'm kinda digging the ggplot syntax
haven't created that complicated of graphs tho

small wedge
#

is this a GAN? or is it mnist classification?

agile cobalt
odd stratus
past meteor
#

Going from ggplot2 to matplotlib feels like going back to the stone age

spare forum
#

seaborn is simple af

tepid tartan
spare forum
#

idk about bootcamps tbh I learned from master degree + projects

tepid tartan
#

My python is horrible. DONE the course 3 years ago.

tepid tartan
spare forum
tepid tartan
#

This is what I found on Youtube

spare forum
#

for python maybe juste grab some knowledge then do some notebooks on kaggle

#

It would be more efficient to grab key concepts and hop on projects

serene grail
#

I was also going to suggest Kaggle, I think it's hard to learn from giant video courses like that
Also if your Python is rusty you might want to read A Byte of Python, it's a free book available online and it should be a good refresher

spare forum
#

like for sql you setup your own example postgres db and you fool around

#

I would say that being able to do just a bit on Tableau and or Power Bi is plently enough especially before the first internship as albsolute beginner

tepid tartan
#

The most internships are in the Summer and I plan on graduating in the Spring of 2025

#

I'm asking too much for you. Could you give me a list of things to learn from scratch? Like starting nothing. @spare forum

spare forum
#

are you solid in stats ?

tepid tartan
#

Just the basic @spare forum just took the course last semester

#

spent the majority of my time learning the calculator

spare forum
#

stats : basic descriptive + tests (t test anova chi2 correlation) SQL : modelisation entity-relationship model, obv joins, views common table expressions, being good at queries python : pandas numpy, matplotlib/seaborn, excel basics, Pbi basics Tableau basics, added value R, SAS

tepid tartan
#

On stats, I should probably use Excel more since I'm very familiar with t test correlation on T-84 Calculator.

spare forum
#

the tool is a thing but you beed to know how to explain the choice of such tool in a scenario

lusty lotus
#

is there a python api for GPU compute without having the user to write cuda code/c code? speed is not paramount but id prefer if theres some python lib that can work with amd gpus too.
im in the process of building an autograd engine

tepid tartan
spare forum
#

there is no absolute best response, just find something that can use more than one tech at once

tepid tartan
#

True.

#

@spare forum, any recommended Website/Video to learn all that you mentioned?

spare forum
#

udemy

#

or just if you have a clear idea of project go for it and check the docs ( plus gpt)

tepid tartan
past bramble
tepid tartan
#

I'll let you know if I have questions or issues along the way

past bramble
past bramble
#

I'm using TensorFlow, can I see the outputs layer by layer? I want to see how it draws them

agile cobalt
small wedge
past bramble
#

oh nice I can pass them through each layer

serene grail
past bramble
#

How do you pass input to the layers? I tried this:```py
def visualize_steps(number = np.random.randint(10)):
noise = tf.random.normal([1, 100])
labels = tf.keras.utils.to_categorical([[number]], 10)

outputs = [[labels, noise]]

for layer in model.layers:
    output = layer(outputs[-1])
    outputs.append(output)

and recieved this error:nim
output = layer(*outputs[-1])
^^^^^^^^^^^^^^^^^^^

raise TypeError('too many positional arguments') from None

TypeError: too many positional arguments

agile cobalt
#

is your code using layer(*outputs[-1]) or layer(outputs[-1])?

also number = np.random.randint(10) will suffer from the same problem as mutable default arguments, it'll call once when creating the function then re-use the same result every single time

past bramble
#

it's a problem we can't individually pass the input to the layers. I have to make a separate model for each layer and track the inputs/outputs

past meteor
#

It's a very exotic case of data leakage

#

Which is in and of itself worth publishing because I didn't do anything different than papers I read

upbeat prism
#

I have a BERT model, tiny one. If you look at the architecture you see that before the attentio nit branched into query, key, values, skip. How can I find the node the branching happens?

coarse nacelle
#

@serene scaffold Hello these are some terminologies that were discussed today in the intro and I was all blank..

AutoML (Automated Machine Learning) Toolbox
Neural Architecture Search
Algorithm Selection
Metalearning
Automated Reinforcement Learning
Learning Curves
Recommender Systems
Fairness
Green AML
Hyperparameter Optimization
CASH (Combined Algorithm Selection and Hyperparameter Optimization)```
wooden sail
serene grail
past meteor
#

But it's not, (part of) the evaluation was done incorrectly

serene grail
past meteor
#

The first part I found copied from papers and for the second I rolled my own and it's on the handcrafted holdout set I found that the results were BS

#

So I think many numbers out there are inflated, because if you don't do due diligence and the reviewer doesn't catch you, you're good lol

#

It's an easy fix for me, I just need a tuning set that is similar to my holdout set, then the exotic leakage that I encountered will be punished fairly

quaint mulch
quaint mulch
#

Is this a question?

Since this is an intro, I suppose that you will get familiar with them at the end?

quaint mulch
charred light
#

Does it make sense for a image dataset (e.g. the handwritten digits 16x16) to face multicollinearity?
I don't see how that makes sense for images as multicollinearity wouldn't really apply? You wouldn't remove columns of image data even if they were correlated?

spare forum
quaint mulch
charred light
shadow viper
#

good day guys, please anyone know how to or a good site where I can use a pre trained vision transformers for my dataset

wooden sail
charred light
# wooden sail is there any more context to the question? at a glance, i would rather assume th...

This course isn't that complex. The goal was to run simple linear regression and KNN on the handwritten image dataset (16x16). In one of the solutions, it states (image). Additionally, some students solutions generated a correlation matrix as part of their EDA.

I didn't create one since I didn't think it's necessary for image data. I wanted to confirm if it was indeed needed or students were running through the motions as they tend to do in these homework reports.

wooden sail
#

what kind of regression?

charred light
#

Worst of all is the prof created a list of "things to include, if applicable" but students tend to ignore the "if applicable" and add everything.

wooden sail
#

it really does make sense here. if you wanna do linear regression on the images, you can generate a basis for the image space and just weigh the basis vectors

charred light
#

It's a linear regression for classification.

#

Classifying* between the numbers 2 and 7

wooden sail
#

still makes sense though

#

the covariance approach is probably the simplest thing to try first

#

it's the same as asking if your images live in a low dimensional vector space and are linearly separable

#

the "features" are the spanning set of the vector space, and this is lower dimensional than the number of images

quaint mulch
wooden sail
#

you vectorize the images, make a correlation matrix, then use an EVD (which here is the same as an SVD, and if you center the correlation to get a covariance matrix, it's also the same as PCA). if you don't use a basis but rather an overcomplete spanning set, the regression coefficients are not unique and you'll have a really bad time with classification (not that linear classification is great without using other methods along with it)

quaint mulch
charred light
wooden sail
#

it's difficult to tell without knowing what any of the functions do there

charred light
#

It's equivalent to:
lr = LinearRegression()
lr.fit(X_train, y_train)
In python, but provides additional auto generated metrics like R & R squared, F statistics, etc that the sklearn doesn't (automatically)

wooden sail
#

yeah but "linear regression" isn't enough to know what is happening there

#

what's the spanning set being used?

#

you can do linear regression in infinitely many ways

#

what are x_train and y_train here

spare forum
#

data is ziptrain, V1 ~ means y is the V1 column of this dataset if I remember correctly

charred light
#

X is all other columns data[:, 1:], Y is the first column. data[:, 0]

wooden sail
#

yeah then it's exactly as i described before

#

you're making y out of a linear combination of all the other images

#

images of a similar type are usually in a low dimensional space, so if you collect several images of the same kind, they're linearly dependent (what you call multicollinearity)

#

you can get around that by finding a basis, e.g. through evd or svd or pca

#

it's the same way the old "eigenfaces" algorithm works

quaint mulch
ivory timber
#

Hey, does anyone know how to convert a tmx file into an obstacle matrix?

#

for a pathfinding

charred light
lapis sequoia
#

top 100 optimizers

#

184*

spare forum
#

okay ?

coarse nacelle
unkempt apex
#
RangeIndex: 2249698 entries, 0 to 2249697
Data columns (total 6 columns):
 #   Column           Dtype  
---  ------           -----  
 0   PRODUCT_ID       int64  
 1   TITLE            object 
 2   BULLET_POINTS    object 
 3   DESCRIPTION      object 
 4   PRODUCT_TYPE_ID  int64  
 5   PRODUCT_LENGTH   float64
dtypes: float64(1), int64(2), object(3)
memory usage: 103.0+ MB```
#

and it has this level of missing values

TITLE                   13
BULLET_POINTS       837366
DESCRIPTION        1157382
PRODUCT_TYPE_ID          0
PRODUCT_LENGTH           0
dtype: int64```
#

so dropping is bad idea , because bullet_points and descriptions are our features which I will put in model

#

so what can I do?

past bramble
spare forum
#

Can't do that much magic here it seems

unkempt apex
#

so I added like 'missing' in each null columns

spare forum
#

for training, can't do ohter thing than drop here I think

unkempt apex
#

but then my model performance will go down?

#

what does this error says?

#

or do I need to show full traceback?

spare forum
unkempt apex
versed bough
#

Why is a dictionary I set using multiprocessor manager, not being acknowledged within the child process ? It was working fine when processor.join() was being used, but I can’t use that because then I can’t run it multiple times at once ( it’s a training process for ML models), which is precisely why I’m using multiprocessing, to be able to simultaneously run it. What’s the work around? Even if it’s being passed directly to the child, it can’t access it properly.

unkempt apex
spare forum
#

idk you either to df = df.dop ... or df.drop(inplace=True) but personnally I have no idea if there is an ultimate best

left tartan
unkempt apex
pine escarp
tepid tartan
shadow viper
#

good day everyone, please is there anyone who has worked with vision transformer online?

left tartan
shadow viper
#

omg billyyy

shadow viper
left tartan
#

(but I don't do vision / CV stuff)

shadow viper
left tartan
shadow viper
#

so my school gave us a task to use vision transformer to train a model on a stroke dataset. but then its actually so large and heavy for my PC. the training time is mind blowing also. i tried using the pretrained but its giving me a zip error.

its there anyway to get this done without killing my laptop and with maybe a little less time?

serene scaffold
shadow viper
#

thanks

versed bough
#

Why is a dictionary I set using multiprocessor manager, not being acknowledged within the child process ? It was working fine when processor.join() was being used, but I can’t use that because then I can’t run it multiple times at once ( it’s a training process for ML models), which is precisely why I’m using multiprocessing, to be able to simultaneously run it. What’s the work around? Even if it’s being passed directly to the child, it can’t access it properly.

verbal oar
#

because inplace mutates?

#

yeah good intuition, I thought different reason

gusty jasper
#

Hey Im pretty new to this server and I am trying to find some software to help us. Im representing someone who owns a couple clubs (foods/drinks) and the owner wanted some sort of AI that is able to detect products being sold and be graphed (for exp. USER1 - 2 Waffles Sold) be able to detect whos selling them and how many, be able to see what kind of product it is and tell us about it from CCTV footage.

storm valve
#

!recruiting

gusty jasper
#

Could you tell me whats the right channel about that stuff? Someone told me I should ask here.

storm valve
gusty jasper
#

Oh so sorry, I didn't know that. Ill delete.

full furnace
#

Is a image recognition as my porotofolio will land a intern

#

What should I make next to land as a intern

gusty jasper
full furnace
#

Ml engineer or data scientist

gusty jasper
#

Image recognition is really useful, we are currently in-need of some software that is able to look at hours of cctv footage and give us information about sold products

#

Im sure, you could easily land as a intern if you are really good as a data scientist

full furnace
#

I hope so

#

Thanks for the response

quaint mulch
quaint mulch
quaint mulch
quaint mulch
past bramble
#

guys I made the layer by layer visualization

serene grail
spare forum
past bramble
spare forum
# full furnace Wait there's a paid one

I did paid internship, and the interviewer just asked simple questions if I knew supervised unsupervised ml etc... the more you do the better, but to answer yes this project is very valuable for an internship

past bramble
# past bramble guys I made the layer by layer visualization

from this visualization I am thinking, I wouldn't need the dense layer with 12k data points do I? cuz they get reduced to the 7x7 image from where the drawing starts.

I should directly start from creating the image with no dense layer and just convolutions is what I inferred from this.

spare forum
#

I was paid

full furnace
spare forum
#

depending on the country internship can be paid

full furnace
#

I see did u search it online the internship what web u use to get it

spare forum
full furnace
spare forum
#

you do projects , and add a "project" section in resume with the github link and description

full furnace
spare forum
#

you should make your resume on canvas or word

full furnace
#

Ohh I see I can do word thanks for the help

unkempt apex
unkempt apex
#

this is paper on SAR images

#

but from where I can download the dataset?

agile cobalt
# unkempt apex but from where I can download the dataset?

that paragraph?

3.2 Dataset Availability
The SEN1-2 dataset is shared under the open access license CC-BY and available for download at a persistent link provided by the library of the Technical University of Munich: ... This paper must be cited when the dataset is used for research purposes

agile cobalt
#

pretty sure that's still on the small side for datasets

unkempt apex
agile cobalt
#

I haven't really worked with any large datasets myself, but some can reach petabyte-scale (specially things crawled from the web)

tepid tartan
#

@spare forum were you able to see what GPT recommended me on to look into at Udemy?

spare forum
#

Yes but tbh pick something they will all be fine

unkempt apex
tepid tartan
#

I can manually browse that topic that you mentioned

spare forum
#

Check the site there are evaluations and syllabus the top courses are always good

tepid tartan
#

For stats, should I into with excel or not?

#

@spare forum

safe agate
left tartan
spare forum
tepid tartan
#

For solving problems that might require a calculator

past bramble
#

arent batch normalization layers the same as the previous layers here?

#

do they not play a role here or?

coarse nacelle
serene scaffold
spring field
#

and if you can't get a refund, remember the sunk cost fallacy
though I suppose in this case the cost may not have sunk all the way to the bottom, so if you pick up the basics, you may be able to continue with it 🤷‍♂️

verbal venture
#

So in a transformer, the queries are "questions" the model asks about surrounding context words, and the keys are the answers "information/content" about a word. But how are these values determined? What mathematics/linear transformation causes queries to become "questions" and keys to become "the information the queries are looking for". I understand the after-effect of dot-product similarity - I'm asking how the values get determined to begin with (why the attention mechanism works)?

#

I understand that once you have questions, and answers, represented in vectors, if you get dot-product similarity you get attention, but I'm asking how the questions and answers come to be in the first place. Through "weight-updates" is not a good enough answer lol

wooden sail
#

that's really pretty much it

verbal venture
#

not really

#

the transformation for Q, K and V are the same mathematically, but they represent different things

#

I'm asking how

wooden sail
#

you're thinking too hard about it tbh

#

you start with vectors in 3 different vector spaces, but you want to be able to compare their similarity

#

Q, K, and V are matrices that project vectors in those original vector spaces to the same low dimensional vector space where they can now be compared against each other with dot products

#

since the original vector spaces are generally distinct, the way you project them into the "comparison space" is different

#

as to how exactly to do it, well that's learned by showing examples of which vectors should be similar

wooden sail
#

that is how all neural networks learn

#

that is also the black box part that is not well understood

#

it's just a nasty non-convex optimization problem with no general guarantees

iron basalt
#

The original idea behind Q, K, V was indeed to act like a search system, but what they actually do is hard to tell. The current intuition seems to point in the direction that they basically sufficiently mix things (the inputs). And that the attention part that does this can be replaced with other things that also sufficiently mix things (but are more efficient, also maybe not even learned).

#

(Then the feed forward layers extract from this giant mixed soup)

#

(Bordering on something like a reservoir computer (lottery ticket hypothesis comes into play))

wooden sail
#

you have measurable data y that lives in some space Y. you have good reason to believe that the process that generates this data has parameters x in some space X. the process that generates the data is some function f, so that y = f(x), but you don't know f. so you replace it with a neural network N that has its own parameters w. so now we want y = N(w, x), but the w are unknown. if you have several examples of x and y, you can learn w. for the particular case of attention, a self attention block has w = (Q, K, V). now you present tons of labelled text to learn Q, K, and V

#

(measurable in that you measure it from somewhere, not in the measure theory sense btw)

#

and as squiggle says, and someone else has mentioned in this channel over the past few months, you can replace Q, K, and V with other kinds of transformations or force them to have special structure and they still work, because the whole idea is not really well motivated

iron basalt
verbal venture
#

okay, just forget the concept of a query and a key, just in terms of vector spaces, how does Q become Q and K become K (through math)

#

both start with the same word embeddings yeah? I'm asking how you go from word embedding * WQ -> "queries"

wooden sail
#

i'm not sure i understand your question

verbal venture
#

like mathematically/in terms of embeddings what does query mean here?

wooden sail
#

"query" is not a mathematical term

verbal venture
#

no but it is math + vector embeddings

wooden sail
#

what you tell the network to do is a weighted dot product

#

say you wanna compare a vector v to a vector u, but they're different lengths. we can multiply each by a properly sized rectangular matrix so that they now have the same lengths and we can take the dot product. let's call those Q and K, for example, so that we can do (Qu)^T (Kv)

#

and now we come along with labelled data and explicitly say "for all of these examples, (Qu)^T (Kv) should be large" (you can write this as an optimization problem)

#

you can now differentiate the cost function of your optimization problem w.r.t. Q and K and do some gradient based optimization so that they yield good results for all of the examples of u and v you chose

verbal venture
#

yeah I get that

wooden sail
#

so which part is troubling you?

verbal venture
#

I'll tell you but I need to do it through like 3 questions

#

okay word embeddings * linear query weights. What does each row/column in the query weight matrix represent?

#

this will lead to my other questions^

wooden sail
#

nothing in the real world

verbal venture
#

so the query weights represent nothing?

wooden sail
#

parameters in neural networks don't usually represent anything useful

#

the name "query" is also made up

iron basalt
#

Initially, random projections.

#

But after training, too complex to assign human meaning too.

wooden sail
#

yeah mathematically the whole matrix is a projection onto a lower dimensional vector space

verbal venture
#

okay and why are we projecting to a lower space for queries specifically?

wooden sail
#

and you find a "nice" projection that works well for the data you showed

verbal venture
#

it's just even more condensed vector representations (the lower space)?

wooden sail
#

usually for 2 reasons

#

one is simplicity. the other is that we often expect that useful data is useful precisely because it is "structured"

#

and structure often comes in the form of "low dimensionality"

#

if you explicitly account for this in your model, you end up with fewer parameters and also force the network to find out this structure... hopefully, at least

#

you've seen bottlenecks in other architectures like u-nets with CNNs, for example. natural images are "low dimensional", for a proper definition of "low dimensional"

iron basalt
#

Also with too many dimensions combinatorics explode, and that's not something we can compute even with super computers in a reasonable amount of time.

verbal venture
#

okay, so what does each row + column represent in the context of queries?

wooden sail
#

nothing

verbal venture
#

dude that's not possible

wooden sail
#

you can think of the rows as projection vectors if you like

wooden sail
#

it's a heuristic that works very well but is not well understood

verbal venture
#

well for starters each row is each word as a query right? and each column is one "feature" of the query space?

wooden sail
#

sure

#

well

#

"correspond to", not "is"

iron basalt
# verbal venture dude that's not possible

Imagine you are sitting infront of a box with a bunch of dials, and you have a light on top of the box. Your goal is to change the dials such that the light is as bright as possible. So you start moving the dials and they seem to have various non-linear effects on the brightness. But with a ton of trial and error you end up with a bright light. What do to the values of the dials represent?

wooden sail
#

the numbers are weights of linear transformations applied to the input vectors (words or sentence vectors, for example)

iron basalt
#

It's just a mechanical detail of how the values are stored.

verbal venture
#

okay let me ask another question. I understand that if you take the cosine similarity of 2 vectors if they equal to 1 they are the same vector. So a big thing in deep learning is the distributional hypothesis, which is words that appear in similar contexts mean the similar things

#

yeah?

#

so you use word2vec to find the embeddings of each word, and once they're in a shared vector space the distance between them is their semantic similarity. yeah?

#

give me 1 min

verbal venture
#

but semantically speaking would be equal

wooden sail
#

you have to learn the embedding just like Q, K, and V are learned

verbal venture
#

right, but there is a logic behind that. Word in similar contexts = similar meanings

wooden sail
#

that is your motivation

#

and that is what you hope the embedding achieves

#

it's not guaranteed that that is what it does

#

you train it in hopes that you achieve that. that motivates how you perform the embedding and which data you show

verbal venture
#

so queries = what the word is "looking for", keys = "words that can be offered". But what is the logic behind getting "looking for" vectors and "words that can be offered"? how does that happen? I know word2vec is fundamentally black box but the distributional hypothesis makes sense. What is the "logic" of Q, K and V? how do Q, K and V vectors come to be?

wooden sail
iron basalt
#

DL, no guarantees. It's "feels like a good idea" + converted into math + "it seems to work for a bunch of data, maybe this is due to my idea" (but could also be due to other things that are side effects of your idea, you are going off of correlation here).

wooden sail
#

but i have to highlight again that this is motivation only

#

just as squiggle says. you get inspiration from something and then try to make an architecture that will promote the behavior you want

#

you have no guarantees it will work

#

you also have no guarantees that even if it works, it does so because of your choices

#

if you make any network deep enough, it'll work regardless of architecture

iron basalt
#

To get down to what actually mattered, you have to do what many are doing now, which is replacing parts like the attention part with something else, and if it still works that was not it (or what they both have in common).

verbal venture
#

well just in terms of what we think how Q, K and V comes to be. Attention makes sense due to cosine similarity, but before you get Q * K = Attention, you need to first get queries and then keys

#

it doesn't need to be a perfect explanation, just maybe what the current guess is as to how it works

wooden sail
#

they had motivation to try that architecture

agile anvil
wooden sail
#

whether that is why it worked is a different question, currently under research

iron basalt
#

But it also seems pretty clear that it's doing way more work than needed, so it's kind of a brute force approach.

verbal venture
wooden sail
#

like how databases store key-value pairs and you query the database to fish out the values you want, so the query needs to specify which keys to look for

#

that's really about it

#

you're looking for a deeper meaning where there isn't one tbh

verbal venture
#

I just don't know how someone can say queries (what the model is asking) * keys (what the vectors are offering) = attention, when no one knows how queries or keys even came to be

iron basalt
verbal venture
#

I think the problem is I'm actually looking for a super simple explanation (like how word2vec works via "words that are in similar contexts = the same), but you guys know a lot about DL so you're actually trying to offer a deeper meaning that isn't discovered/found yet

#

I think I'm going to find what I'm looking for through the computation graph of the forward pass

wooden sail
#

and since matrix multiplication can be written in terms of dot products, which when properly scaled are equivalent to cosine similarity, you can write that as query*key = value (weight)

#

there's no reason why it should be cosine similarity that is used

iron basalt
verbal venture
#

well doesn't that part make sense because you can compare vector direction

wooden sail
#

right, other than matmul being fast

#

sure, but if you want accuracy there are many other choices of similarity you might consider

#

and more importantly, this motivation does not mean this is what the network is doing either

iron basalt
#

DL works at all because we conveniently had an entire evolution of GPUs (for games / movies / etc) that became more general over time and happen to work well for this.

verbal venture
#

oh I see, you're saying what we planned vs what the model does may be unaligned

iron basalt
wooden sail
#

but the motivation was kinda like that. get some ideas from how dbs work, and notice that matmul is fast, can encompass a large number of tokens at the same time, and also represents "similarity" in some sense

iron basalt
verbal venture
#

so you're saying the transformer is doing X (what we think it's doing/what the authors designed it to do) but it could really be doing Y

#

and we just don't know

wooden sail
#

there are several papers looking at it, replacing different parts of the model, enforcing structure on Q, K, V, etc

iron basalt
#

For example, if you remove the position encoding it will learn to do it itself.

#

(And it ends up mimicking grid cells to do this (found in biology for positioning systems))

verbal venture
#

are you guys aware of any OG papers that explain why deep learning works (or could work)

#

example universal function approximation theorem

#

I heard the OG papers talk about why it works more than the current published stuff

wooden sail
#

there are several papers presenting different forms of universal approximation

#

you can also check out papers exploring the different optimizers, since they discuss nonconvex optimization and stochastic approximation

wooden sail
#

most of the general results in universal approximation are not constructive

#

2e.g. they say stuff like "as the number of layers goes to infinity", or "there is a number N of layers so that if you have a network with n > N layers, the error falls below epsilon", but they don't say what N is nor how it can be found

#

so the idea is "more layer good"

#

there are recent papers for specific architectures explaining under which conditions the training error goes to 0 though

#

iirc for unrolled ADMM and LISTA with relu activations. there might be others

#

for universal approx you can just follow the references in wikipedia

verbal venture
#

ah ok, ty

#

just wondering are you both phds?

wooden sail
#

working on it

verbal venture
#

word you did a masters in AI?

#

and how good should someone's math be for phd? is an undergrad in math enough?

gaunt wren
#

So, I've got a random data set with seemingly random columns (id <string>, f_0 to f_9 <random? numbers, although there seems to be a certain distribution>

#

I've got this data from an interviewer telling me to perform.. something?

#

basically i havent been given a specific task, just what the result should look like - IDs and 1s or 0s attached to them

#

any tips on how I should start?

verbal venture
#

you can link the homework if you want. visual example is better

gaunt wren
#

example data

#

example target (yes, the IDs do match)

#

does this help

#

the task itself just says "figure it out"

#

distribution in one pic

#

and this is as far as correlation coefficients go

spare forum
gaunt wren
#

properly adjusted

#

there seems to be a pretty good correlation for everything, though

dire island
#

hello, does anyone know how to plot a heatmap with matplotlib.plotly

agile cobalt
#

do you mean matplotlib.pyplot?
plotly is an entirely different package

#

iirc the default way is just using imshow though - pretty sure they have an example for it in the docs and/or gallery

dire island
#

sorry, meant matplotlib.pyplot

#

im trying to learn understand and practice using libraries like pandas, seaborn, numpy and so an. And I can create a heatmap in seaborn but i was asked for the same map but with matplotlib for the 'car_crashes'

spare forum
#

Generally I do like sns.heatmap(round(df.corr(),2), annot=True)

gaunt wren
#

alr did, factorized some none numerical stats and added the target column

#

I'd assume the target is achieved by combining columns that have lower corr coeff

#

also worth mentioning, this is probably not the way to check correlation between categorical and numerical data

ancient copper
#

Hey guys, can someone help me? I’m having trouble importing a TICKER with yfinance. I want to hide the error if the ticker doesn’t exist in the Yahoo Finance database, but currently, the terminal shows this error:

violet gull
#

catch the error

quaint mulch
#

or now that you have a target, just plot each against the target

quaint mulch
quaint mulch
verbal venture
#

I don’t think it’s a good idea to reference any heavy-hitting DL stuff after just doing MLP

quaint mulch
verbal venture
#

I think there’s a lot of angles to study deep learning but from your list you went from KA Math -> NYU deep learning course

quaint mulch
#

Well... there is the andrew ng course in the middle hahaha

#

and yes, there is a "speed run" feel to my list, you are correct

verbal venture
#

I think for PhD you need a shit ton of math

quaint mulch
#

That's true. My list was not specifically for PhD.
But I think you missed what I meant.
What I meant was for you to the content of "Basic Concepts" https://github.com/aprbw/ArianDLPrimer/tree/master?tab=readme-ov-file#basic-concepts

and not "Basics (from literal zero)" https://github.com/aprbw/ArianDLPrimer/tree/master?tab=readme-ov-file#basics-from-literal-zero

So, if you can follow along the math in https://arxiv.org/abs/2104.13478 and https://arxiv.org/pdf/2304.12210 then I think you be confident to start a PhD

GitHub

My personal list of what are the things to learn in deep learning. - aprbw/ArianDLPrimer

verbal venture
#

so that last matrix highlighted cell represents a weighted sum of all the words' features along a single dimension * the attention score of word 1 x word 1. I'm just wondering what getting a weighted sum of all the values * the attention score does?

iron basalt
# verbal venture so that last matrix highlighted cell represents a weighted sum of all the words'...

I skimmed this video, seems like it answers your qeustions: https://www.youtube.com/watch?v=eMlx5fFNoYc

Demystifying attention, the key mechanism inside transformers and LLMs.
Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3b1b.co/support
Special thanks to these supporters: https://www.3blue1brown.com/lessons/attention#thanks
An equally valuable form of support is to simply share the videos.

Demystifying self...

▶ Play video
verbal venture
#

the answer is the highlighted cell is just the final "meaning" of 1 feature of the word yeah? you take the word (attention row 1) * all words' values (col 1), get R1C1 in the ouput matrix. The weighted sum = the attention scores of that word with all other words * the contribution of the meaning of all other words (value vectors col 1), results in how much 1 feature from all words contribute both attention + semantically to word 1?

odd stratus
untold fable
#

how and where to learn ai

#

and data science

#

and machine learning and deep learning

violet gull
#

youtube

untold fable
#

on youtube there are ton of resourse

#

and teach us from basics

verbal venture
quaint mulch
wooden sail
lapis sequoia
#

Hey, about to download 9,000,000 images to train a model. Where should I store them for easy access? I’m worried that if I put them on S3 I’ll have to redownload it all if I want to train something.

odd stratus
#

it might be best to preprocess the images to the scale of the a.i. input so that way you dont have to process them for training and also to save storage size

past bramble
#

I'm having hard time calculating the shapes of conv2d, conv2d transpose layers with padding, strides and kernel size