hasty mountain May 30, 2023, 12:13 AM

#

Hey guys, is it possible to make a Variational AutoEncoder where the Encoder, instead of generating as output the mean and the log variance of a distribution(which will be the latent vector), instead generates the standard deviation directly?

I've been studying VAEs and I'm being quite troubled by the fact that I can't manage to make a model that works properly, so I'm trying to review some things such as the KL-Divergence for the Encoder loss, the Gaussian Likelihood for Decoder(not MSE), and my Encoder output...

For now, I've been making the Encoder output the standard deviation(or, at least, I guess I'm making it output the standard deviation) besides the mean, I've been using KL-Divergence loss for the Encoder - though I've seen that KL Divergence also has a "closed form" which is useful for multivariate dimensions - and Gaussian Likelihood for the Decoder instead of MSE(because I find the idea around Likelihood more mathmatically correct and interesting)

earnest widget May 30, 2023, 5:30 AM

#

What is the use of global average pooling? Does it make any difference in terms of performance for image classification?

#

I went through this link but I did not understand what the advantage. https://stackoverflow.com/questions/46036522/defining-model-in-keras-include-top-true

Stack Overflow

Defining model in keras (include_top = True)

Can somebody tell me what include_top= True means when defining a model in keras?

I read the meaning of this line in Keras Documentation. It says include_top: whether to include the fully-connected

dusk tide May 30, 2023, 6:12 AM

#

thanks

past meteor May 30, 2023, 6:41 AM

#

hasty mountain Hey guys, is it possible to make a Variational AutoEncoder where the Encoder, in...

The log is just there so sigma is positive if I remember correctly

#

You can do it but idk how well it would play around with relu

errant lake May 30, 2023, 6:44 AM

#

dusk tide thanks

👌

dense crane May 30, 2023, 8:25 AM

#

is it normal that acc drops down and then backs to growing?

#

#

like the question is more like should i keep it or interupt the training already?

cold osprey May 30, 2023, 8:32 AM

#

Is that train loss/accuracy or test?

#

I would just leave it to 30 epochs and plot a loss /accuracy graph to have a look

oblique quarry May 30, 2023, 8:39 AM

#

I've got 2 questions. Do i have to update the kernel based on backprop? if yes, how do i do that? And can i add 2 kerneloutputs togther(I mean yes i could cuz they're the same shape) but would it have any benefits? https://paste.pythondiscord.com/xizuciqego

mint palm May 30, 2023, 9:34 AM

#

this is my model

model = BlipModel.from_pretrained("Salesforce/blip-image-captioning-base")

this model input frames of size 3,224,224 and output embedding of size 512
on one gpu it can take 40 frames (40,3,224,224).
I want to use 4 gpu so that it take 160 frames.
but i am having error in gpu parallelism implementation:

model = BlipModel.from_pretrained("Salesforce/blip-image-captioning-base")
device_ids = [0, 1, 2, 3]
model = nn.DataParallel(model, device_ids=device_ids)

after this do i need to split input? but then splitting means processing one part at a time: then we are NOT actually making it effiecient
let us assume video_data is 160,3,224,224 sized input, how to use 4 gpus now to get 160, 512 output

dense crane May 30, 2023, 9:48 AM

#

cold osprey I would just leave it to 30 epochs and plot a loss /accuracy graph to have a loo...

yea but i realised that maybe it was too less data for each class so now i have 1000 per class instead of 200 so i guess it should improve acc a lot

hasty mountain May 30, 2023, 12:43 PM

#

past meteor You can do it but idk how well it would play around with relu

Yeah, that's exactly what I was doing. The only problem is that I can't use Gaussian initialization for the encoder, since it may generate 0s as outputs.

#

I'll see how using variance goes (if Colabs allows me). Maybe using ReLU for the standard deviation may restrict a bit the values the model can output(no negatives, for example)...idk...

past meteor May 30, 2023, 1:05 PM

#

hasty mountain Yeah, that's exactly what I was doing. The only problem is that I can't use Gaus...

Any reason why you can't use the default initialization (Glorot or He) and why you're going with gaussian?

hasty mountain May 30, 2023, 1:06 PM

#

past meteor Any reason why you can't use the default initialization (Glorot or He) and why y...

Nope. That's why I wasn't using gaussian init for my encoder at that time

past meteor May 30, 2023, 1:47 PM

#

By the way, I just to think that having negatives in your initialization caused dead relu immediately which isn't the case. It's perfectly fine for things to be negative or exactly 0.

errant bison May 30, 2023, 1:48 PM

#

what nueral nets can i use for ANPR?

#

or any tutorial for the same?

#

can u tell for the same but using nueral nets?

potent sky May 30, 2023, 2:28 PM

#

errant bison can u tell for the same but using nueral nets?

Everything I've mentioned is implemented using Neural Nets. Except the last part where I've specifically mentioned "traditional image processing"

errant bison May 30, 2023, 2:32 PM

#

potent sky Everything I've mentioned is implemented using Neural Nets. Except the last part...

YOLO is nueral nets?

potent sky May 30, 2023, 2:33 PM

#

YOLO is an algorithm, that employs neural nets, yes

errant bison May 30, 2023, 2:34 PM

#

ohh

#

so R-CNN?

hasty mountain May 30, 2023, 3:40 PM

#

past meteor By the way, I just to think that having negatives in your initialization caused ...

Yes but...suppose that I have 3 image samples, and the encoded version of they would provide a var of like, -1.2, -1.14 and -0.7.
With the ReLU, those 3 images would be all encoded into 0., all the three of them, which I suppose might be problematic

past meteor May 30, 2023, 3:51 PM

#

hasty mountain Yes but...suppose that I have 3 image samples, and the encoded version of they w...

You mean the variance might be encoded to 0? Are we stil talking about what happens right before the sample step?

hasty mountain May 30, 2023, 3:52 PM

#

pithink

#

My encoder outputs a mean and a variance through a fully connected layer. So, if the FCC for the variance is followed by a ReLU activation...it could be encoded to 0

Then, I'd have a normal distribution with no variance at all...

#

Now that I'm thinking about it...perhaps this could be catastrophic...🥲

past meteor May 30, 2023, 3:54 PM

#

In all honesty, I'm a bit out of depth here so I don't think I can help you either way. I don't really understand your problem domain. Haven't worked with VAE's specifically, just looked at the math. From that perspective I know that that's the reason why log(variance) is the output

#

I've worked a lot with regular conv autoencoders hence why I bothered wasting your time at all :/

hasty mountain May 30, 2023, 3:55 PM

#

Exactly because I'm a bit stranger to the maths that I thought I could use standard deviation directly...but I didn't think of those things

hasty mountain May 30, 2023, 3:57 PM

#

past meteor I've worked a lot with regular conv autoencoders hence why I bothered wasting yo...

Too bad latent diffusion goes for VAE.
I thought about using a vanilla autoencoder, but I guess it could prejudice the generation process

past meteor May 30, 2023, 3:58 PM

#

hasty mountain *Exactly because I'm a bit stranger to the maths that I thought I could use stan...

You can always just watch deepmind's lecture on variational inference. Part of it goes into VAE's

hasty mountain May 30, 2023, 3:59 PM

#

past meteor You can always just watch deepmind's lecture on variational inference. Part of i...

Is it in coursera? Can you send the link?

past meteor May 30, 2023, 4:00 PM

#

Just on YouTube. Look for their channel and their DL series. It's class. Can't send you the link right now because I'm going to start my commute.

hasty mountain May 30, 2023, 4:01 PM

#

I found it. Thanks!

potent sky May 30, 2023, 4:05 PM

#

errant bison so R-CNN?

YOLO can use plain CNNs
For object detection YOLO is faster and more efficient than R-CNNs, Fast-RCNNs and Faster-RCNNs

dense crane May 30, 2023, 4:34 PM

#

how can i open the .ipynb file stored in local machine with the data also in local machine (now i moving it into a google drive) in the google colab ?

agile cobalt May 30, 2023, 4:36 PM

#

move both the file and the data to google colab

#

the code runs on their servers, and access files on your Google Drive

past meteor May 30, 2023, 4:38 PM

#

What I used to do is version control a folder on my local machine and clone my repo on google colab

#

It's a lot of overhead initially because you need to make a token on git etc. but it makes move between your local machine and colab a lot smoother

placid cedar May 30, 2023, 4:43 PM

#

hi again

#

apparently chatgpt says i should do the numerical variable transformation before splitting the data...

#

is it ok to do so before train test split?

past meteor May 30, 2023, 4:44 PM

#

What do you understand as "numerical variable transformation"

agile cobalt May 30, 2023, 4:45 PM

#

generally speaking you must apply any transformations you make to the data to both the training and the test groups

placid cedar May 30, 2023, 4:45 PM

#

since im doing a linear regression model, i have to ensure that my numerical variables have to follow a linear regression line by using the Q-Q plots

hasty mountain May 30, 2023, 4:45 PM

#

past meteor It's a lot of overhead initially because you need to make a token on git etc. bu...

It would be cool to do something like...hosting your data in your own server and accessing this server directly through your colab file pithink

placid cedar May 30, 2023, 4:45 PM

#

as far as what i was taught to do

past meteor May 30, 2023, 4:46 PM

#

So you mean like a log transformation?

placid cedar May 30, 2023, 4:46 PM

#

placid cedar since im doing a linear regression model, i have to ensure that my numerical var...

so i wld be trial and error by using different transformation methods like power transformaton

past meteor May 30, 2023, 4:46 PM

#

Or box-cox, ...

placid cedar May 30, 2023, 4:46 PM

#

log

past meteor May 30, 2023, 4:46 PM

#

Are you using sklearn?

placid cedar May 30, 2023, 4:47 PM

#

yep

#

im allowed to use any libraries in fact

past meteor May 30, 2023, 4:47 PM

#

So you should make a Pipeline that does a log transform before applying the regression

#

Tbh a log transform doesn't depend on the data in your train or test set but honestly if you're beginning I'd really try to keep them 100 % separate

placid cedar May 30, 2023, 4:48 PM

#

i actually really wanted to see the best mse and r square test results i can get based on my trial and error, testing out all the different methods

#

so i was firm with splitting the data into train and test

#

then i do my transformation

#

to find the best results or smth

past meteor May 30, 2023, 4:49 PM

#

Split -> QQ plot -> log/box cox / ...

dense crane May 30, 2023, 4:49 PM

#

agile cobalt the code runs on their servers, and access files on your Google Drive

ok thx, but how do i use their gpu i mean am i have to move the data to gpu somehow or it is all happend automatically?

placid cedar May 30, 2023, 4:49 PM

#

so far i progressed quite a lot actually, i handled my outliers, imputation complete, encoding done as well

past meteor May 30, 2023, 4:50 PM

#

If you use Pipeline you ensure the same transformations are applied on train and test as well, like etrotta said

placid cedar May 30, 2023, 4:51 PM

#

past meteor Split -> QQ plot -> log/box cox / ...

so to find the best transformation, should i base it on the Q-Q plot or my MSE and R-SQUARE results

agile cobalt May 30, 2023, 4:51 PM

#

dense crane ok thx, but how do i use their gpu i mean am i have to move the data to gpu some...

depends on which library are you using, some have methods like tensor.to(device)

#

check the docs

placid cedar May 30, 2023, 4:51 PM

#

because everyone in my class is competing to get the best mse and r square scores LOL

past meteor May 30, 2023, 4:51 PM

#

placid cedar so to find the best transformation, should i base it on the Q-Q plot or my MSE a...

The way I do it is by making predictions and then visualizing the residuals (the error) I'm making with respect to all variables

dense crane May 30, 2023, 4:52 PM

#

agile cobalt depends on which library are you using, some have methods like `tensor.to(device...

you mean like pytorch or tf?

past meteor May 30, 2023, 4:52 PM

#

If you see that the error is not normally distributed with respect to a variable then you can start thinking about log transformations etc.

agile cobalt May 30, 2023, 4:52 PM

#

dense crane you mean like pytorch or tf?

yes.
again, check the documentation of the library you are using

past meteor May 30, 2023, 4:52 PM

#

It's hard to explain tbh but this is generally a solid method

dense crane May 30, 2023, 4:53 PM

#

ok thx man

placid cedar May 30, 2023, 4:54 PM

#

how wld i know what's the best transformation tho

#

my teacher always tells us abt the q-q plot, in which u shld get the most data points on the linear regression line

past meteor May 30, 2023, 4:54 PM

#

Imagine if your x-axis is time and your y-axis is the error (on the test set), if the error is increasing along time then you can consider a log transform for example

#

https://www.statisticshowto.com/residual-plot/ this explains it better than I'm doing right now. Also look at this link: https://www.sportsci.org/resource/stats/logtrans.html

placid cedar May 30, 2023, 4:57 PM

#

aite will look into it

past meteor May 30, 2023, 4:58 PM

#

For example, I had a demand forecasting dataset once and I noticed that the error exploded in early April. Your residual plot "tells" you you're not modelling the impact of Easter accurately

dense crane May 30, 2023, 5:29 PM

#

is it noraml that reading data to google colab takes 7 min while the same opearation in vs code was taking like 1 min

spiral inlet May 30, 2023, 6:28 PM

#

Anyone interested in participating in this competition?
https://www.kaggle.com/competitions/playground-series-s3e16/

Regression with a Crab Age Dataset

Playground Series - Season 3, Episode 16

potent sky May 30, 2023, 7:14 PM

#

past meteor What I used to do is version control a folder on my local machine and clone my r...

same. and you can just run the script to set it up, don't have to manually do it every time

potent sky May 30, 2023, 7:17 PM

#

hasty mountain It would be cool to do something like...hosting your data in your own server and...

yeah would be cool but your internet would be used for the upload, instead of github's ;-;

potent sky May 30, 2023, 7:18 PM

#

dense crane is it noraml that reading data to google colab takes 7 min while the same opeara...

wdym? reading data from a csv file into colab? then generally no

dense crane May 30, 2023, 7:18 PM

#

the bunch of .jpeg files

#

@potent sky

potent sky May 30, 2023, 7:20 PM

#

if they're on the colab runtime storage then no, they shouldn't take that much longer

dense crane May 30, 2023, 7:22 PM

#

i have the data on g drive and i did mount the gdrive

potent sky May 30, 2023, 7:22 PM

#

then again, it shouldn't take that much longer. Unless you have too many files in your gdrive

#

tbh that shouldn't make it much slower after mounting either

dense crane May 30, 2023, 7:24 PM

#

i mean this have like 3GB, where i always take only some part of it like 25% and the diff is that when i am doing the same operation with vs code it doesnt take thhat much time

#

like at most 2min while in colab it would be around 10-15 min

potent sky May 30, 2023, 7:25 PM

#

Are you just talking about loading in the data or are you performing some processing on it too

dense crane May 30, 2023, 7:26 PM

#

both

#

i m loading the images and then tensoring them

potent sky May 30, 2023, 7:27 PM

#

Compare just loading times for both

#

Also why are you loading the entire dataset into memory at once

dense crane May 30, 2023, 7:27 PM

#

because i m training the model

#

like not the whole

#

but like 5k images

#

to train a model

#

and the whole dataset have like 30k

potent sky May 30, 2023, 7:29 PM

#

You can retrieve on the fly

dense crane May 30, 2023, 7:30 PM

#

not sure if i understand

potent sky May 30, 2023, 7:30 PM

#

potent sky Compare just loading times for both

Still try this.
The key is to identify the bottleneck first.
Then you can figure out how to get rid of it

dense crane May 30, 2023, 7:30 PM

#

make sense

potent sky May 30, 2023, 7:31 PM

#

dense crane not sure if i understand

You can load each instance in mini-batches from the disk, instead of loading them all at once into memory

#

That's what you do when dealing with huge datasets that you just cannot load into memory all at once
But loading into memory should make things faster later

dense crane May 30, 2023, 7:32 PM

#

but in general it would take the same amout of time because i m using all loaded data in training

#

ok i will consider it but still dont know how this would cut the time spend on loading the data

potent sky May 30, 2023, 7:38 PM

#

Because it wouldn't load all the data in at once but only as it's required for training

#

It would take overall more time (over multiple epochs) because loading from disk is slower

#

Ignore this part it's not very relevant rn

potent sky May 30, 2023, 7:41 PM

#

potent sky Also why are you loading the entire dataset into memory at once

This was an off hand question

dense crane May 30, 2023, 7:49 PM

#

ok thx anyway

lapis sequoia May 30, 2023, 9:05 PM

#

Guys. What are some dashboarding tools that can get connected to python environments but be modified in a interactive way

hasty mountain May 30, 2023, 9:43 PM

#

hasty mountain Yeah, that's exactly what I was doing. The only problem is that I can't use Gaus...

I may have discovered the problem... may

The decoder in a VAE has a specific trick to generate images...it generates a location in a distribution...since I'm training it on a Gaussian Distribution, it generates a location in a Normal Distribution...so I can't simply print its output thinking it's an image...I have to de-Normalize it

#

(Yes, I just noticed the normalization thing now and because of that...so this also explains why normalization isn't the same as scaling...)

#

Also, I was using tanh as final activation function for my decoder... py_guido

magic dune May 30, 2023, 10:30 PM

#

hasty mountain *Also, I was using tanh as final activation function for my decoder...* <:py_gui...

is tanh same as sigmoid

hasty mountain May 30, 2023, 10:31 PM

#

Yes...I guess that's exactly the problem...they aren't.

junior stone May 30, 2023, 10:48 PM

#

https://youtu.be/BNBnFKCzG0s?t=88

YouTube

SULLY

This AI Tool will Change the Way We Watch YOUTUBE Videos FOREVER ! ...

Try SkmAI Beta out on chrome (Youtube): https://chrome.google.com/webstore/detail/skmai-ai-powered-video-se/nkkklchgjghdppjfponpogcfgggchjef

Hope yall enjoy the extension. Youtube is the start, we aim to revolutionize users and business consume and search for information. Search through meaning.

▶ Play video

#

try Skm https://chrome.google.com/webstore/detail/skmai-ai-powered-video-se/nkkklchgjghdppjfponpogcfgggchjef

SkmAI: AI-powered video search on Youtube

'Skim' through Youtube videos using our revolutionary AI-powered video search tool

serene scaffold May 30, 2023, 10:52 PM

#

@junior stone please don't "drop and run" links. if you think this link is interesting, say why. if you're just promoting it, please remove it.

junior stone May 30, 2023, 10:53 PM

#

oh ok this tool I found is interesting coz its lit allow you to search thourgh Youtube Video through the power of AI

tidal bough May 30, 2023, 10:54 PM

#

This AI Tool will Change the Way We Watch YOUTUBE Videos FOREVER ! ...
^video titles I'll never watch

junior stone May 30, 2023, 10:54 PM

#

you can index to relevant parts of any video

#

using AI

#

by searchin through meanin

errant bison May 30, 2023, 10:54 PM

#

how can we train the model based on images?

junior stone May 30, 2023, 10:55 PM

#

tidal bough > This AI Tool will Change the Way We Watch YOUTUBE Videos FOREVER ! ... ^video ...

Try it out and I promise you, you'll come back here and tell everyone to try it as well

#

Neural search

#

I answered u AI is vague ur right. Neural search

night kernel May 30, 2023, 11:22 PM

#

does anyone know about commercial licensing with open sourced llm's like llama and alpaca?

#

looking to make my own chatbot based on a small amount of text and use it commerically, ideally without using cgpt

serene scaffold May 30, 2023, 11:52 PM

#

night kernel does anyone know about commercial licensing with open sourced llm's like llama a...

you should look for a license somewhere in connection to the file for the model. for example, this is a "LICENSE" file that comes with the original BERT model: https://huggingface.co/bert-base-uncased/blob/main/LICENSE

LICENSE · bert-base-uncased at main

merry stone May 31, 2023, 4:03 AM

#

im planning on implementing some sort of food classification in my app, now I dont wanna manually label the dataset and train my models right, there should be existing ones that perform really well? i tried object detection using keras inception v3 but it doesnt work for all sorts of food only simple stuff like banana. Theres the food101 dataset thats split into training and testing but do i have to download it to train my own model on my pc or is there an existing one i can use? if so, how? sorry im a newbie

agile cobalt May 31, 2023, 4:08 AM

#

merry stone im planning on implementing some sort of food classification in my app, now I do...

you can try browsing existing models like ones available in https://huggingface.co/models?pipeline_tag=image-classification&sort=downloads&search=food but much of the time you'll have to gather and label a few examples and fine tune, unless you want the exact same categories that were present in the model's training data

merry stone May 31, 2023, 4:15 AM

#

so im a noob at ml in general

#

but basically lets say i have 100 images then i split them into training and testing sets right? then for training i would label the food and each category is basically a food item right?

agile cobalt May 31, 2023, 4:17 AM

#

pretty much yes, but you need to have at least a handful of labelled images of each category

merry stone May 31, 2023, 4:18 AM

#

so thats when the food 101 dataset comes in right? it says it has 100k ish labeled images

agile cobalt May 31, 2023, 4:18 AM

#

the issue is whenever the labels they use are fit for your problem or not

merry stone May 31, 2023, 4:19 AM

#

so it should be better than object detection right

#

since this is for food specifically

agile cobalt May 31, 2023, 4:19 AM

#

using a pre-trained model can bring down the number of images you need from hundreds if not thousands per category to (in nearly a best case scenario) a dozen or so per category, but you still need of a bit of data

cold osprey May 31, 2023, 4:19 AM

#

depends on ur problem

merry stone May 31, 2023, 4:20 AM

#

agile cobalt using a pre-trained model can bring down the number of images you need from hund...

but if i were to create a new category or food item lets say then i would need thousands again right

#

also i have a really dumb question but how are the models like stored

agile cobalt May 31, 2023, 4:21 AM

#

if it is entirely new and completely unlike anything the model was pre-trained on, yes
if it is similar enough to the data the model is already familiar with, no

merry stone May 31, 2023, 4:21 AM

#

like when I train a model how is it used like is it a text file locally or something

agile cobalt May 31, 2023, 4:22 AM

#

a model is just a crap ton lot of weights + some meta data about how to use these weights
they are stored in a custom file format, but are stored in files just like a text file would be stored

merry stone May 31, 2023, 4:22 AM

#

hmmm so when does tensorflow come into play for this

agile cobalt May 31, 2023, 4:23 AM

#

it knows how to use parse that meta data into instructions your computer can follow, as well as some other stuff useful for creating the model in first place

merry stone May 31, 2023, 4:23 AM

#

ahh gotcha

#

so for fine tuning

#

how much work is it usually

#

like you have to mess with hyper parameters or something right?

agile cobalt May 31, 2023, 4:23 AM

#

less than creating from scratch, but still a fair bit

#

depends on which model you use as the base

merry stone May 31, 2023, 4:24 AM

#

hmmm okayy

cold osprey May 31, 2023, 4:24 AM

#

model choice will depend on how u plan to deploy ur model

merry stone May 31, 2023, 4:24 AM

#

so if i steal someones model then i wouldn't have to deal with training or using the dataset at all right

cold osprey May 31, 2023, 4:24 AM

#

a model living in a mobile device would generally be smaller/weaker than one living on the cloud e.g.

agile cobalt May 31, 2023, 4:24 AM

#

fine tuning is still training, just on a smaller dataset

#

but if you do not even fine tune, you do not have to worry about training, though you'd be locked to the output of the original model

merry stone May 31, 2023, 4:26 AM

#

cold osprey a model living in a mobile device would generally be smaller/weaker than one li...

if it's a github pages hosted webapp

agile cobalt May 31, 2023, 4:26 AM

#

💀

merry stone May 31, 2023, 4:26 AM

#

can it still use the model file from the repository

agile cobalt May 31, 2023, 4:26 AM

#

you know that github pages cannot run python code at all right? it only serves static pages (html)

merry stone May 31, 2023, 4:26 AM

#

oh really

#

shit

agile cobalt May 31, 2023, 4:27 AM

#

I mean, you can use tensorflow.js or pyscript, but it gets a bit complicated and you'll need to use a small model

merry stone May 31, 2023, 4:27 AM

#

so if i want to host a web app

cold osprey May 31, 2023, 4:27 AM

#

flask app?

#

vercel can host flask apps

merry stone May 31, 2023, 4:27 AM

#

hmmm flask with python backend right

cold osprey May 31, 2023, 4:28 AM

#

ive not hosted one with a ml model yet tho

#

its on my to do list

#

hosted on vercel

merry stone May 31, 2023, 4:28 AM

#

it should work right

#

normally

cold osprey May 31, 2023, 4:28 AM

#

should

merry stone May 31, 2023, 4:28 AM

#

for the model is it just one .bin file thats it

agile cobalt May 31, 2023, 4:29 AM

#

some models are too large to fit in most common hosting providers unless you pay $$$$$, but you should be able to find one small enough for your use case to host for cheap, maybe even fit on free tiers

#

there's also the option of just using an API like Hugging Face's Inference API instead of actually hosting the model, specially if you do not plan to fine tune yourself

merry stone May 31, 2023, 4:32 AM

#

agile cobalt there's also the option of just using an API like Hugging Face's Inference API i...

how does this work sorry im a newbie

cold osprey May 31, 2023, 4:33 AM

#

just imagine normal API

merry stone May 31, 2023, 4:33 AM

#

oh

#

so i wouldnt have to download the model right

agile cobalt May 31, 2023, 4:34 AM

#

yes, if you can find a model whose outputs are fit for your use case

merry stone May 31, 2023, 4:35 AM

#

ohh

#

whats a model class

cold osprey May 31, 2023, 4:36 AM

#

like a normal class

#

but representing a model

merry stone May 31, 2023, 4:36 AM

#

how do i know what that is tho

#

im using chatgpt for help and this is what i have so far is this looking okay: ```py
import torch
import torch.nn as nn
import torchvision.transforms as transforms
from PIL import Image
from keras.utils import img_to_array
import numpy as np
import tensorflow as tf
from keras.applications.inception_v3 import preprocess_input, decode_predictions

Load the PyTorch model

pytorch_model = YourModelClass() # Replace with your PyTorch model class
pytorch_model.load_state_dict(torch.load('pytorch_model.bin'))
pytorch_model.eval()

Define the TensorFlow model

class TFModel(tf.keras.Model):
def init(self, pytorch_model):
super(TFModel, self).init()
self.pytorch_model = pytorch_model
self.softmax = nn.Softmax(dim=1)

def call(self, inputs):
    inputs = tf.convert_to_tensor(inputs)
    inputs = tf.transpose(inputs, [0, 3, 1, 2])  # Transpose image dimensions
    inputs = preprocess_input(inputs.numpy())  # Preprocess image
    inputs = torch.from_numpy(inputs).float()  # Convert to PyTorch tensor
    outputs = self.pytorch_model(inputs)  # Forward pass
    outputs = self.softmax(outputs)  # Apply softmax
    return outputs.detach().numpy()

Convert PyTorch model to TensorFlow model

tf_model = TFModel(pytorch_model)

Function to recognize and label food

def recognize_food(image_path):
img = Image.open(image_path).convert("RGB")
img = img.resize((299, 299))
img = img_to_array(img) / 255.0
img = np.expand_dims(img, axis=0)

preds = tf_model.predict(img)
decoded_preds = decode_predictions(preds, top=1)[0]

food_label = decoded_preds[0][1]
confidence = decoded_preds[0][2]

return food_label, confidence

Example usage

image_path = 'banana.jpg'
food_label, confidence = recognize_food(image_path)

print(f"Food: {food_label}, Confidence: {confidence}")

agile cobalt May 31, 2023, 4:38 AM

#

GVPikachuFacePalm

#

Convert PyTorch model to TensorFlow model

tf_model = TFModel(pytorch_model)

#

no, just no

merry stone May 31, 2023, 4:39 AM

#

what does it mean

#

ohh

#

pytorch and tensorflow are different right

agile cobalt May 31, 2023, 4:39 AM

#

yes

merry stone May 31, 2023, 4:39 AM

#

i forgot

#

so i need a tensorflow model then?

agile cobalt May 31, 2023, 4:39 AM

#

you need of a model that matches the library you are using

#

I recommend taking some time to read the documentation and/or some tutorials before you try anything else, and do not trust chatgpt if you cannot discern if it's output makes sense

merry stone May 31, 2023, 4:41 AM

#

hmmmm

#

so ig read up on pytorch first

#

then try

agile cobalt May 31, 2023, 4:41 AM

#

good luck

merry stone May 31, 2023, 4:41 AM

#

thanks for the help!

potent sky May 31, 2023, 4:54 AM

#

agile cobalt > # Convert PyTorch model to TensorFlow model > tf_model = TFModel(pytorch_model...

omg lmao

#

There should be a chatgpt meme page

merry stone May 31, 2023, 4:55 AM

#

how can i find the class labels of the model

potent sky May 31, 2023, 4:58 AM

#

merry stone im planning on implementing some sort of food classification in my app, now I do...

Whatever dataset you're using will have a set of labels if you're planning to train or fine-tune on that. These will be the labels associated with your model
If you're just using some model out-of-the-box, check out the model documentation, it should describe what each output corresponds to

merry stone May 31, 2023, 5:00 AM

#

i assume these are the labels then? https://huggingface.co/skylord/swin-finetuned-food101/blob/main/config.json are these the only food items it recognizes?

config.json · skylord/swin-finetuned-food101 at main

potent sky May 31, 2023, 5:01 AM

#

merry stone i assume these are the labels then? https://huggingface.co/skylord/swin-finetune...

The id2label part yes, those 101 labels are the outputs associated with this model

merry stone May 31, 2023, 5:02 AM

#

shit this is hard then lol

#

it doesnt have banana :(

plush jungle May 31, 2023, 5:02 AM

#

can someone help explain attention to me? I understand that the goal is to get a matrix that maps every value in one vector to every value in another and in each cell of the matrix you get a score. but in the transformer architecture, what does this matrix actually represent?

potent sky May 31, 2023, 5:02 AM

#

Yes, those are the only items it recognises because that's what it's been trained to do.
Ofcourse if you run it on things it doesn't recognise it'll predict the closest thing it figures

merry stone May 31, 2023, 5:03 AM

#

potent sky Yes, those are the only items it recognises because that's what it's been traine...

but that closest thing will also be one of these 101 right

potent sky May 31, 2023, 5:03 AM

#

Yes

#

Because that's all it "knows" that's all it's been trained on

merry stone May 31, 2023, 5:04 AM

#

but is there really no universal food detection ml model then

cold osprey May 31, 2023, 5:04 AM

#

u could use these pretrained models to create some sort of feauture map

#

and use in in (iirc the attention is all you need model)

potent sky May 31, 2023, 5:04 AM

#

plush jungle can someone help explain attention to me? I understand that the goal is to get ...

There's a video that will probably explain better than on text here, I'll see if I can find it

cold osprey May 31, 2023, 5:05 AM

#

feature embeddings or smth

#

https://machinelearningmastery.com/the-vision-transformer-model/ this

merry stone May 31, 2023, 5:05 AM

#

cold osprey u could use these pretrained models to create some sort of feauture map

but is there like an official pretrained model by some big company or something for detecting food

cold osprey May 31, 2023, 5:05 AM

#

no idea, google

merry stone May 31, 2023, 5:05 AM

#

because for object detection theres pretty accurate universal models right

cold osprey May 31, 2023, 5:06 AM

#

doubt there is coz food items is virtually infinite

#

like u can always break it down to more and more specific food items

potent sky May 31, 2023, 5:06 AM

#

merry stone but is there really no universal food detection ml model then

No but you can build one with all the classes you need.
You can use a pre-trained model as a feature extractor.
Remove it's classification head (final layer)
Attach your own final layers with 102 classes instead of 101 (including banana).
Freeze the previous feature extractor layers (to prevent something called catastrophic forgetting)
And then train on a small dataset of bananas so it learns to recognise these

cold osprey May 31, 2023, 5:06 AM

#

catastrophic forgetting TIL

merry stone May 31, 2023, 5:07 AM

#

potent sky No but you can build one with all the classes you need. You can use a pre-traine...

but i have to add like hundreds of classes :(

#

isnt that like manual labor

potent sky May 31, 2023, 5:07 AM

#

cold osprey `catastrophic forgetting` TIL

It's one of the most annoying and challenging problems to advancing ML to be more generic ngl

cold osprey May 31, 2023, 5:07 AM

#

if u have the classes u need in a list, shud be done with a few lines of code imo

potent sky May 31, 2023, 5:08 AM

#

merry stone but i have to add like hundreds of classes :(

Wdym. There are hundreds of food items you need to recognise that are not present in this model?

merry stone May 31, 2023, 5:08 AM

#

potent sky Wdym. There are hundreds of food items you need to recognise that are not presen...

probably since banana wasnt even there

potent sky May 31, 2023, 5:08 AM

#

cold osprey if u have the classes u need in a list, shud be done with a few lines of code im...

.

merry stone May 31, 2023, 5:08 AM

#

cold osprey if u have the classes u need in a list, shud be done with a few lines of code im...

but then ill need hundreds of images for each class right

cold osprey May 31, 2023, 5:08 AM

#

ull need images for each class for sure

#

how many is model dependent

merry stone May 31, 2023, 5:09 AM

#

should I just use this model and skip banana then

potent sky May 31, 2023, 5:09 AM

#

Also you don't need to keep these 101 food items
You can completely get rid of them and have your own 50 items or something
But then you will need a suitable dataset of these 50 items to train the model on

merry stone May 31, 2023, 5:09 AM

#

im just trying to add a project in my resume :(

#

idk if this is even a good one

potent sky May 31, 2023, 5:10 AM

#

merry stone but then ill need hundreds of images for each class right

Yep
There are fruit datasets online, have a look. I'd built a food recognition app a long time ago

merry stone May 31, 2023, 5:10 AM

#

i thought a food recognition/nutrition info/calorie app or something

cold osprey May 31, 2023, 5:10 AM

#

potent sky Yep There are fruit datasets online, have a look. I'd built a food recognition a...

funny, the pytorch course im going thru rn uses a food recog model too to teach

potent sky May 31, 2023, 5:11 AM

#

merry stone idk if this is even a good one

It's a step by step thing. Just make sure to learn the most you can from whatever you're doing and you should have a "good" project in due course

merry stone May 31, 2023, 5:11 AM

#

it's not a course tho

#

i just want to replace a cad project thats wasting space in my resume

potent sky May 31, 2023, 5:11 AM

#

cold osprey funny, the pytorch course im going thru rn uses a food recog model too to teach

Interesting. At the time I was learning flutter so I thought it'd be cool to build something combining both ML and flutter

merry stone May 31, 2023, 5:11 AM

#

but when it comes to ml i feel like i never learn

#

like I TAed for a ml course but i still feel so lost

potent sky May 31, 2023, 5:12 AM

#

merry stone but when it comes to ml i feel like i never learn

It's really a process with ML, dw

cold osprey May 31, 2023, 5:12 AM

#

potent sky Interesting. At the time I was learning flutter so I thought it'd be cool to bui...

weird coincidence, ive a friend whos brushing up his flutter too rn hahah

merry stone May 31, 2023, 5:12 AM

#

because Im used to like C programming

potent sky May 31, 2023, 5:12 AM

#

Takes time. It's a vast field. And it lies at the intersection of so many other fields. You have to know so much to be confident

merry stone May 31, 2023, 5:12 AM

#

so when i do these like super shortcuts in python where u can solve like huge problems in 5 lines of code i barely understand whats going on

potent sky May 31, 2023, 5:13 AM

#

merry stone so when i do these like super shortcuts in python where u can solve like huge pr...

Have you studied ML theory?

merry stone May 31, 2023, 5:13 AM

#

i took two classes

#

im a sophomore tho so i dont know shit :(

potent sky May 31, 2023, 5:13 AM

#

Probability, information theory, statistics, AI basics, statistical ML, then deep learning

merry stone May 31, 2023, 5:14 AM

#

nah i only took a data science/ml intro class and a data mining class

#

and linear

potent sky May 31, 2023, 5:14 AM

#

merry stone so when i do these like super shortcuts in python where u can solve like huge pr...

These are mostly to attract people.
ML does take a lot of time and effort to truly understand what's going on. I learn so many new things everyday

merry stone May 31, 2023, 5:14 AM

#

yeah like i want to be able to grasp the whole thing i feel like im missing out there

#

like ive worked on this complicated nlp project with a team that was a success but i didnt understand shit cause i just googled to make the code work

#

but i didnt understand anything overall

potent sky May 31, 2023, 5:15 AM

#

If you urgently need a new project then there are video courses on YT like 4-5hrs long that build a project live so you can follow along.
Otherwise I'd really advise start with the fundamentals, and bite sized simple projects (so that you can see some tangible result)

merry stone May 31, 2023, 5:16 AM

#

hmmm

#

the main goal of this was to really get comfy with the web app part

#

so i wanted to take a shortcut on the ml side

potent sky May 31, 2023, 5:17 AM

#

Wait your primary internet is web dev or ml

#

Ah I see

merry stone May 31, 2023, 5:17 AM

#

idk like a full web app with backend data base etc

#

but every webapp project of mine turns into front end lol

potent sky May 31, 2023, 5:18 AM

#

plush jungle can someone help explain attention to me? I understand that the goal is to get ...

https://youtu.be/4Bdc55j80l8

YouTube

The A.I. Hacker - Michael Phi

Illustrated Guide to Transformers Neural Network: A step by step ex...

Transformers are the rage nowadays, but how do they work? This video demystifies the novel neural network architecture with step by step explanation and illustrations on how transformers work.

CORRECTIONS:
The sine and cosine functions are actually applied to the embedding dimensions and time steps!

Audo Studio | Automagically Make Audio Reco...

▶ Play video

#

For the original attention paper (Bahdanau et. al. 2015) I really think just reading the paper is your best bet

worthy phoenix May 31, 2023, 5:19 AM

#

any good vids to getting started with pytorch or tensorflow? cuz ig if i learn one framework the weights and biases should not be that hard to transfer to the other one

plush jungle May 31, 2023, 5:19 AM

#

ok I guess here's the main source of my confusion with attention. for self attention for example, you get n layers, and m heads, and each head calculates an attention matrix:

but then those matrices gets passed through a linear layer and then a feed forward layer? am I correct in thinking that self attention learns relationships between tokens, and then the feed forward layer learns which of those relationships are important?

potent sky May 31, 2023, 5:20 AM

#

merry stone idk like a full web app with backend data base etc

It depends on what you're looking to do. If you're just looking to learn and demonstrate how to integrate ML models into a web app, then picking an existing pre-trained model and just limiting to what it does should be sufficient
If you're looking to get a good grasp of how ML works then you will have to put in the work for that

merry stone May 31, 2023, 5:21 AM

#

potent sky It depends on what you're looking to do. If you're just looking to learn and dem...

yeah i just didnt want to use an object detection model for food

worthy phoenix May 31, 2023, 5:21 AM

#

worthy phoenix any good vids to getting started with pytorch or tensorflow? cuz ig if i learn o...

also dont recommend the docs cuz im shit with my attention span

merry stone May 31, 2023, 5:21 AM

#

but then again idk how to use the food model either lol

worthy phoenix May 31, 2023, 5:22 AM

#

i can read that here and there but i first have to get the core understanding of the framework then when i get stuck i can lookup the docs

#

i dont mind that

potent sky May 31, 2023, 5:22 AM

#

plush jungle ok I guess here's the main source of my confusion with attention. for self atte...

For a particular token, the self attention layer also learns the relative importance of relationships to other tokens, iirc

plush jungle May 31, 2023, 5:22 AM

#

potent sky For a particular token, the self attention layer also learns the relative import...

then what's the point of the feed forward layer

potent sky May 31, 2023, 5:23 AM

#

merry stone yeah i just didnt want to use an object detection model for food

Fair enough.
So you could find a good detection model that satisfies your requirements, or adjust your requirements

merry stone May 31, 2023, 5:23 AM

#

potent sky Fair enough. So you could find a good detection model that satisfies your requir...

i mean this would do even tho it doesnt have banana https://huggingface.co/skylord/swin-finetuned-food101/tree/main

skylord/swin-finetuned-food101 at main

potent sky May 31, 2023, 5:24 AM

#

plush jungle then what's the point of the feed forward layer

Partly what you said, learning the relative importance of relationships for different tokens.

merry stone May 31, 2023, 5:24 AM

#

merry stone i mean this would do even tho it doesnt have banana https://huggingface.co/skylo...

but idk if this would work with the api? like can i pass an image and get the label directly

potent sky May 31, 2023, 5:24 AM

#

And to learn a better representation of these relationships

potent sky May 31, 2023, 5:24 AM

#

merry stone i mean this would do even tho it doesnt have banana https://huggingface.co/skylo...

Yep. Just figure out the scope of your project. If you're just looking to plug in an ML model then plug in an ML model

#

Really depends on what you're trying to learn

plush jungle May 31, 2023, 5:27 AM

#

potent sky And to learn a better representation of these relationships

I don't really understand. what do you mean by better representations? like lower dimensionality?

potent sky May 31, 2023, 5:27 AM

#

plush jungle then what's the point of the feed forward layer

http://jalammar.github.io/illustrated-transformer/

This might also help

The Illustrated Transformer

Discussions:
Hacker News (65 points, 4 comments), Reddit r/MachineLearning (29 points, 3 comments)

Translations: Arabic, Chinese (Simplified) 1, Chinese (Simplified) 2, French 1, French 2, Japanese, Korean, Persian, Russian, Spanish 1, Spanish 2, Vietnamese

Watch: MIT’s Deep Learning State of the Art lecture referencing this post

In the prev...

potent sky May 31, 2023, 5:29 AM

#

plush jungle I don't really understand. what do you mean by better representations? like lo...

Not necessarily lower
Also helps transform it into the form acceptable by your next later

#

*later

#

The encoder's task is simply to give you self attention vectors for each of the tokens

#

The feed forward helps you use these however you'd want to

plush jungle May 31, 2023, 5:32 AM

#

in CNNs, each layer seems to be doing a different task. earlier layers break down lower level features, and later layers break down higher level features. but I don't have a sense of what anything is actually doing in transformers

#

for example, why is there a linear layer at the end here:

#

what does it do

potent sky May 31, 2023, 5:38 AM

#

In a way, learns what to do with all the different outputs from the different attention heads.
You've got the self attention from n attention heads but what do you do with them now?
All of them give different possibly useful representations of your input, but how do you learn what to do with these representations and map it to an output you want?
That's where the feed forward comes in

plush jungle May 31, 2023, 5:39 AM

#

ok what about the linear layer inside the attention block

#

oh!

#

is it to decide which head is important?

magic dune May 31, 2023, 5:44 AM

#

hello

plush jungle May 31, 2023, 5:44 AM

#

since all the heads apparently go into one single linear layer

plush jungle May 31, 2023, 5:44 AM

#

magic dune hello

hi

magic dune May 31, 2023, 5:44 AM

#

plush jungle hi

can you rate a nn code I made

#

it is very simple

#

but I think I really polished it

cold osprey May 31, 2023, 5:45 AM

#

q: why does inference speed differ depending on the image being sent? assumming same model and hardware being used

magic dune May 31, 2023, 5:46 AM

#

cold osprey q: why does inference speed differ depending on the image being sent? assumming ...

if same model and hardware being used than depending on quality of img and file size 90% sure

cold osprey May 31, 2023, 5:47 AM

#

0    data\pizza_steak_sushi_20_percent\test\pizza\1...    pizza    0.9987    pizza    0.6317    True
1    data\pizza_steak_sushi_20_percent\test\pizza\1...    pizza    0.9957    pizza    0.3714    True
2    data\pizza_steak_sushi_20_percent\test\pizza\1...    pizza    0.9987    pizza    0.4315    True
3    data\pizza_steak_sushi_20_percent\test\pizza\1...    pizza    0.9869    pizza    0.3576    True
4    data\pizza_steak_sushi_20_percent\test\pizza\1...    pizza    0.9698    pizza    0.3697    True```

plush jungle May 31, 2023, 5:47 AM

#

magic dune can you rate a nn code I made

feel free to post it if it's short

cold osprey May 31, 2023, 5:47 AM

#

maybe some images are larger? hence taking longer

#

preprocessing time may be longer

magic dune May 31, 2023, 5:47 AM

#

cold osprey maybe some images are larger? hence taking longer

maybe like more dimensions

#

125x125 vs 512x512

cold osprey May 31, 2023, 5:48 AM

#

hmm lemme see

#

ah ok larger images take longer

plush jungle May 31, 2023, 5:54 AM

#

also what do values do in the attention(key, query, value) setup?

#

since Key dot Query = attention score matrix

potent sky May 31, 2023, 5:59 AM

#

plush jungle since all the heads apparently go into one single linear layer

Okay so the first set of linear layers learn to obtain the Q, K, V matrices from the input. Each self attention head ideally learns something different about the input - provides a different representation (maybe you can crudely compare this to how different filters learn to obtain different features from the input in CNNs)
The next linear layers then combines the outputs of all of these different attention heads into something more useful

The final Feed forward along with Add/norm after the multi head attention learns a more meaningful representation of this output (it's a parameterized processing unit, it's going to learn a mapping from the attention output to the output desired, and so learn to transform it into a richer representation space)

#

Hope that clears things up a bit. I'm in a meeting so can't attend here constantly now (pun intended)

potent sky May 31, 2023, 6:01 AM

#

plush jungle also what do values do in the attention(key, query, value) setup?

It's a little analogous to retrieval systems if you're familiar with those

magic dune May 31, 2023, 6:04 AM

#

!passte

#

!paste

arctic wedgeBOT May 31, 2023, 6:04 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

magic dune May 31, 2023, 6:04 AM

#

https://paste.pythondiscord.com/nusalasoga

plush jungle May 31, 2023, 6:06 AM

#

potent sky It's a little analogous to retrieval systems if you're familiar with those

the video you sent also made that analogy, but even though I'm familiar with retrieval systems I really don't get it. in a retrieval system, a key and a query get multiplied to produce a value, and the key with the highest value gets selected as the top search result. here, in the case of a single head, the key and the query are two vectors we're comparing. dot producting them gets us a scalar, right?

#

and then that scalar gets multiplied elementwise by a Value vector?

past meteor May 31, 2023, 6:14 AM

#

plush jungle the video you sent also made that analogy, but even though I'm familiar with ret...

I'm not sure I like those retrieval system analogies. I think you just need to read a couple of papers to grok the concept. Start by reading the Bahdanau attention mechanism because it's iirc the first attention mechanism so it's explained well. It's with RNN's though

#

There's books like dive in to deep learning that have chapters on attention so you could read those ad well

plush jungle May 31, 2023, 6:16 AM

#

the frustrating thing is that every resource, paper, and book I read leaves me with more questions

past meteor May 31, 2023, 6:17 AM

#

Then you gotta keep digging. For me going to the earliest works nearly always helps

plush jungle May 31, 2023, 6:20 AM

#

past meteor I'm not sure I like those retrieval system analogies. I think you just need to r...

you mean this paper?
https://arxiv.org/pdf/1409.0473.pdf

past meteor May 31, 2023, 6:20 AM

#

Yup

plush jungle May 31, 2023, 6:21 AM

#

the word attention only appears 3 times in that and "value" doesn't appear at all

#

are you sure that's the right one

past meteor May 31, 2023, 6:21 AM

#

3.1 is what attention is.

#

Alpha i,j is the attention weight

plush jungle May 31, 2023, 6:23 AM

#

how can alpha be a matrix of weights if it's being used here as a function:
a(si−1, hj )

#

or is that supposed to be indexing?

#

like this
a[si-1][hj]

potent sky May 31, 2023, 6:27 AM

#

plush jungle the video you sent also made that analogy, but even though I'm familiar with ret...

We're not comparing them. The key x q gives us the attention weights for each token. The score of how each token relates to other tokens.
The values are what this scaling is applied to. These scores help persist the relevant tokens and diminish the less relevant ones
Without multiplication with the values it's only a set of scores

potent sky May 31, 2023, 6:28 AM

#

potent sky For the original attention paper (Bahdanau et. al. 2015) I really think just rea...

And yeah, reading Bahdanau attention will be a good idea

potent sky May 31, 2023, 6:28 AM

#

plush jungle the word attention only appears 3 times in that and "value" doesn't appear at al...

Yeah their original code doesn't use that terminology either. They rather talk in terms of "alignment"

plush jungle May 31, 2023, 6:28 AM

#

let's say you have a vector of english tokens and a vector of french tokens:

#

we would make the english the keys, and the french the queries, to start, right? using the Wk and Wq times the english tokens and french tokens respectively?

#

then dot producting those vectors would gives us a scalar though, right? so how do we end up with a matrix here

dusk bear May 31, 2023, 7:40 AM

#

hey guys!
myself Uttam. I am new to ML and DL but have a very keen interest in learning it. So, recently i saw a satellite image planes detection dataset and I wanted to do it using CNN only. While finding i found this person used RCNN https://github.com/1297rohit/RCNN/blob/master/RCNN.ipynb
but he converted that into 4D values and sent into CNN layers and trained them. But as i work on google colab i couldn't have that much RAM to convert all the data into 4D and train the model. Guys, so can anyone pls help me out like are there any other ways to directly train the model like directly sending the photo to model or stuff?
Sample image attached..

GitHub

RCNN/RCNN.ipynb at master · 1297rohit/RCNN

Step-By-Step Implementation of R-CNN from scratch in python - RCNN/RCNN.ipynb at master · 1297rohit/RCNN

placid cedar May 31, 2023, 10:17 AM

#

hi, just a quick question

#

im doing discretisation now for my linear regression model, but im unsure of which to use

#

equal width discretisation, equal frequency, equal frequency plus encoding

#

which is better for a linear regression model?

errant bison May 31, 2023, 10:23 AM

#

potent sky YOLO can use plain CNNs For object detection YOLO is faster and more efficient t...

ohh ohkk. And can u pls elaborate on how can i extract from video

placid cedar May 31, 2023, 10:24 AM

#

actually nvm i will js experiment on all methods for 3 of my numerical variables xD

low beacon May 31, 2023, 12:17 PM

#

Hi !
I was wondering how the journey is to become a data scientist?
So far my GitHub portfolio only has my thesis should I add more projects?
I have a bachelor degree in IT and my thesis was actually about sentimental analysis using Python and its libraries and Jupyter notebook. So far I am working as a project manager for a year now but working with data has always been on my mind because there are endless possibilities with data and it was so exciting to work with.

merry stone May 31, 2023, 1:05 PM

#

how much work is fine tuning a model for a beginner

#

like if i were to start with some image classification model

#

and tune it for food classification just accurate enough for a college sophomore project

#

how much work would i have to do

cold osprey May 31, 2023, 1:08 PM

#

Not much imo

#

Import model

#

Change fully connected layer

#

Freeze layers

#

Train, wait

#

Dones

merry stone May 31, 2023, 1:09 PM

#

whats a good number of images to train and test

cold osprey May 31, 2023, 1:09 PM

#

Depends on ur model, trial n error ig

#

Use less and increase as needed

merry stone May 31, 2023, 1:10 PM

#

so lets say the best image classification pretrained model in the world

#

can i add on the food101 dataset which has 100k food images split into training and testing

#

so i dont have to manually label

#

like can i add it on top to have additional classes

cold osprey May 31, 2023, 1:10 PM

#

U can just use a model that's been trained on food101 instead then

merry stone May 31, 2023, 1:11 PM

#

true

#

but food101 doesnt have banana :(

cold osprey May 31, 2023, 1:11 PM

#

merry stone like can i add it on top to have additional classes

U can, but the weights leading to that class will be random so u would need to fine tune regardless

merry stone May 31, 2023, 1:11 PM

#

it has complicated dishes but not like simple stuff

#

whats the best approach

#

if i want to create a food recognizer

cold osprey May 31, 2023, 1:12 PM

#

Find banana images and train?

merry stone May 31, 2023, 1:13 PM

#

but i need like all sorts of food right

#

but what are the steps usually

#

lets say i want to add banana class

#

where do i train

cold osprey May 31, 2023, 1:19 PM

#

U can just train on banana images, add an extra output class for it

#

Using a pretrained model

#

Not sure if u need to train on the other images again or not

#

I wouldn't think so since it's already trained

merry stone May 31, 2023, 1:25 PM

#

hmmmm

#

okay ill try

#

so i can like use a fruits dataset then right

cold osprey May 31, 2023, 1:26 PM

#

Yeah

potent sky May 31, 2023, 1:49 PM

#

Freeze the layers

potent sky May 31, 2023, 1:50 PM

#

errant bison ohh ohkk. And can u pls elaborate on how can i extract from video

Same, processing each frame successively.
If you want you can include an object tracker to ensure you don't process the same vehicle repeatedly

plain rose May 31, 2023, 2:12 PM

#

Hmm can anyone explain the mathematics behind gblinear method of xgboost exactly. Like there are many vague one on internet but I would like an exact one with example. And also how can combination of linear model do better than single linear model.
Thank you in advance.

past meteor May 31, 2023, 2:42 PM

#

plain rose Hmm can anyone explain the mathematics behind gblinear method of xgboost exactly...

(Gradient) boosting is a more general thing than decision trees

plain rose May 31, 2023, 2:46 PM

#

Yaa but the idea of adding multiple weak leaner which is in case dt, how can this be extended to linear model as I can't think of weak linear model and how to combine them as if I combine them linearly then it will result in linear model only which is not best for sure as we have exact solution for linear regression

past meteor May 31, 2023, 2:47 PM

#

You fit a linear model on the data, you replace y with (y - y_hat) and you continue. There's proofs that each model you add in this way moves you towards the minimum of the loss

plain rose May 31, 2023, 2:47 PM

#

But how can this be better than a single linear model as the sum of these model will also be linear and we already have an exact solution for linear model.

#

This is not true for dt as sum of dt is not dt

past meteor May 31, 2023, 2:50 PM

#

There's still the bootstrapping and feature subsetting you're doing

#

But this is a good question to be honest 🤔

plain rose May 31, 2023, 2:51 PM

#

But I doubt will it be any better than single model

#

Like I searched whole internet but not a single article for exact details of gb linear

past meteor May 31, 2023, 2:52 PM

#

I wouldn't be able to say it's better or worse than fitting a single model but it's at the very least different

plain rose May 31, 2023, 2:52 PM

#

Hmm but in last it will be linear right?

#

So for sure it can't be better.

past meteor May 31, 2023, 2:53 PM

#

The final model will be a linear model but it may be a better linear model than just doing regular OLS

#

It's just a different way to fit elastic net (mix of L1 and L2 reg)

#

The bootstrapping and feature subsetting make it overfit less than elastic net but yeah, whether or not it's worth the hassle is another discussion (probably not)

plain rose May 31, 2023, 2:56 PM

#

I see thanks and please ping if you find any article or you yourself were able to show the eaxct implementation of gb lineat

past meteor May 31, 2023, 2:56 PM

#

You can just read the source code?

plain rose May 31, 2023, 2:56 PM

#

Because something is going as gblinear with 1 round is not same as single regression so I doubt they are doing something different

past meteor May 31, 2023, 2:57 PM

#

plain rose Because something is going as gblinear with 1 round is not same as single regres...

What do you mean?

plain rose May 31, 2023, 2:58 PM

#

past meteor You can just read the source code?

I tried but I am finding it hard as they have many classes and stuff and optimisation thing s 😦

plain rose May 31, 2023, 2:58 PM

#

past meteor What do you mean?

I thought that in 1 round we predict the target with normal model and then we keep on using these to predict the residual but that is wrong.

past meteor May 31, 2023, 2:59 PM

#

plain rose I thought that in 1 round we predict the target with normal model and then we ke...

Correct but in a normal linear regression you're not bootstrapping and selecting a bunch of features at random are you?

plain rose May 31, 2023, 3:00 PM

#

But for feature selection I set it to 1 i.e. to select all the feature by using the hyper params

past meteor May 31, 2023, 3:01 PM

#

Then the main difference is the fact you're bootstrapping

plain rose May 31, 2023, 3:01 PM

#

No that I also set as 1 that is use the whole sample

#

There are 2 parameter 1 for feature and 1 for samples

past meteor May 31, 2023, 3:02 PM

#

Do you have the same amount of regularization in both cases?

plain rose May 31, 2023, 3:02 PM

#

Yaa i set regularizer params to 0

past meteor May 31, 2023, 3:04 PM

#

And with what are you comparing it with sci-kit? What specific algo?

plain rose May 31, 2023, 3:06 PM

#

I want to just understand xg boost with gb linear

#

Like for comparision i am comparing it with linear regressor

#

But i got widely different results

past meteor May 31, 2023, 3:11 PM

#

Yes but what implementation of linear regression?

plain rose May 31, 2023, 3:11 PM

#

Scikit learn onlt

#

Like linearregressor of scikit learn

past meteor May 31, 2023, 3:23 PM

#

Then I have no idea. Is the code something you can paste so I can have a look?

plain rose May 31, 2023, 3:27 PM

#

Hmm that I can't lemon_pensive but thanks

merry stone May 31, 2023, 3:39 PM

#

can someone guide me through loading a pretrained model from a .bin file in pytorch

#

i just cant figure it out 😭

#

or can i use a model hosted on hugging face or somethign

rose dagger May 31, 2023, 3:53 PM

#

Anybody got any reading recommendations for modern ML/DL models for image segmentation & image classification? Would be great if it included a little bit about data augmentation and feature engineering as well

mint palm May 31, 2023, 4:55 PM

#

wanted to include AWS, now that i have been learning s3, athena, sagemaker
should i remove "interest"? and add all those aws tools like s3, athena sage etc?
should i add "cloud" section?
should i just let it be like this and add just "aws" in tech section?

merry stone May 31, 2023, 5:51 PM

#

isnt the interest line kinda useless

#

unless you have space

potent sky May 31, 2023, 6:03 PM

#

plain rose But how can this be better than a single linear model as the sum of these model ...

Exactly, yes. Even feature subsetting with "weak" linear models can be argued to finally give a linear combination of those as one single linear model.
And the best solution to the fitting problem could've been directly obtained by the regression equation. So why go for GB linear models at all?
One possible defense of boosting here can be that sometimes you don't want the line of absolute best fit. You don't want it to fit that well (regularisation)
Gradient boosting multiple weak linear models would allow you to control this regularisation not just in quantity but also in which feature subsets are more affected.
But idts it makes much more sense than ridge or elastic

#

Fun question ngl

potent sky May 31, 2023, 6:04 PM

#

rose dagger Anybody got any reading recommendations for modern ML/DL models for image segmen...

Like you want research papers or?

potent sky May 31, 2023, 6:09 PM

#

merry stone can someone guide me through loading a pretrained model from a .bin file in pyto...

Load the bin into an OrderedDict using torch.load
Define the model architecture
Load weights into model from OrderedDict using load_state_dict()
Optionally, load optimizer state from OrderedDict into optimizer object using load_state_dict()
Hopefully that works

#

You might have a config file as well confjg.json in which case there should be accompanying documentation on how to use it

potent sky May 31, 2023, 6:11 PM

#

plain rose But i got widely different results

Interesting. Single linear regressor gave better fit I assume?

potent sky May 31, 2023, 6:12 PM

#

past meteor The final model will be a linear model but it may be a better linear model than ...

Interesting, why do you think it may be better?

past meteor May 31, 2023, 6:28 PM

#

potent sky Interesting, why do you think it may be better?

The may I'm using is pretty much no free lunch theorem. It's a different form of regularization that may (or may not) be better

#

Mentioned it a few times but due to the subsetting and bootstrapping you couldn't just fit the same model with OLS, ridge, lasso or elastic net. Whether that matters or not is a different question haha

potent sky May 31, 2023, 6:34 PM

#

past meteor The may I'm using is pretty much no free lunch theorem. It's a different form of...

Hmm fair enough

potent sky May 31, 2023, 6:35 PM

#

past meteor Mentioned it a few times but due to the subsetting and bootstrapping you couldn'...

Mm ig that makes sense.

merry stone May 31, 2023, 6:41 PM

#

potent sky Load the bin into an `OrderedDict` using torch.load Define the model architectur...

i dont understand it's not working :(

potent sky May 31, 2023, 7:05 PM

#

What's the issue/error?

rose dagger May 31, 2023, 7:16 PM

#

potent sky Like you want research papers or?

Sure. I found this: https://arxiv.org/pdf/2001.05566.pdf , will go through it tomorrow. Though it's from 2020, so perhaps there are some new developements that are missing

placid cedar May 31, 2023, 7:52 PM

#

heyyo

#

im finally at my final step for my assignment

#

which is feature engineering

#

i have learnt 5 methods of doing so. should i either, use one best method, and apply it to all my variables. or, find the best method for each variable

placid cedar May 31, 2023, 7:53 PM

#

placid cedar i have learnt 5 methods of doing so. should i either, use one best method, and a...

^ for feature scaling

#

any help and advise wld be nice 🙂

#

and thanks for all the help from everyone here so far eheh

warped leaf May 31, 2023, 9:33 PM

#

def get_keys(obj):
    keys = []
    if isinstance(obj, dict):
        for key in obj.keys():
            keys.append(key)
            keys.extend(get_keys(obj[key]))
    elif isinstance(obj, list):
        for item in obj:
            keys.extend(get_keys(item))
    clean_list = list(dict.fromkeys(keys))
    return clean_list 

print(get_keys(data))

for reading nested json files, i made this function that gets all the keys inside a list to make it more readable, is there a way to edit this code and get more info about if the keys are inside lists or dictionaries?

agile cobalt May 31, 2023, 11:05 PM

#

which kind of info exactly?

#

kinda offtopic for this channel though, just use #python-discussion or #1035199133436354600 for that kind of stuff
also if

clean_list = list(dict.fromkeys(keys))
is just to remove duplicate, use set() instead like list(set(keys)) or sorted(set(keys))

potent sky Jun 1, 2023, 5:12 AM

#

rose dagger Sure. I found this: https://arxiv.org/pdf/2001.05566.pdf , will go through it to...

Oof yes. Segmentation has seen a crazy amount advancement since 2020. Multiple sota advancements just this year already

#

But that paper seems like a good background survey for pre-2020

#

Lookup Maskformer, Mask2Former, TrackFormer, Swin, Oneformer, Segformer, DETR, Denoising DETR Anchor Boxes, MaskDINO, DINO self supervised, SAM off the top of my head

#

There're many other interesting papers

dusk tide Jun 1, 2023, 5:42 AM

#

Hello, I am practicing data cleaning on Movies dataset (dataset link here https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset . There are around 46466 rows and 24 columns* ('adult', 'belongs_to_collection', 'budget', 'genres', 'homepage', 'id',
'imdb_id', 'original_language', 'original_title', 'overview',
'popularity', 'poster_path', 'production_companies',
'production_countries', 'release_date', 'revenue', 'runtime',
'spoken_languages', 'status', 'tagline', 'title', 'video',
'vote_average', 'vote_count')* . But there are a lot of NaN values in budget and revenue columns around 36K. So how to handle these values?? . I had one idea to calculate the average of budget/revenue of movies based on genre (in a time span of five years). For eg, Average of Action and thriller genre movies from 1990 to 1995 and replacing with NaN values occuring in these years. Is this the right way?? Can anyone suggest any more idea of how to do this??

The Movies Dataset

Metadata on over 45,000 movies. 26 million ratings from over 270,000 users.

past meteor Jun 1, 2023, 5:47 AM

#

potent sky Lookup Maskformer, Mask2Former, TrackFormer, Swin, Oneformer, Segformer, DETR, D...

How fast (inference time) are these transformers? A lot of segmentation pipelines like the one I worked on last year need to run in real time on (potentially) a mobile device 💀

potent sky Jun 1, 2023, 5:57 AM

#

past meteor How fast (inference time) are these transformers? A lot of segmentation pipeline...

Haha very interesting question. This was a major concern for our work too.
Oneformer is at one end of the spectrum where it's abysmally slow, 2:30minutes on a T4 GPU, and 3:30minutes on CPU for a single image ;-;
(Though I was inclined to humor it since the paper is really interesting and it defines a unified segmentation problem...good stuff all in all)
At the other end, for high accuracy models there's MaskDINO, DINO self supervised and SAM giving around 6-7s per image. For slightly lower accuracy we got upto 0.16s per image but that's sorta under NDA
The Segformer etc. lie somewhere in between
I haven't yet experimented with DINO self supervised myself

#

I might be missing smtg since it has been sometime

past meteor Jun 1, 2023, 6:03 AM

#

2 minutes? Christ. We ended up using something from mediapipe which worked just fine for us.

I guess in some domains like medical imaging it's perfectly fine

potent sky Jun 1, 2023, 6:04 AM

#

What was the mIOU?

#

For our use case we really needed very high accuracy

dim jungle Jun 1, 2023, 6:06 AM

#

📢 Mark your calendar for our upcoming thrilling workshop on LangChain organized by ADaSci! 🗓️🌟

🚀** Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications** 🚀

🗓️ Date: 17th June 2023
⏰ Time: 10 AM Onwards
📍 Location: Online / Virtual

Unleash your creativity and explore the power of generative AI with LangChain! 🤖✨

🔹 Create LLM-powered applications across various industry domains
🔹 Build and deploy generative AI-powered agents for real-world scenarios
🔹 Develop custom applications like chatbots and web agents using company-specific data
🔹 Learn step-by-step with practical examples and collaborate with fellow enthusiasts

Don't miss this opportunity to revolutionize industries with innovative, intelligent, and personalized applications! Secure your spot now!

Visit for more details and registration: https://adasci.org/product/mastering-langchain-a-hands-on-workshop-for-building-generative-ai-applications/

Association of Data Scientists

Vaibhav Kumar

Mastering LangChain: A Hands-on Workshop for Building Generative AI...

Attend the hands-on workshop on LangChain and learn how to build LLM powered generative AI applications for industries in very simple ways

potent sky Jun 1, 2023, 6:08 AM

#

past meteor 2 minutes? Christ. We ended up using something from mediapipe which worked just ...

Hmm the list of model options seems very specific for people and selfies. The only generic one is DeepLabV3 and in our experiments we found it far outpaced.

past meteor Jun 1, 2023, 6:09 AM

#

Yeah, for us that was fine. It was related to people. High accuracy wasn't a requirement either

potent sky Jun 1, 2023, 6:10 AM

#

True, there's always the trade-off between accuracy and speed. For us high accuracy was very important

#

On the upside, we got to do some really exciting research haha

rose dagger Jun 1, 2023, 6:47 AM

#

potent sky Lookup Maskformer, Mask2Former, TrackFormer, Swin, Oneformer, Segformer, DETR, D...

Thank you so much!

bleak crown Jun 1, 2023, 7:04 AM

#

Does anyone here have experience with the tensorflow data api that could help answer my question?
https://stackoverflow.com/questions/76378415/how-can-i-batch-a-tensorflow-dataset-without-loading-all-the-data-into-memory-si

Generator expressions didn't work either

Stack Overflow

How can I batch a tensorflow dataset without loading all the data i...

I am developing a Speech Recognition Model using TensorFlow, and I have a directory structure for my dataset. Within the ./dataset directory, I have subdirectories for validation, training, and tes...

wooden sail Jun 1, 2023, 7:10 AM

#

you can take a look at tf and keras's dataset objects

#

https://www.tensorflow.org/api_docs/python/tf/data/Dataset

TensorFlow

tf.data.Dataset | TensorFlow v2.12.0

Represents a potentially large set of elements.

#

https://keras.io/api/data_loading/image/

Keras documentation: Image data loading

#

https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator

TensorFlow

tf.keras.preprocessing.image.ImageDataGenerator | TensorFlow v2.1...

Generate batches of tensor image data with real-time data augmentation.

#

with these you can specify a directory to be consumed

bleak crown Jun 1, 2023, 7:16 AM

#

Yeah but the issue is preprocessing the output (text) files. Should i just preprocess every text file beforehand, and write it?

wooden sail Jun 1, 2023, 7:18 AM

#

you can preprocess them as they are loaded

#

as is usually done with images. ideally you don't edit the originals

#

this is of course more expensive, i know, but it's the sanitary approach, let's call it

bleak crown Jun 1, 2023, 7:20 AM

#

I'm just confused on how to preprocess the output (y) data. For the input, I'd obviously just use a preprocessing layer. But how do I put the output through such a layer so that the loss functions and what not work correctly

#

I'm pretty new to tensorflow and just don't quite understand how you can do that

tribal holly Jun 1, 2023, 7:21 AM

#

Well...

wooden sail Jun 1, 2023, 7:22 AM

#

bleak crown I'm just confused on how to preprocess the output (y) data. For the input, I'd o...

i'm not sure i follow here. if you do anything to the output, it's no longer "preprocessing"

#

or you mean y as in the (x,y) pairs shown to the network, not y as in the output of the network?

bleak crown Jun 1, 2023, 7:32 AM

#

Yeah, the y_true, not the y_pred

#

I want to tokenize (or in this case vectorize) the y aspect of the input output pairs shown to the model. Or would I just write a custom loss and do it there?

wooden sail Jun 1, 2023, 7:36 AM

#

you can also apply functions to the data before feeding it into the network

#

notice that things like a "preprocessing layer" are really just functions

#

you can perfectly well apply a preprocessing layer to y without having it be part of the network that processes x

bleak crown Jun 1, 2023, 7:36 AM

#

wooden sail you can also apply functions to the data before feeding it into the network

Yes, but that goes back to the "i can't load all the data into memory at once, only the file names"

cold osprey Jun 1, 2023, 7:36 AM

#

!rule 3

arctic wedgeBOT Jun 1, 2023, 7:36 AM

#

Rules

3. Respect staff members and listen to their instructions.

wooden sail Jun 1, 2023, 7:36 AM

#

cold osprey !rule 3

hmm?

cold osprey Jun 1, 2023, 7:37 AM

#

noisy name rule

wooden sail Jun 1, 2023, 7:37 AM

#

bleak crown Yes, but that goes back to the "i can't load all the data into memory at once, o...

you would do it the same way as with x. it is applied only to the batch that is currently loaded

bleak crown Jun 1, 2023, 7:38 AM

#

I'm curious as to why it is crashing then. Because my batch size is 32, each audio file is about 4mb, and the text files are only a few words. I have 16gb of ram. But the map function i have should only be applied to the current batch?

wooden sail Jun 1, 2023, 7:39 AM

#

that i don't know, i haven't seen your code 😛

#

if you use the keras or tf dataset classes, those load only a small chunk at a time

#

if you manually did something else, you might be running out of mem

bleak crown Jun 1, 2023, 7:39 AM

#

def build_dataset(self, path: str = "./dataset/train-clean-100") -> Any:
    audio_files = []
    text_files = []
    for root, dirs, files in os.walk(path):
        for file in files:
            if file == "audio.wav":
                audio_files.append(os.path.join(root, file))
            elif file == "text.txt":
                text_files.append(os.path.join(root, file))
    audio_dataset = tf.data.Dataset.from_tensor_slices(audio_files)
    audio_dataset = audio_dataset.map(
        self.preprocess_audio, num_parallel_calls=tf.data.experimental.AUTOTUNE)

    text_dataset = tf.data.Dataset.from_tensor_slices(text_files)
    text_dataset = text_dataset.map(
        self.preprocess_text, num_parallel_calls=tf.data.experimental.AUTOTUNE)

    dataset = tf.data.Dataset.zip((audio_dataset, text_dataset))
    return dataset

def preprocess_audio(self, audio_path: str) -> tf.Tensor:
    audio = tf.io.read_file(audio_path)
    audio, _ = tf.audio.decode_wav(audio, desired_samples=20000)
    return audio

def preprocess_text(self, text_path: tf.Tensor) -> Any:
    texts = tf.io.read_file(text_path)
    tokens = self.vectorizer(texts)
    return tokens

#

I'm just calling processor().build_dataset().batch(32) and it crashes. However when i don't batch it runs fine.

#

Obviously without a batch index which is a different issue that could be easily fixed, but it accepts the input up until the lstm layer where it gets upset there isn't a feature index

#

Vectorizer is pre-fitted

wooden sail Jun 1, 2023, 7:42 AM

#

hmm i don't see anything strange at a glance

lapis sequoia Jun 1, 2023, 7:43 AM

#

give me the best tensor flow course link

#

NOW

bleak crown Jun 1, 2023, 7:44 AM

#

💀

#

Well I'm gonna go to sleep it's nearly 2am. I'll try again tomorrow. Thanks for the help :). I may be back tomorrow if i can't fet it figured out

cold osprey Jun 1, 2023, 7:49 AM

#

lapis sequoia NOW

NO

lapis sequoia Jun 1, 2023, 7:50 AM

#

cold osprey NO

lemon_grumpy

mint palm Jun 1, 2023, 9:22 AM

#

from transformers import BlipModel
model = BlipModel.from_pretrained("Salesforce/blip-image-captioning-base")
device_ids = [2, 4, 5, 7]
self.blip = DataParallel(model, device_ids = device_ids).to(torch.device('cuda:2'))

video_data.to(torch.device('cuda:2'))
video_features = self.blip.module.get_image_features(video_data)

even after this i am only able to have batch size as big as one gpu supported earlier. It should support 4x batch size cuz i am giving 4 gpus. Where is the error?

earnest widget Jun 1, 2023, 9:43 AM

#

Is it common for validation loss to slow down when decreasing? Could it be the learning rate being too low or high? This is for Mobilenetv3large model.

native umbra Jun 1, 2023, 11:23 AM

#

is there is internships for AI/Data science for Beginner level?(I can not find any)

icy anchor Jun 1, 2023, 11:29 AM

#

Hi, does anyone here have experience with Puppeteer and BrightData's scraping browser?

placid cedar Jun 1, 2023, 11:30 AM

#

hi all

#

for feature engineering, is it done on only numerical variables. Or, is it done on categorical variables that have been encoded and numerical variables, basically the whole dataset

#

at the final step for my assignment 🥲

raw compass Jun 1, 2023, 11:49 AM

#

how do I extend an already existed well-known language model with some kind of new information?

serene scaffold Jun 1, 2023, 1:15 PM

#

raw compass how do I extend an already existed well-known language model with some kind of n...

please be more specific

past meteor Jun 1, 2023, 1:24 PM

#

placid cedar for feature engineering, is it done on only numerical variables. Or, is it done ...

Both

cold osprey Jun 1, 2023, 1:30 PM

#

https://pytorch.org/vision/stable/models.html
Looking at the bottom tabel of all the model weights, what does the GFlops there mean? Afaik, GFlops is a measure of compute power so i dont rly get what it represents there

mild dirge Jun 1, 2023, 1:32 PM

#

Giga float operations per second, it is still a measure of computing power in some form. Namely some operations can be done quicker than others, because it is heavily parallel for example @cold osprey

#

Or they meant FLOPs which is not giga float operations per second but just giga float operations

cold osprey Jun 1, 2023, 1:33 PM

#

ye so like Eff Net B2 for e.g. has a GFLOPS value of 1.09, what does that mean compared to Eff Net B1 of 0.69

#

oh

#

so just how much computation is needed?

mild dirge Jun 1, 2023, 1:34 PM

#

But they wrote GFLOPS capital, which is confusing

#

I would guess they actually mean FLOPs, so not per second

#

just the number of operations

cold osprey Jun 1, 2023, 1:35 PM

#

makes sense if its a measure of how much computation is needed say for model training or inference to gauge how much hardware/time is needed

past meteor Jun 1, 2023, 1:37 PM

#

cold osprey so just how much computation is needed?

Yes

#

Says more than just listing the number of parameters because of what PcCamel said

cold osprey Jun 1, 2023, 1:38 PM

#

ait thanks got it

#

so its gflops and not gflops per sec, thats what confused me

placid cedar Jun 1, 2023, 2:05 PM

#

hi

#

do u think its viable to handle outliers in the target variable, such as winserisation?

lapis sequoia Jun 1, 2023, 2:12 PM

#

Hello, I have this issue: https://github.com/onnx/onnx/issues/5278
can someone help me? i would be really grateful.

GitHub

Using if node on top of two onnx models then merging with another m...

Ask a Question Question I have a model that outputs masks, then there's a postprocessing part to get the final output from those masks. There is two types of postprocessing depending on a boole...

sterile wyvern Jun 1, 2023, 2:59 PM

#

I ran a method 2 times to train and test it (in and out of sample)
Im wondering do I run the method i built a 3rd time to forward test.

crimson summit Jun 1, 2023, 3:04 PM

#

I am currently following along in the book "Make your own neural network by tariq rashid" I am coding a simple 3 layer neural network with 3 neurons in each layer. (full code: https://github.com/makeyourownneuralnetwork/makeyourownneuralnetwork/blob/master/part2_neural_network.ipynb)

#

I am confused on how these two lines of code are working ?

update the weights for the links between the hidden and output layers

    self.who += self.lr * numpy.dot((output_errors * final_outputs * (1.0 - final_outputs)), numpy.transpose(hidden_outputs))

update the weights for the links between the input and hidden layers
self.wih += self.lr * numpy.dot((hidden_errors * hidden_outputs * (1.0 - hidden_outputs)), numpy.transpose(inputs))

    pass

My question is this. The weights are being adjusted between the hidden layer and output layer and then they are being adjusted between the input layer and hidden layer. Don't I need to use back propagation to find the error of the hidden layer output and then use that correct hidden layer output to then adjust the weights connecting the hidden and output layer ? Right now the code is using the incorrect hidden layer output to adjust the weights between the hidden and output layers. Once i adjust the weights between the input and hidden layer it will give me a new hidden output and I think I would have to re adjust the weights between the hidden and output layer again ?

Here is a picture on the process im talking about.

#

wooden sail Jun 1, 2023, 3:25 PM

#

crimson summit I am confused on how these two lines of code are working ? # update the weigh...

the nice thing about backprop is that it allows you to update the intermediate "hidden" parameters based only on the final output of the network

#

if you needed to know the ideal output at each layer, no one would use deep neural networks 😛

#

the way to think of it is that the loss function you use depends on two things: the ideal output (the "label", if you will) and the output that the network actually produces

#

let's call the network N, and the loss L, and the label (ideal output) y

#

let's also call the input x, and the parameters of the network, idk, theta

#

this comes out as L(y, N(x, theta))

#

this is just one single function, through function composition, that depends on the labeled data pairs (x,y), and on the parameters theta of the network N

#

you can directly differentiate this with respect to theta and use that to update theta. how? through backprop, since you know how the network is made: through the composition of affine transformations and activation functions

mild dirge Jun 1, 2023, 3:29 PM

#

I think the question is that, because you have changed the hidden to output layer, when giving the input to the model again, the hidden error would be different, thus the update to the input to hidden should be based on this new hidden to output.

#

Which would be a more ideal update, but would need to run the model on the batch again for every layer.

placid cedar Jun 1, 2023, 3:29 PM

#

anyone available atm?

#

need some help 🥲

wooden sail Jun 1, 2023, 3:30 PM

#

mild dirge I think the question is that, because you have changed the hidden to output laye...

which "hidden error" do you mean?

mild dirge Jun 1, 2023, 3:30 PM

#

# update the weights for the links between the input and hidden layers
        self.wih += self.lr * numpy.dot((hidden_errors * hidden_outputs * (1.0 - hidden_outputs)), numpy.transpose(inputs))

#

This one

#

the hidden_errors var

wooden sail Jun 1, 2023, 3:32 PM

#

ah

#

there's a name for that. that would be a flavor of coordinate descent

#

that only converges under special conditions

#

the way to see it is that, although you do it step by step by backpropping through the layers, what is happening is that all of the network parameters get updated at the same time

#

if you change the parameters of one layer only and then compute a new error to update the other parameters, the error also changes again for the parameters you previously updated

#

this is only valid under special conditions

#

here's a nice read https://class.ece.uw.edu/546/2016spr/lectures/coordescent2014.pdf

#

it actually has crazy good performance in special cases, but requires extra properties to reach optima

#

https://arxiv.org/pdf/2012.03503.pdf here's another nice read

wooden sail Jun 1, 2023, 3:41 PM

#

mild dirge I think the question is that, because you have changed the hidden to output laye...

my observation would be that this doesn't really happen, does it?

#

the standard update step goes through every single layer

mild dirge Jun 1, 2023, 3:43 PM

#

Yeah it does, but that is what they are asking, why update all at the same time, once you want to update input to hidden, the derivatives wouldn't be "relevant" anymore because the model has changed after the update of other layers, but they are.

crimson summit Jun 1, 2023, 3:52 PM

#

mild dirge Yeah it does, but that is what they are asking, why update all at the same time,...

@wooden sail was explaining that if I update the weights between the input layer and hidden layer and run the code and then update the weights between the hidden layer and output layer it would not work because when I update the code after adjusting the weights between the input and hidden layer it would give me a entirely new error and as a result the hidden output would still be incorrect and I would just be in this never ending loop

#

still trying to wrap my head around it

#

but I think thats what he means

#

@wooden sail did i get that right or am i still looking at it from the wrong angle

wooden sail Jun 1, 2023, 3:55 PM

#

crimson summit <@467435887236612106> was explaining that if I update the weights between the in...

it MAY work, but it's no longer gradient descent and back propagation as you know it 😛

#

the update happens for all parameters in the network at the same time, normally

#

we think of networks as having "layers" because that makes it easier to describe them, but at the end of the day, a network is a function with a ton of parameters and nothing else

crimson summit Jun 1, 2023, 3:57 PM

#

wooden sail we think of networks as having "layers" because that makes it easier to describe...

thx bro for explaining

#

means alot

serene scaffold Jun 1, 2023, 3:57 PM

#

wooden sail we think of networks as having "layers" because that makes it easier to describe...

do both ogres and onions still have layers?

wooden sail Jun 1, 2023, 3:58 PM

#

serene scaffold do both ogres and onions still have layers?

layers make it easier to describe onions and ogres, but at the end of the day, they are just lumps with a ton of cells and nothing else

native umbra Jun 1, 2023, 3:59 PM

#

is there any recent internships for AI/Data science for Beginner level?(I can not find any in linkedIn)

serene scaffold Jun 1, 2023, 3:59 PM

#

wooden sail layers make it easier to describe onions and ogres, but at the end of the day, t...

https://tenor.com/view/what-think-hmm-baby-cute-gif-16182376

Tenor

lapis sequoia Jun 1, 2023, 4:04 PM

#

Not really sure where to ask this question so figured I’d try this channel, if I have a 2x2 grid of points spaced evenly apart (let’s say 1000 units apart) how can I calculate the radius needed on each point so they overlap and cover 100% of the area within the 2x2 grid.

I included an image to help understand what I’m trying to say.

mild dirge Jun 1, 2023, 4:05 PM

#

Not really for this channel. But the circles should touch in the middle, so half the distance between diagonal points.

wooden sail Jun 1, 2023, 4:05 PM

#

do you want the region between the points, or also some region beyond the points?

#

exactly as pccamel says. in the image you have, the radius is sqrt(2) times half the distance between 2 adjacent points

#

if the points are in the midpoint of some pixels, the radius may have to be bigger

lapis sequoia Jun 1, 2023, 4:06 PM

#

so in this example the formula would be

sqrt(2) * 500

Since this example is 1000 units between them?

wooden sail Jun 1, 2023, 4:06 PM

#

mhm

lapis sequoia Jun 1, 2023, 4:07 PM

#

wooden sail mhm

Thanks 🙂

wooden sail Jun 1, 2023, 4:07 PM

#

then the circles meet exactly in the middle. you may have some numerical issues with that, so you could make the circles a little bigger

crimson summit Jun 1, 2023, 4:07 PM

#

@wooden sail sent you a friend request my guy

wooden sail Jun 1, 2023, 4:07 PM

#

i don't accept any

crimson summit Jun 1, 2023, 4:08 PM

#

oh dam

#

guess ill just write messages in here lol

lapis sequoia Jun 1, 2023, 4:11 PM

#

wooden sail then the circles meet exactly in the middle. you may have some numerical issues ...

Wouldn’t I need to divide by 2 since the formula would give ~1414 as the radius but if the distance between each point is only 1000 then we overlap way more than needed? The image I’m showing has a radius of 716 (slightly more than /2 but essentially the same)

wooden sail Jun 1, 2023, 4:13 PM

#

lapis sequoia Wouldn’t I need to divide by 2 since the formula would give ~1414 as the radius ...

that's exactly what i said, isn't it? you also said 500 times sqrt(2) yourself

lapis sequoia Jun 1, 2023, 4:13 PM

#

Nvm I’m dumb ahaha notice that now

crimson summit Jun 1, 2023, 4:51 PM

#

@wooden sail I don’t know if this is a super obvious and I’m just being dumb but how does it mathematically work that the summation of the errors of the weights coming from a neuron are the errror of the output of that neuron ?

wooden sail Jun 1, 2023, 4:53 PM

#

i'm not sure what "coming from a neuron" means

mild dirge Jun 1, 2023, 4:56 PM

#

Well the effect of the neuron value on the total error is the sum of the effect of the neuron on error of output 1, and the effect of the neuron on the error of output 2.

wooden sail Jun 1, 2023, 5:01 PM

#

the whole thing is being described in very nonstandard terminology, that thing you linked is confusing

#

there is one error function, and the gradient of that error function with respect to the parameters of the network

#

it's probably a better idea to think about it that way

#

"error" evokes the idea that, at the very least, you're subtracting one quantity from another reference value

#

which is not what is happening there at all

crimson summit Jun 1, 2023, 5:07 PM

#

i am just wondering how the sumation of w1,1 w1,2 and w1,3 gives you the error for that neuron

#

mathematically

wooden sail Jun 1, 2023, 5:07 PM

#

"error for that neuron" means nothing to me mathematically

#

at best it's an auxiliary quantity that shows up when computing something else, and this person gave it that name

#

idk why

#

if i knew what they wanted to compute with it, i could give you an explanation

#

i would look for a different tutorial that uses standard terms

crimson summit Jun 1, 2023, 5:09 PM

#

Screenshot_2023-06-01_at_12.08.55_PM.png

#

yea the book i am learning from is reffering to it as an error

#

my bad for the non standard terms

plain jungle Jun 1, 2023, 5:10 PM

#

crimson summit <@467435887236612106> I don’t know if this is a super obvious and I’m just being...

https://github.com/JTexpo/Python_Projects/blob/main/DNN_Math_Selenium/PythonBot/deep_neural_network.py

Here’s code that I made which does a DNN from scratch. I hope that the documentation helps you better understand the math

GitHub

Python_Projects/deep_neural_network.py at main · JTexpo/Python_Proj...

Contribute to JTexpo/Python_Projects development by creating an account on GitHub.

#

Just to give another mathy resource cause I know DNNs is a tough one to find some clean write ups about

hasty mountain Jun 1, 2023, 5:10 PM

#

Error = input for that layer? yert

crimson summit Jun 1, 2023, 5:11 PM

#

plain jungle https://github.com/JTexpo/Python_Projects/blob/main/DNN_Math_Selenium/PythonBot/...

Thx ill check it out bro

plain jungle Jun 1, 2023, 5:11 PM

#

Cheers! If you have any questions feel free to ping

crimson summit Jun 1, 2023, 5:12 PM

#

plain jungle Cheers! If you have any questions feel free to ping

Bet

hasty mountain Jun 1, 2023, 5:12 PM

#

Or...for that... neuron? (though I think a neuron would be better represented as the output of a layer/input for another pithink )

crimson summit Jun 1, 2023, 5:13 PM

#

wooden sail if i knew what they wanted to compute with it, i could give you an explanation

i guess from the diagram I just sent they want to use the "error" to compute how much to adjust the weights by

#

idk what the exact mathimatical term is for that

crimson summit Jun 1, 2023, 5:14 PM

#

hasty mountain Error = input for that layer? <:yert:832277526809149461>

error as in the error of final number after the activation function has been applied

wooden sail Jun 1, 2023, 5:15 PM

#

they're using that in the computation of the gradients, which are used to update the weights

#

but that's just weird naming

hasty mountain Jun 1, 2023, 5:15 PM

#

Gradients is also a weird name, btw...why not just call it "derivatives"?

crimson summit Jun 1, 2023, 5:16 PM

#

i think its because its a super beginners book it helped me understand it when i first read it

wooden sail Jun 1, 2023, 5:16 PM

#

hasty mountain *Gradients is also a weird name, btw...why not just call it "derivatives"?*

they're not just derivatives either. in multiple dimensions, there are several kinds of derivatives and you have to specify which one you are using

plain jungle Jun 1, 2023, 5:16 PM

#

crimson summit i guess from the diagram I just sent they want to use the "error" to compute how...

The error is the derivative of your loss function. You can find the derivative of the error with using activations functions but a loss function such as

( expected - reality ) ^ 2

Has an error of 2(expected-reality)

It’s just a variable to help find other variables needed in back prop, but is not the delta to adjust the weights by

wooden sail Jun 1, 2023, 5:17 PM

#

the gradient is the vector of partial derivatives w.r.t. each variable. there are other kinds

#

e.g. the total derivative, which is the sum of partial derivatives

crimson summit Jun 1, 2023, 5:17 PM

#

hasty mountain *Gradients is also a weird name, btw...why not just call it "derivatives"?*

i think its because your decending a gradient to get to the global minima

wooden sail Jun 1, 2023, 5:17 PM

#

crimson summit i think its because your decending a gradient to get to the global minima

not to the global one in general

#

unless the problem is (strictly) convex

hasty mountain Jun 1, 2023, 5:18 PM

#

crimson summit i think its because its a super beginners book it helped me understand it when i...

Well, fair enough then. I just hope they're careful enough to explain that "this method is actullay called gradient computation"

#

Because man...how I get lost over crazy terminology folks use...especially in Reinforcement Learning.

crimson summit Jun 1, 2023, 5:19 PM

#

plain jungle The error is the derivative of your loss function. You can find the derivative o...

would it not be -2(expected-reality) cause you also have to do deriv of (expected-reality)

crimson summit Jun 1, 2023, 5:20 PM

#

hasty mountain Because man...how I get lost over crazy terminology folks use...especially in Re...

yea fr bro Im going through it right now lol

plain jungle Jun 1, 2023, 5:21 PM

#

Nope, and I’d encourage you to check the code link cause it explains it with all the variables there, but to get the derivative of your error you use an activation function, not taking the derivative of the 2(e-a)

crimson summit Jun 1, 2023, 5:24 PM

#

plain jungle Nope, and I’d encourage you to check the code link cause it explains it with all...

this was my logic

Screenshot_2023-06-01_at_12.23.40_PM.png

crimson summit Jun 1, 2023, 5:24 PM

#

plain jungle Nope, and I’d encourage you to check the code link cause it explains it with all...

im reading through your code now thx bro

plain jungle Jun 1, 2023, 5:41 PM

#

crimson summit this was my logic

Ah I see why you now said -2, yeah you can use it like that. I’m not too sure how it would turn out. When I’ve built DNNs in the past I used the target as my focus instead of my output. But both should net you the same I think

night kernel Jun 1, 2023, 6:17 PM

#

*trying to build my own small ai chatbot for free

ive asked about this a bit before so please excuse me if you saw my messages and i sound redundant.

wanted a piece of advice on LLMs in general. let's say i can find my own LLM for commercial use. said LLM might have its own syntax - i.e. some ai chatbots can be more rude than others, some have sense of humor, etc

what if i want mine to be simple much like chatgpt? no nonsense, just takes my text and is all business. do you suggest i downloaded source code for an already existing LLM and make that adjustment?

plain jungle Jun 1, 2023, 6:19 PM

#

night kernel *trying to build my own small ai chatbot for free ive asked about this a bit be...

I’d start really small with 2 things. First is an NLP (or a tokenizer) and next is a RNN. With those 2 concepts you can combine ideas to make a chat bot. ChatGPT uses something known as a transformer which is a huge combination of the 2 components listed and much more

night kernel Jun 1, 2023, 6:21 PM

#

plain jungle I’d start really small with 2 things. First is an NLP (or a tokenizer) and next ...

thanks for the response. these NLP's and RNN's - do i need to make them or can i download existing source code?

plain jungle Jun 1, 2023, 6:35 PM

#

NLP is a natural language processor. There’s a few out there you can import; however, depending on the scale a lot of hobbyist will just do the letters a-z as 1-28.

RNNs is a Recursive Neural Network and it is used for predicting the next NLP character. Once again, keeping it very lightweight of 1-28 letters, you can get some pretty mid sentences that you’ll need to run through with a simple spell check

#

TensorFlow is my personal recommendation for RNNs

zenith gull Jun 1, 2023, 7:04 PM

#

hey guys i'm wondering if anyone has input on the following.

I have a code for a chatbot that takes an input of multiple pdfs and i then use llm to generate a response. However it seems that my chatbot is strictly limited to what i provide meaning it can't act like chatgpt and access the web.

I'm wondering if theres a way to combine the two and have a program for achatbot that can answer the questions from the document but can also answer general questions.

Thank you in advance for helping

plain jungle Jun 1, 2023, 7:06 PM

#

zenith gull hey guys i'm wondering if anyone has input on the following. I have a code for ...

You’d need to have something interpret what the chatbot says and execute the commands. I’d take a look at AutoGPT cause they do it in a relatively safe way

plain jungle Jun 1, 2023, 10:09 PM

#

For my first crack at an RNN from scratch I’m extremely happy

night kernel Jun 1, 2023, 10:48 PM

#

plain jungle For my first crack at an RNN from scratch I’m extremely happy

nice. how did you make it?

plain jungle Jun 1, 2023, 11:00 PM

#

night kernel nice. how did you make it?

Hopefully soon I’ll have my code in the github, but just took a DNN and used the logic but with a few modifications to make it Recursive instead of Deep

night kernel Jun 1, 2023, 11:01 PM

#

cool man

plain jungle Jun 1, 2023, 11:01 PM

#

Thanks!

magic dune Jun 1, 2023, 11:19 PM

#

plain jungle For my first crack at an RNN from scratch I’m extremely happy

rnn?

crimson summit Jun 1, 2023, 11:20 PM

#

plain jungle For my first crack at an RNN from scratch I’m extremely happy

yesirrrrrr

plain jungle Jun 2, 2023, 12:44 AM

#

magic dune rnn?

Recurrent neural network

potent sky Jun 2, 2023, 3:44 AM

#

plain jungle For my first crack at an RNN from scratch I’m extremely happy

In numpy?

potent sky Jun 2, 2023, 3:49 AM

#

hasty mountain Because man...how I get lost over crazy terminology folks use...especially in Re...

oof, what problems did you have with RL terminology?

plain jungle Jun 2, 2023, 4:03 AM

#

potent sky In numpy?

Yep only Numpy

potent sky Jun 2, 2023, 4:04 AM

#

Nice! When I'd done that I started with thinking I'd just have to make some modifications to my dnn from scratch code but apparently that wasn't extensible enough xd. Had to write RNNs entirely from scratch again ;-;

#

Nice work!

plain jungle Jun 2, 2023, 4:07 AM

#

Thank you! Yeah there was still some new code, but mostly just updating how the back prop needed to be done

potent sky Jun 2, 2023, 4:17 AM

#

plain jungle Thank you! Yeah there was still some new code, but mostly just updating how the ...

Was it a plain RNN or like a GRU?

plain jungle Jun 2, 2023, 4:33 AM

#

potent sky Was it a plain RNN or like a GRU?

Am still trying to get familiar with the lingo, so forgive me for not having a shorter answer. How it works is it treats its layer as a Nx1. So for the sun graph it was a 10x1, and then for a defined amount of steps it world repeat the logic. So a 3x1 would be :

[ A, B, C ]
[ B, C, P1 ]
[ C, P1, P2 ]
…

Back prop is then done treating the layer as just a Nx1, so while the dot product of the error derivative and the weights may return a NxN array, we only look at NxN[-1][-1] turning it back into a 1x1 for bptt

placid cedar Jun 2, 2023, 4:55 AM

#

hi guys, i need some help atm

#

anyone can help me with smth? 😦

potent sky Jun 2, 2023, 4:55 AM

#

plain jungle Am still trying to get familiar with the lingo, so forgive me for not having a s...

What dym by N here. And an RNN shares it's weights across each timestep, you'd only need the gradients for one network.
Also, my question was more along the lines of what gates you're using internall within an RNN unit, whether or not you're using a cell state, whether you're going both forwards and backwards, etc.

placid cedar Jun 2, 2023, 4:56 AM

#

basically now im at my final step for my linea regression model

#

and i want to conduct polynomial expansion

#

but i am not sure which are the columns to apply it for

#

i have transformed my X_train and X_test in such a way

#

#

i want to find out how i can put back the target variable into this, X_train, so that i can make something like this

#

#

is it possible to use my X_train alone to get this?

past meteor Jun 2, 2023, 5:13 AM

#

potent sky oof, what problems did you have with RL terminology?

From my experience RL tends to reuse words that other domains of ML use but they mean something different

#

Because I did a bunch of econometrics and regular stats coursework in undergrad words like bias, robust, variance have many different meanings.😫

potent sky Jun 2, 2023, 5:37 AM

#

The Bias-variance trade-off in traditional ML correspond to the exploration-exploitation trade-off in RL. It's a good analogy imo

past meteor Jun 2, 2023, 5:37 AM

#

RL has its own bias variance trade-off

potent sky Jun 2, 2023, 5:38 AM

#

I can't think of many confusing examples, maybe I'm out of touch ;-;
MDPs, MRPs, State, Environment, Reward signals, agent, all are nicely defined in RL literature

potent sky Jun 2, 2023, 5:38 AM

#

past meteor RL has its own bias variance trade-off

Yes but that one is pretty similar to the ML one so I don't get the confusion ;-;

grim crater Jun 2, 2023, 5:38 AM

#

Hello! I'm having some trouble loading some content from a remote repository. I've set up a Jupyter notebook and attempted to get this dataset, but no luck. Is it something with how I'm grabbing DOWNLOAD_ROOT, HOUSING_PATH, or HOUSING_URL?

Here's the code...

`import os
import tarfile
import urllib

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml2/blob/master/"
HOUSING_PATH = os.path.join("datasets", "housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
os.makedirs(housing_path, exist_ok=True)
tgz_path = os.path.join(housing_path, "housing.tgz")
urllib.request.urlretrieve(housing_url, tgz_path)
housing_tgz = tarfile.open(tgz_path)
housing_tgz.extractall(path=housing_path)
housing_tgz.close()

import pandas as pd

def load_housing_data(housing_path=HOUSING_PATH):
csv_path = os.path.join(housing_path, "housing.csv")
return pd.read_csv(csv_path)

housing = load_housing_data()
housing.head()`

potent sky Jun 2, 2023, 5:39 AM

#

I do remember being a little confused when I was starting out with it. But I can't think of anything rn

past meteor Jun 2, 2023, 5:39 AM

#

It means something else to the statistical learning definition

#

The bias in bias variance in stat learning is inductive bias

#

The bias in bias variance in RL is statistical bias, like having a biased estimator

potent sky Jun 2, 2023, 5:42 AM

#

grim crater Hello! I'm having some trouble loading some content from a remote repository. ...

I think you're never running fetch_housing_data() so there's nothing at housing_path for load_housing_data() to load?

potent sky Jun 2, 2023, 5:42 AM

#

past meteor It means something else to the statistical learning definition

Hmm interesting. What literature did you consult

past meteor Jun 2, 2023, 5:44 AM

#

Maybe in the limit they're the same but the implications are different because the setting is quite different

past meteor Jun 2, 2023, 5:45 AM

#

potent sky Hmm interesting. What literature did you consult

In uni we did the derivations of bias variance in a few courses. I'll look for what they cited

potent sky Jun 2, 2023, 5:47 AM

#

RL can have its own inductive bias too

#

I don't think RL is limited to statistical bias

past meteor Jun 2, 2023, 5:48 AM

#

Oh yeah, RL uses inductive bias when selecting their representation of the environment and function approximators

potent sky Jun 2, 2023, 5:48 AM

#

And stat learning as well has both inductive bias as well as statistical bias

#

For example, yeah I think

past meteor Jun 2, 2023, 5:49 AM

#

But when RL literature says bias from my experience it's really related to the bias you seen in statistics 101, your estimates cannot converge to the population's parameter value

potent sky Jun 2, 2023, 5:49 AM

#

Besides, in terms of a predictor, doesn't statistical bias already subsume any inductive biases we might have started with?

potent sky Jun 2, 2023, 5:50 AM

#

past meteor But when RL literature says bias from my experience it's really related to the b...

Isee. I mostly referred to Sutton&Barto, didn't find the bias-variance concept confusing then. Maybe I didn't study it closely enough

past meteor Jun 2, 2023, 5:51 AM

#

potent sky Isee. I mostly referred to Sutton&Barto, didn't find the bias-variance concept c...

Yeah I read that in full and implemented most things by hand except model-based algos

#

And I remember going into this rabbit hole

potent sky Jun 2, 2023, 5:52 AM

#

Interesting. Do send if you find any of the conclusions you came to then. I'll have to look into this again when I get some time haha

potent sky Jun 2, 2023, 5:54 AM

#

past meteor Yeah I read that in full and implemented most things by hand except model-based ...

I'm guilty of coding selectively only the methods I wanted to try ;-;
The rest I just did the math lol

grim crater Jun 2, 2023, 5:55 AM

#

potent sky I think you're never running `fetch_housing_data()` so there's nothing at `housi...

I was able to get it working. Looks like maybe some basic libraries weren't installed in the initial code, because even fetch_housing_data() was returning an error...something about the requests not coming through...

past meteor Jun 2, 2023, 5:57 AM

#

This is how we determined bias and variance, I found my slides:

#

Maybe I found the RL definiton confusing because I was overthinking it haha

potent sky Jun 2, 2023, 6:04 AM

#

past meteor This is how we determined bias and variance, I found my slides:

For clarity, what's t here?

potent sky Jun 2, 2023, 6:05 AM

#

grim crater I was able to get it working. Looks like maybe some basic libraries weren't ins...

oh hmm

past meteor Jun 2, 2023, 6:07 AM

#

potent sky For clarity, what's t here?

Same as here: https://en.wikipedia.org/wiki/Bias–variance_tradeoff#Derivation

#

8 am is too early for this, but nice chat @potent sky

#

Closing thought is that the implications for both RL and (regular ML) are the same: give a little bias away to drastically reduce variance. For ML this is, imo, closer to reducing model complexity but in RL it's more discussed in the literature in terms of statistics + MC (high variance, low bias) vs TD learning (high bias, low variance).

You were right in saying they're the same thing.

potent sky Jun 2, 2023, 6:15 AM

#

past meteor 8 am is too early for this, but nice chat <@833644804670750750>

Makes sense

potent sky Jun 2, 2023, 6:16 AM

#

past meteor This is how we determined bias and variance, I found my slides:

Hmm seems like regular statistical bias

potent sky Jun 2, 2023, 6:20 AM

#

past meteor Closing thought is that the implications for both RL and (regular ML) are the sa...

Yes haha I recall the MC vs TD example.
I suppose it's pretty similar then but I'll still have a look later if I get some free time

placid cedar Jun 2, 2023, 6:36 AM

#

hi guys

hasty mountain Jun 2, 2023, 6:36 AM

#

potent sky I can't think of many confusing examples, maybe I'm out of touch ;-; MDPs, MRPs,...

They didn't seem that nicely defined in the texts that I've read. So I took quite some time to get that "policy", for example, can be a neural network.
At the time, I didn't know a neural network could be seen as a function. (I'm not in the field of math sciences). I just learned that when I got to study Diffusion Models. (since every tutorial and article about diffusion models says clearly that "the de-noising function can be a neural network")

placid cedar Jun 2, 2023, 6:36 AM

#

im experiencing difficulties at the moment

#

when i put my target y_train, back to my x_train, there are many NaN values in the target variable

#

which im not sure why as well...

#

it was only after standardisation, and this problem has appeared

#

there was no problem fitting my y_train back to my x_train most of the time

#

it was only after scaling, and this problem occurred

#

what may be the possible reason for this tho?

agile cobalt Jun 2, 2023, 6:46 AM

#

do you have any NaNs in y_train?

placid cedar Jun 2, 2023, 6:46 AM

#

not at all

agile cobalt Jun 2, 2023, 6:47 AM

#

it might be some issue with pandas index alignment then

placid cedar Jun 2, 2023, 6:47 AM

#

anyway to fix this?

agile cobalt Jun 2, 2023, 6:47 AM

#

check any operations you might be doing that could modify the indexes

placid cedar Jun 2, 2023, 6:47 AM

#

im starting to tear my hair 🥲

#

can we take this to private chat btw, may bombard a bit too much

agile cobalt Jun 2, 2023, 6:47 AM

#

also double check if check the length/shape of x_train and y_train match

placid cedar Jun 2, 2023, 6:48 AM

#

so sorry if its too inconvenient

agile cobalt Jun 2, 2023, 6:48 AM

#

placid cedar can we take this to private chat btw, may bombard a bit too much

don't worry, not like anyone else is using this channel rn

placid cedar Jun 2, 2023, 6:48 AM

#

ah sure

#

so i went to check the X_train

#

wait lemme find it real quick

#

#

the indexing is like this

#

before scaling

#

but after i did standardisation i mean

#

#

unless there's something wrong with my code or smth

#

scaler = StandardScaler()

scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

X_train_scaled = pd.DataFrame(X_train_scaled, columns=X_train.columns)
X_test_scaled = pd.DataFrame(X_test_scaled, columns=X_test.columns)

placid cedar Jun 2, 2023, 6:53 AM

#

agile cobalt also double check if check the length/shape of x_train and y_train match

yep they do match

agile cobalt Jun 2, 2023, 6:55 AM

#

it looks like the indexes have almost definitely been altered, they went from a crazy order to a default sorted 0, 1, 2, 3, ...?

placid cedar Jun 2, 2023, 6:55 AM

#

lol yeah i assume so xD

#

i think it possibly changed as i used the pd.DataFrame

agile cobalt Jun 2, 2023, 7:00 AM

#

yeah, do not generate new dataframes so carelessly

#

https://scikit-learn.org/stable/auto_examples/miscellaneous/plot_set_output.html

scikit-learn

Introducing the set_output API

This example will demonstrate the set_output API to configure transformers to output pandas DataFrames. set_output can be configured per estimator by calling the set_output method or globally by se...

placid cedar Jun 2, 2023, 7:59 AM

#

how shld i interpret this? and which variables shld i use for my polynomial expansion?

#

#

shld i use polynomial expansion on encoded categorical data? or should i only consider numerical variables

past meteor Jun 2, 2023, 8:30 AM

#

placid cedar how shld i interpret this? and which variables shld i use for my polynomial expa...

I've mentioned residual plots a few times

drowsy timber Jun 2, 2023, 10:33 AM

#

Hi!, Anyone here experienced in using python/pyspark in an aws emr instance?

I'm having difficulty installing cartopy on my instance. It keeps reading out an error. I have all the dependencies installed too like geos, shapely, and pysph

torn mulch Jun 2, 2023, 11:57 AM

#

import numpy as np
def gausseidel(x,y,t=15,e=0.022):
    diags=np.diag(np.abs(x)).copy
    np.fill_diagonal(x,0)
    sum=np.sum(x,axis=1)
    if not np.all(diags>sum):
        return True
Xs = [
    [
      [4, 2, -1],
      [1, -5, 2],
      [2, -1, -4]
    ],
    [
      [3, 4, 5],
      [-3, 7, -4],
      [1, -4, -2]
    ],
    [
      [9, -2, 3, 2],
      [2, 8, -2, 3],
      [-3, 2, 11, -4],
      [-2, 3, 2, 10]
    ]
]
Ys = [
    [41, -10, 1],
    [34, -32, 62],
    [55, -14, 12, -21]
]
for i,X in enumerate(Xs):
    x=np.array(X)
    y=np.array(Ys[i])
    if gausseidel(x,y):
        print("Not Diagonally Dominant")

#

why i cannot run this code?

wooden sail Jun 2, 2023, 11:59 AM

#

what error do you get?

#

at a glance i would think that your usage of copy might be the issue. i think it's a function, so you'd have to do np.diag(...).copy() with parentheses

torn mulch Jun 2, 2023, 12:02 PM

#

wooden sail at a glance i would think that your usage of copy might be the issue. i think it...

Oh i forget that,ty bro

neat stratus Jun 2, 2023, 12:12 PM

#

Hey does anyone here have experience working with running graphs algorithms on a massive graph? I want to run some community detection algorithm on a a huge bi-partite graph(~5million Nodes and ~ 100M edges). Unfortunately for me I run out of memory using networkx and scipy sparse matrices. I'm wondering if there's a libray like networkx but backed by disk instead of being in memory. Any other solutions are welcome as well

celest vine Jun 2, 2023, 12:17 PM

#

Is PySpark just pandas but for big data?

snow fog Jun 2, 2023, 12:53 PM

#

Hi, what do you guys think of a library which will tell you where exactly (in which stdlib) a function is defined in python?
I have written it , but the problem is do you guys know about an ml model which can calculate the semantic similarly between pow and get_power?
Why am I asking this ?
I am writing this Library for python newcomers who may come from a different language and in their language there is a function named literal_eval as i am iterating over all stdlibs function i will calculate semantic similarity with this keyword.

past meteor Jun 2, 2023, 1:07 PM

#

celest vine Is PySpark just pandas but for big data?

Yesn't,

The syntax is different.

It's suitable for distributed computing over different nodes.

It can work with larger than memory datasets because it spills to disk.

It has way more overhead than pandas because it runs on the JVM so for small operations it's just wasteful.

coral field Jun 2, 2023, 2:06 PM

#

Why does training a model to classify colored images take much longer and for more epochs than using grayscale? Is it because of the two extra dimensions.

timid grove Jun 2, 2023, 2:25 PM

#

**NotImplementedError: Cannot copy out of meta tensor; no data! **
same issue:
https://github.com/togethercomputer/OpenChatKit/issues/87

I am having the same problem i loaded the model checkpoint shards in both float32 and bfloat16 but it does not work for me i do not know for what reason.

This is my google colab file its a request to have a look in it.
https://drive.google.com/file/d/1-ccrx1Q5tkLUYtZBGi5lNZGjPMyr_X9U/view?usp=sharing

AN OVERVIEW OF MY CODE:
i am using https://huggingface.co/HuggingFaceH4/starchat-alpha model, finetuning it on my own dataset. Firstly i using the meta device i made a device_map to load the checkpoint shards to my device , then i initialized my model using the downloaded checkpoints on my session storage then i loaded the weights tied them and finally i used acceletator load_checkpoint_and_dispatch and passed the folder contaning checkpoints and .josn files which is giving me this error.

This is the code snip that is giving me error:

The error:

my checkpoint folder that i am passing.

Please correct if i am conceptually wrong or missing some imp step.
I am using colab pro for running this code.

Thank You!
If anyone has worked with the same error please help.
Your inputs will be highly appreciated.
I am struggling with this error from past 5 days but not able to find the solution so** PLEASE HELP !**

GitHub

NotImplementedError: Cannot copy out of meta tensor; no data! · Iss...

While trying to implement Pythia-Chat-Base-7B I am getting this error on running the very fist command (python inference/bot.py --model togethercomputer/Pythia-Chat-Base-7B) after creating and acti...

Google Docs

starchatFinetuning🤗.ipynb

Colaboratory notebook

HuggingFaceH4/starchat-alpha · Hugging Face

mild dirge Jun 2, 2023, 3:06 PM

#

coral field Why does training a model to classify colored images take much longer and for mo...

The convolutional layers have 3 times as many parameters, and thus it also takes a lot longer to calculate the convolutions

coral field Jun 2, 2023, 3:10 PM

#

Do you also know why the training accuracy decreases so little from epoch to epoch? What's generally the ideal amount of epochs to train a model like this?

mild dirge Jun 2, 2023, 3:11 PM

#

Depends on way too many things to give a general answer. Your image size, network size, problem difficulty, optimization algorithm, activation etc.

#

Even with that info it would be hard to say

cosmic narwhal Jun 2, 2023, 3:50 PM

#

Hello everyone! I am currently building a machine learning algorithm whose objective is to try to predict the outcome of next years March Madness basketball tournament. My plan is to feed it about 6 statistical categories that correlate with tournament success from this year, then fine tune it to make it accurate. Once it works fairly well, I will use that same algorithm for next year. Will this be effective? What advice would you have on making it as accurate as possible?

serene scaffold Jun 2, 2023, 3:52 PM

#

cosmic narwhal Hello everyone! I am currently building a machine learning algorithm whose objec...

when you say "feed it statistical categories", what is "it"? what kind of model is it?

cosmic narwhal Jun 2, 2023, 3:54 PM

#

I have only created an outline for it, but it looks like this.

#

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

Loads the CSV data into a pandas DataFrame so it can be manipulated

df = pd.read_csv('March Madness Data/2023 Game Data.csv')

The 5 categories that we are evaluating

selected_features = ['Kenpom adjusted efficiency', 'BARTTORVIK ADJUSTED EFFICIENCY',
'EFG %', 'DEFENSE, EFG', 'POINTS PER POSSESSION OFFENSE',
'POINTS PER POSSESSION DEFENSE']

Now create a new dataframe with the most valuable feautures

df_X = df[selected_features]

Assuming 'target' is the name of your target column

if 'target' in df.columns:
df_y = df['target']
else:
df_y = None # Set the target variable to None or any appropriate default value

Split the data into a training set and a test set. This will likely be a statistic and the output of that statistic

X_train, X_test, y_train, y_test = train_test_split(df_X, df_y, test_size=0.2, random_state=42)

Creates an instance of the model

lr = LogisticRegression()

Trains the model to observe accuracy

lr.fit(X_train, y_train)

Makes predictions on the test set to see how effective it is

predictions = lr.predict(X_test)

Print the accuracy

print("Accuracy: ", accuracy_score(y_test, predictions))

hasty mountain Jun 2, 2023, 3:55 PM

#

Oh...Discord became a Jupyter notebook Markdown

agile cobalt Jun 2, 2023, 3:55 PM

#

!code

arctic wedgeBOT Jun 2, 2023, 3:55 PM

#

Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

placid cedar Jun 2, 2023, 3:59 PM

#

hi guys

#

for my categorical values, i have encoded them, as well as did standardisation on them. should i put it in my polynomial expansion, since after putting them in it, the mse and r-square test results improved quite dramatically, from 0.57 to 0.62

serene scaffold Jun 2, 2023, 4:01 PM

#

cosmic narwhal import pandas as pd from sklearn.model_selection import train_test_split from sk...

"Set the target variable to None or any appropriate default value" for what you're trying to do, there's nothing you can do if you don't have a y.

cosmic narwhal Jun 2, 2023, 4:03 PM

#

Yes, I just haven’t determined my y yet, so that is a placeholder for now

#

I am very much learning as I’m going haha

serene scaffold Jun 2, 2023, 4:03 PM

#

anyway, suppose you train the logistic regression model. once you have it, it needs all the features in your x data in order to work. so you can't use it for a future situation unless you know what the kenpom adjusted efficiency, efg, points per possession offense, etc. are

#

and I imagine that at that point, you already know who won

cosmic narwhal Jun 2, 2023, 4:09 PM

#

I would access that data right before the tournament starts for next year, and try to utilize the same algorithm. Any ideas on what would be a good Y variable?

potent sky Jun 2, 2023, 4:09 PM

#

Using the same method, can you try a smaller model that should fit entirely into your GPU memory easily and see what happens then?

cosmic narwhal Jun 2, 2023, 4:09 PM

#

What type of model would you suggest for that?

potent sky Jun 2, 2023, 4:10 PM

#

Looking at the GitHub issue it looks like it's because the model doesn't fit into GPU memory entirely and the handling for that isn't all that good. So identifying the source of the problem first will help arrive at a solution

potent sky Jun 2, 2023, 4:10 PM

#

timid grove **NotImplementedError: Cannot copy out of meta tensor; no data! ** same issue: h...

^^

serene scaffold Jun 2, 2023, 4:12 PM

#

@cosmic narwhal Stargazer is not talking to you

potent sky Jun 2, 2023, 4:12 PM

#

Yeah mb, I thought I was replying to the right message but apparently not

cosmic narwhal Jun 2, 2023, 4:12 PM

#

No worries

serene scaffold Jun 2, 2023, 4:13 PM

#

cosmic narwhal I would access that data right before the tournament starts for next year, and t...

I don't know basketball--you're saying that all your X features are information that's available before the game is played?

cosmic narwhal Jun 2, 2023, 4:13 PM

#

serene scaffold anyway, suppose you train the logistic regression model. once you have it, it ne...

In terms of the model itself, does it look like it’ll get the job done? Any other critiques?

cosmic narwhal Jun 2, 2023, 4:14 PM

#

serene scaffold I don't know basketball--you're saying that all your X features are information ...

It’s public information that is provided after each game of the regular season. So right before the tournament, I would have access to all that data from the season leading up to that

serene scaffold Jun 2, 2023, 4:14 PM

#

cosmic narwhal It’s public information that is provided after each game of the regular season. ...

and your goal is to predict what? which single team will win the tournament?

plain jungle Jun 2, 2023, 4:15 PM

#

cosmic narwhal I would access that data right before the tournament starts for next year, and t...

I’m not too sure on how comfortable you are with AI so… If I had to tackle a problem as a beginner, I’d attempt the following.

Collect the outcomes of the previous March Madnesses / season games.

Find the average win % of a team

Build a tree of your next year bracket, and match the win %s and whoever has the higher % on average moves to the next round

There’s a lot of moving pieces to predict who would win such as players, home court adv, etc… this won’t garuentee you the winning answer, but it is a beginners step into the right direction for approaching future ai. The more you make models the more advance your models will become over time

cosmic narwhal Jun 2, 2023, 4:15 PM

#

serene scaffold and your goal is to predict what? which single team will win the tournament?

Yes, as best I can

cosmic narwhal Jun 2, 2023, 4:16 PM

#

plain jungle I’m not too sure on how comfortable you are with AI so… If I had to tackle a pro...

I appreciate your recommendation. Thank you

serene scaffold Jun 2, 2023, 4:16 PM

#

cosmic narwhal Yes, as best I can

so for whichever team won the tournament in the training data, you can make their y value 1. and whichever team was eliminated first, you can make theirs 0. and then everyone else can get a number between 0 and 1 based on how close they got to winning

cosmic narwhal Jun 2, 2023, 4:17 PM

#

I will add that into my code right now. Thank you for everything @serene scaffold I hope that this model turns out to be reasonably accurate haha

torn mulch Jun 2, 2023, 5:11 PM

#

import numpy as np
def f(x):
    x**6+2*x**2-3
def g(x):
    5*x**5+4*x
def newton(x0,error,iteration,max):
    x1= x0-(f(x0)/g(x0))
    print(f"Iteration of {iteration} new root = {x1}")
    if(np.abs(f(x1)) < error):
        return True
    if(iteration == max):
        return False
    newton(x1,error,iteration+1,max)
if not newton(4,0.01,1,15):
    print("cannot find the root")

why i cannot run this code?

#

TypeError Traceback (most recent call last)
Cell In[1], line 14
12 return False
13 newton(x1,error,iteration+1,max)
---> 14 if not newton(4,0.01,1,15):
15 print("cannot find the root")

Cell In[1], line 7, in newton(x0, error, iteration, max)
6 def newton(x0,error,iteration,max):
----> 7 x1= x0-(f(x0)/g(x0))
8 print(f"Iteration of {iteration} new root = {x1}")
9 if(np.abs(f(x1)) < error):

TypeError: unsupported operand type(s) for /: 'NoneType' and 'NoneType'

wooden sail Jun 2, 2023, 5:15 PM

#

torn mulch ```py import numpy as np def f(x): x**6+2*x**2-3 def g(x): 5*x**5+4*x de...

your f, g, and newton functions don't return anything. in python, functions without an explicit return will return None

#

also recursion in python isn't the best

torn mulch Jun 2, 2023, 5:17 PM

#

wooden sail your f, g, and newton functions don't return anything. in python, functions with...

oh ok ty bro

torn mulch Jun 2, 2023, 5:28 PM

#

wooden sail your f, g, and newton functions don't return anything. in python, functions with...

import numpy as np
def f(x):
    return x**6+2*x**2-3
def g(x):
    return 6*x**5+4*x
def newton(x0,error,iteration,max):
    x1= x0-(f(x0)/g(x0))
    print(f"Iteration of {iteration} new root = {x1}")
    if(np.abs(f(x1)) < error):
        print("A")
        return True
    if(iteration == max):
        return False
    newton(x1,error,iteration+1,max)
if not newton(4,0.01,1,15):
    print("cannot find the root")
else:
    printf("root found!")

when this code return true why not print root found?

wooden sail Jun 2, 2023, 5:29 PM

#

your newton method still doesn't have a proper return

#

your recursive call needs to be in a return as well

#

otherwise the inner returns don't get passed back out to the previous calls

#

imagine an inner newton iteration finds a root and returns true

#

this turns your code into

def newton(x0,error,iteration,max):
    x1= x0-(f(x0)/g(x0))
    print(f"Iteration of {iteration} new root = {x1}")
    if(np.abs(f(x1)) < error):
        print("A")
        return True
    if(iteration == max):
        return False
    True
``` but that last True is not returned anywhere, and the function returns None unless the solution is found at the very first iteration

serene scaffold Jun 2, 2023, 5:36 PM

#

Javapython Sadge

torn mulch Jun 2, 2023, 5:39 PM

#

import numpy as np
def f(x):
    return x**6+2*x**2-3
def g(x):
    return 6*x**5+4*x
def newton(x0,error,iteration,max):
    x1= x0-(f(x0)/g(x0))
    print(f"Iteration of {iteration} new root = {x1}")
    if(np.abs(f(x1)) < error):
        print("A")
        return True
    if(iteration == max):
        return False
    True
    newton(x1,error,iteration+1,max)
if not newton(4,0.01,1,15):
    print("cannot find the root")
else:
    print("root found!")

it still cannot print root found

wooden sail Jun 2, 2023, 5:40 PM

#

i think you misunderstood what i told you 😛

#

what i meant was that you need

def newton(x0,error,iteration,max):
    x1= x0-(f(x0)/g(x0))
    print(f"Iteration of {iteration} new root = {x1}")
    if(np.abs(f(x1)) < error):
        print("A")
        return True
    if(iteration == max):
        return False
    return newton(x1,error,iteration+1,max)
``` and the code i wrote was an explanation why

past meteor Jun 2, 2023, 5:45 PM

#

serene scaffold Javapython <:Sadge:859381189676630026>

Do you use Jython?

serene scaffold Jun 2, 2023, 5:45 PM

#

past meteor Do you use Jython?

no. I'm referring to how Edd retained the Java style of the user's code

torn mulch Jun 2, 2023, 5:46 PM

#

wooden sail i think you misunderstood what i told you 😛

oh ok

cold osprey Jun 2, 2023, 5:47 PM

#

whats java about it?

torn mulch Jun 2, 2023, 5:48 PM

#

wooden sail i think you misunderstood what i told you 😛

joe_salute

cobalt sleet Jun 2, 2023, 6:02 PM

#

hello friends I have a question

#

are ConvNets dead? Are transformers the new big boy in image processing?

past meteor Jun 2, 2023, 6:04 PM

#

Transformers have much less bias so that means they need a lot more data but they're a lot more flexible

#

I also think they need more memory than conv nets so they have their time and place

#

Some CNN's can run on edge/mobile/... I'm not sure the run of the mill vision transformer can but ViT's are such active research that I could be so wrong by now 🤣

#

https://arxiv.org/abs/2305.07027 etc etc

arXiv.org

EfficientViT: Memory Efficient Vision Transformer with Cascaded Gro...

Vision transformers have shown great success due to their high model
capabilities. However, their remarkable performance is accompanied by heavy
computation costs, which makes them unsuitable for real-time applications. In
this paper, we propose a family of high-speed vision transformers named
EfficientViT. We find that the speed of existing tra...

coral field Jun 2, 2023, 6:08 PM

#

what's the purpose of adding dense layers to a cnn? why not just only use convolutional layers?

cobalt sleet Jun 2, 2023, 6:08 PM

#

past meteor https://arxiv.org/abs/2305.07027 etc etc

interesting. So it seems to me that they could be in the path of becoming old technology

#

relics of a bygone era

#

crazy I've only heard about them recently

#

I thought cnns like ConvNext were going to remain the big boys for a while

past meteor Jun 2, 2023, 6:09 PM

#

cobalt sleet interesting. So it seems to me that they could be in the path of becoming old te...

I doubt that will be the case any time soon.

cobalt sleet Jun 2, 2023, 6:10 PM

#

yeah but transformer being the efficient and flexible building block as it is will likely take over

past meteor Jun 2, 2023, 6:22 PM

#

Linear regression didn't stop being a thing after transformers either

#

Occam's razor

tacit knot Jun 2, 2023, 6:25 PM

#

I'm building an open source project (hope to release first version late today or tomorrow, finally) built on top of a CNN (ZoeDepth) at the core. None of the other AI models or approaches were even very close to the performance and quality of this one. That said, I'm more of a "user" of AI stuff than a "developer" of AI right now. Also, to say I'm new to AI (and python lol) is a bit of an understatement. Only point is, there are bleeding edge / state of the art AIs still rolling out as CNNs, so I doubt they are going to disappear any time all that soon.

#

I've also decided that the options for dependency management in python all feel oddly complex and convoluted...not a fan lol

past meteor Jun 2, 2023, 7:02 PM

#

tacit knot I've also decided that the options for dependency management in python all feel ...

Poetry is quite simple after you read the docs

severe topaz Jun 2, 2023, 7:25 PM

#

#

Which one of these methods is popular I’ve reviewed some Gibbs and PCA lately… what is everyone using?

tacit knot Jun 2, 2023, 7:50 PM

#

past meteor Poetry is quite simple after you read the docs

Thanks for the suggestion, I'll take a look at it actually. My point still stands though, I've never even heard of that one lol

past meteor Jun 2, 2023, 7:51 PM

#

Venv (inside stdlib) in conjunction with poetry is a great fit

tacit knot Jun 2, 2023, 7:56 PM

#

ah gotcha, venv/conda are the main things I've been working with, but some of these bleeding-edge AI projects are barely held together it seems. Then again, I'm on linux so that occasionally makes things much easier, but sometimes much much harder. Getting NeRF (instant-ngp specifically) up and running was kinda crazy of a process as one of the more difficult examples recently.

past meteor Jun 2, 2023, 7:58 PM

#

Afaik pip installing can kill a project because there's no real dependency checking that is as stringent as poetry

tacit knot Jun 2, 2023, 7:58 PM

#

I guess my main issue is always being presented with 5 options that are often only halfway explained. Three weeks ago I had never heard of venv/conda/pipx/etc (brand new to python AND ai/ml stuff)

#

Dang, where does the time go, maybe it has been 5-6 weeks now that I think about it.

#

hmm poetry seems interesting, but I'm curious, how would you compare it to npm/yarn? or just a different kind of animal?

wooden sail Jun 2, 2023, 8:04 PM

#

past meteor Afaik pip installing can kill a project because there's no real dependency check...

pip's dependency check has gotten pretty good as of late

#

it didn't use to do that at all, back in the days when conda was originally conceived

potent sky Jun 2, 2023, 8:18 PM

#

cobalt sleet interesting. So it seems to me that they could be in the path of becoming old te...

Convolutional operations and convolutional neural nets by extension have some unique properties that make them especially suitable for image processing. Though ViTs are amazing and a lot of my research involves them, raw ViTs have some fundamental characteristics that don't render then equally suitable in some respects.
There's a lot of research going on in this but so far all successful approaches make use of CNNs with ViT to incorporate these properties (that I know of)
So it's sometime atleast before CNNs are relics of a bygone era haha

potent sky Jun 2, 2023, 8:21 PM

#

tacit knot I've also decided that the options for dependency management in python all feel ...

What about poetry

potent sky Jun 2, 2023, 8:21 PM

#

past meteor Venv (inside stdlib) in conjunction with poetry is a great fit

Exactly, that's what I generally use and have seen being used

potent sky Jun 2, 2023, 8:23 PM

#

wooden sail it didn't use to do that at all, back in the days when conda was originally conc...

I think conda is past it's time. I know it's still immensely popular, especially in ML/DS but I honestly don't like it

#

pip + venv I've found to be quite sufficient

Poetry for serious projects

tacit knot Jun 2, 2023, 8:27 PM

#

I guess my issue has been when trying to work with bleeding-edge stuff that isn't even JUST python, for example: https://github.com/NVlabs/instant-ngp#building-instant-ngp-windows--linux

GitHub

GitHub - NVlabs/instant-ngp: Instant neural graphics primitives: li...

Instant neural graphics primitives: lightning fast NeRF and more - GitHub - NVlabs/instant-ngp: Instant neural graphics primitives: lightning fast NeRF and more

#

Getting that up and running was.....challenging lol

#

there is a line "also run pip install -r requirements.txt" as basically an afterthought

potent sky Jun 2, 2023, 8:29 PM

#

Mhm but most of the build tools for this are non Python
Python is optional, just for bindings ig

#

If you do want python bindings then go for pip install requirements

tacit knot Jun 2, 2023, 8:31 PM

#

yea, i mean i got it all working, but I guess i'm associating stuff to python incorrectly, that is just one of the many places things can get complex with these AI/ML things

potent sky Jun 2, 2023, 8:31 PM

#

pip install -r requirements.txt is standard when setting up the dependencies for any python proj (even the filename requirements.txt). That might be why it appears as an afterthought - because for a lot of people, it is

potent sky Jun 2, 2023, 8:31 PM

#

tacit knot yea, i mean i got it all working, but I guess i'm associating stuff to python in...

Mhm possibly, fair enough

tacit knot Jun 2, 2023, 8:33 PM

#

Sure, but just following that can kill a different project when not using venv or whatever. Early on every time I tried to stand up a new project just to try it out I'd end up destroying my environment for several other things.

#

Then again, I'm just venting really, shouldn't be complaining about bleeding-edge crazy AI things being a little tricky lol

wooden sail Jun 2, 2023, 8:36 PM

#

potent sky I think conda is past it's time. I know it's still immensely popular, especially...

i still think conda does a good job. easy way of managing both venvs and packages, and it's still arguably the easiest way of getting intel-optimized packages

#

also you don't deal with the safety issues of pypi, since the repos are curated

modest onyx Jun 2, 2023, 8:42 PM

#

potent sky Convolutional operations and convolutional neural nets by extension have some un...

I haven’t read up on ViTs yet (heard of them today only surprisingly). I was quite shocked when I found out they outperform CNNs but that would be quite misleading if the ViT models incorporate CNNs as part of their architecture

errant bison Jun 2, 2023, 9:47 PM

#

potent sky Same, processing each frame successively. If you want you can include an object ...

can u mention which libraries or methodologies can i use for object tracker? And i mean how would it detect objects like cars, and bicycles? what approach u suggest

pseudo spire Jun 2, 2023, 9:54 PM

#

errant bison can u mention which libraries or methodologies can i use for object tracker? And...

YOLO

errant bison Jun 2, 2023, 9:56 PM

#

pseudo spire YOLO

i am unable to find any proper tutorial

#

every tutorial says to clone a git

hasty mountain Jun 2, 2023, 10:14 PM

#

past meteor Transformers have much less bias so that means they need a lot more data *but* t...

Yes, but man...Transformers are so temperamental to train...

#

Warming steps, adjusting learning rate, the residual blocks unbalancing the gradients and possibly leading to model collapse, the problems of using Teacher Enforcing...

#

Besides... One can't make a GAN using a Transformer because it's too efficient as Discriminator and it collapses the adversarial process. No fun!

But maybe someday I'll try something with a GPT as Generator and a BERT as Discriminator brainmon

rain garnet Jun 2, 2023, 10:29 PM

#

Hey, I'm currently working on a data science project and I'd like some advice.

The problem description is that there are multiple stores, and each store has multiple products. What I want to do is based on recent sales data, I want to predict a good price to sell each product.

I'm guessing it wouldn't be a good idea to train a model for each product, and for each store, so in this case, what can I do? I am using linear regression and predicting the demand of a given product based on the price.

plain jungle Jun 2, 2023, 11:11 PM

#

rain garnet Hey, I'm currently working on a data science project and I'd like some advice. ...

Have you thought about taking the center of mass of the data? Take a price such as 1 dollar, and give it a weight of all sold, such as 10 units. Then with another datapoint, such as $2 and 5 units. You can find a center of mass between the two like $1.25

somber pollen Jun 2, 2023, 11:19 PM

#

Do you know what the correct price is? Or are you just trying to provide a forecast?

rain garnet Jun 2, 2023, 11:19 PM

#

plain jungle Have you thought about taking the center of mass of the data? Take a price such ...

I won't have access to the data of the entire market though

rain garnet Jun 2, 2023, 11:20 PM

#

somber pollen Do you know what the correct price is? Or are you just trying to provide a forec...

I am trying to predict the optimal price to sell a product that would give the most profit

somber pollen Jun 2, 2023, 11:20 PM

#

rain garnet I am trying to predict the optimal price to sell a product that would give the m...

If you want to do that you're going to have to incorporate economics math which is a pain

#

You can only find the optimal price for a product if you can measure how much demand decreases for a given increase in price

#

Optimal in the sense of maximizing profit, if you just want a good sense of what a "fair" price is you can just take the average

rain garnet Jun 2, 2023, 11:21 PM

#

I'm basically just using a simple algorithm which seems to work for now, I'm plotting # of orders vs price, and multiplying it with the profit

rain garnet Jun 2, 2023, 11:21 PM

#

somber pollen You can only find the optimal price for a product if you can measure how much de...

yeah I'm using linear regression for that

somber pollen Jun 2, 2023, 11:23 PM

#

Multiplying with the profit? Do you mean like plotting on a graph versus it?

rain garnet Jun 2, 2023, 11:24 PM

#

Demand * profit for all the possible prices, so that would give me the max profit I can get theoretically

somber pollen Jun 2, 2023, 11:24 PM

#

how would you sell the product for multiple prices at a time though?

rain garnet Jun 2, 2023, 11:24 PM

#

I'd have to test it

#

I'd start with the lowest price, then the highest price

#

and then go from there

somber pollen Jun 2, 2023, 11:25 PM

#

Generally the best way is to plot the thing you can control (in this case price) versus the thing you're trying to maximize (profit)

rain garnet Jun 2, 2023, 11:25 PM

#

collect more data

somber pollen Jun 2, 2023, 11:25 PM

#

If you multiply them together then the relationship becomes less clear

rain garnet Jun 2, 2023, 11:25 PM

#

yeah but I won't have profit in the dataset

#

I'll have number of orders for a given price

somber pollen Jun 2, 2023, 11:26 PM

#

Ohhh, I see what you mean. So you're basically comparing the price versus the computed profit for that price, which is done by multiplying the number of orders vs the price etc

#

Yeah that sounds like a good approach

rain garnet Jun 2, 2023, 11:27 PM

#

yep

#

it's kind of a brute force approach rn tho

#

this won't scale, cause i'd need a model for each product for each store in this case

somber pollen Jun 2, 2023, 11:39 PM

#

rain garnet this won't scale, cause i'd need a model for each product for each store in this...

a linear regression model will be marginally expensive in terms of time. just a basic neural network is like thousand of times more expensive in terms of compute--you definitely don't need to worry about making a model for each of something that is almost certainly less than 10,000 (unless this project is for like Amazon lol)

plain jungle Jun 3, 2023, 12:18 AM

#

Agreed NNs are good but shouldn’t be seen as the solution to all. This sounds like something that some algos could solve more efficiently

hasty mountain Jun 3, 2023, 12:29 AM

#

I suppose Random Forests would be expensive as well...since they're...a bit like NNs?

#

pithink

rain garnet Jun 3, 2023, 12:33 AM

#

somber pollen a linear regression model will be marginally expensive in terms of time. just a ...

yeah I guess, each product would only have like 4-5 datapoints anyways

somber pollen Jun 3, 2023, 12:34 AM

#

rain garnet yeah I guess, each product would only have like 4-5 datapoints anyways

yeah it's important to get like an order of magnitude estimation of how complex your model is to figure out whether it would fit your performance characteristics

hasty mountain Jun 3, 2023, 12:38 AM

#

Speaking of expensive models... Is there a way to use Genetic Algorithms together with Stochastic Gradient Descent to optimize my model without obliterating my GPU or RAM?

#

I was thinking about trying something like that for a Variational AutoEncoder...or for another model which the loss is decreasing a bit slowly(but still can be trained without overfitting problems)

potent sky Jun 3, 2023, 3:13 AM

#

wooden sail also you don't deal with the safety issues of pypi, since the repos are curated

Okay that is a good point tbh

potent sky Jun 3, 2023, 3:15 AM

#

modest onyx I haven’t read up on ViTs yet (heard of them today only surprisingly). I was qui...

Raw ViTs (without incorporating CNNs) outperform SOTA CNNs on some tasks. However on tasks like dense prediction, the fundamental structure of raw ViTs presents some disadvantages that are currently mitigated by incorporating CNNs somewhere in the process

potent sky Jun 3, 2023, 3:16 AM

#

errant bison can u mention which libraries or methodologies can i use for object tracker? And...

YOLO.
iirc, there are many tutorials on YT specifically to build object trackers, if that's more your thing

potent sky Jun 3, 2023, 3:16 AM

#

hasty mountain Besides... One can't make a GAN using a Transformer because it's too efficient a...

Use a transformer as generator too-
xd

potent sky Jun 3, 2023, 3:19 AM

#

hasty mountain Speaking of expensive models... Is there a way to use Genetic Algorithms togethe...

Both are parameter optimization methods in their own right, how would you interleave them? I suppose you could use some sort of weighted update or serial update (genetic then SGD) but does that even make sense mathematically?

potent sky Jun 3, 2023, 3:20 AM

#

potent sky Okay that is a good point tbh

But for any serious project you should probably conduct an independent screening of all the repos you're planning to use, irrespective of whether you use conda or not

rich sail Jun 3, 2023, 4:33 AM

#

Hey could anyone help me with my problem in python-help, greatly appreciated! 🙏

distant cosmos Jun 3, 2023, 6:23 AM

#

Can someone help me understand LSA and it's uses ?
Like i am using it as a metric to analyze similarity between two directories which contain 100s of files in them

#

Is this the right use case ?

rose dagger Jun 3, 2023, 8:28 AM

#

A question about Fully Convolutional Networks: Is the "skip connection" literally just adding the layer of the encoder part to the corresponding layer (of equal spatial dimensions) of the decoder part?

past meteor Jun 3, 2023, 8:35 AM

#

hasty mountain Speaking of expensive models... Is there a way to use Genetic Algorithms togethe...

Yes, this is super Common

#

This is called "local search" in optimization literature. So you run your genetic algorithm as you would normally and then at specific points you run a local search, if your problem allows it, it could be a few iterations of SGD you run at that moment.

#

Imagine you have a (nearly) continuous function that has many local optima, for example the egg holder function. At this point you may want to consider running a genetic algorithm.

You run it as is. Mu is a hyperparameter of your genetic algo, it's the mean of a poisson distribution that you sample from. Whatever you get from this sample determines how many iterations of SGD you run on each individual in the population.

Plus side is that you're generally going to land exactly at some local optimum exactly faster but the downside is that if mu is too high your population will converge very fast to 1, potentially, suboptimal solution.

hasty mountain Jun 3, 2023, 10:28 AM

#

potent sky Both are parameter optimization methods in their own right, how would you interl...

I was thinking about processing them in parallel, and then use whichever provides the lower loss for each iteration.
I suppose that this would mean that most of the time the optimization will be done by SGD, and sometimes the genetic optimization will take over

#

I just hope the genetic optimization wouldn't make an optimization so abrupt that it would break my model yert

hasty mountain Jun 3, 2023, 10:30 AM

#

past meteor Imagine you have a (nearly) continuous function that has many local optima, for ...

Hm...now you're making me consider using it for a GAN... brainmon

#

Maybe for a Reinforcement Learning model would be better, but usually my RL models also must have a really short processing time(I like to use them to play games)

past meteor Jun 3, 2023, 10:30 AM

#

It'll be really expensive. Individual represents a full model. Each step of local search is training a new model

hasty mountain Jun 3, 2023, 10:31 AM

#

Oh... I see... Then nah.
I was thinking more of using a genetic algorithm where each individual is a weight from my model layers.

past meteor Jun 3, 2023, 10:31 AM

#

Depending on what your search is for, I'm thinking of architecture or hyperparameter search

hasty mountain Jun 3, 2023, 10:36 AM

#

rose dagger A question about Fully Convolutional Networks: Is the "skip connection" literall...

Not adding the layer. It's actually taking the output of one encoder layer and passing it to the decoder.

At least in the UNet architecture, which is the image that you've shown. In general, a skip connection is just using the output of an layer in another layer that isn't the next one(which is what is done normally)

#

The ResNet architecture can show you clearly what is a residual connection... it was the model that popularized them, afterall.

rose dagger Jun 3, 2023, 10:40 AM

#

hasty mountain Not adding the layer. It's actually taking the output of one encoder layer and p...

I see, thanks.

potent sky Jun 3, 2023, 11:12 AM

#

hasty mountain I was thinking about processing them in parallel, and then use whichever provide...

I think this would either be very susceptible to getting stuck in local minima, or be very expensive

potent sky Jun 3, 2023, 11:18 AM

#

rose dagger A question about Fully Convolutional Networks: Is the "skip connection" literall...

Yes, in some cases it is addition of the outputs of one layer to the inputs of another, generally as in Residual Neural Networks (Here I mean element-wise addition of the feature maps)
In other cases, a skip connection can also represent concatenation of two feature maps, i.e stacking them on top of each other.
U-Net uses concatenation, ResNet-18 uses addition, DenseNets use concatenation, etc.

#

The decision-making for when to use concatenation and when to use addition is subtle

#

Both methods allow you to preserve information from previous layers in the network and propagate gradients more effectively and thus both come under "skip connections"

hasty mountain Jun 3, 2023, 3:25 PM

#

potent sky I think this would either be very susceptible to getting stuck in local minima, ...

Local optima...ugh... This is such a killjoy for my experiments... I think one type of Attention Mechanism I made probably isn't as efficient as it could be exactly because of local minima. And I'm in trouble with RL exactly because of that.

hasty mountain Jun 3, 2023, 3:29 PM

#

potent sky The decision-making for when to use concatenation and when to use addition is su...

Generally, what I've seen is: If you want a simple residual conection, maybe to stabilize your model(Transformer) or to help in information preservation or backpropagation, then you should prefer element-wise addition. If you want to condition your output based on some information you got from previous layers, use concatenation.

#

This is what I see people using in generative models, at least.

twilit arch Jun 3, 2023, 3:49 PM

#

How do arguments work in midjourney? Is the model trained to respond to arguments like --ar or --niji or is that something they handle in the middle?

hasty mountain Jun 3, 2023, 4:00 PM

#

twilit arch How do arguments work in midjourney? Is the model trained to respond to argument...

Yes. You execute a script in the command shell, and when you pass the arguments together with the command to run the model, it'll create a dictionary of arguments which will be used for many steps to define the whole process, such as the model, the input, the process...etc.

twilit arch Jun 3, 2023, 4:01 PM

#

so I should train with the arguments then

hasty mountain Jun 3, 2023, 4:01 PM

#

I don't know how the scripts for midjourney works, but most models available to the public are like that. You run a script in the command shell, the script will execute the model and the arguments you've passed will be used to create a dictionary of arguments

twilit arch Jun 3, 2023, 4:01 PM

#

hm alright

#

im planning on training openjourney so

hasty mountain Jun 3, 2023, 4:02 PM

#

twilit arch im planning on training openjourney so

If you want to study their code, a protip: look for an argparse

#

This is the module that will probably be used to create the dictionary of arguments from the command shell

agile cobalt Jun 3, 2023, 4:10 PM

#

twilit arch How do arguments work in midjourney? Is the model trained to respond to argument...

for things like niji, it completely changes which model is used
for things like quality, it changes hyperparameters like the noise schedule

#

for ar I'm not sure, could be using a different upscaler

potent sky Jun 3, 2023, 4:17 PM

#

hasty mountain Generally, what I've seen is: If you want a simple residual conection, maybe to ...

Exactly.
The underlying principle is that element-wise addition is akin to an adjustment in the parameter space of that layer, so it should preferably come from a layer with a similar parameter space -> "closer" layer.
Concatenation is akin to collecting more features. These might be at different levels or hierarchies of representation and so addition doesn't make sense. Generally this translates to:
skip connections between "closer" layers -> element-wise addition
"Farther" layers -> concat

The basic multi-head attention unit and encoder unit in transformers have skip connections for almost successive layers, and rightly so addition makes sense here

past meteor Jun 3, 2023, 4:20 PM

#

I need to do something with transformers because just reading about them makes it feel so vague

#

I understand all the pieces intuitively but that's about it

potent sky Jun 3, 2023, 4:22 PM

#

Hmm have you built it from scratch

past meteor Jun 3, 2023, 4:22 PM

#

No but I'm not sure how much that'll help me.

potent sky Jun 3, 2023, 4:23 PM

#

what part do you think you're lacking/uncomfortable in?

past meteor Jun 3, 2023, 4:23 PM

#

The annoying information retrieval analogies

potent sky Jun 3, 2023, 4:23 PM

#

okay yeah I'm not sure how much building it from scratch will help in that, but it might

twilit arch Jun 3, 2023, 4:24 PM

#

agile cobalt for things like niji, it completely changes which model is used for things like ...

ah yeh i knew they have different models but the other ones I wonder how they handle them

past meteor Jun 3, 2023, 4:24 PM

#

the bricks make sense, you have a sequence and you're conditioning on every other element in the sequence. Multi head because each head takes into account a different piece of info

potent sky Jun 3, 2023, 4:24 PM

#

A deep dive into the math ig

#

Mhm

past meteor Jun 3, 2023, 4:25 PM

#

Maybe it's because I haven't actually used them and I've exclusively just looked at the math pithink

#data-science-and-ml

Load the PyTorch model

Define the TensorFlow model

Convert PyTorch model to TensorFlow model

Function to recognize and label food

Example usage

Convert PyTorch model to TensorFlow model

no, just no

update the weights for the links between the hidden and output layers

Loads the CSV data into a pandas DataFrame so it can be manipulated

The 5 categories that we are evaluating

Now create a new dataframe with the most valuable feautures

Assuming 'target' is the name of your target column

Split the data into a training set and a test set. This will likely be a statistic and the output of that statistic

Creates an instance of the model

Trains the model to observe accuracy

Makes predictions on the test set to see how effective it is

Print the accuracy