#data-science-and-ml

1 messages · Page 65 of 1

hasty mountain
#

Hey guys, is it possible to make a Variational AutoEncoder where the Encoder, instead of generating as output the mean and the log variance of a distribution(which will be the latent vector), instead generates the standard deviation directly?

I've been studying VAEs and I'm being quite troubled by the fact that I can't manage to make a model that works properly, so I'm trying to review some things such as the KL-Divergence for the Encoder loss, the Gaussian Likelihood for Decoder(not MSE), and my Encoder output...

For now, I've been making the Encoder output the standard deviation(or, at least, I guess I'm making it output the standard deviation) besides the mean, I've been using KL-Divergence loss for the Encoder - though I've seen that KL Divergence also has a "closed form" which is useful for multivariate dimensions - and Gaussian Likelihood for the Decoder instead of MSE(because I find the idea around Likelihood more mathmatically correct and interesting)

earnest widget
#

What is the use of global average pooling? Does it make any difference in terms of performance for image classification?

dusk tide
#

thanks

past meteor
#

You can do it but idk how well it would play around with relu

errant lake
dense crane
#

is it normal that acc drops down and then backs to growing?

#

like the question is more like should i keep it or interupt the training already?

cold osprey
#

Is that train loss/accuracy or test?

#

I would just leave it to 30 epochs and plot a loss /accuracy graph to have a look

oblique quarry
#

I've got 2 questions. Do i have to update the kernel based on backprop? if yes, how do i do that? And can i add 2 kerneloutputs togther(I mean yes i could cuz they're the same shape) but would it have any benefits? https://paste.pythondiscord.com/xizuciqego

mint palm
#

this is my model

model = BlipModel.from_pretrained("Salesforce/blip-image-captioning-base")

this model input frames of size 3,224,224 and output embedding of size 512
on one gpu it can take 40 frames (40,3,224,224).
I want to use 4 gpu so that it take 160 frames.
but i am having error in gpu parallelism implementation:

model = BlipModel.from_pretrained("Salesforce/blip-image-captioning-base")
device_ids = [0, 1, 2, 3]
model = nn.DataParallel(model, device_ids=device_ids)

after this do i need to split input? but then splitting means processing one part at a time: then we are NOT actually making it effiecient
let us assume video_data is 160,3,224,224 sized input, how to use 4 gpus now to get 160, 512 output

dense crane
hasty mountain
#

I'll see how using variance goes (if Colabs allows me). Maybe using ReLU for the standard deviation may restrict a bit the values the model can output(no negatives, for example)...idk...

past meteor
hasty mountain
past meteor
#

By the way, I just to think that having negatives in your initialization caused dead relu immediately which isn't the case. It's perfectly fine for things to be negative or exactly 0.

errant bison
#

what nueral nets can i use for ANPR?

#

or any tutorial for the same?

#

can u tell for the same but using nueral nets?

potent sky
potent sky
#

YOLO is an algorithm, that employs neural nets, yes

errant bison
#

ohh

#

so R-CNN?

hasty mountain
past meteor
hasty mountain
#

My encoder outputs a mean and a variance through a fully connected layer. So, if the FCC for the variance is followed by a ReLU activation...it could be encoded to 0

Then, I'd have a normal distribution with no variance at all...

#

Now that I'm thinking about it...perhaps this could be catastrophic...🥲

past meteor
#

In all honesty, I'm a bit out of depth here so I don't think I can help you either way. I don't really understand your problem domain. Haven't worked with VAE's specifically, just looked at the math. From that perspective I know that that's the reason why log(variance) is the output

#

I've worked a lot with regular conv autoencoders hence why I bothered wasting your time at all :/

hasty mountain
#

Exactly because I'm a bit stranger to the maths that I thought I could use standard deviation directly...but I didn't think of those things

hasty mountain
past meteor
hasty mountain
past meteor
#

Just on YouTube. Look for their channel and their DL series. It's class. Can't send you the link right now because I'm going to start my commute.

hasty mountain
#

I found it. Thanks!

potent sky
# errant bison so R-CNN?

YOLO can use plain CNNs
For object detection YOLO is faster and more efficient than R-CNNs, Fast-RCNNs and Faster-RCNNs

dense crane
#

how can i open the .ipynb file stored in local machine with the data also in local machine (now i moving it into a google drive) in the google colab ?

agile cobalt
#

move both the file and the data to google colab

#

the code runs on their servers, and access files on your Google Drive

past meteor
#

What I used to do is version control a folder on my local machine and clone my repo on google colab

#

It's a lot of overhead initially because you need to make a token on git etc. but it makes move between your local machine and colab a lot smoother

placid cedar
#

hi again

#

apparently chatgpt says i should do the numerical variable transformation before splitting the data...

#

is it ok to do so before train test split?

past meteor
#

What do you understand as "numerical variable transformation"

agile cobalt
#

generally speaking you must apply any transformations you make to the data to both the training and the test groups

placid cedar
#

since im doing a linear regression model, i have to ensure that my numerical variables have to follow a linear regression line by using the Q-Q plots

hasty mountain
placid cedar
#

as far as what i was taught to do

past meteor
#

So you mean like a log transformation?

placid cedar
past meteor
#

Or box-cox, ...

placid cedar
#

log

past meteor
#

Are you using sklearn?

placid cedar
#

yep

#

im allowed to use any libraries in fact

past meteor
#

So you should make a Pipeline that does a log transform before applying the regression

#

Tbh a log transform doesn't depend on the data in your train or test set but honestly if you're beginning I'd really try to keep them 100 % separate

placid cedar
#

i actually really wanted to see the best mse and r square test results i can get based on my trial and error, testing out all the different methods

#

so i was firm with splitting the data into train and test

#

then i do my transformation

#

to find the best results or smth

past meteor
#

Split -> QQ plot -> log/box cox / ...

dense crane
placid cedar
#

so far i progressed quite a lot actually, i handled my outliers, imputation complete, encoding done as well

past meteor
#

If you use Pipeline you ensure the same transformations are applied on train and test as well, like etrotta said

placid cedar
agile cobalt
#

check the docs

placid cedar
#

because everyone in my class is competing to get the best mse and r square scores LOL

past meteor
past meteor
#

If you see that the error is not normally distributed with respect to a variable then you can start thinking about log transformations etc.

agile cobalt
past meteor
#

It's hard to explain tbh but this is generally a solid method

dense crane
#

ok thx man

placid cedar
#

how wld i know what's the best transformation tho

#

my teacher always tells us abt the q-q plot, in which u shld get the most data points on the linear regression line

past meteor
#

Imagine if your x-axis is time and your y-axis is the error (on the test set), if the error is increasing along time then you can consider a log transform for example

placid cedar
#

aite will look into it

past meteor
#

For example, I had a demand forecasting dataset once and I noticed that the error exploded in early April. Your residual plot "tells" you you're not modelling the impact of Easter accurately

dense crane
#

is it noraml that reading data to google colab takes 7 min while the same opearation in vs code was taking like 1 min

spiral inlet
potent sky
potent sky
potent sky
dense crane
#

the bunch of .jpeg files

#

@potent sky

potent sky
#

if they're on the colab runtime storage then no, they shouldn't take that much longer

dense crane
#

i have the data on g drive and i did mount the gdrive

potent sky
#

then again, it shouldn't take that much longer. Unless you have too many files in your gdrive

#

tbh that shouldn't make it much slower after mounting either

dense crane
#

i mean this have like 3GB, where i always take only some part of it like 25% and the diff is that when i am doing the same operation with vs code it doesnt take thhat much time

#

like at most 2min while in colab it would be around 10-15 min

potent sky
#

Are you just talking about loading in the data or are you performing some processing on it too

dense crane
#

both

#

i m loading the images and then tensoring them

potent sky
#

Compare just loading times for both

#

Also why are you loading the entire dataset into memory at once

dense crane
#

because i m training the model

#

like not the whole

#

but like 5k images

#

to train a model

#

and the whole dataset have like 30k

potent sky
#

You can retrieve on the fly

dense crane
#

not sure if i understand

potent sky
dense crane
#

make sense

potent sky
#

That's what you do when dealing with huge datasets that you just cannot load into memory all at once
But loading into memory should make things faster later

dense crane
#

but in general it would take the same amout of time because i m using all loaded data in training

#

ok i will consider it but still dont know how this would cut the time spend on loading the data

potent sky
#

Because it wouldn't load all the data in at once but only as it's required for training

#

It would take overall more time (over multiple epochs) because loading from disk is slower

#

Ignore this part it's not very relevant rn

potent sky
dense crane
#

ok thx anyway

lapis sequoia
#

Guys. What are some dashboarding tools that can get connected to python environments but be modified in a interactive way

hasty mountain
#

(Yes, I just noticed the normalization thing now and because of that...so this also explains why normalization isn't the same as scaling...)

#

Also, I was using tanh as final activation function for my decoder... py_guido

hasty mountain
#

Yes...I guess that's exactly the problem...they aren't.

junior stone
serene scaffold
#

@junior stone please don't "drop and run" links. if you think this link is interesting, say why. if you're just promoting it, please remove it.

junior stone
#

oh ok this tool I found is interesting coz its lit allow you to search thourgh Youtube Video through the power of AI

tidal bough
#

This AI Tool will Change the Way We Watch YOUTUBE Videos FOREVER ! ...
^video titles I'll never watch

junior stone
#

you can index to relevant parts of any video

#

using AI

#

by searchin through meanin

errant bison
#

how can we train the model based on images?

junior stone
#

Neural search

#

I answered u AI is vague ur right. Neural search

night kernel
#

does anyone know about commercial licensing with open sourced llm's like llama and alpaca?

#

looking to make my own chatbot based on a small amount of text and use it commerically, ideally without using cgpt

serene scaffold
merry stone
#

im planning on implementing some sort of food classification in my app, now I dont wanna manually label the dataset and train my models right, there should be existing ones that perform really well? i tried object detection using keras inception v3 but it doesnt work for all sorts of food only simple stuff like banana. Theres the food101 dataset thats split into training and testing but do i have to download it to train my own model on my pc or is there an existing one i can use? if so, how? sorry im a newbie

agile cobalt
merry stone
#

so im a noob at ml in general

#

but basically lets say i have 100 images then i split them into training and testing sets right? then for training i would label the food and each category is basically a food item right?

agile cobalt
#

pretty much yes, but you need to have at least a handful of labelled images of each category

merry stone
#

so thats when the food 101 dataset comes in right? it says it has 100k ish labeled images

agile cobalt
#

the issue is whenever the labels they use are fit for your problem or not

merry stone
#

so it should be better than object detection right

#

since this is for food specifically

agile cobalt
#

using a pre-trained model can bring down the number of images you need from hundreds if not thousands per category to (in nearly a best case scenario) a dozen or so per category, but you still need of a bit of data

cold osprey
#

depends on ur problem

merry stone
#

also i have a really dumb question but how are the models like stored

agile cobalt
#

if it is entirely new and completely unlike anything the model was pre-trained on, yes
if it is similar enough to the data the model is already familiar with, no

merry stone
#

like when I train a model how is it used like is it a text file locally or something

agile cobalt
#

a model is just a crap ton lot of weights + some meta data about how to use these weights
they are stored in a custom file format, but are stored in files just like a text file would be stored

merry stone
#

hmmm so when does tensorflow come into play for this

agile cobalt
#

it knows how to use parse that meta data into instructions your computer can follow, as well as some other stuff useful for creating the model in first place

merry stone
#

ahh gotcha

#

so for fine tuning

#

how much work is it usually

#

like you have to mess with hyper parameters or something right?

agile cobalt
#

less than creating from scratch, but still a fair bit

#

depends on which model you use as the base

merry stone
#

hmmm okayy

cold osprey
#

model choice will depend on how u plan to deploy ur model

merry stone
#

so if i steal someones model then i wouldn't have to deal with training or using the dataset at all right

cold osprey
#

a model living in a mobile device would generally be smaller/weaker than one living on the cloud e.g.

agile cobalt
#

fine tuning is still training, just on a smaller dataset

#

but if you do not even fine tune, you do not have to worry about training, though you'd be locked to the output of the original model

merry stone
agile cobalt
#

💀

merry stone
#

can it still use the model file from the repository

agile cobalt
#

you know that github pages cannot run python code at all right? it only serves static pages (html)

merry stone
#

oh really

#

shit

agile cobalt
#

I mean, you can use tensorflow.js or pyscript, but it gets a bit complicated and you'll need to use a small model

merry stone
#

so if i want to host a web app

cold osprey
#

flask app?

#

vercel can host flask apps

merry stone
#

hmmm flask with python backend right

cold osprey
#

ive not hosted one with a ml model yet tho

#

its on my to do list

#

hosted on vercel

merry stone
#

it should work right

#

normally

cold osprey
#

should

merry stone
#

for the model is it just one .bin file thats it

agile cobalt
#

some models are too large to fit in most common hosting providers unless you pay $$$$$, but you should be able to find one small enough for your use case to host for cheap, maybe even fit on free tiers

#

there's also the option of just using an API like Hugging Face's Inference API instead of actually hosting the model, specially if you do not plan to fine tune yourself

merry stone
cold osprey
#

just imagine normal API

merry stone
#

oh

#

so i wouldnt have to download the model right

agile cobalt
#

yes, if you can find a model whose outputs are fit for your use case

merry stone
#

ohh

#

whats a model class

cold osprey
#

like a normal class

#

but representing a model

merry stone
#

how do i know what that is tho

#

im using chatgpt for help and this is what i have so far is this looking okay: ```py
import torch
import torch.nn as nn
import torchvision.transforms as transforms
from PIL import Image
from keras.utils import img_to_array
import numpy as np
import tensorflow as tf
from keras.applications.inception_v3 import preprocess_input, decode_predictions

Load the PyTorch model

pytorch_model = YourModelClass() # Replace with your PyTorch model class
pytorch_model.load_state_dict(torch.load('pytorch_model.bin'))
pytorch_model.eval()

Define the TensorFlow model

class TFModel(tf.keras.Model):
def init(self, pytorch_model):
super(TFModel, self).init()
self.pytorch_model = pytorch_model
self.softmax = nn.Softmax(dim=1)

def call(self, inputs):
    inputs = tf.convert_to_tensor(inputs)
    inputs = tf.transpose(inputs, [0, 3, 1, 2])  # Transpose image dimensions
    inputs = preprocess_input(inputs.numpy())  # Preprocess image
    inputs = torch.from_numpy(inputs).float()  # Convert to PyTorch tensor
    outputs = self.pytorch_model(inputs)  # Forward pass
    outputs = self.softmax(outputs)  # Apply softmax
    return outputs.detach().numpy()

Convert PyTorch model to TensorFlow model

tf_model = TFModel(pytorch_model)

Function to recognize and label food

def recognize_food(image_path):
img = Image.open(image_path).convert("RGB")
img = img.resize((299, 299))
img = img_to_array(img) / 255.0
img = np.expand_dims(img, axis=0)

preds = tf_model.predict(img)
decoded_preds = decode_predictions(preds, top=1)[0]

food_label = decoded_preds[0][1]
confidence = decoded_preds[0][2]

return food_label, confidence

Example usage

image_path = 'banana.jpg'
food_label, confidence = recognize_food(image_path)

print(f"Food: {food_label}, Confidence: {confidence}")

agile cobalt
#

Convert PyTorch model to TensorFlow model

tf_model = TFModel(pytorch_model)

#

no, just no

merry stone
#

what does it mean

#

ohh

#

pytorch and tensorflow are different right

agile cobalt
#

yes

merry stone
#

i forgot

#

so i need a tensorflow model then?

agile cobalt
#

you need of a model that matches the library you are using

#

I recommend taking some time to read the documentation and/or some tutorials before you try anything else, and do not trust chatgpt if you cannot discern if it's output makes sense

merry stone
#

hmmmm

#

so ig read up on pytorch first

#

then try

agile cobalt
#

good luck

merry stone
#

thanks for the help!

potent sky
#

There should be a chatgpt meme page

merry stone
#

how can i find the class labels of the model

potent sky
merry stone
potent sky
merry stone
#

shit this is hard then lol

#

it doesnt have banana :(

plush jungle
#

can someone help explain attention to me? I understand that the goal is to get a matrix that maps every value in one vector to every value in another and in each cell of the matrix you get a score. but in the transformer architecture, what does this matrix actually represent?

potent sky
#

Yes, those are the only items it recognises because that's what it's been trained to do.
Ofcourse if you run it on things it doesn't recognise it'll predict the closest thing it figures

merry stone
potent sky
#

Yes

#

Because that's all it "knows" that's all it's been trained on

merry stone
#

but is there really no universal food detection ml model then

cold osprey
#

u could use these pretrained models to create some sort of feauture map

#

and use in in (iirc the attention is all you need model)

potent sky
cold osprey
#

feature embeddings or smth

merry stone
cold osprey
#

no idea, google

merry stone
#

because for object detection theres pretty accurate universal models right

cold osprey
#

doubt there is coz food items is virtually infinite

#

like u can always break it down to more and more specific food items

potent sky
# merry stone but is there really no universal food detection ml model then

No but you can build one with all the classes you need.
You can use a pre-trained model as a feature extractor.
Remove it's classification head (final layer)
Attach your own final layers with 102 classes instead of 101 (including banana).
Freeze the previous feature extractor layers (to prevent something called catastrophic forgetting)
And then train on a small dataset of bananas so it learns to recognise these

cold osprey
#

catastrophic forgetting TIL

merry stone
#

isnt that like manual labor

potent sky
cold osprey
#

if u have the classes u need in a list, shud be done with a few lines of code imo

potent sky
merry stone
merry stone
cold osprey
#

ull need images for each class for sure

#

how many is model dependent

merry stone
#

should I just use this model and skip banana then

potent sky
#

Also you don't need to keep these 101 food items
You can completely get rid of them and have your own 50 items or something
But then you will need a suitable dataset of these 50 items to train the model on

merry stone
#

im just trying to add a project in my resume :(

#

idk if this is even a good one

potent sky
merry stone
#

i thought a food recognition/nutrition info/calorie app or something

cold osprey
potent sky
merry stone
#

it's not a course tho

#

i just want to replace a cad project thats wasting space in my resume

potent sky
merry stone
#

but when it comes to ml i feel like i never learn

#

like I TAed for a ml course but i still feel so lost

potent sky
cold osprey
merry stone
#

because Im used to like C programming

potent sky
#

Takes time. It's a vast field. And it lies at the intersection of so many other fields. You have to know so much to be confident

merry stone
#

so when i do these like super shortcuts in python where u can solve like huge problems in 5 lines of code i barely understand whats going on

merry stone
#

i took two classes

#

im a sophomore tho so i dont know shit :(

potent sky
#

Probability, information theory, statistics, AI basics, statistical ML, then deep learning

merry stone
#

nah i only took a data science/ml intro class and a data mining class

#

and linear

potent sky
merry stone
#

yeah like i want to be able to grasp the whole thing i feel like im missing out there

#

like ive worked on this complicated nlp project with a team that was a success but i didnt understand shit cause i just googled to make the code work

#

but i didnt understand anything overall

potent sky
#

If you urgently need a new project then there are video courses on YT like 4-5hrs long that build a project live so you can follow along.
Otherwise I'd really advise start with the fundamentals, and bite sized simple projects (so that you can see some tangible result)

merry stone
#

hmmm

#

the main goal of this was to really get comfy with the web app part

#

so i wanted to take a shortcut on the ml side

potent sky
#

Wait your primary internet is web dev or ml

#

Ah I see

merry stone
#

idk like a full web app with backend data base etc

#

but every webapp project of mine turns into front end lol

potent sky
# plush jungle can someone help explain attention to me? I understand that the goal is to get ...

Transformers are the rage nowadays, but how do they work? This video demystifies the novel neural network architecture with step by step explanation and illustrations on how transformers work.

CORRECTIONS:
The sine and cosine functions are actually applied to the embedding dimensions and time steps!

Audo Studio | Automagically Make Audio Reco...

▶ Play video
#

For the original attention paper (Bahdanau et. al. 2015) I really think just reading the paper is your best bet

worthy phoenix
#

any good vids to getting started with pytorch or tensorflow? cuz ig if i learn one framework the weights and biases should not be that hard to transfer to the other one

plush jungle
#

ok I guess here's the main source of my confusion with attention. for self attention for example, you get n layers, and m heads, and each head calculates an attention matrix:

but then those matrices gets passed through a linear layer and then a feed forward layer? am I correct in thinking that self attention learns relationships between tokens, and then the feed forward layer learns which of those relationships are important?

potent sky
# merry stone idk like a full web app with backend data base etc

It depends on what you're looking to do. If you're just looking to learn and demonstrate how to integrate ML models into a web app, then picking an existing pre-trained model and just limiting to what it does should be sufficient
If you're looking to get a good grasp of how ML works then you will have to put in the work for that

merry stone
worthy phoenix
merry stone
#

but then again idk how to use the food model either lol

worthy phoenix
#

i can read that here and there but i first have to get the core understanding of the framework then when i get stuck i can lookup the docs

#

i dont mind that

potent sky
plush jungle
potent sky
potent sky
merry stone
potent sky
#

And to learn a better representation of these relationships

potent sky
#

Really depends on what you're trying to learn

plush jungle
potent sky
# plush jungle then what's the point of the feed forward layer
potent sky
#

*later

#

The encoder's task is simply to give you self attention vectors for each of the tokens

#

The feed forward helps you use these however you'd want to

plush jungle
#

in CNNs, each layer seems to be doing a different task. earlier layers break down lower level features, and later layers break down higher level features. but I don't have a sense of what anything is actually doing in transformers

#

for example, why is there a linear layer at the end here:

#

what does it do

potent sky
#

In a way, learns what to do with all the different outputs from the different attention heads.
You've got the self attention from n attention heads but what do you do with them now?
All of them give different possibly useful representations of your input, but how do you learn what to do with these representations and map it to an output you want?
That's where the feed forward comes in

plush jungle
#

ok what about the linear layer inside the attention block

#

oh!

#

is it to decide which head is important?

magic dune
#

hello

plush jungle
#

since all the heads apparently go into one single linear layer

plush jungle
magic dune
#

it is very simple

#

but I think I really polished it

cold osprey
#

q: why does inference speed differ depending on the image being sent? assumming same model and hardware being used

magic dune
cold osprey
#
0    data\pizza_steak_sushi_20_percent\test\pizza\1...    pizza    0.9987    pizza    0.6317    True
1    data\pizza_steak_sushi_20_percent\test\pizza\1...    pizza    0.9957    pizza    0.3714    True
2    data\pizza_steak_sushi_20_percent\test\pizza\1...    pizza    0.9987    pizza    0.4315    True
3    data\pizza_steak_sushi_20_percent\test\pizza\1...    pizza    0.9869    pizza    0.3576    True
4    data\pizza_steak_sushi_20_percent\test\pizza\1...    pizza    0.9698    pizza    0.3697    True```
plush jungle
cold osprey
#

maybe some images are larger? hence taking longer

#

preprocessing time may be longer

magic dune
#

125x125 vs 512x512

cold osprey
#

hmm lemme see

#

ah ok larger images take longer

plush jungle
#

also what do values do in the attention(key, query, value) setup?

#

since Key dot Query = attention score matrix

potent sky
# plush jungle since all the heads apparently go into one single linear layer

Okay so the first set of linear layers learn to obtain the Q, K, V matrices from the input. Each self attention head ideally learns something different about the input - provides a different representation (maybe you can crudely compare this to how different filters learn to obtain different features from the input in CNNs)
The next linear layers then combines the outputs of all of these different attention heads into something more useful

The final Feed forward along with Add/norm after the multi head attention learns a more meaningful representation of this output (it's a parameterized processing unit, it's going to learn a mapping from the attention output to the output desired, and so learn to transform it into a richer representation space)

#

Hope that clears things up a bit. I'm in a meeting so can't attend here constantly now (pun intended)

potent sky
magic dune
#

!passte

#

!paste

arctic wedgeBOT
#
Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

magic dune
plush jungle
# potent sky It's a little analogous to retrieval systems if you're familiar with those

the video you sent also made that analogy, but even though I'm familiar with retrieval systems I really don't get it. in a retrieval system, a key and a query get multiplied to produce a value, and the key with the highest value gets selected as the top search result. here, in the case of a single head, the key and the query are two vectors we're comparing. dot producting them gets us a scalar, right?

#

and then that scalar gets multiplied elementwise by a Value vector?

past meteor
#

There's books like dive in to deep learning that have chapters on attention so you could read those ad well

plush jungle
#

the frustrating thing is that every resource, paper, and book I read leaves me with more questions

past meteor
#

Then you gotta keep digging. For me going to the earliest works nearly always helps

past meteor
#

Yup

plush jungle
#

the word attention only appears 3 times in that and "value" doesn't appear at all

#

are you sure that's the right one

past meteor
#

3.1 is what attention is.

#

Alpha i,j is the attention weight

plush jungle
#

how can alpha be a matrix of weights if it's being used here as a function:
a(si−1, hj )

#

or is that supposed to be indexing?

#

like this
a[si-1][hj]

potent sky
potent sky
potent sky
plush jungle
#

let's say you have a vector of english tokens and a vector of french tokens:

#

we would make the english the keys, and the french the queries, to start, right? using the Wk and Wq times the english tokens and french tokens respectively?

#

then dot producting those vectors would gives us a scalar though, right? so how do we end up with a matrix here

dusk bear
#

hey guys!
myself Uttam. I am new to ML and DL but have a very keen interest in learning it. So, recently i saw a satellite image planes detection dataset and I wanted to do it using CNN only. While finding i found this person used RCNN https://github.com/1297rohit/RCNN/blob/master/RCNN.ipynb
but he converted that into 4D values and sent into CNN layers and trained them. But as i work on google colab i couldn't have that much RAM to convert all the data into 4D and train the model. Guys, so can anyone pls help me out like are there any other ways to directly train the model like directly sending the photo to model or stuff?
Sample image attached..

GitHub

Step-By-Step Implementation of R-CNN from scratch in python - RCNN/RCNN.ipynb at master · 1297rohit/RCNN

placid cedar
#

hi, just a quick question

#

im doing discretisation now for my linear regression model, but im unsure of which to use

#

equal width discretisation, equal frequency, equal frequency plus encoding

#

which is better for a linear regression model?

errant bison
placid cedar
#

actually nvm i will js experiment on all methods for 3 of my numerical variables xD

low beacon
#

Hi !
I was wondering how the journey is to become a data scientist?
So far my GitHub portfolio only has my thesis should I add more projects?
I have a bachelor degree in IT and my thesis was actually about sentimental analysis using Python and its libraries and Jupyter notebook. So far I am working as a project manager for a year now but working with data has always been on my mind because there are endless possibilities with data and it was so exciting to work with.

merry stone
#

how much work is fine tuning a model for a beginner

#

like if i were to start with some image classification model

#

and tune it for food classification just accurate enough for a college sophomore project

#

how much work would i have to do

cold osprey
#

Not much imo

#

Import model

#

Change fully connected layer

#

Freeze layers

#

Train, wait

#

Dones

merry stone
#

whats a good number of images to train and test

cold osprey
#

Depends on ur model, trial n error ig

#

Use less and increase as needed

merry stone
#

so lets say the best image classification pretrained model in the world

#

can i add on the food101 dataset which has 100k food images split into training and testing

#

so i dont have to manually label

#

like can i add it on top to have additional classes

cold osprey
#

U can just use a model that's been trained on food101 instead then

merry stone
#

true

#

but food101 doesnt have banana :(

cold osprey
merry stone
#

it has complicated dishes but not like simple stuff

#

whats the best approach

#

if i want to create a food recognizer

cold osprey
#

Find banana images and train?

merry stone
#

but i need like all sorts of food right

#

but what are the steps usually

#

lets say i want to add banana class

#

where do i train

cold osprey
#

U can just train on banana images, add an extra output class for it

#

Using a pretrained model

#

Not sure if u need to train on the other images again or not

#

I wouldn't think so since it's already trained

merry stone
#

hmmmm

#

okay ill try

#

so i can like use a fruits dataset then right

cold osprey
#

Yeah

potent sky
#

Freeze the layers

potent sky
plain rose
#

Hmm can anyone explain the mathematics behind gblinear method of xgboost exactly. Like there are many vague one on internet but I would like an exact one with example. And also how can combination of linear model do better than single linear model.
Thank you in advance.

past meteor
plain rose
#

Yaa but the idea of adding multiple weak leaner which is in case dt, how can this be extended to linear model as I can't think of weak linear model and how to combine them as if I combine them linearly then it will result in linear model only which is not best for sure as we have exact solution for linear regression

past meteor
#

You fit a linear model on the data, you replace y with (y - y_hat) and you continue. There's proofs that each model you add in this way moves you towards the minimum of the loss

plain rose
#

But how can this be better than a single linear model as the sum of these model will also be linear and we already have an exact solution for linear model.

#

This is not true for dt as sum of dt is not dt

past meteor
#

There's still the bootstrapping and feature subsetting you're doing

#

But this is a good question to be honest 🤔

plain rose
#

But I doubt will it be any better than single model

#

Like I searched whole internet but not a single article for exact details of gb linear

past meteor
#

I wouldn't be able to say it's better or worse than fitting a single model but it's at the very least different

plain rose
#

Hmm but in last it will be linear right?

#

So for sure it can't be better.

past meteor
#

The final model will be a linear model but it may be a better linear model than just doing regular OLS

#

It's just a different way to fit elastic net (mix of L1 and L2 reg)

#

The bootstrapping and feature subsetting make it overfit less than elastic net but yeah, whether or not it's worth the hassle is another discussion (probably not)

plain rose
#

I see thanks and please ping if you find any article or you yourself were able to show the eaxct implementation of gb lineat

past meteor
#

You can just read the source code?

plain rose
#

Because something is going as gblinear with 1 round is not same as single regression so I doubt they are doing something different

plain rose
plain rose
# past meteor What do you mean?

I thought that in 1 round we predict the target with normal model and then we keep on using these to predict the residual but that is wrong.

past meteor
plain rose
#

But for feature selection I set it to 1 i.e. to select all the feature by using the hyper params

past meteor
#

Then the main difference is the fact you're bootstrapping

plain rose
#

No that I also set as 1 that is use the whole sample

#

There are 2 parameter 1 for feature and 1 for samples

past meteor
#

Do you have the same amount of regularization in both cases?

plain rose
#

Yaa i set regularizer params to 0

past meteor
#

And with what are you comparing it with sci-kit? What specific algo?

plain rose
#

I want to just understand xg boost with gb linear

#

Like for comparision i am comparing it with linear regressor

#

But i got widely different results

past meteor
#

Yes but what implementation of linear regression?

plain rose
#

Scikit learn onlt

#

Like linearregressor of scikit learn

past meteor
#

Then I have no idea. Is the code something you can paste so I can have a look?

plain rose
#

Hmm that I can't lemon_pensive but thanks

merry stone
#

can someone guide me through loading a pretrained model from a .bin file in pytorch

#

i just cant figure it out 😭

#

or can i use a model hosted on hugging face or somethign

rose dagger
#

Anybody got any reading recommendations for modern ML/DL models for image segmentation & image classification? Would be great if it included a little bit about data augmentation and feature engineering as well

mint palm
#

wanted to include AWS, now that i have been learning s3, athena, sagemaker
should i remove "interest"? and add all those aws tools like s3, athena sage etc?
should i add "cloud" section?
should i just let it be like this and add just "aws" in tech section?

merry stone
#

isnt the interest line kinda useless

#

unless you have space

potent sky
# plain rose But how can this be better than a single linear model as the sum of these model ...

Exactly, yes. Even feature subsetting with "weak" linear models can be argued to finally give a linear combination of those as one single linear model.
And the best solution to the fitting problem could've been directly obtained by the regression equation. So why go for GB linear models at all?
One possible defense of boosting here can be that sometimes you don't want the line of absolute best fit. You don't want it to fit that well (regularisation)
Gradient boosting multiple weak linear models would allow you to control this regularisation not just in quantity but also in which feature subsets are more affected.
But idts it makes much more sense than ridge or elastic

#

Fun question ngl

potent sky
potent sky
#

You might have a config file as well confjg.json in which case there should be accompanying documentation on how to use it

potent sky
potent sky
past meteor
#

Mentioned it a few times but due to the subsetting and bootstrapping you couldn't just fit the same model with OLS, ridge, lasso or elastic net. Whether that matters or not is a different question haha

merry stone
potent sky
#

What's the issue/error?

rose dagger
placid cedar
#

heyyo

#

im finally at my final step for my assignment

#

which is feature engineering

#

i have learnt 5 methods of doing so. should i either, use one best method, and apply it to all my variables. or, find the best method for each variable

placid cedar
#

any help and advise wld be nice 🙂

#

and thanks for all the help from everyone here so far eheh

warped leaf
#
def get_keys(obj):
    keys = []
    if isinstance(obj, dict):
        for key in obj.keys():
            keys.append(key)
            keys.extend(get_keys(obj[key]))
    elif isinstance(obj, list):
        for item in obj:
            keys.extend(get_keys(item))
    clean_list = list(dict.fromkeys(keys))
    return clean_list 

print(get_keys(data))

for reading nested json files, i made this function that gets all the keys inside a list to make it more readable, is there a way to edit this code and get more info about if the keys are inside lists or dictionaries?

agile cobalt
#

which kind of info exactly?

#

kinda offtopic for this channel though, just use #python-discussion or #1035199133436354600 for that kind of stuff
also if

clean_list = list(dict.fromkeys(keys))
is just to remove duplicate, use set() instead like list(set(keys)) or sorted(set(keys))

potent sky
#

But that paper seems like a good background survey for pre-2020

#

Lookup Maskformer, Mask2Former, TrackFormer, Swin, Oneformer, Segformer, DETR, Denoising DETR Anchor Boxes, MaskDINO, DINO self supervised, SAM off the top of my head

#

There're many other interesting papers

dusk tide
#

Hello, I am practicing data cleaning on Movies dataset (dataset link here https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset . There are around 46466 rows and 24 columns* ('adult', 'belongs_to_collection', 'budget', 'genres', 'homepage', 'id',
'imdb_id', 'original_language', 'original_title', 'overview',
'popularity', 'poster_path', 'production_companies',
'production_countries', 'release_date', 'revenue', 'runtime',
'spoken_languages', 'status', 'tagline', 'title', 'video',
'vote_average', 'vote_count')* . But there are a lot of NaN values in budget and revenue columns around 36K. So how to handle these values?? . I had one idea to calculate the average of budget/revenue of movies based on genre (in a time span of five years). For eg, Average of Action and thriller genre movies from 1990 to 1995 and replacing with NaN values occuring in these years. Is this the right way?? Can anyone suggest any more idea of how to do this??

past meteor
potent sky
# past meteor How fast (inference time) are these transformers? A lot of segmentation pipeline...

Haha very interesting question. This was a major concern for our work too.
Oneformer is at one end of the spectrum where it's abysmally slow, 2:30minutes on a T4 GPU, and 3:30minutes on CPU for a single image ;-;
(Though I was inclined to humor it since the paper is really interesting and it defines a unified segmentation problem...good stuff all in all)
At the other end, for high accuracy models there's MaskDINO, DINO self supervised and SAM giving around 6-7s per image. For slightly lower accuracy we got upto 0.16s per image but that's sorta under NDA
The Segformer etc. lie somewhere in between
I haven't yet experimented with DINO self supervised myself

#

I might be missing smtg since it has been sometime

past meteor
#

2 minutes? Christ. We ended up using something from mediapipe which worked just fine for us.

I guess in some domains like medical imaging it's perfectly fine

potent sky
#

What was the mIOU?

#

For our use case we really needed very high accuracy

dim jungle
#

📢 Mark your calendar for our upcoming thrilling workshop on LangChain organized by ADaSci! 🗓️🌟

🚀** Mastering LangChain: A Hands-on Workshop for Building Generative AI Applications** 🚀

🗓️ Date: 17th June 2023
⏰ Time: 10 AM Onwards
📍 Location: Online / Virtual

Unleash your creativity and explore the power of generative AI with LangChain! 🤖✨

🔹 Create LLM-powered applications across various industry domains
🔹 Build and deploy generative AI-powered agents for real-world scenarios
🔹 Develop custom applications like chatbots and web agents using company-specific data
🔹 Learn step-by-step with practical examples and collaborate with fellow enthusiasts

Don't miss this opportunity to revolutionize industries with innovative, intelligent, and personalized applications! Secure your spot now!

Visit for more details and registration: https://adasci.org/product/mastering-langchain-a-hands-on-workshop-for-building-generative-ai-applications/

Attend the hands-on workshop on LangChain and learn how to build LLM powered generative AI applications for industries in very simple ways

potent sky
past meteor
#

Yeah, for us that was fine. It was related to people. High accuracy wasn't a requirement either

potent sky
#

True, there's always the trade-off between accuracy and speed. For us high accuracy was very important

#

On the upside, we got to do some really exciting research haha

bleak crown
#

Does anyone here have experience with the tensorflow data api that could help answer my question?
https://stackoverflow.com/questions/76378415/how-can-i-batch-a-tensorflow-dataset-without-loading-all-the-data-into-memory-si

Generator expressions didn't work either

wooden sail
#

you can take a look at tf and keras's dataset objects

#

with these you can specify a directory to be consumed

bleak crown
#

Yeah but the issue is preprocessing the output (text) files. Should i just preprocess every text file beforehand, and write it?

wooden sail
#

you can preprocess them as they are loaded

#

as is usually done with images. ideally you don't edit the originals

#

this is of course more expensive, i know, but it's the sanitary approach, let's call it

bleak crown
#

I'm just confused on how to preprocess the output (y) data. For the input, I'd obviously just use a preprocessing layer. But how do I put the output through such a layer so that the loss functions and what not work correctly

#

I'm pretty new to tensorflow and just don't quite understand how you can do that

tribal holly
#

Well...

wooden sail
#

or you mean y as in the (x,y) pairs shown to the network, not y as in the output of the network?

bleak crown
#

Yeah, the y_true, not the y_pred

#

I want to tokenize (or in this case vectorize) the y aspect of the input output pairs shown to the model. Or would I just write a custom loss and do it there?

wooden sail
#

you can also apply functions to the data before feeding it into the network

#

notice that things like a "preprocessing layer" are really just functions

#

you can perfectly well apply a preprocessing layer to y without having it be part of the network that processes x

bleak crown
cold osprey
#

!rule 3

arctic wedgeBOT
#

3. Respect staff members and listen to their instructions.

wooden sail
cold osprey
#

noisy name rule

wooden sail
bleak crown
#

I'm curious as to why it is crashing then. Because my batch size is 32, each audio file is about 4mb, and the text files are only a few words. I have 16gb of ram. But the map function i have should only be applied to the current batch?

wooden sail
#

that i don't know, i haven't seen your code 😛

#

if you use the keras or tf dataset classes, those load only a small chunk at a time

#

if you manually did something else, you might be running out of mem

bleak crown
#
def build_dataset(self, path: str = "./dataset/train-clean-100") -> Any:
    audio_files = []
    text_files = []
    for root, dirs, files in os.walk(path):
        for file in files:
            if file == "audio.wav":
                audio_files.append(os.path.join(root, file))
            elif file == "text.txt":
                text_files.append(os.path.join(root, file))
    audio_dataset = tf.data.Dataset.from_tensor_slices(audio_files)
    audio_dataset = audio_dataset.map(
        self.preprocess_audio, num_parallel_calls=tf.data.experimental.AUTOTUNE)

    text_dataset = tf.data.Dataset.from_tensor_slices(text_files)
    text_dataset = text_dataset.map(
        self.preprocess_text, num_parallel_calls=tf.data.experimental.AUTOTUNE)

    dataset = tf.data.Dataset.zip((audio_dataset, text_dataset))
    return dataset

def preprocess_audio(self, audio_path: str) -> tf.Tensor:
    audio = tf.io.read_file(audio_path)
    audio, _ = tf.audio.decode_wav(audio, desired_samples=20000)
    return audio

def preprocess_text(self, text_path: tf.Tensor) -> Any:
    texts = tf.io.read_file(text_path)
    tokens = self.vectorizer(texts)
    return tokens
#

I'm just calling processor().build_dataset().batch(32) and it crashes. However when i don't batch it runs fine.

#

Obviously without a batch index which is a different issue that could be easily fixed, but it accepts the input up until the lstm layer where it gets upset there isn't a feature index

#

Vectorizer is pre-fitted

wooden sail
#

hmm i don't see anything strange at a glance

lapis sequoia
#

give me the best tensor flow course link

#

NOW

bleak crown
#

💀

#

Well I'm gonna go to sleep it's nearly 2am. I'll try again tomorrow. Thanks for the help :). I may be back tomorrow if i can't fet it figured out

cold osprey
lapis sequoia
mint palm
#
from transformers import BlipModel
model = BlipModel.from_pretrained("Salesforce/blip-image-captioning-base")
device_ids = [2, 4, 5, 7]
self.blip = DataParallel(model, device_ids = device_ids).to(torch.device('cuda:2'))

video_data.to(torch.device('cuda:2'))
video_features = self.blip.module.get_image_features(video_data)

even after this i am only able to have batch size as big as one gpu supported earlier. It should support 4x batch size cuz i am giving 4 gpus. Where is the error?

earnest widget
#

Is it common for validation loss to slow down when decreasing? Could it be the learning rate being too low or high? This is for Mobilenetv3large model.

native umbra
#

is there is internships for AI/Data science for Beginner level?(I can not find any)

icy anchor
#

Hi, does anyone here have experience with Puppeteer and BrightData's scraping browser?

placid cedar
#

hi all

#

for feature engineering, is it done on only numerical variables. Or, is it done on categorical variables that have been encoded and numerical variables, basically the whole dataset

#

at the final step for my assignment 🥲

raw compass
#

how do I extend an already existed well-known language model with some kind of new information?

cold osprey
mild dirge
#

Giga float operations per second, it is still a measure of computing power in some form. Namely some operations can be done quicker than others, because it is heavily parallel for example @cold osprey

#

Or they meant FLOPs which is not giga float operations per second but just giga float operations

cold osprey
#

ye so like Eff Net B2 for e.g. has a GFLOPS value of 1.09, what does that mean compared to Eff Net B1 of 0.69

#

oh

#

so just how much computation is needed?

mild dirge
#

But they wrote GFLOPS capital, which is confusing

#

I would guess they actually mean FLOPs, so not per second

#

just the number of operations

cold osprey
#

makes sense if its a measure of how much computation is needed say for model training or inference to gauge how much hardware/time is needed

past meteor
#

Says more than just listing the number of parameters because of what PcCamel said

cold osprey
#

ait thanks got it

#

so its gflops and not gflops per sec, thats what confused me

placid cedar
#

hi

#

do u think its viable to handle outliers in the target variable, such as winserisation?

lapis sequoia
sterile wyvern
#

I ran a method 2 times to train and test it (in and out of sample)
Im wondering do I run the method i built a 3rd time to forward test.

crimson summit
#

I am confused on how these two lines of code are working ?

update the weights for the links between the hidden and output layers

    self.who += self.lr * numpy.dot((output_errors * final_outputs * (1.0 - final_outputs)), numpy.transpose(hidden_outputs))

update the weights for the links between the input and hidden layers
self.wih += self.lr * numpy.dot((hidden_errors * hidden_outputs * (1.0 - hidden_outputs)), numpy.transpose(inputs))

    pass

My question is this. The weights are being adjusted between the hidden layer and output layer and then they are being adjusted between the input layer and hidden layer. Don't I need to use back propagation to find the error of the hidden layer output and then use that correct hidden layer output to then adjust the weights connecting the hidden and output layer ? Right now the code is using the incorrect hidden layer output to adjust the weights between the hidden and output layers. Once i adjust the weights between the input and hidden layer it will give me a new hidden output and I think I would have to re adjust the weights between the hidden and output layer again ?

Here is a picture on the process im talking about.

wooden sail
#

if you needed to know the ideal output at each layer, no one would use deep neural networks 😛

#

the way to think of it is that the loss function you use depends on two things: the ideal output (the "label", if you will) and the output that the network actually produces

#

let's call the network N, and the loss L, and the label (ideal output) y

#

let's also call the input x, and the parameters of the network, idk, theta

#

this comes out as L(y, N(x, theta))

#

this is just one single function, through function composition, that depends on the labeled data pairs (x,y), and on the parameters theta of the network N

#

you can directly differentiate this with respect to theta and use that to update theta. how? through backprop, since you know how the network is made: through the composition of affine transformations and activation functions

mild dirge
#

I think the question is that, because you have changed the hidden to output layer, when giving the input to the model again, the hidden error would be different, thus the update to the input to hidden should be based on this new hidden to output.

#

Which would be a more ideal update, but would need to run the model on the batch again for every layer.

placid cedar
#

anyone available atm?

#

need some help 🥲

wooden sail
mild dirge
#
# update the weights for the links between the input and hidden layers
        self.wih += self.lr * numpy.dot((hidden_errors * hidden_outputs * (1.0 - hidden_outputs)), numpy.transpose(inputs))
#

This one

#

the hidden_errors var

wooden sail
#

ah

#

there's a name for that. that would be a flavor of coordinate descent

#

that only converges under special conditions

#

the way to see it is that, although you do it step by step by backpropping through the layers, what is happening is that all of the network parameters get updated at the same time

#

if you change the parameters of one layer only and then compute a new error to update the other parameters, the error also changes again for the parameters you previously updated

#

this is only valid under special conditions

#

it actually has crazy good performance in special cases, but requires extra properties to reach optima

wooden sail
#

the standard update step goes through every single layer

mild dirge
#

Yeah it does, but that is what they are asking, why update all at the same time, once you want to update input to hidden, the derivatives wouldn't be "relevant" anymore because the model has changed after the update of other layers, but they are.

crimson summit
# mild dirge Yeah it does, but that is what they are asking, why update all at the same time,...

@wooden sail was explaining that if I update the weights between the input layer and hidden layer and run the code and then update the weights between the hidden layer and output layer it would not work because when I update the code after adjusting the weights between the input and hidden layer it would give me a entirely new error and as a result the hidden output would still be incorrect and I would just be in this never ending loop

#

still trying to wrap my head around it

#

but I think thats what he means

#

@wooden sail did i get that right or am i still looking at it from the wrong angle

wooden sail
#

the update happens for all parameters in the network at the same time, normally

#

we think of networks as having "layers" because that makes it easier to describe them, but at the end of the day, a network is a function with a ton of parameters and nothing else

crimson summit
#

means alot

serene scaffold
wooden sail
native umbra
#

is there any recent internships for AI/Data science for Beginner level?(I can not find any in linkedIn)

lapis sequoia
#

Not really sure where to ask this question so figured I’d try this channel, if I have a 2x2 grid of points spaced evenly apart (let’s say 1000 units apart) how can I calculate the radius needed on each point so they overlap and cover 100% of the area within the 2x2 grid.

I included an image to help understand what I’m trying to say.

mild dirge
#

Not really for this channel. But the circles should touch in the middle, so half the distance between diagonal points.

wooden sail
#

do you want the region between the points, or also some region beyond the points?

#

exactly as pccamel says. in the image you have, the radius is sqrt(2) times half the distance between 2 adjacent points

#

if the points are in the midpoint of some pixels, the radius may have to be bigger

lapis sequoia
#

so in this example the formula would be

sqrt(2) * 500

Since this example is 1000 units between them?

wooden sail
#

mhm

lapis sequoia
wooden sail
#

then the circles meet exactly in the middle. you may have some numerical issues with that, so you could make the circles a little bigger

crimson summit
#

@wooden sail sent you a friend request my guy

wooden sail
#

i don't accept any

crimson summit
#

oh dam

#

guess ill just write messages in here lol

lapis sequoia
wooden sail
lapis sequoia
#

Nvm I’m dumb ahaha notice that now

crimson summit
#

@wooden sail I don’t know if this is a super obvious and I’m just being dumb but how does it mathematically work that the summation of the errors of the weights coming from a neuron are the errror of the output of that neuron ?

wooden sail
#

i'm not sure what "coming from a neuron" means

mild dirge
#

Well the effect of the neuron value on the total error is the sum of the effect of the neuron on error of output 1, and the effect of the neuron on the error of output 2.

wooden sail
#

the whole thing is being described in very nonstandard terminology, that thing you linked is confusing

#

there is one error function, and the gradient of that error function with respect to the parameters of the network

#

it's probably a better idea to think about it that way

#

"error" evokes the idea that, at the very least, you're subtracting one quantity from another reference value

#

which is not what is happening there at all

crimson summit
#

i am just wondering how the sumation of w1,1 w1,2 and w1,3 gives you the error for that neuron

#

mathematically

wooden sail
#

"error for that neuron" means nothing to me mathematically

#

at best it's an auxiliary quantity that shows up when computing something else, and this person gave it that name

#

idk why

#

if i knew what they wanted to compute with it, i could give you an explanation

#

i would look for a different tutorial that uses standard terms

crimson summit
#

yea the book i am learning from is reffering to it as an error

#

my bad for the non standard terms

plain jungle
#

Just to give another mathy resource cause I know DNNs is a tough one to find some clean write ups about

hasty mountain
#

Error = input for that layer? yert

plain jungle
#

Cheers! If you have any questions feel free to ping

hasty mountain
#

Or...for that... neuron? (though I think a neuron would be better represented as the output of a layer/input for another pithink )

crimson summit
#

idk what the exact mathimatical term is for that

crimson summit
wooden sail
#

they're using that in the computation of the gradients, which are used to update the weights

#

but that's just weird naming

hasty mountain
#

Gradients is also a weird name, btw...why not just call it "derivatives"?

crimson summit
#

i think its because its a super beginners book it helped me understand it when i first read it

wooden sail
plain jungle
wooden sail
#

the gradient is the vector of partial derivatives w.r.t. each variable. there are other kinds

#

e.g. the total derivative, which is the sum of partial derivatives

crimson summit
wooden sail
#

unless the problem is (strictly) convex

hasty mountain
#

Because man...how I get lost over crazy terminology folks use...especially in Reinforcement Learning.

crimson summit
crimson summit
plain jungle
#

Nope, and I’d encourage you to check the code link cause it explains it with all the variables there, but to get the derivative of your error you use an activation function, not taking the derivative of the 2(e-a)

crimson summit
plain jungle
# crimson summit this was my logic

Ah I see why you now said -2, yeah you can use it like that. I’m not too sure how it would turn out. When I’ve built DNNs in the past I used the target as my focus instead of my output. But both should net you the same I think

night kernel
#

*trying to build my own small ai chatbot for free

ive asked about this a bit before so please excuse me if you saw my messages and i sound redundant.

wanted a piece of advice on LLMs in general. let's say i can find my own LLM for commercial use. said LLM might have its own syntax - i.e. some ai chatbots can be more rude than others, some have sense of humor, etc

what if i want mine to be simple much like chatgpt? no nonsense, just takes my text and is all business. do you suggest i downloaded source code for an already existing LLM and make that adjustment?

plain jungle
night kernel
plain jungle
#

NLP is a natural language processor. There’s a few out there you can import; however, depending on the scale a lot of hobbyist will just do the letters a-z as 1-28.

RNNs is a Recursive Neural Network and it is used for predicting the next NLP character. Once again, keeping it very lightweight of 1-28 letters, you can get some pretty mid sentences that you’ll need to run through with a simple spell check

#

TensorFlow is my personal recommendation for RNNs

zenith gull
#

hey guys i'm wondering if anyone has input on the following.

I have a code for a chatbot that takes an input of multiple pdfs and i then use llm to generate a response. However it seems that my chatbot is strictly limited to what i provide meaning it can't act like chatgpt and access the web.

I'm wondering if theres a way to combine the two and have a program for achatbot that can answer the questions from the document but can also answer general questions.

Thank you in advance for helping

plain jungle
plain jungle
#

For my first crack at an RNN from scratch I’m extremely happy

night kernel
plain jungle
night kernel
#

cool man

plain jungle
#

Thanks!

plain jungle
potent sky
plain jungle
potent sky
#

Nice! When I'd done that I started with thinking I'd just have to make some modifications to my dnn from scratch code but apparently that wasn't extensible enough xd. Had to write RNNs entirely from scratch again ;-;

#

Nice work!

plain jungle
#

Thank you! Yeah there was still some new code, but mostly just updating how the back prop needed to be done

potent sky
plain jungle
# potent sky Was it a plain RNN or like a GRU?

Am still trying to get familiar with the lingo, so forgive me for not having a shorter answer. How it works is it treats its layer as a Nx1. So for the sun graph it was a 10x1, and then for a defined amount of steps it world repeat the logic. So a 3x1 would be :

[ A, B, C ]
[ B, C, P1 ]
[ C, P1, P2 ]

Back prop is then done treating the layer as just a Nx1, so while the dot product of the error derivative and the weights may return a NxN array, we only look at NxN[-1][-1] turning it back into a 1x1 for bptt

placid cedar
#

hi guys, i need some help atm

#

anyone can help me with smth? 😦

potent sky
placid cedar
#

basically now im at my final step for my linea regression model

#

and i want to conduct polynomial expansion

#

but i am not sure which are the columns to apply it for

#

i have transformed my X_train and X_test in such a way

#

i want to find out how i can put back the target variable into this, X_train, so that i can make something like this

#

is it possible to use my X_train alone to get this?

past meteor
#

Because I did a bunch of econometrics and regular stats coursework in undergrad words like bias, robust, variance have many different meanings.😫

potent sky
#

The Bias-variance trade-off in traditional ML correspond to the exploration-exploitation trade-off in RL. It's a good analogy imo

past meteor
#

RL has its own bias variance trade-off

potent sky
#

I can't think of many confusing examples, maybe I'm out of touch ;-;
MDPs, MRPs, State, Environment, Reward signals, agent, all are nicely defined in RL literature

potent sky
grim crater
#

Hello! I'm having some trouble loading some content from a remote repository. I've set up a Jupyter notebook and attempted to get this dataset, but no luck. Is it something with how I'm grabbing DOWNLOAD_ROOT, HOUSING_PATH, or HOUSING_URL?

Here's the code...

`import os
import tarfile
import urllib

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml2/blob/master/"
HOUSING_PATH = os.path.join("datasets", "housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
os.makedirs(housing_path, exist_ok=True)
tgz_path = os.path.join(housing_path, "housing.tgz")
urllib.request.urlretrieve(housing_url, tgz_path)
housing_tgz = tarfile.open(tgz_path)
housing_tgz.extractall(path=housing_path)
housing_tgz.close()

import pandas as pd

def load_housing_data(housing_path=HOUSING_PATH):
csv_path = os.path.join(housing_path, "housing.csv")
return pd.read_csv(csv_path)

housing = load_housing_data()
housing.head()`

potent sky
#

I do remember being a little confused when I was starting out with it. But I can't think of anything rn

past meteor
#

It means something else to the statistical learning definition

#

The bias in bias variance in stat learning is inductive bias

#

The bias in bias variance in RL is statistical bias, like having a biased estimator

potent sky
potent sky
past meteor
#

Maybe in the limit they're the same but the implications are different because the setting is quite different

past meteor
potent sky
#

RL can have its own inductive bias too

#

I don't think RL is limited to statistical bias

past meteor
#

Oh yeah, RL uses inductive bias when selecting their representation of the environment and function approximators

potent sky
#

And stat learning as well has both inductive bias as well as statistical bias

#

For example, yeah I think

past meteor
#

But when RL literature says bias from my experience it's really related to the bias you seen in statistics 101, your estimates cannot converge to the population's parameter value

potent sky
#

Besides, in terms of a predictor, doesn't statistical bias already subsume any inductive biases we might have started with?

potent sky
past meteor
#

And I remember going into this rabbit hole

potent sky
#

Interesting. Do send if you find any of the conclusions you came to then. I'll have to look into this again when I get some time haha

potent sky
grim crater
past meteor
#

This is how we determined bias and variance, I found my slides:

#

Maybe I found the RL definiton confusing because I was overthinking it haha

potent sky
past meteor
#

8 am is too early for this, but nice chat @potent sky

#

Closing thought is that the implications for both RL and (regular ML) are the same: give a little bias away to drastically reduce variance. For ML this is, imo, closer to reducing model complexity but in RL it's more discussed in the literature in terms of statistics + MC (high variance, low bias) vs TD learning (high bias, low variance).

You were right in saying they're the same thing.

potent sky
potent sky
placid cedar
#

hi guys

hasty mountain
# potent sky I can't think of many confusing examples, maybe I'm out of touch ;-; MDPs, MRPs,...

They didn't seem that nicely defined in the texts that I've read. So I took quite some time to get that "policy", for example, can be a neural network.
At the time, I didn't know a neural network could be seen as a function. (I'm not in the field of math sciences). I just learned that when I got to study Diffusion Models. (since every tutorial and article about diffusion models says clearly that "the de-noising function can be a neural network")

placid cedar
#

im experiencing difficulties at the moment

#

when i put my target y_train, back to my x_train, there are many NaN values in the target variable

#

which im not sure why as well...

#

it was only after standardisation, and this problem has appeared

#

there was no problem fitting my y_train back to my x_train most of the time

#

it was only after scaling, and this problem occurred

#

what may be the possible reason for this tho?

agile cobalt
#

do you have any NaNs in y_train?

placid cedar
#

not at all

agile cobalt
#

it might be some issue with pandas index alignment then

placid cedar
#

anyway to fix this?

agile cobalt
#

check any operations you might be doing that could modify the indexes

placid cedar
#

im starting to tear my hair 🥲

#

can we take this to private chat btw, may bombard a bit too much

agile cobalt
#

also double check if check the length/shape of x_train and y_train match

placid cedar
#

so sorry if its too inconvenient

agile cobalt
placid cedar
#

ah sure

#

so i went to check the X_train

#

wait lemme find it real quick

#

the indexing is like this

#

before scaling

#

but after i did standardisation i mean

#

unless there's something wrong with my code or smth

#

scaler = StandardScaler()

scaler.fit(X_train)

X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

X_train_scaled = pd.DataFrame(X_train_scaled, columns=X_train.columns)
X_test_scaled = pd.DataFrame(X_test_scaled, columns=X_test.columns)

agile cobalt
#

it looks like the indexes have almost definitely been altered, they went from a crazy order to a default sorted 0, 1, 2, 3, ...?

placid cedar
#

lol yeah i assume so xD

#

i think it possibly changed as i used the pd.DataFrame

agile cobalt
#

yeah, do not generate new dataframes so carelessly

placid cedar
#

how shld i interpret this? and which variables shld i use for my polynomial expansion?

#

shld i use polynomial expansion on encoded categorical data? or should i only consider numerical variables

past meteor
drowsy timber
#

Hi!, Anyone here experienced in using python/pyspark in an aws emr instance?

I'm having difficulty installing cartopy on my instance. It keeps reading out an error. I have all the dependencies installed too like geos, shapely, and pysph

torn mulch
#
import numpy as np
def gausseidel(x,y,t=15,e=0.022):
    diags=np.diag(np.abs(x)).copy
    np.fill_diagonal(x,0)
    sum=np.sum(x,axis=1)
    if not np.all(diags>sum):
        return True
Xs = [
    [
      [4, 2, -1],
      [1, -5, 2],
      [2, -1, -4]
    ],
    [
      [3, 4, 5],
      [-3, 7, -4],
      [1, -4, -2]
    ],
    [
      [9, -2, 3, 2],
      [2, 8, -2, 3],
      [-3, 2, 11, -4],
      [-2, 3, 2, 10]
    ]
]
Ys = [
    [41, -10, 1],
    [34, -32, 62],
    [55, -14, 12, -21]
]
for i,X in enumerate(Xs):
    x=np.array(X)
    y=np.array(Ys[i])
    if gausseidel(x,y):
        print("Not Diagonally Dominant")
        
#

why i cannot run this code?

wooden sail
#

what error do you get?

#

at a glance i would think that your usage of copy might be the issue. i think it's a function, so you'd have to do np.diag(...).copy() with parentheses

neat stratus
#

Hey does anyone here have experience working with running graphs algorithms on a massive graph? I want to run some community detection algorithm on a a huge bi-partite graph(~5million Nodes and ~ 100M edges). Unfortunately for me I run out of memory using networkx and scipy sparse matrices. I'm wondering if there's a libray like networkx but backed by disk instead of being in memory. Any other solutions are welcome as well

celest vine
#

Is PySpark just pandas but for big data?

snow fog
#

Hi, what do you guys think of a library which will tell you where exactly (in which stdlib) a function is defined in python?
I have written it , but the problem is do you guys know about an ml model which can calculate the semantic similarly between pow and get_power?
Why am I asking this ?
I am writing this Library for python newcomers who may come from a different language and in their language there is a function named literal_eval as i am iterating over all stdlibs function i will calculate semantic similarity with this keyword.

past meteor
# celest vine Is PySpark just pandas but for big data?

Yesn't,

The syntax is different.

It's suitable for distributed computing over different nodes.

It can work with larger than memory datasets because it spills to disk.

It has way more overhead than pandas because it runs on the JVM so for small operations it's just wasteful.

coral field
#

Why does training a model to classify colored images take much longer and for more epochs than using grayscale? Is it because of the two extra dimensions.

timid grove
#

**NotImplementedError: Cannot copy out of meta tensor; no data! **
same issue:
https://github.com/togethercomputer/OpenChatKit/issues/87

I am having the same problem i loaded the model checkpoint shards in both float32 and bfloat16 but it does not work for me i do not know for what reason.

This is my google colab file its a request to have a look in it.
https://drive.google.com/file/d/1-ccrx1Q5tkLUYtZBGi5lNZGjPMyr_X9U/view?usp=sharing

AN OVERVIEW OF MY CODE:
i am using https://huggingface.co/HuggingFaceH4/starchat-alpha model, finetuning it on my own dataset. Firstly i using the meta device i made a device_map to load the checkpoint shards to my device , then i initialized my model using the downloaded checkpoints on my session storage then i loaded the weights tied them and finally i used acceletator load_checkpoint_and_dispatch and passed the folder contaning checkpoints and .josn files which is giving me this error.

This is the code snip that is giving me error:

The error:

my checkpoint folder that i am passing.

Please correct if i am conceptually wrong or missing some imp step.
I am using colab pro for running this code.

Thank You!
If anyone has worked with the same error please help.
Your inputs will be highly appreciated.
I am struggling with this error from past 5 days but not able to find the solution so** PLEASE HELP !**

GitHub

While trying to implement Pythia-Chat-Base-7B I am getting this error on running the very fist command (python inference/bot.py --model togethercomputer/Pythia-Chat-Base-7B) after creating and acti...

mild dirge
coral field
#

Do you also know why the training accuracy decreases so little from epoch to epoch? What's generally the ideal amount of epochs to train a model like this?

mild dirge
#

Depends on way too many things to give a general answer. Your image size, network size, problem difficulty, optimization algorithm, activation etc.

#

Even with that info it would be hard to say

cosmic narwhal
#

Hello everyone! I am currently building a machine learning algorithm whose objective is to try to predict the outcome of next years March Madness basketball tournament. My plan is to feed it about 6 statistical categories that correlate with tournament success from this year, then fine tune it to make it accurate. Once it works fairly well, I will use that same algorithm for next year. Will this be effective? What advice would you have on making it as accurate as possible?

serene scaffold
cosmic narwhal
#

I have only created an outline for it, but it looks like this.

#

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

Loads the CSV data into a pandas DataFrame so it can be manipulated

df = pd.read_csv('March Madness Data/2023 Game Data.csv')

The 5 categories that we are evaluating

selected_features = ['Kenpom adjusted efficiency', 'BARTTORVIK ADJUSTED EFFICIENCY',
'EFG %', 'DEFENSE, EFG', 'POINTS PER POSSESSION OFFENSE',
'POINTS PER POSSESSION DEFENSE']

Now create a new dataframe with the most valuable feautures

df_X = df[selected_features]

Assuming 'target' is the name of your target column

if 'target' in df.columns:
df_y = df['target']
else:
df_y = None # Set the target variable to None or any appropriate default value

Split the data into a training set and a test set. This will likely be a statistic and the output of that statistic

X_train, X_test, y_train, y_test = train_test_split(df_X, df_y, test_size=0.2, random_state=42)

Creates an instance of the model

lr = LogisticRegression()

Trains the model to observe accuracy

lr.fit(X_train, y_train)

Makes predictions on the test set to see how effective it is

predictions = lr.predict(X_test)

Print the accuracy

print("Accuracy: ", accuracy_score(y_test, predictions))

hasty mountain
#

Oh...Discord became a Jupyter notebook Markdown

agile cobalt
#

!code

arctic wedgeBOT
#
Formatting code on discord

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

For long code samples, you can use our pastebin.

placid cedar
#

hi guys

#

for my categorical values, i have encoded them, as well as did standardisation on them. should i put it in my polynomial expansion, since after putting them in it, the mse and r-square test results improved quite dramatically, from 0.57 to 0.62

serene scaffold
cosmic narwhal
#

Yes, I just haven’t determined my y yet, so that is a placeholder for now

#

I am very much learning as I’m going haha

serene scaffold
#

anyway, suppose you train the logistic regression model. once you have it, it needs all the features in your x data in order to work. so you can't use it for a future situation unless you know what the kenpom adjusted efficiency, efg, points per possession offense, etc. are

#

and I imagine that at that point, you already know who won

cosmic narwhal
#

I would access that data right before the tournament starts for next year, and try to utilize the same algorithm. Any ideas on what would be a good Y variable?

potent sky
#

Using the same method, can you try a smaller model that should fit entirely into your GPU memory easily and see what happens then?

cosmic narwhal
#

What type of model would you suggest for that?

potent sky
#

Looking at the GitHub issue it looks like it's because the model doesn't fit into GPU memory entirely and the handling for that isn't all that good. So identifying the source of the problem first will help arrive at a solution

serene scaffold
#

@cosmic narwhal Stargazer is not talking to you

potent sky
#

Yeah mb, I thought I was replying to the right message but apparently not

cosmic narwhal
#

No worries

serene scaffold
cosmic narwhal
cosmic narwhal
serene scaffold
plain jungle
# cosmic narwhal I would access that data right before the tournament starts for next year, and t...

I’m not too sure on how comfortable you are with AI so… If I had to tackle a problem as a beginner, I’d attempt the following.

Collect the outcomes of the previous March Madnesses / season games.

Find the average win % of a team

Build a tree of your next year bracket, and match the win %s and whoever has the higher % on average moves to the next round

There’s a lot of moving pieces to predict who would win such as players, home court adv, etc… this won’t garuentee you the winning answer, but it is a beginners step into the right direction for approaching future ai. The more you make models the more advance your models will become over time

cosmic narwhal
serene scaffold
# cosmic narwhal Yes, as best I can

so for whichever team won the tournament in the training data, you can make their y value 1. and whichever team was eliminated first, you can make theirs 0. and then everyone else can get a number between 0 and 1 based on how close they got to winning

cosmic narwhal
#

I will add that into my code right now. Thank you for everything @serene scaffold I hope that this model turns out to be reasonably accurate haha

torn mulch
#
import numpy as np
def f(x):
    x**6+2*x**2-3
def g(x):
    5*x**5+4*x
def newton(x0,error,iteration,max):
    x1= x0-(f(x0)/g(x0))
    print(f"Iteration of {iteration} new root = {x1}")
    if(np.abs(f(x1)) < error):
        return True
    if(iteration == max):
        return False
    newton(x1,error,iteration+1,max)
if not newton(4,0.01,1,15):
    print("cannot find the root")

why i cannot run this code?

#

TypeError Traceback (most recent call last)
Cell In[1], line 14
12 return False
13 newton(x1,error,iteration+1,max)
---> 14 if not newton(4,0.01,1,15):
15 print("cannot find the root")

Cell In[1], line 7, in newton(x0, error, iteration, max)
6 def newton(x0,error,iteration,max):
----> 7 x1= x0-(f(x0)/g(x0))
8 print(f"Iteration of {iteration} new root = {x1}")
9 if(np.abs(f(x1)) < error):

TypeError: unsupported operand type(s) for /: 'NoneType' and 'NoneType'

wooden sail
#

also recursion in python isn't the best

torn mulch
# wooden sail your f, g, and newton functions don't return anything. in python, functions with...
import numpy as np
def f(x):
    return x**6+2*x**2-3
def g(x):
    return 6*x**5+4*x
def newton(x0,error,iteration,max):
    x1= x0-(f(x0)/g(x0))
    print(f"Iteration of {iteration} new root = {x1}")
    if(np.abs(f(x1)) < error):
        print("A")
        return True
    if(iteration == max):
        return False
    newton(x1,error,iteration+1,max)
if not newton(4,0.01,1,15):
    print("cannot find the root")
else:
    printf("root found!")

when this code return true why not print root found?

wooden sail
#

your newton method still doesn't have a proper return

#

your recursive call needs to be in a return as well

#

otherwise the inner returns don't get passed back out to the previous calls

#

imagine an inner newton iteration finds a root and returns true

#

this turns your code into

def newton(x0,error,iteration,max):
    x1= x0-(f(x0)/g(x0))
    print(f"Iteration of {iteration} new root = {x1}")
    if(np.abs(f(x1)) < error):
        print("A")
        return True
    if(iteration == max):
        return False
    True
``` but that last True is not returned anywhere, and the function returns None unless the solution is found at the very first iteration
serene scaffold
#

Javapython Sadge

torn mulch
#
import numpy as np
def f(x):
    return x**6+2*x**2-3
def g(x):
    return 6*x**5+4*x
def newton(x0,error,iteration,max):
    x1= x0-(f(x0)/g(x0))
    print(f"Iteration of {iteration} new root = {x1}")
    if(np.abs(f(x1)) < error):
        print("A")
        return True
    if(iteration == max):
        return False
    True
    newton(x1,error,iteration+1,max)
if not newton(4,0.01,1,15):
    print("cannot find the root")
else:
    print("root found!")

it still cannot print root found

wooden sail
#

i think you misunderstood what i told you 😛

#

what i meant was that you need

def newton(x0,error,iteration,max):
    x1= x0-(f(x0)/g(x0))
    print(f"Iteration of {iteration} new root = {x1}")
    if(np.abs(f(x1)) < error):
        print("A")
        return True
    if(iteration == max):
        return False
    return newton(x1,error,iteration+1,max)
``` and the code i wrote was an explanation why
past meteor
serene scaffold
cold osprey
#

whats java about it?

cobalt sleet
#

hello friends I have a question

#

are ConvNets dead? Are transformers the new big boy in image processing?

past meteor
#

Transformers have much less bias so that means they need a lot more data but they're a lot more flexible

#

I also think they need more memory than conv nets so they have their time and place

#

Some CNN's can run on edge/mobile/... I'm not sure the run of the mill vision transformer can but ViT's are such active research that I could be so wrong by now 🤣

coral field
#

what's the purpose of adding dense layers to a cnn? why not just only use convolutional layers?

cobalt sleet
#

relics of a bygone era

#

crazy I've only heard about them recently

#

I thought cnns like ConvNext were going to remain the big boys for a while

past meteor
cobalt sleet
#

yeah but transformer being the efficient and flexible building block as it is will likely take over

past meteor
#

Linear regression didn't stop being a thing after transformers either

#

Occam's razor

tacit knot
#

I'm building an open source project (hope to release first version late today or tomorrow, finally) built on top of a CNN (ZoeDepth) at the core. None of the other AI models or approaches were even very close to the performance and quality of this one. That said, I'm more of a "user" of AI stuff than a "developer" of AI right now. Also, to say I'm new to AI (and python lol) is a bit of an understatement. Only point is, there are bleeding edge / state of the art AIs still rolling out as CNNs, so I doubt they are going to disappear any time all that soon.

#

I've also decided that the options for dependency management in python all feel oddly complex and convoluted...not a fan lol

past meteor
severe topaz
#

Which one of these methods is popular I’ve reviewed some Gibbs and PCA lately… what is everyone using?

tacit knot
past meteor
#

Venv (inside stdlib) in conjunction with poetry is a great fit

tacit knot
#

ah gotcha, venv/conda are the main things I've been working with, but some of these bleeding-edge AI projects are barely held together it seems. Then again, I'm on linux so that occasionally makes things much easier, but sometimes much much harder. Getting NeRF (instant-ngp specifically) up and running was kinda crazy of a process as one of the more difficult examples recently.

past meteor
#

Afaik pip installing can kill a project because there's no real dependency checking that is as stringent as poetry

tacit knot
#

I guess my main issue is always being presented with 5 options that are often only halfway explained. Three weeks ago I had never heard of venv/conda/pipx/etc (brand new to python AND ai/ml stuff)

#

Dang, where does the time go, maybe it has been 5-6 weeks now that I think about it.

#

hmm poetry seems interesting, but I'm curious, how would you compare it to npm/yarn? or just a different kind of animal?

wooden sail
#

it didn't use to do that at all, back in the days when conda was originally conceived

potent sky
# cobalt sleet interesting. So it seems to me that they could be in the path of becoming old te...

Convolutional operations and convolutional neural nets by extension have some unique properties that make them especially suitable for image processing. Though ViTs are amazing and a lot of my research involves them, raw ViTs have some fundamental characteristics that don't render then equally suitable in some respects.
There's a lot of research going on in this but so far all successful approaches make use of CNNs with ViT to incorporate these properties (that I know of)
So it's sometime atleast before CNNs are relics of a bygone era haha

potent sky
potent sky
#

pip + venv I've found to be quite sufficient

  • Poetry for serious projects
tacit knot
#

Getting that up and running was.....challenging lol

#

there is a line "also run pip install -r requirements.txt" as basically an afterthought

potent sky
#

Mhm but most of the build tools for this are non Python
Python is optional, just for bindings ig

#

If you do want python bindings then go for pip install requirements

tacit knot
#

yea, i mean i got it all working, but I guess i'm associating stuff to python incorrectly, that is just one of the many places things can get complex with these AI/ML things

potent sky
#

pip install -r requirements.txt is standard when setting up the dependencies for any python proj (even the filename requirements.txt). That might be why it appears as an afterthought - because for a lot of people, it is

tacit knot
#

Sure, but just following that can kill a different project when not using venv or whatever. Early on every time I tried to stand up a new project just to try it out I'd end up destroying my environment for several other things.

#

Then again, I'm just venting really, shouldn't be complaining about bleeding-edge crazy AI things being a little tricky lol

wooden sail
#

also you don't deal with the safety issues of pypi, since the repos are curated

modest onyx
errant bison
errant bison
#

every tutorial says to clone a git

hasty mountain
#

Warming steps, adjusting learning rate, the residual blocks unbalancing the gradients and possibly leading to model collapse, the problems of using Teacher Enforcing...

#

Besides... One can't make a GAN using a Transformer because it's too efficient as Discriminator and it collapses the adversarial process. No fun!

But maybe someday I'll try something with a GPT as Generator and a BERT as Discriminator brainmon

rain garnet
#

Hey, I'm currently working on a data science project and I'd like some advice.

The problem description is that there are multiple stores, and each store has multiple products. What I want to do is based on recent sales data, I want to predict a good price to sell each product.

I'm guessing it wouldn't be a good idea to train a model for each product, and for each store, so in this case, what can I do? I am using linear regression and predicting the demand of a given product based on the price.

plain jungle
somber pollen
#

Do you know what the correct price is? Or are you just trying to provide a forecast?

rain garnet
rain garnet
somber pollen
#

You can only find the optimal price for a product if you can measure how much demand decreases for a given increase in price

#

Optimal in the sense of maximizing profit, if you just want a good sense of what a "fair" price is you can just take the average

rain garnet
#

I'm basically just using a simple algorithm which seems to work for now, I'm plotting # of orders vs price, and multiplying it with the profit

rain garnet
somber pollen
#

Multiplying with the profit? Do you mean like plotting on a graph versus it?

rain garnet
#

Demand * profit for all the possible prices, so that would give me the max profit I can get theoretically

somber pollen
#

how would you sell the product for multiple prices at a time though?

rain garnet
#

I'd have to test it

#

I'd start with the lowest price, then the highest price

#

and then go from there

somber pollen
#

Generally the best way is to plot the thing you can control (in this case price) versus the thing you're trying to maximize (profit)

rain garnet
#

collect more data

somber pollen
#

If you multiply them together then the relationship becomes less clear

rain garnet
#

yeah but I won't have profit in the dataset

#

I'll have number of orders for a given price

somber pollen
#

Ohhh, I see what you mean. So you're basically comparing the price versus the computed profit for that price, which is done by multiplying the number of orders vs the price etc

#

Yeah that sounds like a good approach

rain garnet
#

yep

#

it's kind of a brute force approach rn tho

#

this won't scale, cause i'd need a model for each product for each store in this case

somber pollen
plain jungle
#

Agreed NNs are good but shouldn’t be seen as the solution to all. This sounds like something that some algos could solve more efficiently

hasty mountain
#

I suppose Random Forests would be expensive as well...since they're...a bit like NNs?

rain garnet
somber pollen
hasty mountain
#

Speaking of expensive models... Is there a way to use Genetic Algorithms together with Stochastic Gradient Descent to optimize my model without obliterating my GPU or RAM?

#

I was thinking about trying something like that for a Variational AutoEncoder...or for another model which the loss is decreasing a bit slowly(but still can be trained without overfitting problems)

potent sky
potent sky
potent sky
potent sky
potent sky
potent sky
rich sail
#

Hey could anyone help me with my problem in python-help, greatly appreciated! 🙏

distant cosmos
#

Can someone help me understand LSA and it's uses ?
Like i am using it as a metric to analyze similarity between two directories which contain 100s of files in them

#

Is this the right use case ?

rose dagger
#

A question about Fully Convolutional Networks: Is the "skip connection" literally just adding the layer of the encoder part to the corresponding layer (of equal spatial dimensions) of the decoder part?

past meteor
#

This is called "local search" in optimization literature. So you run your genetic algorithm as you would normally and then at specific points you run a local search, if your problem allows it, it could be a few iterations of SGD you run at that moment.

#

Imagine you have a (nearly) continuous function that has many local optima, for example the egg holder function. At this point you may want to consider running a genetic algorithm.

You run it as is. Mu is a hyperparameter of your genetic algo, it's the mean of a poisson distribution that you sample from. Whatever you get from this sample determines how many iterations of SGD you run on each individual in the population.

Plus side is that you're generally going to land exactly at some local optimum exactly faster but the downside is that if mu is too high your population will converge very fast to 1, potentially, suboptimal solution.

hasty mountain
#

I just hope the genetic optimization wouldn't make an optimization so abrupt that it would break my model yert

hasty mountain
#

Maybe for a Reinforcement Learning model would be better, but usually my RL models also must have a really short processing time(I like to use them to play games)

past meteor
#

It'll be really expensive. Individual represents a full model. Each step of local search is training a new model

hasty mountain
#

Oh... I see... Then nah.
I was thinking more of using a genetic algorithm where each individual is a weight from my model layers.

past meteor
#

Depending on what your search is for, I'm thinking of architecture or hyperparameter search

hasty mountain
#

The ResNet architecture can show you clearly what is a residual connection... it was the model that popularized them, afterall.

potent sky
potent sky
# rose dagger A question about Fully Convolutional Networks: Is the "skip connection" literall...

Yes, in some cases it is addition of the outputs of one layer to the inputs of another, generally as in Residual Neural Networks (Here I mean element-wise addition of the feature maps)
In other cases, a skip connection can also represent concatenation of two feature maps, i.e stacking them on top of each other.
U-Net uses concatenation, ResNet-18 uses addition, DenseNets use concatenation, etc.

#

The decision-making for when to use concatenation and when to use addition is subtle

#

Both methods allow you to preserve information from previous layers in the network and propagate gradients more effectively and thus both come under "skip connections"

hasty mountain
hasty mountain
#

This is what I see people using in generative models, at least.

twilit arch
#

How do arguments work in midjourney? Is the model trained to respond to arguments like --ar or --niji or is that something they handle in the middle?

hasty mountain
twilit arch
#

so I should train with the arguments then

hasty mountain
#

I don't know how the scripts for midjourney works, but most models available to the public are like that. You run a script in the command shell, the script will execute the model and the arguments you've passed will be used to create a dictionary of arguments

twilit arch
#

hm alright

#

im planning on training openjourney so

hasty mountain
#

This is the module that will probably be used to create the dictionary of arguments from the command shell

agile cobalt
#

for ar I'm not sure, could be using a different upscaler

potent sky
# hasty mountain Generally, what I've seen is: If you want a simple residual conection, maybe to ...

Exactly.
The underlying principle is that element-wise addition is akin to an adjustment in the parameter space of that layer, so it should preferably come from a layer with a similar parameter space -> "closer" layer.
Concatenation is akin to collecting more features. These might be at different levels or hierarchies of representation and so addition doesn't make sense. Generally this translates to:
skip connections between "closer" layers -> element-wise addition
"Farther" layers -> concat

The basic multi-head attention unit and encoder unit in transformers have skip connections for almost successive layers, and rightly so addition makes sense here

past meteor
#

I need to do something with transformers because just reading about them makes it feel so vague

#

I understand all the pieces intuitively but that's about it

potent sky
#

Hmm have you built it from scratch

past meteor
#

No but I'm not sure how much that'll help me.

potent sky
#

what part do you think you're lacking/uncomfortable in?

past meteor
#

The annoying information retrieval analogies

potent sky
#

okay yeah I'm not sure how much building it from scratch will help in that, but it might

twilit arch
past meteor
#

the bricks make sense, you have a sequence and you're conditioning on every other element in the sequence. Multi head because each head takes into account a different piece of info

potent sky
#

A deep dive into the math ig

#

Mhm

past meteor
#

Maybe it's because I haven't actually used them and I've exclusively just looked at the math pithink