#data-science-and-ml | Python | Page 2

hidden rapids Jul 25, 2022, 2:27 PM

#

#

couldn't send snippet because of character limit

cinder matrix Jul 25, 2022, 2:29 PM

#

guys when a language model gets trained

#

does it learn the probability distribution for the vocabulary

mild dirge Jul 25, 2022, 2:29 PM

#

hidden rapids

I don't have much experience with keras, but could it be that you have to specify the batch size too?

#

It probably thinks there's multiple possible output shapes now or something

serene scaffold Jul 25, 2022, 2:32 PM

#

hidden rapids couldn't send snippet because of character limit

!paste

arctic wedgeBOT Jul 25, 2022, 2:32 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hidden rapids Jul 25, 2022, 2:33 PM

#

serene scaffold !paste

https://paste.pythondiscord.com/qanedoyaju

hidden rapids Jul 25, 2022, 2:33 PM

#

mild dirge I don't have much experience with keras, but could it be that you have to specif...

where exactly should i do that

serene scaffold Jul 25, 2022, 2:34 PM

#

cinder matrix guys when a language model gets trained

"language model" describes what it does, not how its implemented.

cinder matrix Jul 25, 2022, 2:34 PM

#

serene scaffold "language model" describes what it *does*, not how its implemented.

if i train a text generative model

#

how can i describe the state of the model after it has been trained, like would it be correct to say "after training, it has learned the probability distrbution for each word in the dataset"

#

is there a better way to put it

cinder matrix Jul 25, 2022, 2:37 PM

#

cinder matrix if i train a text generative model

how would you put it, like what is distinguish before and after training with regards to the learned probabilities

serene scaffold Jul 25, 2022, 2:37 PM

#

cinder matrix how can i describe the state of the model after it has been trained, like would ...

it's more about learning the probability of sequences, and saying "this sentence is more likely than this one"

cinder matrix Jul 25, 2022, 2:38 PM

#

serene scaffold it's more about learning the probability of sequences, and saying "this sentence...

like multiple combination of words right

#

and learning what is more likely to come after

#

am writing a report and can't find the proper wording ;-;

serene scaffold Jul 25, 2022, 2:40 PM

#

cinder matrix and learning what is more likely to come after

a language model can learn that for "the boy was chased by the x", x is more likely to be "dog" than "cat", because "the boy was chased by the dog" is learned to be more likely than "the boy was chased by the cat"

#

though this isn't necessarily because of some learned properties of the words "dog" and "cat".

cinder matrix Jul 25, 2022, 2:41 PM

#

now when it comes to writing what you said in simple terms XD

serene scaffold Jul 25, 2022, 2:42 PM

#

cinder matrix now when it comes to writing what you said in simple terms XD

well, if you're writing a report, you should say "sequence" instead of "sentence", and "token" instead of "word".

cinder matrix Jul 25, 2022, 2:42 PM

#

so i say, "After a language model has been trained, it has learned the probability distribution for every sequences"

serene scaffold Jul 25, 2022, 2:46 PM

#

.wiki language model

strange elbowBOT Jul 25, 2022, 2:46 PM

#

Wikipedia Search Results

Language model
A language model is a probability distribution over sequences of words. Given such a sequence of length m, a language model assigns a probability P (

BERT (language model)
research publications analyzing and improving the model. The original English-language BERT has two models: (1) the BERTBASE: 12 encoders with 12 bidirectional

serene scaffold Jul 25, 2022, 2:46 PM

#

the first sentence has it 😄

cinder matrix Jul 25, 2022, 2:46 PM

#

is mine not correct tho :<

wooden sail Jul 25, 2022, 2:47 PM

#

not quite in that the distribution is over the sequences, not a separate distribution per sequence

iron basalt Jul 25, 2022, 2:49 PM

#

cinder matrix is mine not correct tho :<

"over"

cinder matrix Jul 25, 2022, 2:50 PM

#

"After a language model has been trained, it has learned the probability distribution over sequences of words from the dataset"

wooden sail Jul 25, 2022, 2:50 PM

#

if you don't like the word "over", you an also use joint distribution, too

cinder matrix Jul 25, 2022, 2:50 PM

#

what about conditional probbility

wooden sail Jul 25, 2022, 2:52 PM

#

well, the way you have it atm says nothing about whether events are independent or not, but sequences are considered to be dependent on each other. you could add info regarding that if you'd like

cinder matrix Jul 25, 2022, 2:52 PM

#

conditional probability distribution over the set of tokens

wooden sail Jul 25, 2022, 2:52 PM

#

hmmm that's more tricky to say

cinder matrix Jul 25, 2022, 2:53 PM

#

can you please rephrase it for me c:

wooden sail Jul 25, 2022, 2:53 PM

#

i'd leave it out if you can't phrase it yourself. this is the sort of stuff that attracts questions during reviews

iron basalt Jul 25, 2022, 2:54 PM

#

You will need more details on which model specifically. Some generative models don't even really model the joint distribution, they just generate things that "look like" they would be in the data set.

hidden rapids Jul 25, 2022, 2:55 PM

#

hidden rapids https://paste.pythondiscord.com/qanedoyaju

can anyone pls look into this

iron basalt Jul 25, 2022, 2:55 PM

#

(You can sometimes get away with this if you don't care, such as when generating images)

cinder matrix Jul 25, 2022, 2:55 PM

#

iron basalt You will need more details on which model specifically. Some generative models d...

gpt2

lapis sequoia Jul 25, 2022, 3:18 PM

#

iron basalt Jul 25, 2022, 3:25 PM

#

cinder matrix gpt2

Well, that is a bit more complicated. If you are talking about language models in general, then the distribution over sequences is fine. Like Wikipedia (which is why it's also being vague).

#

*Languages models are diverse.

cinder matrix Jul 25, 2022, 3:27 PM

#

lapis sequoia

what do you mean, just use it whats the problem?

#

and it does print in order, or do you mean sorted?

wooden sail Jul 25, 2022, 3:27 PM

#

this is stelercus's worst nightmare. not just the code, all of the text is an image

lapis sequoia Jul 25, 2022, 3:27 PM

#

oh itt does?

cinder matrix Jul 25, 2022, 3:27 PM

#

lapis sequoia oh itt does?

yes

#

unless your using a dictionary

lapis sequoia Jul 25, 2022, 3:28 PM

#

im not

#

lemme double check rq

cinder matrix Jul 25, 2022, 3:29 PM

#

lol you literally sent your answer

lapis sequoia Jul 25, 2022, 3:29 PM

#

i dont want to print [34234423, 324234 ,34234]

#

i want to print 423423 then 433243 42342

cinder matrix Jul 25, 2022, 3:29 PM

#

you want to access the elements

#

print(list[index])

lapis sequoia Jul 25, 2022, 3:30 PM

#

list = [3242423, 2342342 ,234242]
client.get(list)

#

im actually tryina do that

#

replacing the (list) with each number

#

thats in there

cinder matrix Jul 25, 2022, 3:31 PM

#

lapis sequoia i want to print 423423 then 433243 42342

so your omitting the '3' in front correct

lapis sequoia Jul 25, 2022, 3:31 PM

#

oh no its just random numbers

cinder matrix Jul 25, 2022, 3:31 PM

#

its supposed to be client id?

lapis sequoia Jul 25, 2022, 3:32 PM

#

an id yes

lapis sequoia Jul 25, 2022, 3:32 PM

#

lapis sequoia list = [3242423, 2342342 ,234242] client.get(list)

here

cinder matrix Jul 25, 2022, 3:32 PM

#

so if i do client.get(3) it should give the random numbers with 3 in front?

#

since ids are unique i would use list index as id

lapis sequoia Jul 25, 2022, 3:33 PM

#

no that is just my function what it does it sends messages to the ids there

cinder matrix Jul 25, 2022, 3:33 PM

#

ok just give me a example

#

of input and output that yo uwant

lapis sequoia Jul 25, 2022, 3:33 PM

#

ok so i got the list with the numbers

#

and i have a command

#

that sends a message to those id's

#

client.get(list)

cinder matrix Jul 25, 2022, 3:34 PM

#

client.get(list) will send a message to every item in the list right

#

it accepts a list, iterates through each item and sends them a message

lapis sequoia Jul 25, 2022, 3:35 PM

#

yes

#

and i cant figure out

#

how to send the message to the numbers from my list

#

client.get(342342353) if i do this it will send a message there

cinder matrix Jul 25, 2022, 3:36 PM

#

this?

lapis sequoia Jul 25, 2022, 3:36 PM

#

yes

grand canyon Jul 25, 2022, 3:37 PM

#

guys

#

i had a question with pytorch

#

im building a network that classifies breast cancer

cinder matrix Jul 25, 2022, 3:37 PM

#

ksi bein sus

grand canyon Jul 25, 2022, 3:37 PM

#

i keep getting this error: RuntimeError: expected scalar type Float but found Double

cinder matrix Jul 25, 2022, 3:38 PM

#

have you tried existing tutorials

grand canyon Jul 25, 2022, 3:38 PM

#

here's my code: https://github.com/Dodesimo/BinomialIDCClassifer/blob/master/classifier.ipynb

GitHub

BinomialIDCClassifer/classifier.ipynb at master · Dodesimo/Binomial...

Contribute to Dodesimo/BinomialIDCClassifer development by creating an account on GitHub.

grand canyon Jul 25, 2022, 3:38 PM

#

cinder matrix have you tried existing tutorials

yes

#

this is a specific question with my code though im not sure why i keep getting this error

#

i tried casting my x, y in the trainloader to float

#

yet im still getting error

#

what could be the problem?

timid kiln Jul 25, 2022, 3:51 PM

#

I have two tables in excel. I'm pulling Table B into pandas, dropping a couple columns, reindexing a few columns, and then pasting that at the end of Table A in excel. There are 60 columns. When I reindex Table B in pandas, I only reindexed the columns I wanted to reorder; however, it appears that the rest of the columns I didn't mention were dropped from the df.

Is there a way to use either column numbers, or to tell pandas "hey, reorder these columns and the rest of them can stay the same, after these reordered columns"?

serene scaffold Jul 25, 2022, 4:09 PM

#

timid kiln I have two tables in excel. I'm pulling Table B into pandas, dropping a couple ...

even though you're conceptualizing it as "only reindexing a few columns", you're changing the order of the columns in general, so you need to provide a list of labels that reflects the order you want at the end for all the columns.

#

taking the label of each column, is there some property that distinguishes between the ones you want to promote to the front, and the ones that you don't?

#

(for example, is it "every column that's divisible by 3" or "every column with an underscore")

mild dirge Jul 25, 2022, 4:16 PM

#

grand canyon yet im still getting error

Maybe your data is np.float64 instead of np.float32?

#

my_arr = my_arr.astype(np.float32)

#

Did you try that?

grand canyon Jul 25, 2022, 4:16 PM

#

mild dirge Maybe your data is np.float64 instead of np.float32?

no i did not try that

#

i found another fix

#

but

mild dirge Jul 25, 2022, 4:17 PM

#

Which is?

timid kiln Jul 25, 2022, 4:17 PM

#

serene scaffold (for example, is it "every column that's divisible by 3" or "every column with a...

It is 100% based on the column header names. The first 11 in Table B will always be the same; the subsequent ~50 columns headers will change every month.

These two tables are production forecast tables. So the first few columns are "this is where the production is coming from, the company, the pipeline it's feeding into..." etc. The last columns are monthly production forecasts, so each column has a header of, for example, "2022-Jul", "2022-Aug", and so forth. When I run the update each month, the column headers will change.

Table B is the forecast for potential production. Meaning, some folks are proposing new production. Table A is the base, or existing production.

I'm merging the tables so that we can get a summary of the production forecast for the next X years.

grand canyon Jul 25, 2022, 4:17 PM

#

mild dirge Which is?

LOSS = []
for epoch in range(100):
for i, (x,y) in enumerate(trainloader):
** x = torch.from_numpy(np.asarray(x)).float()
y = torch.from_numpy(np.asarray(y)).float()**
yhat = model(x.view(-1, 50 * 50))
loss = criterion(yhat.flatten(), y)
LOSS.append(loss)
optimizer.zero_grad()
loss.backward()
optimizer.step()
#%%

#

those two bold lines

#

but

mild dirge Jul 25, 2022, 4:17 PM

#

ah alright

timid kiln Jul 25, 2022, 4:17 PM

#

serene scaffold (for example, is it "every column that's divisible by 3" or "every column with a...

So after the first 11 columns, the headers in Table B and Table A will always match.

grand canyon Jul 25, 2022, 4:17 PM

#

the new problem is that the output is now a 2d tensor w two elements, but my y is a 1d tensor w a single element

#

also

#

my yhat values are now negative

#

Screen_Shot_2022-07-25_at_12.20.04_PM.png

#

@mild dirge

timid kiln Jul 25, 2022, 4:20 PM

#

Interestingly it seems that reindex has similar capabilities of drop. It's just the inverse.

mild dirge Jul 25, 2022, 4:21 PM

#

grand canyon <@309775277720993792>

You understand the meaning of the two vectors?

grand canyon Jul 25, 2022, 4:21 PM

#

is it beacuse

#

my output (d_out) is 2?

lapis sequoia Jul 25, 2022, 4:21 PM

#

https://www.youtube.com/watch?v=rLY7T23EOtQ

YouTube

Super_ElectroGamer YT

Сломал Python!!! ||| Pydroid3

Как сломать python? (не совсем) Если вы это хотите, можете сделать. Мне кажется, это реально круто. Но на самом деле... ВСЁ БУДЕТ В ЗАКРЕПЛЁННОМ КОММЕНТАРИИ

▶ Play video

grand canyon Jul 25, 2022, 4:22 PM

#

mild dirge You understand the meaning of the two vectors?

could you explain?

#

when i set my

#

d_out to 1

#

i just get negative values

#

im not sure why though

serene scaffold Jul 25, 2022, 4:22 PM

#

lapis sequoia https://www.youtube.com/watch?v=rLY7T23EOtQ

is this relevant...?

mild dirge Jul 25, 2022, 4:22 PM

#

don't know what d_out is

grand canyon Jul 25, 2022, 4:23 PM

#

mild dirge don't know what d_out is

number of output nodes

#

Screen_Shot_2022-07-25_at_12.23.18_PM.png

#


class Net(nn.Module):
    def __init__(self, D_in, H, D_out):
        super(Net, self).__init__()
        self.linear1 = nn.Linear(D_in, H)
        self.linear2 = nn.Linear(H, D_out)

    def forward(self, x):
        x = torch.relu(self.linear1(x))
        x = self.linear2(x)
        return x

mild dirge Jul 25, 2022, 4:24 PM

#

The output of the model is probably logits, which you need to put through a softmax to get confidence for each class

#

But without the context of loss, you can just take the argmax, and that will be the model's predicted class

lapis sequoia Jul 25, 2022, 4:25 PM

#

serene scaffold is this relevant...?

this schoolboy broke a python

mild dirge Jul 25, 2022, 4:26 PM

#

So instead of your model just outputting the predictions for each sample like:

[
  0,
  1,
  0,
]

it outputs something like:

[
  [0.8, 0.1],
  [0.2, 0.6],
  [0.95, 0.01]
]

#

And you want the position of the largest element in each row

#

So you can convert from the second format to the first

serene scaffold Jul 25, 2022, 4:27 PM

#

lapis sequoia this schoolboy broke a python

This is the data science channel. Please make sure that all your messages in our server are on-topic.

lapis sequoia Jul 25, 2022, 4:27 PM

#

serene scaffold This is the data science channel. Please make sure that all your messages in our...

sry

thick marlin Jul 25, 2022, 5:23 PM

#

Is this the right place to ask if there is a pretrained model available? and if so where can I search for it?

steep cypress Jul 25, 2022, 5:26 PM

#

thick marlin Is this the right place to ask if there is a pretrained model available? and if ...

you can try tensorflow hub, torchvision models, timm or huggingface

thick marlin Jul 25, 2022, 5:30 PM

#

steep cypress you can try tensorflow hub, torchvision models, timm or huggingface

I was wondering if Imaginaire's SPADE has a pretrained or fine tuned model on bdd dataset.

mint palm Jul 25, 2022, 5:35 PM

#

CVPR author literally assume reader has already worked along with them on projects, the way they explain is garbage, maybe not meant for someone new to field.

#

how do i overcome this

bold timber Jul 25, 2022, 5:49 PM

#

Hi guys, I want to make sure for my understanding. What is the number of hidden layer? Whether in this case I have 3 hidden layer?

mild dirge Jul 25, 2022, 5:51 PM

#

bold timber Hi guys, I want to make sure for my understanding. What is the number of hidden ...

Try and draw it out like this @bold timber

#

Then you just count the amount of layers that are not the input or the output

bold timber Jul 25, 2022, 5:54 PM

#

mild dirge Then you just count the amount of layers that are not the input or the output

But what do you think about my question? how many hidden layer that I have?

mild dirge Jul 25, 2022, 5:55 PM

#

mild dirge Try and draw it out like this <@786960616664727572>

^

#

I'm saying this so you can check it for yourself, that way you know for sure you understand it

#

You start with 4 nodes (the input) and then try draw it out

bold timber Jul 25, 2022, 5:56 PM

#

mild dirge I'm saying this so you can check it for yourself, that way you know for sure you...

That it's mean I have 3 hidden layers, right?

gleaming osprey Jul 25, 2022, 6:12 PM

#

I'm stuck on 70% accuracy, what can I do?

#

model = Sequential()

model.add(Conv2D(8, 2, activation='relu', input_shape=(48, 48, 1)))
model.add(BatchNormalization())
model.add(Dropout(0.2))
model.add(MaxPooling2D(2))

model.add(Conv2D(16, 2, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.2))
model.add(MaxPooling2D(2))

model.add(Conv2D(32, 2, activation='relu', kernel_regularizer = keras.regularizers.l2(0.001)))
model.add(BatchNormalization())
model.add(MaxPooling2D(2))

model.add(Conv2D(64, 2, activation='relu', kernel_regularizer = keras.regularizers.l2(0.001)))
model.add(BatchNormalization())
model.add(Conv2D(128, 2, activation='relu', kernel_regularizer = keras.regularizers.l2(0.0005)))
model.add(Dropout(0.4))
model.add(BatchNormalization())
model.add(Conv2D(128, 2, activation='relu', kernel_regularizer = keras.regularizers.l2(0.0005)))
model.add(Dropout(0.4))
model.add(BatchNormalization())

model.add(Flatten())

model.add(Dense(512, activation='relu', kernel_regularizer = keras.regularizers.l2(0.0005)))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(256, activation='relu', kernel_regularizer = keras.regularizers.l2(0.005)))
model.add(BatchNormalization())
model.add(Dropout(0.4))
model.add(Dense(128, activation='relu', kernel_regularizer = keras.regularizers.l2(0.005)))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Dense(64, activation='relu', kernel_regularizer = keras.regularizers.l2(0.0001)))
model.add(BatchNormalization())
model.add(Dropout(0.3))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))

model.add(Dense(7, activation='sigmoid'))``` Thats my model

steady basalt Jul 25, 2022, 6:32 PM

#

depends wats the task

mild dirge Jul 25, 2022, 6:34 PM

#

It also looks like you only use 2 filters for each convolution

#

Normally the size of the filters decreases the further in the model, and the amount of filters increases

#

@gleaming osprey

gleaming osprey Jul 25, 2022, 6:35 PM

#

mild dirge It also looks like you only use 2 filters for each convolution

ok, so what should it be ideally

#

and should I use pooling layers?

mild dirge Jul 25, 2022, 6:36 PM

#

How big are your input images?

#

Oh actually nvm, the order is different for keras layers from pytorch

#

You do increase the amount of filters

#

It's also weird that your kernel size is 2, while it is rarely even

wooden sail Jul 25, 2022, 6:41 PM

#

i've been suggesting for days to use a larger kernel (and fewer conv layers). maybe they listen to you

mild dirge Jul 25, 2022, 6:41 PM

#

Well? @gleaming osprey

gleaming osprey Jul 25, 2022, 6:42 PM

#

oh sorry

gleaming osprey Jul 25, 2022, 6:42 PM

#

wooden sail i've been suggesting for days to use a larger kernel (and fewer conv layers). ma...

srry

gleaming osprey Jul 25, 2022, 6:42 PM

#

mild dirge How big are your input images?

um 48x48

limber sage Jul 25, 2022, 6:42 PM

#

heyy anyone here with experience in nltk?

#

natural language processing

mild dirge Jul 25, 2022, 6:42 PM

#

mild dirge It's also weird that your kernel size is 2, while it is rarely even

Yeah but this @gleaming osprey

gleaming osprey Jul 25, 2022, 6:43 PM

#

mild dirge Yeah but this <@798254943923863612>

i dont know what are good kernel size

#

i just know what they are

wooden sail Jul 25, 2022, 6:45 PM

#

try some nice odd numbers. 3, 5, maybe even 7. 7 is already huge for that image size

gleaming osprey Jul 25, 2022, 6:45 PM

#

wooden sail try some nice odd numbers. 3, 5, maybe even 7. 7 is already huge for that image ...

ok

mild dirge Jul 25, 2022, 6:45 PM

#

Yeah 3 is enough probably

#

Since you maxpool twice as well

gleaming osprey Jul 25, 2022, 6:45 PM

#

mild dirge Yeah 3 is enough probably

should I use 3 for all?

mild dirge Jul 25, 2022, 6:46 PM

#

Well I would use odd numbers for kernels

#

It's a lot simpler to visualize the convolution as well

gleaming osprey Jul 25, 2022, 6:46 PM

#

mild dirge It's a lot simpler to visualize the convolution as well

idk how to

mild dirge Jul 25, 2022, 6:46 PM

#

Knowing that the output of each layer depends on neighbouring pixels in each direction

gleaming osprey Jul 25, 2022, 6:46 PM

#

here is my new model:```py
model = Sequential()

model.add(Conv2D(8, 5, activation='relu', input_shape=(48, 48, 1)))
model.add(BatchNormalization())
model.add(Dropout(0.2))
model.add(MaxPooling2D(2))

model.add(Conv2D(16, 5, activation='relu'))
model.add(Dropout(0.2))

model.add(Conv2D(32, 3, activation='relu', kernel_regularizer = keras.regularizers.l2(0.001)))

model.add(Conv2D(64, 3, activation='relu', kernel_regularizer = keras.regularizers.l2(0.001)))
model.add(Dropout(0.4))
model.add(Conv2D(128, 3, activation='relu', kernel_regularizer = keras.regularizers.l2(0.0005)))
model.add(Dropout(0.4))
model.add(Conv2D(128, 3, activation='relu', kernel_regularizer = keras.regularizers.l2(0.0005)))
model.add(Dropout(0.4))
model.add(MaxPooling2D(2))

model.add(Flatten())

model.add(Dense(512, activation='relu', kernel_regularizer = keras.regularizers.l2(0.0005)))
model.add(Dropout(0.6))
model.add(Dense(256, activation='relu', kernel_regularizer = keras.regularizers.l2(0.005)))
model.add(Dropout(0.4))
model.add(Dense(128, activation='relu', kernel_regularizer = keras.regularizers.l2(0.005)))
model.add(Dropout(0.3))
model.add(Dense(64, activation='relu', kernel_regularizer = keras.regularizers.l2(0.0001)))
model.add(Dropout(0.3))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))

model.add(Dense(7, activation='softmax'))```

mortal warren Jul 25, 2022, 6:47 PM

#

Hi, I'm trying to concatenate multiple dataframes together using final_df = pd.concat([final_df, df], axis=1), but it's creating duplicate indices. How do i get around this?

wooden sail Jul 25, 2022, 6:47 PM

#

those are SO MANY convoluton + pooling layers

#

you can probably get away with just 2 or 3

gleaming osprey Jul 25, 2022, 6:47 PM

#

wooden sail you can probably get away with just 2 or 3

ok, how many filters would be ideal?

#

16 and 32?

mild dirge Jul 25, 2022, 6:47 PM

#

That's really hard to say

gleaming osprey Jul 25, 2022, 6:48 PM

#

hm

mild dirge Jul 25, 2022, 6:48 PM

#

Depends on how complex patterns are

gleaming osprey Jul 25, 2022, 6:48 PM

#

they are faces

mild dirge Jul 25, 2022, 6:48 PM

#

How many patterns there are that are useful for predicting the class etc.

wooden sail Jul 25, 2022, 6:48 PM

#

the reason you're needing so much regularization is that you have a humongous amount of parameters for the size of your data set. it's difficult to give a fixed number, but the more params, the more data you need, regardless of how complex the data is

mild dirge Jul 25, 2022, 6:48 PM

#

An important thing to look at is also the "receptive field" of each layer

#

Like for a 3x3 convolution the receptive field is 3x3 pixels

#

But 3x3 convolution followed by 3x3 convolution would give a receptive field of 5x5

#

Followed by maxpool(2x2) would give 10x10 f.e.

#

And if you think the pattern can be found through looking at subsets of 10x10 pixels that is fine

#

If you are interested in very detailed patterns, you could try a lower receptive field (maybe not maxpool f.e.)

#

Make sense? @gleaming osprey

gleaming osprey Jul 25, 2022, 6:52 PM

#

hmm, I need to do something now, I'll try this later

wooden sail Jul 25, 2022, 6:52 PM

#

you'll notice that your convolutions boil down to having single pixel outputs, and then you still keep doing convolutions. from the receptive field standpoint, this means you have a bunch of filters that just take in the entire image. at that point you have made makeshift fully connected layers and you may as well just use that instead

#

like using a fat matrix as a transformation to embed the images in a higher dim space. that's not very useful for the task you're dealing with, but might be useful if you wanted to find "similar images" instead

bold timber Jul 25, 2022, 7:02 PM

#

bold timber Hi guys, I want to make sure for my understanding. What is the number of hidden ...

When I try to draw it out, I think I have two hidden layers and 1 output layer. That is true?

Please correct me if I'm wrong, Sir. @mild dirge

mild dirge Jul 25, 2022, 7:02 PM

#

Show your drawing 😄

mild dirge Jul 25, 2022, 7:03 PM

#

mild dirge Try and draw it out like this <@786960616664727572>

This drawing has 3 hidden layers f.e.

bold timber Jul 25, 2022, 7:07 PM

#

mild dirge Show your drawing 😄

like this?

mild dirge Jul 25, 2022, 7:08 PM

#

Jup seems correct

bold timber Jul 25, 2022, 7:09 PM

#

mild dirge Jup seems correct

I assume the last nn.Linear(4,3) is the output layer? Is it correct?

mild dirge Jul 25, 2022, 7:10 PM

#

With keras you just add the connection from one layer to the next

#

So nn.Linear(4,3) Means you have fully connected weights between 4 and 3 nodes

bold timber Jul 25, 2022, 7:12 PM

#

bold timber Hi guys, I want to make sure for my understanding. What is the number of hidden ...

Does it means in my case I have 2 hidden layers with 1 output layer, right?

#

@mild dirge

dawn dune Jul 25, 2022, 8:22 PM

#

Hi, is there a way in which I can allow VSCode to acknowledge that CUDA is available to avoid this error? : RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU. My apologies for posting this multiple times but I only just found this channel

mild dirge Jul 25, 2022, 8:40 PM

#

Not much to do with VSCode I'd think

#

Did you install pytorch with cuda according to the website?

#

https://pytorch.org/get-started/locally/

PyTorch

#

@dawn dune

lapis sequoia Jul 25, 2022, 8:43 PM

#

i have started doing data science recently , i have covered python basics and numpy , pandas , matplotlib ... my laptop lags as its ram get utilized 95% with chrome tabs( youtube from where i learn ) and vs code , i have i5 7 gen , integrated gpu , 8gb ram , 256ssd + 2tb hdd , i am thinking to buy a new laptop , can u recommend me what minimum specification should my new laptop has ... for data science ..... mainly processor and amount of ram and gpu

mild dirge Jul 25, 2022, 8:44 PM

#

If you are planning to buy a laptop you're better of using google collab or something

#

If you want to run bigger models a desktop will be more appropriate

dawn dune Jul 25, 2022, 8:44 PM

#

mild dirge <@201749590188359681>

Sorry hi yes I have, I'm running a TTS model in preparation for a final year project

mild dirge Jul 25, 2022, 8:45 PM

#

So you put in a line like pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113

#

You didn't just dopip install torch?

lapis sequoia Jul 25, 2022, 8:46 PM

#

mild dirge If you want to run bigger models a desktop will be more appropriate

i don't have any idea about big models right now , i am in college and willing to learn complete data science to get some job , i already have 1 laptop , so according to that situation should i get a desktop instead of laptop

dawn dune Jul 25, 2022, 8:46 PM

#

I run the demo doc on my conda pycharm env but no audio is produced like in the colab demo, however when I made my own script in VSCode it complains about cuda not being available.

mild dirge Jul 25, 2022, 8:47 PM

#

lapis sequoia i don't have any idea about big models right now , i am in college and willing t...

If you want to run it locally yes

#

If you are fine with having to need internet connection, then using google collab or other cloud computing services could be fine

dawn dune Jul 25, 2022, 8:48 PM

#

mild dirge You didn't just do`pip install torch`?

I used : conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge

lapis sequoia Jul 25, 2022, 8:48 PM

#

mild dirge If you want to run it locally yes

ohk , thanks

dawn dune Jul 25, 2022, 8:48 PM

#

Would I not be able to run a model(Tacotron 1 & 2) locally on my laptop with a gtx 1060 and 16gb ram?

mild dirge Jul 25, 2022, 8:49 PM

#

Don't know what tacotron is, you can still run some decently sized models, as long as you have enough ram

dawn dune Jul 25, 2022, 8:49 PM

#

It's a TTS model

mild dirge Jul 25, 2022, 8:49 PM

#

But with a laptop with a somewhat oldish gpu, it might take 10 hours instead of 1 or 2 with rtx 3000 series f.e.

dawn dune Jul 25, 2022, 8:50 PM

#

and how long would doing it on colab take?

mild dirge Jul 25, 2022, 8:50 PM

#

Probably less, collab lends out quite a lot of computing imo

dawn dune Jul 25, 2022, 8:50 PM

#

roughly estimating 🙂

#

I see

#

thanks

mild dirge Jul 25, 2022, 8:51 PM

#

I would just use collab on your laptop as long as you think it's good enough

#

You can also subscribe for more benefits etc. but I don't know how worth that all is

dawn dune Jul 25, 2022, 8:52 PM

#

If only the uni gave me funding for that 🤣

mild dirge Jul 25, 2022, 8:53 PM

#

Our uni has a big computer you can use for running models

#

But it's quite a hassle I think

dawn dune Jul 25, 2022, 8:53 PM

#

Okay but back to my original point, would you have any idea how to add cuda to VSC

mild dirge Jul 25, 2022, 8:53 PM

#

Doesn't have much to do with your IDE I'd think

#

But normally just doing that line works

#

Maybe you installed it in a virtual environment, or maybe you installed it globally but not in your venv

dawn dune Jul 25, 2022, 8:54 PM

#

okay let me try

#

I installed it in a conda env that I called pytorch

mild dirge Jul 25, 2022, 8:54 PM

#

I would generally never name anything after a package name

dawn dune Jul 25, 2022, 8:55 PM

#

that was just an example but fair

#

Is there a way to simply rename it without starting from scratch?

red timber Jul 26, 2022, 12:47 AM

#

Accountability post:
Today I continued to build the database for a passion project. Using the Reddit API, and continuing to work on text preprocessing techniques.

last peak Jul 26, 2022, 3:11 AM

#

hi, help pandas :'(

#

how to get unique rows in dataframe, not series?

#

full_df.loc[:,['MONTH','YEAR']] , I want rows of unique month and year

#

oh drop duplicates silly me

#

next question, how do I cross join in pandas

#

okay nevermind found it.. but why use merge vs join?

tacit basin Jul 26, 2022, 3:17 AM

#

last peak okay nevermind found it.. but why use merge vs join?

This is nice overview of merge, join, concatenate https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

grand canyon Jul 26, 2022, 3:17 AM

#

i had a question about training and validation in pytorch

#

i have the following training function:

last peak Jul 26, 2022, 3:17 AM

#

tacit basin This is nice overview of merge, join, concatenate https://pandas.pydata.org/pand...

ty!

grand canyon Jul 26, 2022, 3:18 AM

#

def train(epochs, trainloader, model, criterion, optimizer, validloader):
    TRAININGLOSS = []
    VALIDLOSS = []
    VALIDACCURACY = []
    correct = 0
    for epoch in range(epochs):
        for x, y in trainloader:
            x = torch.from_numpy(np.asarray(x)).float()
            y = torch.from_numpy(np.asarray(y)).float()
            yhat = model(x.view(-1, 288 * 96))
            predictedvalue = torch.max(yhat, dim=1)
            loss = criterion(predictedvalue[0], y)
            print("Training Loss: ", loss)
            TRAININGLOSS.append(loss)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        for x, y in validloader:
            x = torch.from_numpy(np.asarray(x)).float()
            y = torch.from_numpy(np.asarray(y)).float()
            yhat = model(x.view(-1, 288 * 96))
            predictedvalue = torch.max(yhat, dim=1)
            validloss = criterion(predictedvalue[0], y)
            print("Validation Loss: ", validloss)
            VALIDLOSS.append(validloss)
            label = torch.argmax(yhat)
            correct += (label == y).sum().item()
            accuracy = 100 * (correct / len(valid_dataset))
            print("Percent Accuracy:", accuracy)
            VALIDACCURACY.append(accuracy)

#

sometimes though

#

the accuracy is over a 100 percent which doesn't make sense

#

and for three epochs of training the cost function remains at around 50

#

do you guys have any suggestions on improving the training function so that the cost function goes down and the accuracy is calculated correctly

last peak Jul 26, 2022, 3:28 AM

#

correct += (label == y).sum().item()
accuracy = 100 * (correct / len(valid_dataset))

something wrong with this logic if you got >100%

#

why do you have sum there when you just want a 1 or 0 there for correct

grand canyon Jul 26, 2022, 3:33 AM

#

last peak why do you have sum there when you just want a 1 or 0 there for correct

thats a good point

#

i didn't realize that

grand canyon Jul 26, 2022, 3:34 AM

#

last peak why do you have sum there when you just want a 1 or 0 there for correct

for x, y in validloader:
            x = torch.from_numpy(np.asarray(x)).float()
            y = torch.from_numpy(np.asarray(y)).float()
            yhat = model(x.view(-1, 288 * 96))
            predictedvalue = torch.max(yhat, dim=1)
            validloss = criterion(predictedvalue[0], y)
            print("Validation Loss: ", validloss)
            VALIDLOSS.append(validloss)
            label = torch.argmax(yhat)
            if label == y:
                correct+=1
            accuracy = 100 * (correct / len(valid_dataset))
            print("Percent Accuracy:", accuracy)
            VALIDACCURACY.append(accuracy)

#

i just added an increment to the correct

last peak Jul 26, 2022, 3:34 AM

#

ok cool so what kind of loss functions are you currently using?

grand canyon Jul 26, 2022, 3:34 AM

#

last peak ok cool so what kind of loss functions are you currently using?

so currently im using a binary cross entropy

last peak Jul 26, 2022, 3:34 AM

#

what kind of data is in x

grand canyon Jul 26, 2022, 3:35 AM

#

let me just send you

#

a github repostiory one second

last peak Jul 26, 2022, 3:35 AM

#

ok not sure how much I can help.. as I am new to ML :)

grand canyon Jul 26, 2022, 3:35 AM

#

np any help is appreicated

#

class Net(nn.Module):
    def __init__(self, D_in, H, D_out):
        super(Net, self).__init__()
        self.linear1 = nn.Linear(D_in, H)
        self.linear2 = nn.Linear(H, D_out)

    def forward(self, x):
        x = torch.relu(self.linear1(x))
        x = torch.sigmoid(self.linear2(x))
        return x
#%%
model = Net(27648, 100, 2)
criterion = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

#

so i defined my network

#

and im using a bceloss function

#

and im using an adam optimizer

#

i could try incremeneting the learning rate

last peak Jul 26, 2022, 3:37 AM

#

if you using bceloss then you only have 2 classes?

grand canyon Jul 26, 2022, 3:37 AM

#

yeah that's why my output

#

is 2

last peak Jul 26, 2022, 3:37 AM

#

ah got it

#

so this is only 1 hidden layer NN?

grand canyon Jul 26, 2022, 3:38 AM

#

yep

last peak Jul 26, 2022, 3:39 AM

#

isnt that going to underfit

#

when u check your bias variance curve is your test and validation data have similar error but high?

grand canyon Jul 26, 2022, 3:40 AM

#

i didn't have the opportunity to do that

#

because i was thinking of ways to make my training

#

better

last peak Jul 26, 2022, 3:40 AM

#

oh why dont you juts try adding some extra layers for now

grand canyon Jul 26, 2022, 3:40 AM

#

last peak oh why dont you juts try adding some extra layers for now

if i do that i get a matrix multiplication eerror

#

if i add another layer

last peak Jul 26, 2022, 3:41 AM

#

oh maybe your D_out is wrong

grand canyon Jul 26, 2022, 3:41 AM

#

how would that be possible then because its binary classification

last peak Jul 26, 2022, 3:42 AM

#

i mean when you are defining your hidden layer 2

grand canyon Jul 26, 2022, 3:42 AM

#

oh ok

#

alright i can tinker with it

last peak Jul 26, 2022, 3:42 AM

#

why would it cause matrix multiplication error, i thought you can just stack as many hidden layers as you want

grand canyon Jul 26, 2022, 3:42 AM

#

let me show you one second

last peak Jul 26, 2022, 3:43 AM

#

also by the way why even have 2 layers for output

#

if its just 2 classes, cant you juts have 1 layer

grand canyon Jul 26, 2022, 3:43 AM

#

you mean node?

last peak Jul 26, 2022, 3:43 AM

#

yeah

grand canyon Jul 26, 2022, 3:43 AM

#

yeah and if its beyond a certain threshold

#

its one class

#

but i feel like having two seperate nodes is best practice

last peak Jul 26, 2022, 3:44 AM

#

its juts that i always see them use softmax if its more than 1 output node

#

is there any difference if you try a softmax and sparce cross entropy loss function instead

grand canyon Jul 26, 2022, 3:45 AM

#

ok

#

i can try that

#

so i can create an addition hidden layer, and on the last part of the forward ill add a softmax

last peak Jul 26, 2022, 3:45 AM

#

ok cool, I will add you :) lets practice ML together!

lapis sequoia Jul 26, 2022, 4:24 AM

#

Yes support is for cuda but i have rx560 opencl, any other way to faster?

tacit basin Jul 26, 2022, 4:28 AM

#

lapis sequoia Yes support is for cuda but i have rx560 opencl, any other way to faster?

Opencl is not cuda right?

#

You can try free colab, paperspace, kaggle or sagemaker studio lab for free GPU

lapis sequoia Jul 26, 2022, 4:30 AM

#

tacit basin You can try free colab, paperspace, kaggle or sagemaker studio lab for free GPU

Can I really make it faster with free?

#

@tacit basin gradient or core? In paperspace?

tacit basin Jul 26, 2022, 4:44 AM

#

lapis sequoia <@490342783572246538> gradient or core? In paperspace?

gradient is notebooks like environment, should be fine.

tacit basin Jul 26, 2022, 4:45 AM

#

lapis sequoia Can I really make it faster with free?

they have free gpu, should be way faster than cpu

#

all have limitations on free tier, like active session time, paperspace is 6 hours i think, then the notebook will stop, but you can start again. not sure how much time it needs....

lapis sequoia Jul 26, 2022, 4:47 AM

#

Never used notebook, wav2lip will work?

tacit basin Jul 26, 2022, 4:47 AM

#

do you need terminal?

#

you can start terminal from jupyter notebook as well

lapis sequoia Jul 26, 2022, 4:48 AM

#

Yes i used wav2lip recently on command prompt

tacit basin Jul 26, 2022, 4:49 AM

#

or you can run command line prompts from within notebok with !ls for example, but terminal would be better in this case i think

#

once your machine is running select the jupyter icon bottom left of the screenshot

lapis sequoia Jul 26, 2022, 4:50 AM

#

So where should I go?

#

Gradient?

tacit basin Jul 26, 2022, 4:51 AM

#

core is like a proper vm, but not sure about free gpus there, let me check

#

cheapest gpu on core is ~0.50 /hour

lapis sequoia Jul 26, 2022, 4:53 AM

#

Gradient would be better? , Which runtime i should select PyTorch, tensorflow?

tacit basin Jul 26, 2022, 4:54 AM

#

that would depend which framework is used for wav2lip

lapis sequoia Jul 26, 2022, 4:55 AM

#

Error : We are currently out of capacity for the selected VM type. Try again in a few minutes, or select a different instance.

tacit basin Jul 26, 2022, 4:55 AM

#

would colab work? i can see they have a link to colab in the github repo, so should be easier to set up

lapis sequoia Jul 26, 2022, 4:55 AM

#

No i don't want to use at least google here

tacit basin Jul 26, 2022, 4:59 AM

#

looks like they use tensorflow

noble drum Jul 26, 2022, 11:49 AM

#

In cross_val_score
Why do you only need 2 parameters, xtrain and y train?
Don't you also need xtest and ytest to get the accuracy?

steady basalt Jul 26, 2022, 12:05 PM

#

its cross validation

#

so no

serene scaffold Jul 26, 2022, 1:25 PM

#

If you assume your whole ecosystem is already in Python, is there any reason to use HDFS over dask?

rancid kelp Jul 26, 2022, 1:30 PM

#

Hey!

>>> torch.__version__
'1.11.0+cpu'```
what does "cpu" mean here? Does this mean that current version is only cpu compatible?

serene scaffold Jul 26, 2022, 1:30 PM

#

rancid kelp Hey! ```python >>> torch.__version__ '1.11.0+cpu'``` what does "cpu" mean here...

yes

rancid kelp Jul 26, 2022, 1:31 PM

#

serene scaffold yes

okayy thanks! how do i install the gpu version of pytorch, I can't find it.

serene scaffold Jul 26, 2022, 1:31 PM

#

rancid kelp okayy thanks! how do i install the gpu version of pytorch, I can't find it.

what OS

rancid kelp Jul 26, 2022, 1:31 PM

#

serene scaffold what OS

win 11

serene scaffold Jul 26, 2022, 1:31 PM

#

and what CUDA version

rancid kelp Jul 26, 2022, 1:31 PM

#

serene scaffold and what CUDA version

currently have 11.7

#

but it shows False when i run torch.cuda.is_available()

serene scaffold Jul 26, 2022, 1:32 PM

#

try

pip install torch --extra-index-url https://download.pytorch.org/whl/cu117

rancid kelp Jul 26, 2022, 1:33 PM

#

serene scaffold try ```bash pip install torch --extra-index-url https://download.pytorch.org/whl...

thanks! i have the conda env, shouldn't i use conda command?

serene scaffold Jul 26, 2022, 1:34 PM

#

rancid kelp thanks! i have the conda env, shouldn't i use `conda` command?

I don't use conda, but the website says

NOTE: 'conda-forge' channel is required for cudatoolkit 11.6
conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge

#

I would encourage you to stop using conda unless you're sure that what you're trying to do can't be done without it.

rancid kelp Jul 26, 2022, 1:35 PM

#

serene scaffold I would encourage you to stop using conda unless you're sure that what you're tr...

Great i'll keep that in mind ^^

median mica Jul 26, 2022, 1:59 PM

#

not sure the best place to ask this, but does anyone have experience with matplotlib

#

it's slow as hell plotting a simple line graph for like 3 points, half a second

dawn dune Jul 26, 2022, 2:29 PM

#

serene scaffold I would encourage you to stop using conda unless you're sure that what you're tr...

Hey, mind elaborating why you say this? Sorry for randomly chiming in

serene scaffold Jul 26, 2022, 2:36 PM

#

dawn dune Hey, mind elaborating why you say this? Sorry for randomly chiming in

Conda became the default assumption for DS/AI people in a time when compiling certain libraries (like numpy) was a potential pain point. but that isn't really the case anymore, so for the most part, it's a needless quirk that makes it harder for DS/AI people to get support from the rest of the Python community.

I work in the AI department of the company, and almost none of us use conda, and the holdouts are being asked to stop using it so that we don't have to deal with licensing. It likely won't even be installed on the next iteration of our high-performance computer.

wooden sail Jul 26, 2022, 2:37 PM

#

not to mention you have to pay for it if you use it at a company

serene scaffold Jul 26, 2022, 2:38 PM

#

only for enterprise use, afaik

wooden sail Jul 26, 2022, 2:38 PM

#

(that being said, i do still use it :P) the commercial versions, yes. for personal use, no

serene scaffold Jul 26, 2022, 2:39 PM

#

also, I'm having an issue with dask.

import dask.bag as db
bag = db.read_text('/home/blah/**/*.json')

there's 5 million JSONs in there. my program keeps getting killed before this statement even finishes

#

and the console just says "Killed", so I don't even get any leads about why

ripe forge Jul 26, 2022, 2:40 PM

#

wooden sail not to mention you have to pay for it if you use it at a company

ish? maybe? no? it's all quite confusing tbh

#

but doesnt it allow you to access other sources like conda forge, and no strings attached?

wooden sail Jul 26, 2022, 2:40 PM

#

where are you submitting the dask jobs to? a compute server/cluster with limited compute time?

serene scaffold Jul 26, 2022, 2:41 PM

#

wooden sail where are you submitting the dask jobs to? a compute server/cluster with limited...

a linux VM where I have sudo. No scheduling.

ripe forge Jul 26, 2022, 2:41 PM

#

it was my understanding the conda itself was free and open source, and the repo access to anaconda was where they wanted to monetize large corps. but frankly, i couldnt wrap my head around it all when i looked at it.

wooden sail Jul 26, 2022, 2:42 PM

#

no sort of hpc scheduler at all?

serene scaffold Jul 26, 2022, 2:42 PM

#

wooden sail no sort of hpc scheduler at all?

nope

ripe forge Jul 26, 2022, 2:43 PM

#

maybe add a loop and read jsons one by one yourself, and i assume you can get print statements to display or something

#

find out which json is erroring out, if any.

serene scaffold Jul 26, 2022, 2:44 PM

#

ripe forge find out which json is erroring out, if any.

there are no Python exceptions

#

just "Killed"

wooden sail Jul 26, 2022, 2:44 PM

#

yeah i was gonna suggest something similar. maybe using a few threads or processes that log their own status and see where they die, and if it's always at the same place

ripe forge Jul 26, 2022, 2:44 PM

#

usual disclaimer, i have no idea how any of this works, just thinking of initial ideas

ripe forge Jul 26, 2022, 2:44 PM

#

serene scaffold there are no Python exceptions

thats fine though, if you control how the jsons are being read yourself, and print as you go, you'd know the last one you managed to read before "killed"

wooden sail Jul 26, 2022, 2:45 PM

#

for starters to see if the behavior is deterministic

#

on multiprocessing using Value and Array, for example, a common issue is that if these synchronous vars get too large, for whatever reason the processes never send the termination signal

#

could be the task even finished successfully but then the dask bag gets killed because it sits idle

ripe forge Jul 26, 2022, 2:48 PM

#

oh that would be funny

#

then even simpler, just add a print after the read line, and see if the print message appears

#

if it does, you know the line worked

wooden sail Jul 26, 2022, 2:50 PM

#

when this happens though, the main becomes deadlocked and gets killed too

#

i think starting by having some periodic logging is a good start

gleaming osprey Jul 26, 2022, 2:53 PM

#

my model validation is stuck at 59% after 3 hrs of training

#

this is my model: ```py
model = Sequential()
model.add(Conv2D(8, 3, padding='same', input_shape=(48, 48, 1), activation='relu'))
model.add(Dropout(0.2))
model.add(BatchNormalization())

model.add(MaxPooling2D(2))

model.add(Conv2D(16, 5, padding='same', activation='relu', kernel_regularizer=keras.regularizers.l2(l=0.1)))
model.add(Dropout(0.25))
model.add(BatchNormalization())

model.add(MaxPooling2D(2))

model.add(Flatten())

model.add(Dense(512, activation='relu', kernel_regularizer=keras.regularizers.l2(l=0.01)))
model.add(Dropout(0.4))
model.add(Dense(256, activation='relu', kernel_regularizer=keras.regularizers.l2(l=0.01)))
model.add(Dropout(0.3))
model.add(Dense(128, activation='relu', kernel_regularizer=keras.regularizers.l2(l=0.1)))
model.add(Dropout(0.2))
model.add(Dense(7, activation='softmax'))
model.summary() ```

mild dirge Jul 26, 2022, 3:28 PM

#

gleaming osprey this is my model: ```py model = Sequential() model.add(Conv2D(8, 3, padding='sam...

Again, only providing your model isn't super helpful here

#

It depends on so much more

gleaming osprey Jul 26, 2022, 3:28 PM

#

mild dirge Again, only providing your model isn't super helpful here

sorry again, the input shape is 48x48

mild dirge Jul 26, 2022, 3:28 PM

#

We can see that from the model 😛

gleaming osprey Jul 26, 2022, 3:28 PM

#

the values can be from 0 - 255

mild dirge Jul 26, 2022, 3:29 PM

#

Your image isn't normalized?

#

Or standardized

gleaming osprey Jul 26, 2022, 3:29 PM

#

no

#

i tried it and the loss just didnt lower

#

tho I think now its because I messed up with the batch normalization

mild dirge Jul 26, 2022, 3:30 PM

#

Right, but that should normally just be step one of processing image data

gleaming osprey Jul 26, 2022, 3:30 PM

#

lemme try it

#

ik

mild dirge Jul 26, 2022, 3:30 PM

#

Batch normalization makes it less of a problem

#

But it should still just be common practice

gleaming osprey Jul 26, 2022, 3:30 PM

#

i forgot

#

sorry

mild dirge Jul 26, 2022, 3:30 PM

#

And you are classifiying faces you said before

#

What are you classifying them on?

wooden sail Jul 26, 2022, 3:31 PM

#

iirc the kind of expression they make, like disgust or happiness?

gleaming osprey Jul 26, 2022, 3:31 PM

#

mild dirge And you are classifiying faces you said before

yes

gleaming osprey Jul 26, 2022, 3:31 PM

#

mild dirge What are you classifying them on?

28709 input faces

#

i wanted to do data augmentation and was planning to do it in a while

#

here is a sample input

Screenshot_2022-07-17_at_12.20.20_p.m..png

mild dirge Jul 26, 2022, 3:32 PM

#

Seems like plenty of faces to train on, might not need that much data augmentation

mild dirge Jul 26, 2022, 3:32 PM

#

gleaming osprey here is a sample input

Are they all perfectly centered?

gleaming osprey Jul 26, 2022, 3:32 PM

#

i dont knwo

mild dirge Jul 26, 2022, 3:33 PM

#

Again, looking at your data, very important step of the process

gleaming osprey Jul 26, 2022, 3:33 PM

#

like, if you took a bounding box from opencv, like that

gleaming osprey Jul 26, 2022, 3:33 PM

#

mild dirge Again, looking at your data, very important step of the process

i did

wooden sail Jul 26, 2022, 3:33 PM

#

even if they're centered though, you could mirror them left-right to get more data

mild dirge Jul 26, 2022, 3:33 PM

#

Then you should know if the faces are centered

gleaming osprey Jul 26, 2022, 3:33 PM

#

gleaming osprey like, if you took a bounding box from opencv, like that

they are like that

mild dirge Jul 26, 2022, 3:33 PM

#

Maybe the bounding boxes weren't always correct

gleaming osprey Jul 26, 2022, 3:33 PM

#

not always

mild dirge Jul 26, 2022, 3:33 PM

#

Did you crop the faces yourself (automatically)?

gleaming osprey Jul 26, 2022, 3:34 PM

#

no

#

i am using the fer2013 dataset

wooden sail Jul 26, 2022, 3:35 PM

#

you can start by reading its description and docs, then

gleaming osprey Jul 26, 2022, 3:35 PM

#

https://www.kaggle.com/datasets/msambare/fer2013

mild dirge Jul 26, 2022, 3:35 PM

#

Yeah, it should tell something about the quality of the data

gleaming osprey Jul 26, 2022, 3:35 PM

#

wooden sail you can start by reading its description and docs, then

i did?

wooden sail Jul 26, 2022, 3:35 PM

#

so are they all centered and cropped correctly?

gleaming osprey Jul 26, 2022, 3:35 PM

#

so that the face is more or less centred and occupies about the same amount of space in each image.

mild dirge Jul 26, 2022, 3:36 PM

#

And looking at the data, the immediate thing I see is class imbalance, how did you take care of that?

wooden sail Jul 26, 2022, 3:36 PM

#

aight

gleaming osprey Jul 26, 2022, 3:36 PM

#

mild dirge And looking at the data, the immediate thing I see is class imbalance, how did y...

um i didnt

mild dirge Jul 26, 2022, 3:36 PM

#

How many times does your model guess disgust?

gleaming osprey Jul 26, 2022, 3:36 PM

#

im being honest, I didnt really do much with the data other than to simply get it to work

gleaming osprey Jul 26, 2022, 3:36 PM

#

mild dirge How many times does your model guess disgust?

not that often

mild dirge Jul 26, 2022, 3:37 PM

#

gleaming osprey my model validation is stuck at 59% after 3 hrs of training

And what is this measure? accuracy/f1/precision etc?

gleaming osprey Jul 26, 2022, 3:37 PM

#

mild dirge And what is this measure? accuracy/f1/precision etc?

?

#

its clearly overfitting

wooden sail Jul 26, 2022, 3:37 PM

#

can you give a count of how many times each class occurs in the training data?

mild dirge Jul 26, 2022, 3:37 PM

#

What do you mean with model validation of 59%?

gleaming osprey Jul 26, 2022, 3:37 PM

#

mild dirge What do you mean with model validation of 59%?

the validation/evaluation (cuz they were about the same) are 59%

mild dirge Jul 26, 2022, 3:38 PM

#

validation is not a measure of performance

gleaming osprey Jul 26, 2022, 3:38 PM

#

while the accuracy while training is 96%

#

oh

mild dirge Jul 26, 2022, 3:38 PM

#

so accuracy?

gleaming osprey Jul 26, 2022, 3:38 PM

#

evaluation accuracy is 57-59%

gleaming osprey Jul 26, 2022, 3:38 PM

#

mild dirge validation is not a measure of performance

validation accuracy, sorry

mild dirge Jul 26, 2022, 3:38 PM

#

Alright, that is also quite a big choice, what performance measure you use

gleaming osprey Jul 26, 2022, 3:39 PM

#

['accuracy']

mild dirge Jul 26, 2022, 3:39 PM

#

Accuracy is not the most common way to measure performance with imbalanced data

gleaming osprey Jul 26, 2022, 3:39 PM

#

optimizer is Adam

mild dirge Jul 26, 2022, 3:39 PM

#

Imo the first step would be taking a look at your data processing pipeline, making sure the data is good to use for training, and only then worrying about your model

gleaming osprey Jul 26, 2022, 3:39 PM

#

mild dirge Accuracy is not the most common way to measure performance with imbalanced data

would you suggest that I invest some time into data augmentation and balancing the data

mild dirge Jul 26, 2022, 3:39 PM

#

Because right now, you load the image, instantly feed to model

gleaming osprey Jul 26, 2022, 3:40 PM

#

mild dirge Imo the first step would be taking a look at your data processing pipeline, maki...

yes, i admit, i rushed to the model

mild dirge Jul 26, 2022, 3:40 PM

#

And balancing the data could be done with data augmentation, but you have a lot of imbalance

#

There are 17 times more samples of happy than disgust f.e.

gleaming osprey Jul 26, 2022, 3:40 PM

#

hmm

mild dirge Jul 26, 2022, 3:40 PM

#

So maybe you need some undersampling too

gleaming osprey Jul 26, 2022, 3:40 PM

#

i am not honestly intrested in disgust

mild dirge Jul 26, 2022, 3:41 PM

#

But there's multiple ways to tackle that, you should look into it

gleaming osprey Jul 26, 2022, 3:41 PM

#

I am only intrested in anger, sad, neutral and happy

mild dirge Jul 26, 2022, 3:41 PM

#

gleaming osprey i am not honestly intrested in disgust

Well if you don't care about disgust, remove it

#

Up to you

gleaming osprey Jul 26, 2022, 3:41 PM

#

mild dirge Well if you don't care about disgust, remove it

may I tell you my end goal?

mild dirge Jul 26, 2022, 3:41 PM

#

sure

gleaming osprey Jul 26, 2022, 3:42 PM

#

My end goal is to tell how annoyed/horrible a person is feeling for my program

#

so I can annoy them even more

mild dirge Jul 26, 2022, 3:42 PM

#

Seems like disgust would be the most important here

gleaming osprey Jul 26, 2022, 3:42 PM

#

yeah

mild dirge Jul 26, 2022, 3:42 PM

#

It seems most similar to annoyed I'd think, no?

gleaming osprey Jul 26, 2022, 3:42 PM

#

kinda important

gleaming osprey Jul 26, 2022, 3:42 PM

#

mild dirge It seems most similar to annoyed I'd think, no?

id say a mixture of disgust/anger

mild dirge Jul 26, 2022, 3:43 PM

#

So especially then, you don't just want to look at accuracy

#

Because disgust has way less samples, so you can get an accuracy of 90% without once guessing disgust

gleaming osprey Jul 26, 2022, 3:43 PM

#

yes

#

exucse me, im a bit busy right now, I can talk in 30 mins, is that ok?

mild dirge Jul 26, 2022, 3:44 PM

#

I won't be on in 30 mins, but maybe someone else can help

gleaming osprey Jul 26, 2022, 3:44 PM

#

ok thanks 🙂

unique flame Jul 26, 2022, 4:15 PM

#

gleaming osprey so I can annoy them even more

Hmm...unethical much

gleaming osprey Jul 26, 2022, 4:16 PM

#

unique flame Hmm...unethical much

Its supposed to be you start happy - get annoyed, get little happier, - get annoyed again, rinse and repeat

#

so you dont uninstall cuz the happy points

#

but in the end, you're still annoyed

unique flame Jul 26, 2022, 4:16 PM

#

I didn't know FER-2013 existed, personally I always feel a bit uneasy using data concerning humans. But that's just me.

#

Does seem a few papers on it

modest onyx Jul 26, 2022, 5:59 PM

#

isn't fer-2013 state of the art like 70 something%?

#

assuming the dataset is balanced, then 59% is pretty decent of an accuracy I'd say

#

though if you're getting near perfect accuracy in training, then maybe you might wanna do something like early stopping or revise your regularization techniques

unique flame Jul 26, 2022, 6:13 PM

#

dataset is not balanced..at least the version on Kaggle

#

~~there are papers describing models that have reached +90% accuracy~~ nevermind you're right, I was looking at something else

dim palm Jul 26, 2022, 6:21 PM

#

unique flame ~~there are papers describing models that have reached +90% accuracy~~ nevermind...

what means nvm ?

ripe forge Jul 26, 2022, 7:11 PM

#

dim palm what means nvm ?

nvm stands for "nevermind"

limber token Jul 26, 2022, 7:34 PM

#

Decided to implement this, but am getting:

Traceback (most recent call last):
  File "solution.py", line 112, in <module>
    main()
  File "solution.py", line 42, in main
    fileB.apply(skus_A.__contains__)
  File "/home/remotelinux/.local/lib/python3.8/site-packages/pandas/core/frame.py", line 8845, in apply
    return op.apply().__finalize__(self, method="apply")
  File "/home/remotelinux/.local/lib/python3.8/site-packages/pandas/core/apply.py", line 733, in apply
    return self.apply_standard()
  File "/home/remotelinux/.local/lib/python3.8/site-packages/pandas/core/apply.py", line 857, in apply_standard
    results, res_index = self.apply_series_generator()
  File "/home/remotelinux/.local/lib/python3.8/site-packages/pandas/core/apply.py", line 873, in apply_series_generator
    results[i] = self.f(v)
TypeError: unhashable type: 'Series'

On code:

skus_A = set(fileA['sku'].tolist())
fileB.apply(skus_A.__contains__)

serene scaffold Jul 26, 2022, 7:55 PM

#

limber token Decided to implement this, but am getting: ```py Traceback (most recent call la...

you have to apply it to a specific column, not the whole DataFrame.

#

if you want to apply it to every cell in the DF, use applymap instead of apply

#

also @wooden sail @ripe forge the problem was that I was using like 50GB of memory, and someone else was using about 40. so we're now trying to get more RAM.

#

sudo apt-get install ram

wooden sail Jul 26, 2022, 8:09 PM

#

ah fair enough, trying to download some ram

#

and i suppose doing it single threaded, chunks at a time, takes too long

serene scaffold Jul 26, 2022, 8:19 PM

#

wooden sail and i suppose doing it single threaded, chunks at a time, takes too long

I wrote a shitty version of what I was trying to do with multiprocessing.Pool, and it's still taking hours

wooden sail Jul 26, 2022, 8:20 PM

#

wonderful 😛

limber token Jul 26, 2022, 8:37 PM

#

serene scaffold you have to `apply` it to a specific column, not the whole DataFrame.

pandas documentation has me confused sorry, should it be df['sku'].apply() then?

serene scaffold Jul 26, 2022, 8:37 PM

#

limber token pandas documentation has me confused sorry, should it be `df['sku'].apply()` the...

df['sku'] is probably a Series. and DataFrame.apply isn't quite the same as Series.apply.

#

for DataFrame.apply, the input is a Series. for Series.apply, the input is an element.

limber token Jul 26, 2022, 8:40 PM

#

I'm confused then 😓 how to apply it to a specific column then?

serene scaffold Jul 26, 2022, 8:56 PM

#

limber token pandas documentation has me confused sorry, should it be `df['sku'].apply()` the...

this

#

if you do df.apply, you're doing DataFrame.apply

#

if you do df['sku'].apply, then it's Series.apply

limber token Jul 26, 2022, 8:57 PM

#

Oooooooh I see

#

Thank you

serene scaffold Jul 26, 2022, 9:01 PM

#

@wooden sail do you know that much about dask, btw? because I don't think it's optimizing my computation graph in a memory-efficient way. though it might also be that it is, and the most memory-efficient solution that it can infer (that doesn't involve writing chunks of it to disk) still exceeds my capacity.

wooden sail Jul 26, 2022, 9:02 PM

#

hmm sadly i have only used dask a couple of times, i ended up going back to multiprocessing

#

how many processes are you spawning at a time and how many cores and threads do you have access to?

serene scaffold Jul 26, 2022, 9:07 PM

#

wooden sail how many processes are you spawning at a time and how many cores and threads do ...

not sure about threads. there's like 40 cores though.

#

(and I just closed my work computer 😅 )

wooden sail Jul 26, 2022, 9:09 PM

#

all right, so at least 40 partitions should be fine, assuming you have enough memory for all of them. this is very much a thing of "salt to taste" where you try to convince the OS scheduler to always have a few of your processes running by having a ton of them, but not so many that the parallelization and memory overhead bog you down. i could only recommend doing some logging for a few minutes and testing out different numbers of partitions and how much of a file each one handles at a time

#

a surprising thing is that these servers and clusters with huge numbers of cores are usually slower per core than a bad laptop. it could be that there's just not enough memory for the server to give you any benefit over a new laptop 😛

urban oriole Jul 26, 2022, 9:38 PM

#

guys I need help with using np.where

#

can someone help me with this?

dq_monthly['MoveOutDate'] = np.where(dq_monthly['TenantStatus']=='Current', pd.to_datetime("2022-07-31"), dq_monthly['MoveOutDate'])

#

I need for tenant status that is not current to just have the original MoveOutDate

steady basalt Jul 26, 2022, 9:49 PM

#

    if pd.isna(df_concat['value1'][i]) or df_concat['value_1'][i].str.isnumeric():
        if pd.notna(df_concat['value2'][i]):
            df_concat['value1'][i] = df_concat['value2'][i]```

#

anyone know why this will give me an error on line2

#

in the isna statement

#

for one df/dataset but not for another

#

weird inconsistancy

#

omfgits a typo thats why

#

fixed typo stil error 0

serene scaffold Jul 26, 2022, 9:57 PM

#

steady basalt ```for i in range(len(df_concat['value1'].values)): if pd.isna(df_concat['va...

I don't think we should even try to pick apart how this code works. what is it intended to do?

#

because it should almost certainly be rewritten.

serene scaffold Jul 26, 2022, 9:59 PM

#

urban oriole can someone help me with this? ```dq_monthly['MoveInDate'] = pd.to_datetime(dq_...

you can do this with a .loc assignment, without using np.where

#

dq_monthly.loc[dq_monthly['TenantStatus'].eq('Current'), 'MoveOutDate'] = pd.to_datetime("2022-07-31")

steady basalt Jul 26, 2022, 10:07 PM

#

serene scaffold because it should almost certainly be rewritten.

disagree, using .values

#

all my values are strings by default

#

im checking if they, when converted, can be float

#

or if theyre just strings only

#

for example

#

'XFJ28.0' shud check the next value, and if its something like '22.10' then it shud be changed to be 22.10

#

partition = element.partition('.')
if (partition[0].isdigit() and partition[1] == '.' and partition[2].isdigit())
or (partition[0] == '' and partition[1] == '.' and partition[2].isdigit())
or (partition[0].isdigit() and partition[1] == '.' and partition[2] == ''):
newelement = float(element)

#

smoething like this maybe

#

def is_float(string):
try:
return float(string) and '.' in string # True if string is a number contains a dot
except ValueError: # String is not a number
return False

#

@serene scaffold know any better way to check this? string can be converted to float

#

im thinking to use astype perhaps

#

df_concat['value1']= df_concat['value1'].astype(float)

#

this will error if it hits a non convertible

#

so i had to use a loop

#

how else to do it

pulsar hull Jul 26, 2022, 10:24 PM

#

Finally managed to get my GAN to work, although in pytorch, not from scratch like i planned

steady basalt Jul 26, 2022, 10:30 PM

#

@serene scaffold i have brute forced this

#

for i in range(len(df_concat['value1'].values)):
    try:
        new.append(float(df_concat['value1'].values[i]))
    except:
        new.append('string')```

#

now i just create a new column out of this

steady basalt Jul 26, 2022, 11:27 PM

#

what do you think to do if over half of a column is NaN

#

whats a good cut off to just say 'ok were not using this' rather than impute?

#

approx 65% of one of my key features is missing

urban oriole Jul 27, 2022, 1:30 AM

#

why doesn't dropna work for me

#

dq_monthly['TenantStatus']= dq_monthly[['TenantStatus']].dropna()

#

serene scaffold Jul 27, 2022, 1:31 AM

#

urban oriole ```dq_monthly['TenantStatus']= dq_monthly[['TenantStatus']].dropna()```

you can't have "holes" in a dataframe--NaN is the value that represents missing data. Even though dq_monthly[['TenantStatus']].dropna() removes the NaNs, when you overwrite that data back into the DataFrame, it has to put the NaNs back for all the rows that dropna drops.

urban oriole Jul 27, 2022, 1:32 AM

#

serene scaffold you can't have "holes" in a dataframe--NaN *is* the value that represents missin...

isnt dropna just working like a filter

serene scaffold Jul 27, 2022, 1:33 AM

#

urban oriole isnt dropna just working like a filter

So, the DataFrame has 2458438 rows. If you take one column of that and do dropna, it will have that many rows, or less

#

but if you add that column back to the DataFrame, it absolutely must have a row for every single value in the index

urban oriole Jul 27, 2022, 1:34 AM

#

i dont understand what the point of dropna is then

serene scaffold Jul 27, 2022, 1:34 AM

#

urban oriole i dont understand what the point of dropna is then

it removes the NaNs. the problem is that when you put that column back in the DataFrame, it has to put them back, to make up for the rows you deleted

urban oriole Jul 27, 2022, 1:35 AM

#

so I don't set it equal to dq_monthly['TenantStatus']

serene scaffold Jul 27, 2022, 1:35 AM

#

because if you still want the column to be part of the dataframe, it has to have a value for every row. and for the missing rows, that's going to be NaN.

urban oriole Jul 27, 2022, 1:35 AM

#

I just do dq_monthly[['TenantStatus']].dropna()

serene scaffold Jul 27, 2022, 1:35 AM

#

urban oriole so I don't set it equal to dq_monthly['TenantStatus']

if a value is NaN in the TenantStatus row, are you okay with completely deleting that row from the whole dataframe

urban oriole Jul 27, 2022, 1:35 AM

#

serene scaffold if a value is NaN in the TenantStatus row, are you okay with *completely deletin...

yes

serene scaffold Jul 27, 2022, 1:35 AM

#

even if there's a non-NaN value in other columns for that row?

urban oriole Jul 27, 2022, 1:35 AM

#

that is specifically what I want to do

#

those NaN's are bad or old data not relevant to the analysis

serene scaffold Jul 27, 2022, 1:36 AM

#

then you should do dropna on the dataframe, not on an individual column.

#

!docs pandas.DataFrame.dropna

arctic wedgeBOT Jul 27, 2022, 1:36 AM

#

pandas.DataFrame.dropna


DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)```
Remove missing values.

See the [User Guide](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#missing-data) for more on which values are considered missing, and how to work with missing data.

urban oriole Jul 27, 2022, 1:36 AM

#

the problem is there may be NaNs in other columns that are irrelevant that I need

serene scaffold Jul 27, 2022, 1:36 AM

#

note the how='any'. if any of the values in the row are NaN, that whole row goes bye bye.

urban oriole Jul 27, 2022, 1:36 AM

#

need in terms of row

urban oriole Jul 27, 2022, 1:36 AM

#

serene scaffold note the `how='any'`. if any of the values in the row are NaN, that whole row go...

yea I dont want to do that

#

I want to just delete the records with NaNs for TenantStatus

serene scaffold Jul 27, 2022, 1:37 AM

#

urban oriole I want to just delete the records with NaNs for TenantStatus

you can do df[~df['TenantStatus'].isna()]

urban oriole Jul 27, 2022, 1:37 AM

#

ok

serene scaffold Jul 27, 2022, 1:37 AM

#

which means "pick the rows of df where not tenant status is nan"

urban oriole Jul 27, 2022, 1:38 AM

#

i think thats cleaner

steady basalt Jul 27, 2022, 1:38 AM

#

Df[column].drop na inplace true done

serene scaffold Jul 27, 2022, 1:38 AM

#

steady basalt Df[column].drop na inplace true done

I would just never use in-place anything

#

I think it's even being deprecated

steady basalt Jul 27, 2022, 1:38 AM

#

No way

#

Wtf ?

serene scaffold Jul 27, 2022, 1:38 AM

#

in-place operations have no optimizations

steady basalt Jul 27, 2022, 1:38 AM

#

Saves u having to type df[col] =

serene scaffold Jul 27, 2022, 1:38 AM

#

they just recreate the whole df under the hood

steady basalt Jul 27, 2022, 1:38 AM

#

But that’s the alternative

serene scaffold Jul 27, 2022, 1:39 AM

#

steady basalt Saves u having to type df[col] =

no? in-place operations on dataframes are alternatives to df = df.method()

#

and we're talking about removing rows from the whole df

steady basalt Jul 27, 2022, 1:39 AM

#

That would do so if the na is in the column

urban oriole Jul 27, 2022, 1:39 AM

#

serene scaffold you can do `df[~df['TenantStatus'].isna()]`

still getting NaN's

steady basalt Jul 27, 2022, 1:40 AM

#

So df[col] = df[col].dropna

#

Or just df to start

urban oriole Jul 27, 2022, 1:40 AM

#

steady basalt So df[col] = df[col].dropna

i did that originally and that just replaces original values in there

steady basalt Jul 27, 2022, 1:40 AM

#

Because that will show nans

serene scaffold Jul 27, 2022, 1:40 AM

#

steady basalt So df[col] = df[col].dropna

that's what they did originally, and that doesn't work

steady basalt Jul 27, 2022, 1:40 AM

#

urban oriole i did that originally and that just replaces original values in there

This command doesn’t replace anything

#

It literally drops when there’s a nan

serene scaffold Jul 27, 2022, 1:41 AM

#

if you do dropna on a column, but then write that column back to the df, it has to replace all the rows that got dropped for having nans with nans

#

so it's the most pointless thing you could possibly do.

steady basalt Jul 27, 2022, 1:41 AM

#

But u don’t write it back to the df like that u just declare the df is without the nans in the first place

serene scaffold Jul 27, 2022, 1:42 AM

#

anyway, I already gave them the solution, for better or worse.

steady basalt Jul 27, 2022, 1:43 AM

#

urban oriole i did that originally and that just replaces original values in there

Just try the inplace drop na may work

#

Man I have so much god damn pandas work to do

#

@serene scaffold if you have a key predictor that’s numerical and ur population is NAN for like 60% of the feature what do u do

serene scaffold Jul 27, 2022, 1:44 AM

#

steady basalt <@253696366952316929> if you have a key predictor that’s numerical and ur popula...

I don't know, I do nlp.

steady basalt Jul 27, 2022, 1:44 AM

#

Think I shud keep them and just make them all medians? Keep other information

#

Or try a linear imputation

#

Perhaps

#

There’s no way it’s worth dropping hundred thousand samples

misty flint Jul 27, 2022, 2:12 AM

#

lel

coral cradle Jul 27, 2022, 2:14 AM

#

Heya guys, I've never coded an AI before but I want to learn how to do so. any tips on how to start?

#

it's a bit overwhelming for me. I know the very basics that's it, but I find it very interesting.

serene scaffold Jul 27, 2022, 2:49 AM

#

@coral cradle so, don't plan to create anything groundbreaking in the foreseeable future. Because this is something that people get PhDs and then spend their entire careers working on. Focus on projects that aren't unique, but which help you develop a sense for what AI is and what the concepts are. What are training and test data? What are features? What is a model, and what makes a model "good" or "bad"?

coral cradle Jul 27, 2022, 3:08 AM

#

serene scaffold <@547022520495112242> so, don't plan to create anything groundbreaking in the fo...

oh ok I'll try to think simple for now 🙂

iron basalt Jul 27, 2022, 3:28 AM

#

coral cradle oh ok I'll try to think simple for now 🙂

You can even start by reading Wikipedia and following the links, like what are "intelligent agents"? This will give you a ton of keywords / concepts to look into (including those mentioned by Lurcus): https://en.wikipedia.org/wiki/Artificial_intelligence

Artificial intelligence

Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to the natural intelligence displayed by animals including humans. AI research has been defined as the field of study of intelligent agents, which refers to any system that perceives its environment and takes actions that maximize its chance of achieving its goals....

#

You will see a lot of the same words come up in a lot of the articles, such as "agent", "environment", "actions", "goals", "maximize", "perception", "statistics", "planning", etc. I would take note of these as it will make it easier to find what you are looking for by understanding and using them in your searches.

#

(And try to re-define what you consider to be AI in terms of those words so that you can find something that matches your goals)

misty flint Jul 27, 2022, 4:00 AM

#

build that mental scaffolding

#

py_sun

lapis sequoia Jul 27, 2022, 4:41 AM

#

hey guys, how does checking probability of one distribution being greater than the other work when the distributions are mixed like this

wooden sail Jul 27, 2022, 4:43 AM

#

probability of being greater? like you draw random samples from all 3 and you look for the probability that you drew a larger number from one of them than from the other 2?

lapis sequoia Jul 27, 2022, 4:44 AM

#

yes probability of being greater

#

#

I am not sure if sampling is done

#

but there is a formula, probability matching, that selects "actions" based on the probability that one distribution is greater than another

#

but if we sample form all 3, then the "select action" is redundant right?

#

cause we sampled already, why select action if we already sampled

wooden sail Jul 27, 2022, 4:48 AM

#

i'm not sure i understand what it means by Q(a) > Q(a') there when those are curves

#

i don't find the explanation clear

lapis sequoia Jul 27, 2022, 4:49 AM

#

lapis sequoia hey guys, how does checking probability of one distribution being greater than t...

Q(a) happens to be the distribution I believe

wooden sail Jul 27, 2022, 4:50 AM

#

indeed

lapis sequoia Jul 27, 2022, 4:50 AM

#

and a and a' are different distributions

wooden sail Jul 27, 2022, 4:50 AM

#

yep

lapis sequoia Jul 27, 2022, 4:50 AM

#

honestly I think its either the mean of the distribution

#

or the distribution itself

wooden sail Jul 27, 2022, 4:50 AM

#

i doubt it's the mean, it probably has more to do with the tails, but that's why i was asking about sampling

lapis sequoia Jul 27, 2022, 4:50 AM

#

the normal distribution

wooden sail Jul 27, 2022, 4:50 AM

#

i can see that.

#

you'd have to scroll back in the slides to where they defined what Q(a) > Q(a') means

lapis sequoia Jul 27, 2022, 4:52 AM

#

alright, ~~probability ~~*probably * have to do that

#

but there is no way to compare them other than sampling them and counting the number of times one distribution is greater than the other right?

#

thats the part I got from your first question/answer

wooden sail Jul 27, 2022, 4:54 AM

#

that was my first impression, which would let you answer this question for specific values of the variable that follows these tentative distributions. like given a value you wish to observe, find which distribution is the most likely. as you'd expect, this receives the name "maximum likelihood". but since they don't call it that, i suspect they mean something different

#

so you'll have to scroll back 😛

#

i do notice they give you an ht quantity. this might be the given value for which you want the probability

#

so maybe something like P(X >= ht) when ht ~ Q(a), for example

lapis sequoia Jul 27, 2022, 4:56 AM

#

ah, ht is the history, its the given condition in a posterior distirbution

#

so I won't worry about it for now

#

might I ask a different question since I feel fuzzy about it. How does two distributions get compared in something like a ratio P[X] / P[Y]

wooden sail Jul 27, 2022, 4:57 AM

#

i think you'll need the ht for this... you really should go back and review

wooden sail Jul 27, 2022, 4:58 AM

#

lapis sequoia might I ask a different question since I feel fuzzy about it. How does two distr...

P[X] is the probability of X = x? discrete distributions? or?

lapis sequoia Jul 27, 2022, 4:59 AM

#

wooden sail P[X] is the probability of X = x? discrete distributions? or?

yes, suppose say we have two different distributions P[x] and P[y] (not sure if this is the correct way of saying this) and if we were to compare them like P[X] / P[Y], do I just divide where ever they have the same input variable?

#

I hope my quesiton makes sense

wooden sail Jul 27, 2022, 4:59 AM

#

not really...

#

well, what you describe there is really just "division"

#

since the domain of probability distributions is the values the variable can take

#

but this implies that the two distributions are over the same domain

#

it would be better if you could find this in your notes too, then report back

lapis sequoia Jul 27, 2022, 5:03 AM

#

oh ok, I thought there was some more things going on with it, probably got confused myself

#

thank you for taking the time to answer Edd

wooden sail Jul 27, 2022, 5:03 AM

#

it could be more stuff is going on, but usually in cross entropy or mutual information or kullback leibler divergence, these terms with ratios of probability distributions show up

#

as long as they're two possible distributions for the same variable(s), it should indeed be just vanilla division

lapis sequoia Jul 27, 2022, 5:05 AM

#

ahhh this helps, thanks again 😄

wooden sail Jul 27, 2022, 5:06 AM

#

it works weird at first, but they usually show up inside logarithms

#

so division of probability distributions translates into like differences of information, since the info is related to log probabilities

lapis sequoia Jul 27, 2022, 5:08 AM

#

I see, is there any book you recommend particularly on distributions comparisons

#

like the probability matching, division, etc

#

i currently have The Elements of Statistical Learning by Trevor Hastie but have yet to read it

wooden sail Jul 27, 2022, 5:09 AM

#

books on estimation theory and statistical sig proc. what you mentioned, at least by name, sounds suitable

lapis sequoia Jul 27, 2022, 5:10 AM

#

|estimation theory and statistical sig proc
got it, I will try looking for these

#

thanks again my man

wooden sail Jul 27, 2022, 5:11 AM

#

i like "fundamentals of statistical signal processing" by steven kay

lapis sequoia Jul 27, 2022, 5:11 AM

#

alright, will give that one a shot

#

much much thanks

wooden sail Jul 27, 2022, 5:21 AM

#

ah louis scharf's statistical signal processing: detection, estimation, and time series analysis is another good one

lapis sequoia Jul 27, 2022, 6:13 AM

#

alright, thanks for the rec again 😄

unreal flicker Jul 27, 2022, 6:56 AM

#

hello guys I need some in scraping some messages from my google messages using selenium but I am getting some error with my xpath

#

#

is there any better scraping tools other than selenium that works better for nested divs and custom angular tags

limber token Jul 27, 2022, 11:25 AM

#

When trying to read a large Excel file with pd.read_excel() am getting [1] 129017 killed python3 solution.py

fiery dust Jul 27, 2022, 12:04 PM

#

is someone here into finance? I'd like to learn ML/AI related with it.

fiery dust Jul 27, 2022, 12:05 PM

#

unreal flicker is there any better scraping tools other than selenium that works better for nes...

I doubt someone in here will know the answer for that, its a #data-science-and-ml channel

steady basalt Jul 27, 2022, 12:58 PM

#

unreal flicker

This is data science channel

#

Try dev ops

vast goblet Jul 27, 2022, 1:42 PM

#

hello, how can i get both product name and family name?
the website doesn't have api to request from it

lapis sequoia Jul 27, 2022, 1:57 PM

#

hi, can anyone please help me to write a good summary for my LinkedIn profile as I'm a first-year data science student 🙂

serene scaffold Jul 27, 2022, 1:58 PM

#

limber token When trying to read a large Excel file with `pd.read_excel()` am getting `[1] ...

In what context are you trying to do this

mild dirge Jul 27, 2022, 3:12 PM

#

I don't understand how this helps, because we only calculate f(k, n, p) for every k up to n/2. We never calculate f(k, n, 1-p) so how would this make it any simpler?

#

This is about binomial dsitributions btw
https://en.wikipedia.org/wiki/Binomial_distribution

Binomial distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success (with probability p) or failure (with probability q = 1 − p). ...

wooden sail Jul 27, 2022, 3:19 PM

#

mild dirge I don't understand how this helps, because we only calculate `f(k, n, p)` for ev...

why don't we?

mild dirge Jul 27, 2022, 3:19 PM

#

Well what it suggests is that we don't need to calculate the second half, because we can easily calculate it from the first half results

wooden sail Jul 27, 2022, 3:20 PM

#

mhm

mild dirge Jul 27, 2022, 3:20 PM

#

But then it says that we can simply get the result by taking the "complement" f(n - k, n, 1 - p) but we have not calculated that in the first half

#

We have calculated f(n - k, n, p)though

wooden sail Jul 27, 2022, 3:21 PM

#

yes but note that taking p <- 1-p simply swaps the two multiplied terms. you can think of it as swapping the exponent of p with that of 1-p

#

substitute p with 1-p and see what you get

mild dirge Jul 27, 2022, 3:22 PM

#

Yeah, but then we are just calculating it from scratch again

#

I have created the function f, and it works correctly, when I do:

success_arr = [successes(n, k, p) for k in range(n + 1)]

It also works correctly if I do:

first_half = [successes(n, k, p) for k in range((n + 1) // 2) + 1]
second_half = [successes(n - k, k, 1 - p) for k in range((n + 1) // 2 + 1, n + 1)]
success_arr = first_half + second_half

#

But how do I use first_half to get second_half without having to call sucesses again?

iron basalt Jul 27, 2022, 3:25 PM

#

mild dirge But how do I use `first_half` to get `second_half` without having to call `suces...

You cache it / make a lookup table.

#

(Which is what first half is)

mild dirge Jul 27, 2022, 3:26 PM

#

Right, so that is crating the first_half list, but then how do I get to second_half ?

iron basalt Jul 27, 2022, 3:27 PM

#

You don't, you have just first half is the idea. Less memory.

elfin tapir Jul 27, 2022, 3:27 PM

#

hello, I have an assignment in which I have to use two pictures take the person from the first picture and the background from second picture and merge them together. I'm a beginner, and wanted to ask what would be the best approach and any tips?

mild dirge Jul 27, 2022, 3:28 PM

#

iron basalt You don't, you have just first half is the idea. Less memory.

Right, but I want to also know the second half, without having to calculate them each from scratch

#

It's not a memory problem

iron basalt Jul 27, 2022, 3:28 PM

#

If you ever want a value that would be in the second half, it can reference into the first half. Avoiding computation.

mild dirge Jul 27, 2022, 3:28 PM

#

iron basalt If you ever want a value that would be in the second half, it can reference into...

But how is my question 😛

#

Let's say I want the amount of sucesses of k=15 when n=20

iron basalt Jul 27, 2022, 3:29 PM

#

Maybe i'm not sure what the question is, but it seems to me they are just making a lookup table and only computing half because of symmetry.

wooden sail Jul 27, 2022, 3:29 PM

#

let's take a look. we start with binom(n, k) p^k (1-p)^(n-k). let's set p = 1-p.
we then get binom(n, k) (1-p)^k (p)^(n-k). now let's swap k with n-k
binom(n, n-k) (1-p)^(n-k) p^k
but we know there is symmetry for the binom part. we can also see that the power parts look identical to how they originally did

iron basalt Jul 27, 2022, 3:30 PM

#

(Lookup tables are for speed, and having to only store half is for memory)

wooden sail Jul 27, 2022, 3:30 PM

#

lemme see if i can whip up a python MWE

mild dirge Jul 27, 2022, 3:31 PM

#

import matplotlib.pyplot as plt
from math import factorial as fact


def choose(n, k):
    return fact(n) / (fact(k) * fact(n - k))


def successes(n, k, p):
    return choose(n, k) * (p ** k) * ((1 - p) ** (n - k))


n = 100
p = 0.4

fig, ax = plt.subplots()
success_arr1 = [successes(n, k, p) for k in range(n + 1)]

first_half = [successes(n, k, p) for k in range((n + 1) // 2)]
second_half = [successes(n, n - k, 1 - p) for k in range((n + 1) // 2 + 1, n + 1)]
success_arr2 = first_half + second_half

cum_success_arr = []
ax.scatter(range(len(success_arr1)), success_arr1)
plt.show()

#

This is my code btw

#

@wooden sail

iron basalt Jul 27, 2022, 3:31 PM

#

(Or in the case of by-hand, so you don't need to do as much work filling out the whole table)

mild dirge Jul 27, 2022, 3:32 PM

#

And maybe I am confusing you a bit, because my question isn't why f(n-k, n, 1-p) gives the same answer as f(k, n, p) , but how it actually helps me in any way calculate the second half more easily

iron basalt Jul 27, 2022, 3:32 PM

#

By more easily do you mean speed? Because lookup tables may be faster than doing the actual f(...).

#

(This was way more common on older machines for various functions including stuff like binomial coefficients)

mild dirge Jul 27, 2022, 3:33 PM

#

Yes, but we have not calculated f(n-k, n, 1-p) yet

#

We have only calculated f(n-k, n, p)

#

Because that is part of the first half

wooden sail Jul 27, 2022, 3:34 PM

#

what i would point out is that (1-p)^(n-k) can be dealt with with a binomial expansion, and so this product is also symmetric

#

so the key thing is showing that we can easily swap p with p-1 and it yields the same result thanks to this symmetry

#

gimme a second to get some paper

iron basalt Jul 27, 2022, 3:36 PM

#

n and n-k and p and 1-p

#

#

https://en.wikipedia.org/wiki/Binomial_coefficient

Binomial coefficient

In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers n ≥ k ≥ 0 and is written

#

(Or if you look at Pascal's triangle you can see that you only need to store half)

#

For the 1-p part. Try putting 1-p instead of p into f.

iron basalt Jul 27, 2022, 3:42 PM

#

mild dirge Yes, but we have not calculated `f(n-k, n, 1-p)` yet

By doing the first half, we basically have. We need to look it up though.

mild dirge Jul 27, 2022, 3:42 PM

#

Could you perhaps show how to get the values for the second half using the first half then?

#

Or an example

wooden sail Jul 27, 2022, 3:47 PM

#

ah, but note they say "tables". if you look at the tables in statistics books, the mean all of n, k and p vary. what they mean is that you can get the second half from another distribution with probability 1-p, not from the same one

mild dirge Jul 27, 2022, 3:49 PM

#

Hmm

#

So calculating the first half doesn't help us more easily calculate the second half then?

wooden sail Jul 27, 2022, 3:49 PM

#

does not appear to be the case. i played around with the binomial expansion and you get some quite interesting expressions, but nothing from which you can (easily) look for equivalences

mild dirge Jul 27, 2022, 3:50 PM

#

Hmm right, well maybe you can use the binomial coefficient of the first half

wooden sail Jul 27, 2022, 3:50 PM

#

it's more that you compute pairs of distributions together

mild dirge Jul 27, 2022, 3:50 PM

#

But not the rest

#

Well thx for looking into it @iron basalt @wooden sail ^^

#

Just trying to get a bit more into probability, since I was struggling understanding that chapter in some books about maths for ml

wooden sail Jul 27, 2022, 3:52 PM

#

the following two identities hold:
we already showed that f(k,n,p) = f(n-k,n,1-p), which we check by substitution. now consider
f(k,n,1-p). if we do the same substitution here, we find that f(k,n,1-p) = f(n-k,n,p), which is the bit that, as you noted, was missing. with these two together, it seems we get both f(k,n,p) and f(k,n,1-p) as a 2-for-1

iron basalt Jul 27, 2022, 3:53 PM

#

mild dirge Hmm right, well maybe you can use the binomial coefficient of the first half

You can def. use the symmetry of the binomial coefficient. Note that you can pre-compute p^k from 0 to some k. And (1-p)^k2 from 0 to some k2 (giving two more tables).

wooden sail Jul 27, 2022, 3:53 PM

#

~~you can also consider f(n-k,n,p) and f(k,n,1-p) this way~~ this was already included above, oops

iron basalt Jul 27, 2022, 3:55 PM

#

You can compute a third table of those two multiplied together.

mild dirge Jul 27, 2022, 3:57 PM

#

wooden sail the following two identities hold: we already showed that f(k,n,p) = f(n-k,n,1-p...

This doesn't help us in calculating the second half right?

wooden sail Jul 27, 2022, 3:58 PM

#

you still have to compute everything for one of the 2. it's more like if you computed one fully, you also know the other

#

the curves are pairwise related to each other

mild dirge Jul 27, 2022, 3:59 PM

#

Right, but none of these help us calculate one half from the other

#

It just helps us calculate some other distribution

wooden sail Jul 27, 2022, 3:59 PM

#

yeah

iron basalt Jul 27, 2022, 4:05 PM

#

mild dirge Right, but none of these help us calculate one half from the other

https://www.sjsu.edu/people/saul.cohn/courses/stats/s0/BinomialProbabTable.pdf

#

You can see how across different p. It's the same but in reverse.

#

.902 .810 .723 .640 .563 .490 .423 .360 .303 .250 .203 .160 .123 .090 .063 .040 .023 .010 .002
...
.002 .010 .023 .040 .063 .090 .123 .160 .203 .250 .303 .360 .423 .490 .563 .640 .723 .810 .902

#

(n = 2, k = 0, k = 2)

#

"Second half" is the second half of that whole table linked (varying n, k, and p).

wooden sail Jul 27, 2022, 4:08 PM

#

here's a MWE:

import numpy as np
from scipy.special import binom
import matplotlib.pyplot as plt

n = 15
p = 0.234523

fnkp = np.zeros(n+1)
for k in range(n+1):
    fnkp[k] = binom(n,k)*p**k*(1-p)**(n-k)
    
fnk1_p = np.zeros(n+1)
for k in range(n+1):
    fnk1_p[k] = binom(n,k)*(1-p)**k*(p)**(n-k)

fnk1_p_synth = fnkp[::-1]

plt.plot(fnkp)
plt.plot(fnk1_p)
plt.plot(fnk1_p_synth,'*')
plt.legend(('f(n,k,p)','f(n,k,1-p)','shortcut for f(n,k,1-p)'))
plt.show()

#

#

as squiggle notes, what they explained in a cursed way as "compute up to k = n/2" is the same as saying, we can compute all of f(n,k,p) and just flip it to get f(n,k,p-1)

#

the caveat being that, depending on your implementation of the binomial coefficient, it is cheaper to stick to low values of k. then the first half of f(n,k,p) gives you the second half of f(n,k,1-p) for cheap, and in the reverse direction as well

#

@mild dirge

iron basalt Jul 27, 2022, 4:11 PM

#

Lookup tables can be slower on modern machines. Simply because you are loading in more memory into the CPU's cache, which kicks out other stuff. CPU is so fast it can do stuff like f(...) really fast. But you should always time it yourself to see, because f(...) may be complex enough to warrant it.

#

(Memory speed is the bottleneck on modern machines (so you won't find lookup tables as common as before (still can show up though, just not as obvious as a decision to make as on old machines where they were required)))

mild dirge Jul 27, 2022, 4:13 PM

#

Hmm

iron basalt Jul 27, 2022, 4:13 PM

#

(Or by-hand (very slow, want to avoid work))

mild dirge Jul 27, 2022, 4:13 PM

#

So you calculate the first half of both f(n, k, p) and f(n, k, 1-p) and then you have both distributions fully

#

is that the idea?

#

Because the first half is cheaper to compute

iron basalt Jul 27, 2022, 4:13 PM

#

Cheaper to compute than the whole.

#

It's just avoiding repeat work.

#

You can even have Python do that caching for you with the cache tools built in.

#

Just need to annotate f.

#

https://docs.python.org/3/library/functools.html?highlight=cache#functools.cache

wooden sail Jul 27, 2022, 4:15 PM

#

mild dirge Because the first half is cheaper to compute

that's about right, yea

mild dirge Jul 27, 2022, 4:16 PM

#

Alright cool

wooden sail Jul 27, 2022, 4:19 PM

#

as a sidenote, scipy has several ways of computing the binomial coefficient. binom is the fastest afaik, but also loses precision very quickly. my guess is it uses the gamma function instead of factorials. you can pick your poison between speed and precision

red timber Jul 27, 2022, 4:25 PM

#

Hey everyone 😊 I had a general question to ask the group.
It seems the AI/machine learning/NLP world is pretty much reserved for those with degrees (not a complaint, just a finding from my humble investigation) so as a person on the self taught road, can anyone point to a small role I could play on the team? I don’t have delusions of grandeurs and I don’t assume I would be able to compete with PhD holders, however there must be like “the boring stuff” you super smarties don’t want to do that someone super entry level on the team could be tasked with… but are even those roles reserved for college graduates? Anywhooo….. just more of a musing…trying to find my place in all this 😊🙏🏾😊🙏🏾😊🙏🏾😊🙏🏾😊

wooden sail Jul 27, 2022, 4:28 PM

#

data entry and cleanup is something everyone needs and no one wants to do, and does not (always) require that much in depth knowledge, but domain-specific knowledge is always involved

steady basalt Jul 27, 2022, 4:37 PM

#

Well gentlemen I choked hard on my interview

#

They had me returning functions which were pandas manipulations and I totally fucked it

#

It was expected output style

#

Ngl it was insanely easy too, just choked

red timber Jul 27, 2022, 4:40 PM

#

wooden sail data entry and cleanup is something everyone needs and no one wants to do, and d...

Thank you 👍🏾 and I appreciate the honest reply

wooden sail Jul 27, 2022, 4:41 PM

#

also, if you're not aiming for a research position, you may be able to find your way with a nice "portfolio", as the kids call it nowadays, but i have no experience with the non-research end of this stuff

spare briar Jul 27, 2022, 4:43 PM

#

red timber Hey everyone 😊 I had a general question to ask the group. It seems the AI/mach...

There is a path in machine learning that is more engineering focused called machine learning engineer or MLE
These are also very competitive but can be more accepting of less formal education than data science or research scientist roles

iron basalt Jul 27, 2022, 4:44 PM

#

wooden sail also, if you're not aiming for a research position, you may be able to find your...

Even for research if you can show that you can do stuff like implement papers and that you can do so quickly, you can out compete PhD holders as they often request higher pay, but may not actually give more value. THIS DEPENDS HIGHLY ON THE COMPANY. If it's a large company it may just do the lazy thing of filtering by degree when hiring.

wooden sail Jul 27, 2022, 4:44 PM

#

i don't know any place that would hire a non phd for a research position in europe

#

the market seems to work very differently in america, so ymmv

iron basalt Jul 27, 2022, 4:46 PM

#

I know of one company that will hire for research no degree in Europe. If it's an institute then probably not. Because they love prestige.

spare briar Jul 27, 2022, 4:47 PM

#

FAANG is willing to hire ML research people without PhDs, but not entry level, usually you look at these peoples resumes and say "yeah well that makes sense"

#

they were 4 years MLE at openai and got on papers and transition into research type of people

wooden sail Jul 27, 2022, 4:48 PM

#

the thing is that institutes work at a lower level, not really at "production" level. what you mentioned of implementing papers, for instance, is like the bare minimum expected of everyone. from what i've seen, they wanna see you write papers, write proposals, lead projects, and so on

#

so more on the "you make the new state of the art" line

iron basalt Jul 27, 2022, 4:48 PM

#

You definitely need to stand out and show that you can provide value without relying on a degree to make up for that.

wooden sail Jul 27, 2022, 4:49 PM

#

the phd title is supposed to show you are currently the state of the art at something. ofc that's not always the case in practice, but that is the intent. having published brand new results with theory, proof, and verification

iron basalt Jul 27, 2022, 4:51 PM

#

In the case of the European company I know of, they were doing SOTA with many non-degree holders. But each member really shows that they can do it, e.g. by having their own papers, blogs, books, videos, etc.

spare briar Jul 27, 2022, 4:51 PM

#

you either need a degree or you need experience that obviously makes up for the lack of a degree (and anyone who looks at your resume would agree that your experience is as valuable as someone's masters)

You can't just skip the learning, you need to actually be as knowledgeable as the PhDs and be able to prove this in your projects or published papers

wooden sail Jul 27, 2022, 4:51 PM

#

that sounds about right

iron basalt Jul 27, 2022, 4:52 PM

#

spare briar you either need a degree or you need experience that obviously makes up for the ...

Yeah, and it's about showing that you provide value in the end, that is why you are hired. So whether that is your math skills or simply being good at data cleanup and willing to do it.

#

(And presentation is key for that, have some github repos / projects, papers, books, videos, blogs)

#

(Or simply be well known in the community)

spare briar Jul 27, 2022, 4:54 PM

#

I have a research scientist role without a PhD - The key was I spent time at a research institute and published papers that would be enough for a PhD thesis + strong engineering and open source projects

iron basalt Jul 27, 2022, 4:55 PM

#

(But even with all the effort, some large companies will still just be lazy and filter by degree, and that is the true advantage of a degree if you are just looking for any work (you don't get filtered out blindly))

red timber Jul 27, 2022, 4:58 PM

#

Fascinating points!!!! 🙏🏾 thank you. If I may…I’d like to summarize a bit of the vibe I’m getting….
I am hearing that I basically need to specialize in something. In doing so, I can create specific applicable experience that would perhaps maybe with some luck outweigh my lack of a degree?
To quote Dumb and Dumber, ironically a favorite movie of mine… “ so you’re saying there’s a chance?”
😊😊😊😊😊

iron basalt Jul 27, 2022, 5:07 PM

#

red timber Fascinating points!!!! 🙏🏾 thank you. If I may…I’d like to summarize a bit of t...

Yes, but there is no easy path. No shortcut. As Anokhi wrote, you can't skip the learning, even if you want to just do the more simple tasks for them.

steady basalt Jul 27, 2022, 5:38 PM

#

red timber Hey everyone 😊 I had a general question to ask the group. It seems the AI/mach...

u dont need a phd to do any of those

#

a masters would suffice, but even still not 100% required

#

im sure in america if ur from there theres plenty of oppertunity to do this without even a degree at all, but over here 2/3rds of roles are 'masters or phd minimum' in the description for ds/ml

red timber Jul 27, 2022, 5:50 PM

#

steady basalt im sure in america if ur from there theres plenty of oppertunity to do this with...

Thanks for the honesty. 👍🏾

steady basalt Jul 27, 2022, 5:51 PM

#

unless of course u want to be a researcher, and yes thats pretty much phd only

#

id 100% say college is worth it for this field

#

feels bad man

#

it is what it is

#

basically i cud do all that stuff in my sleep, group,sort, make a df, but in their IDE it was weird af, it had an expected output table i had to mirror and i wasnt used to wriring pandas inside a function and i just flopped, cudnt read the error output either in the ide

#

its kinda dumb, because youd on the job a)not need to do such manipulations often and b) when you do, you do it your own way + have more time than 30 mins

#

not really indicative of ability imo

serene scaffold Jul 27, 2022, 6:05 PM

#

red timber Hey everyone 😊 I had a general question to ask the group. It seems the AI/mach...

If you don't have a degree, your best bet would be to work for a startup. Established companies will usually immediately ignore applications for NLP positions where the applicant doesn't have a relevant degree.

red timber Jul 27, 2022, 6:14 PM

#

serene scaffold If you don't have a degree, your best bet would be to work for a startup. Establ...

Yea awesome! someone else also mentioned a startup. Thank you for the reply 🙏🏾😊

steady basalt Jul 27, 2022, 6:30 PM

#

if ur rly good and can prove that somehow i doubt anything will stand in ur way if u get xp at a startup and show ur good to large companies

#

the only issue i see is that they sometimes autofilter people out without degrees, but with 5+yoe u shud be ok

tropic matrix Jul 27, 2022, 6:37 PM

#

in pandas is there any way to generate a dataframe from a list of csv lines?
i.e.:

[
  '1,2,3',
  '4,5,6'
]

pd.DataFrame(csv_list, columns=['a', 'b', 'c'])

a | b | c
1   2   3
4   5   6

#

(pseudocode)

steady basalt Jul 27, 2022, 6:40 PM

#

could u write a loop to read them all in as dfs and then concat them if trhats waht u wanted

#

oh csv lines not files

tropic matrix Jul 27, 2022, 6:40 PM

#

yeah

#

@steady basalt any idea?

steady basalt Jul 27, 2022, 6:46 PM

#

id read the lines u want into python first

tropic matrix Jul 27, 2022, 6:49 PM

#

steady basalt id read the lines u want into python first

can you base it off my example?

#

is there any way to go from a list of strings that are in csv format to a dataframe?

steady basalt Jul 27, 2022, 6:50 PM

#

not sure what that means

#

you want each string to be a row?

#

or a entry

tropic matrix Jul 27, 2022, 6:53 PM

#

steady basalt you want each string to be a row?

that is correct

steady basalt Jul 27, 2022, 6:54 PM

#

yeah u cud use pure python

#

['col1,col2','val1a,val1b'

tropic matrix Jul 27, 2022, 6:55 PM

#

oh wait

steady basalt Jul 27, 2022, 6:55 PM

#

try read csv with a sep

tropic matrix Jul 27, 2022, 6:56 PM

#

no wait

#

read_csv doesn't accept string input

#

it has to be a file or buffer

#

and instead of converting it to a buffer

#

i could just use str.split(',') on each item in the list

steady basalt Jul 27, 2022, 6:56 PM

#

convert it out of string format i guess

tropic matrix Jul 27, 2022, 6:56 PM

#

and read it as a list

steady basalt Jul 27, 2022, 6:56 PM

#

yep

ripe forge Jul 27, 2022, 6:56 PM

#

that sounds like a lot of unnecessary work

tropic matrix Jul 27, 2022, 6:57 PM

#

what do you suggest?

steady basalt Jul 27, 2022, 6:57 PM

#

ripe forge that sounds like a lot of unnecessary work

its literally 0 work its 1 line

ripe forge Jul 27, 2022, 6:57 PM

#

dont try parsing it into a csv yourself, that can be error prone too. better to let pandas handle it. my suggestion is, join it into a single string, wrap it into a StringIO. and that pandas will happily accept

tropic matrix Jul 27, 2022, 6:57 PM

#

^

steady basalt Jul 27, 2022, 6:57 PM

#

i mean use watever method works fo ru

ripe forge Jul 27, 2022, 6:58 PM

#

it's a very elegant way to essentially have strings be treated like "files", and then be fed into anything that accepts files

steady basalt Jul 27, 2022, 6:58 PM

#

but he wants 1 string to be an entire row with columns

ripe forge Jul 27, 2022, 7:00 PM

#

what do you mean? pandas has no issues parsing csvs with column headers if that's the concern

steady basalt Jul 27, 2022, 7:00 PM

#

sounds like his 'csv' looks like a bunch of single string lines

tropic matrix Jul 27, 2022, 7:07 PM

#

ripe forge dont try parsing it into a csv yourself, that can be error prone too. better to ...

this worked, and is probably going to be easier to avoid issues, thank you

ripe forge Jul 27, 2022, 7:08 PM

#

no worries!

brazen oracle Jul 27, 2022, 7:18 PM

#

Can anyone recommend a type of learning roadmap for what steps/courses I should take to self-teach machine learning and artificial intelligence? I'm currently a python beginner. I know the fundamentals and nothing else. Looking for my next step other than just practicing projects on leetcode.

steady basalt Jul 27, 2022, 7:25 PM

#

practise more

#

then learn pandas

#

set a projcet maybe

thick marlin Jul 27, 2022, 7:29 PM

#

Hello, I'm having trouble getting the right version of pytorch and cuda. The system has 2 versions of cuda 11.1-gcc-9.1.0 and cuda 11.2, that can be loaded as required. I have tried both, however torch.cuda.is_available() is always False.
Details of the system are as follows:

python=3.8.6
pytorch = 1.10.1+cu111
cudnn = 8005
torchvision=0.11.2+cu111 
torchaudio=0.10.1
cudatoolkit=11.3.1

OS Details:
Operating System: Red Hat Enterprise Linux Server 7.9 (Maipo)
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)

I have also tried the latest pytorch version with cudatoolkit=11.3 from the https://pytorch.org/get-started/locally/

PyTorch

serene scaffold Jul 27, 2022, 7:31 PM

#

thick marlin Hello, I'm having trouble getting the right version of pytorch and cuda. The sys...

try also saying what OS

#

also, make sure you're using the same python environment where you have those dependencies installed.

#

if you're using conda, I can't help you beyond that.

thick marlin Jul 27, 2022, 7:34 PM

#

serene scaffold also, make sure you're using the same python environment where you have those de...

Updated the os details, yes used conda to install the dependencies

wooden sail Jul 27, 2022, 7:45 PM

#

how are you doing the loading? something like purge modules and then module load cuda/v.xxx?

#

going by the info you put there, it seems you need cudatoolkit=11.2 or 11.1 though

thick marlin Jul 27, 2022, 8:03 PM

#

wooden sail how are you doing the loading? something like purge modules and then module load...

Yes module load cuda/v.xxx

wooden sail Jul 27, 2022, 8:04 PM

#

since you put cu111, i'm under the impression you'd need the module cuda/v11.1

thick marlin Jul 27, 2022, 8:06 PM

#

there is cuda 11.1-gcc-9.1.0

wooden sail Jul 27, 2022, 8:07 PM

#

i think the cudatoolkit also needs to match that

thick marlin Jul 27, 2022, 8:08 PM

#

pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/torch_stable.html

#

this was the last command I used from pytorch

#

I'll try to see If i can match the toolkit to 11.1

wooden sail Jul 27, 2022, 8:12 PM

#

also, are the nvidia drivers installed?

thick marlin Jul 27, 2022, 8:14 PM

#

I'm assuming so, the machine is a shared hpc machine. nvidia-smi is not available. i'm not sure how else I can check. nvcc returns the following result

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

#

Downgraded the cudatoolit to 11.1.1, still the same result

wooden sail Jul 27, 2022, 8:17 PM

#

hmm and it's a single hpc machine? but are the computations handled in VMs or possibly other nodes? or it's all done directly on this device, in this same session?

thick marlin Jul 27, 2022, 8:21 PM

#

It's JADE-2, "Applications can only be run on the compute nodes by submitting jobs to the Slurm batch queuing system"

wooden sail Jul 27, 2022, 8:21 PM

#

aha

#

it's very likely that the node you are currently in doesn't allow you to run stuff on gpu. you would have to print (actually log) from inside something you submit to slurm. is that what you're doing, or are you checking this on the log-in node?

thick marlin Jul 27, 2022, 8:22 PM

#

I see, i'm trying to install the Nvdia's Imaginaire Librarry

#

Using the instructions for Conda here https://github.com/NVlabs/imaginaire/blob/master/INSTALL.md

GitHub

imaginaire/INSTALL.md at master · NVlabs/imaginaire

NVIDIA's Deep Imagination Team's PyTorch Library. Contribute to NVlabs/imaginaire development by creating an account on GitHub.

#

I'm stuck at this step

# install third-party libraries
export CUDA_VERSION=$(nvcc --version| grep -Po "(\d+\.)+\d+" | head -1)
CURRENT=$(pwd)
for p in correlation channelnorm resample2d bias_act upfirdn2d; do
    cd imaginaire/third_party/${p};
    rm -rf build dist *info;
    python setup.py install;
    cd ${CURRENT};

#

Before I was getting cuda not found errors and stuff

#

This is the current error

arctic wedgeBOT Jul 27, 2022, 8:25 PM

#

Hey @thick marlin!

You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.

thick marlin Jul 27, 2022, 8:26 PM

#

This is the error now
https://paste.pythondiscord.com/izewitenen

wooden sail Jul 27, 2022, 8:29 PM

#

it still looks like a mismatch of versions

thick marlin Jul 27, 2022, 8:31 PM

#

Is there anyway to further debug this?

wooden sail Jul 27, 2022, 8:31 PM

#

what's at the end of the traceback? it still keeps going right?

thick marlin Jul 27, 2022, 8:32 PM

#

TypeError: expected string or bytes-like object This is the last line

#

after what I have in pastebin

wooden sail Jul 27, 2022, 8:32 PM

#

i can't find where setup py is in this repo

thick marlin Jul 27, 2022, 8:33 PM

#

imaginaire/third_party/correlation

wooden sail Jul 27, 2022, 8:36 PM

#

hmm, sadly i haven't the slightest

thick marlin Jul 27, 2022, 8:38 PM

#

Where would be the best place to ask regarding this on this server? available channels or unix?

#

Regardless, Thank you for your help!

wooden sail Jul 27, 2022, 8:38 PM

#

give it a shot down in unix first, i guess

urban oriole Jul 27, 2022, 9:09 PM

#

back to struggling with np.where 😭

#

wooden sail Jul 27, 2022, 9:09 PM

#

what are you trying to do?

urban oriole Jul 27, 2022, 9:09 PM

#

                                           df_delinquent['CurrStatus'] == 'Current', np.where(df_delinquent['Beginning AR Balance'],
                                            df_delinquent['CurrStatus'] == 'Past'))```

#

if CurrStatus equal current subtract ending and beginning Balance,
if past then keep just the beginning balance

#

@wooden sail

wooden sail Jul 27, 2022, 9:11 PM

#

i don't understand what's going on there, you nested np.wheres?

urban oriole Jul 27, 2022, 9:11 PM

#

yes its a nested np.where

wooden sail Jul 27, 2022, 9:12 PM

#

but the second only has 2 params instead of 3 or 1

urban oriole Jul 27, 2022, 9:12 PM

#

hhow can I have 3

wooden sail Jul 27, 2022, 9:12 PM

#

what do you want it to return if the condition is not met?

urban oriole Jul 27, 2022, 9:12 PM

#

im thinking of it like a decision tree

#

so like if current - > do subtraction
if past -> stay as is

#

there's only two conditions Current and Past for CurrStatus

wooden sail Jul 27, 2022, 9:13 PM

#

i don't think this is doing what you think it is

#

the first parameter is a condition. you're using as condition ending balance - beginning balance. this will probably return true, since nonzero numbers are treated as true

#

In [15]: if(-1.5):
    ...:     print('boop')
    ...: 
boop

#

then you're saying, if the condition is true (and it probably will be, unless ending balance == beginning balance), then assign the values currstatus == current. this wil lbe a series of booleans

#

if it's false, it will go into another np.where. the condition for this second one is beginning balance, which is almost certainly going to be True as well. if this is true, then the return value will be a series of floats corresponding to currstatus == past. if it is not, true, it should return something else. you didn't say what, though

urban oriole Jul 27, 2022, 9:17 PM

#

so you're saying the reason its not working is because there is no condition if both are false?

#

should i just add like a np.nan at the end of it then

wooden sail Jul 27, 2022, 9:18 PM

#

that is what the error message told you, yes

#

but also the logic is wrong

lavish kraken Jul 27, 2022, 9:18 PM

#

Hi guys

#

Glad fo be added here

urban oriole Jul 27, 2022, 9:18 PM

#

wooden sail but also the logic is wrong

bc its spitting out wrong dtypes?

lavish kraken Jul 27, 2022, 9:19 PM

#

I need a quick help on a Small task with Python

wooden sail Jul 27, 2022, 9:19 PM

#

because you mixed up the conditions with the return vals

#

look at this MWE ```py
In [16]: truevals = ['beep','boop','blargh']

In [17]: falsevals = ['cat', 'dog', 'python']

In [18]: condition = [True,False,True]

In [19]: import numpy as np

In [20]: np.where(condition, truevals, falsevals)
Out[20]: array(['beep', 'dog', 'blargh'], dtype='<U6')

lavish kraken Jul 27, 2022, 9:19 PM

#

#

Having issues trying to convert this JSON data go Pandas dataframe

wooden sail Jul 27, 2022, 9:19 PM

#

if condition is true for index k, then we get the kth element of truevals. otherwise, we get the kth element of falsevals.

serene scaffold Jul 27, 2022, 9:20 PM

#

lavish kraken

try opening Discord on that computer, so that you can copy and paste stuff. taking pictures of your screen isn't the way to go.

lavish kraken Jul 27, 2022, 9:20 PM

#

Okay

#

Let me do that

serene scaffold Jul 27, 2022, 9:21 PM

#

there also isn't anything in that photograph to indicate that you're passing the JSON to pandas.

urban oriole Jul 27, 2022, 9:21 PM

#

wooden sail if condition is true for index k, then we get the kth element of truevals. other...

lol at this point im just gonna use .loc

wooden sail Jul 27, 2022, 9:24 PM

#

urban oriole lol at this point im just gonna use .loc

this is more like what you're trying to do, just with numpy arrays instead of dataframe columns

In [25]: status = np.array(['current','boop','blargh'])

In [26]: endmoolah = np.array([500, 100, 300])

In [27]: startbucks = np.array([10,300,20])

In [28]: np.where(status == 'current', startbucks, endmoolah-startbucks)
Out[28]: array([  10, -200,  280])

urban oriole Jul 27, 2022, 9:24 PM

#

idky but ifelse statements arent supposed to be this complicated

wooden sail Jul 27, 2022, 9:24 PM

#

they aren't, but i'm sad to report you are using np.where entirely wrong

urban oriole Jul 27, 2022, 9:25 PM

#

thanks but I just did it using .loc instead

wooden sail Jul 27, 2022, 9:29 PM

#

i think it's in your best interest to not share your api key

lavish kraken Jul 27, 2022, 9:29 PM

#

I am trying to convert that JSON data format to DataFrame so I can run some analysis but I am trying to figure out how to get the columnsof product review on Amazon product. Like date, names , title, review,

#

url = "https://amazon-product-reviews-keywords.p.rapidapi.com/product/reviews"

querystring = {"asin":"B091HQNRRD","country":"GB","variants":"1","top":"0"} #B091HQNRRD

headers = {"X-RapidAPI-Key": "api","X-RapidAPI-Host": "amazon-product-reviews-keywords.p.rapidapi.com"
}

response = requests.request("GET", url, headers=headers, params=querystring)

print(response.text)

fiery dust Jul 27, 2022, 9:41 PM

#

what are the different ways to code AI? one is TF, but there are other ways right?

steady basalt Jul 27, 2022, 9:41 PM

#

fiery dust what are the different ways to code AI? one is TF, but there are other ways righ...

pytorch

fiery dust Jul 27, 2022, 9:42 PM

#

and do they have different utilities or they are the same

steady basalt Jul 27, 2022, 9:42 PM

#

its better than tf

#

completely different

fiery dust Jul 27, 2022, 9:42 PM

#

and its better than tf?

steady basalt Jul 27, 2022, 9:42 PM

#

yep, much cleaner and faster

#

its research standard. industry only favours tf cause its easier to code

fiery dust Jul 27, 2022, 9:43 PM

#

tf is more famous though right?

steady basalt Jul 27, 2022, 9:43 PM

#

'easier' is wrong term imo, its harder due to bugs

#

no its not more famous

#

im stil llearning pytorch. but already feels so much nicer

fiery dust Jul 27, 2022, 9:44 PM

#

so you think I could build an algo with pytorch? specifically about financial markets

steady basalt Jul 27, 2022, 9:44 PM

#

what does algo mean?

#

its not exactly... algorithmic

#

its a black box

fiery dust Jul 27, 2022, 9:46 PM

#

I mean

steady basalt Jul 27, 2022, 9:46 PM

#

u mean lstm to predict stocks?

fiery dust Jul 27, 2022, 9:46 PM

#

I just want to create a bot that with certain financial indicators learns to identify possible moves, to the upside or downsife

steady basalt Jul 27, 2022, 9:46 PM

#

ive used keras to do that before

#

not with stocks tho

fiery dust Jul 27, 2022, 9:47 PM

#

steady basalt u mean lstm to predict stocks?

I want to use it as a recommendations

#

but yeah to make it simpler, lets say to "predict"

steady basalt Jul 27, 2022, 9:47 PM

#

yep u build a function to make a lagging window

#

and use that data to p redict th enexxt window

fiery dust Jul 27, 2022, 9:48 PM

#

Ummm

lavish kraken Jul 27, 2022, 9:48 PM

#

Hmmmmph seem like I am in a wrong thread..

fiery dust Jul 27, 2022, 9:49 PM

#

As I understood, machine learning is trying different things and dropping those who have bad results, and keeping the one with the best results. then based on that bot with the best results, try new ways and drop the bad ones

#

until you have an accurate bot?

#

right?

steady basalt Jul 27, 2022, 9:50 PM

#

No

fiery dust Jul 27, 2022, 9:50 PM

#

oof

#

theb the video I saw was way too inaccurate

steady basalt Jul 27, 2022, 9:50 PM

#

well yeah youre kinda describing ab testing models but theyre not bots

#

i mean, lets say you try to predict some things with one particular type of predictive algorithm, youd keep that if it performs well compared to others, sure