cedar sun Dec 3, 2020, 2:13 AM

#

no rlly? i though dimensions were the same......

#

¬¬

velvet thorn Dec 3, 2020, 2:15 AM

#

@cedar sun you need the number of channels

#

even if there’s only 1

#

so (64, 64, 1)

cedar sun Dec 3, 2020, 2:15 AM

#

yeah, i wrote that

#

and got error too

#

q.q

cedar sun Dec 3, 2020, 2:24 AM

#

velvet thorn <@534837275234795530> you need the number of channels

ValueError: Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (8656, 64, 64)

#

model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=(*dimension, 1)))```

#

Now if i do 1, *dimension, 1 it sais my dimension is 5

#

and expected 4

#

so i dont understand

trim oar Dec 3, 2020, 2:33 AM

#

Have you tried PCA instead?

#

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
If you just need to build a model / your features are many relative to your data points, you could try PCA if you don't need to explain "why". PCA basically finds the direction that explains the most variance for all your dimensions(features), which basically removes the multicollinearity.

#

Oh

#

I mean inherently fitted values / residuals don't explain "missing data" but about inherent patterns that's not explained in your model

#

I mean if the problem is not your dataset, then you probably have to change a model. I'm not that great with statistics, but without knowing what data you're working with and why you chose the model you did, it's hard to say already.

#

A possible reason tho

#

You're fitting a regression into a classification problem

#

So you may hvae a multiclass feature that's not encoded. My wild guess

velvet thorn Dec 3, 2020, 2:51 AM

#

cedar sun Now if i do ``1, *dimension, 1`` it sais my dimension is 5

include channels, but not batch size

cedar sun Dec 3, 2020, 2:51 AM

#

what what what?

#

batch size is... 128

velvet thorn Dec 3, 2020, 2:52 AM

#

yes

#

I'm saying

#

show your input definition

cedar sun Dec 3, 2020, 2:52 AM

#

xD

#

i think u dont want

#

but

#

train_data = np.load('train_data.npy')

velvet thorn Dec 3, 2020, 2:53 AM

#

huh?

#

no, the model input

cedar sun Dec 3, 2020, 2:53 AM

#

train_data.append(new_array / 255)

velvet thorn Dec 3, 2020, 2:53 AM

#

no...

cedar sun Dec 3, 2020, 2:53 AM

#

model summary?

#


model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=(*dimension, 1)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_label), activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

history = model.fit(train_data, train_label,
                    batch_size=batch_size,
                    epochs=epochs,
                    validation_data=(valid_data, valid_label),
                    verbose=1)```

#

this?

velvet thorn Dec 3, 2020, 2:53 AM

#

yes

#

what is dimension?

cedar sun Dec 3, 2020, 2:54 AM

#

(64, 64)

velvet thorn Dec 3, 2020, 2:54 AM

#

should be fine

#

what error did you get

cedar sun Dec 3, 2020, 2:54 AM

#

ValueError: Error when checking input: expected conv2d_1_input to have 4 dimensions, but got array with shape (8656, 64, 64)

#

len(train_data) = 8656

velvet thorn Dec 3, 2020, 2:54 AM

#

ah

#

that is

#

a problem with your data

#

use this

#

train_data[..., np.newaxis]

#

your data needs to have a channels axis too

cedar sun Dec 3, 2020, 2:55 AM

#

wait why? i actually have 8656 images of 64x64 pixels each one

#

where is the problem with my data?

velvet thorn Dec 3, 2020, 2:55 AM

#

it needs to have shape (batch_size, height, width, channels)

velvet thorn Dec 3, 2020, 2:55 AM

#

cedar sun where is the problem with my data?

your data does not have a channels axis

#

which it should

#

do the same thing for the validation data

cedar sun Dec 3, 2020, 2:56 AM

#

model.fit(train_data, train_label,

for

model.fit(train_data[..., np.newaxis], train_label[..., np.newaxis],

#

?

velvet thorn Dec 3, 2020, 2:56 AM

#

no

#

you don't modify the labels...

cedar sun Dec 3, 2020, 2:56 AM

#

oh

#

sorry

velvet thorn Dec 3, 2020, 2:56 AM

#

do you understand

cedar sun Dec 3, 2020, 2:56 AM

#

ahhaah

#

ok ok

velvet thorn Dec 3, 2020, 2:56 AM

#

what the problem is?

#

I can explain it again

#

if you don't

cedar sun Dec 3, 2020, 2:56 AM

#

history = model.fit(train_data[..., np.newaxis], train_label,
                    batch_size=batch_size,
                    epochs=epochs,
                    validation_data=(valid_data[..., np.newaxis], valid_label),
                    verbose=1)```

#

sorry i dont understand what does the nn expect to recieve

velvet thorn Dec 3, 2020, 2:57 AM

#

okay

#

so

#

each image

#

must be 3D

#

(height, width, channels)

#

right?

cedar sun Dec 3, 2020, 2:57 AM

#

yes

velvet thorn Dec 3, 2020, 2:57 AM

#

so the thing is

#

even if your image is greyscale

#

you must still have a channels axis

#

just that that axis is of size 1

#

if you have RGB, size 3

#

RGBA, size 4

#

etc.

#

got that?

cedar sun Dec 3, 2020, 2:58 AM

#

wait

#

one sec, before i read u

#

print(train_data[0].shape)

#

returns (64,64)

velvet thorn Dec 3, 2020, 2:58 AM

#

yeah

#

so?

cedar sun Dec 3, 2020, 2:59 AM

#

u said i need a channel dimension, which isnt there

velvet thorn Dec 3, 2020, 2:59 AM

#

yes

#

that's why I told you to add [..., np.newaxis]

#

which adds a channel axis

cedar sun Dec 3, 2020, 2:59 AM

#

i though i was adding the channels here

#

model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=(*dimension, 1)))```

#

on input_shape

velvet thorn Dec 3, 2020, 2:59 AM

#

no

#

okay

#

you need to distinguish between

#

your model definition and your data shape

velvet thorn Dec 3, 2020, 2:59 AM

#

cedar sun i though i was adding the channels here

this is telling the model what shape to expect

cedar sun Dec 3, 2020, 3:00 AM

#

a

velvet thorn Dec 3, 2020, 3:00 AM

#

cedar sun ```py history = model.fit(train_data[..., np.newaxis], train_label, ...

this is changing the data you pass to the model

#

the two must match.

cedar sun Dec 3, 2020, 3:00 AM

#

okey so right now, since my images are 64x64, i only have 2 dimensions

velvet thorn Dec 3, 2020, 3:00 AM

#

velvet thorn this is telling the model *what shape to expect*

this needs to be known to compile the model (in particular, knowing how many params are needed)

velvet thorn Dec 3, 2020, 3:00 AM

#

cedar sun okey so right now, since my images are 64x64, i only have 2 dimensions

yes, but because you're using 2D convolution, each image needs to have 3 axes

#

so you need to add the missing channels axis

cedar sun Dec 3, 2020, 3:00 AM

#

width and height. But i understand nn needs the number of channels. so i take it as 3 dimensions

velvet thorn Dec 3, 2020, 3:00 AM

#

which makes each image a 3D array

#

so

#

the thing is

#

the model doesn't need to know how many images are passed in at once

#

that is the batch size.

#

correct?

cedar sun Dec 3, 2020, 3:01 AM

#

yes

velvet thorn Dec 3, 2020, 3:01 AM

#

it only needs to know the shape of a single image

#

that is why the input_shape you pass in is 3D.

#

now, going back to the call to .fit

cedar sun Dec 3, 2020, 3:01 AM

#

okey

velvet thorn Dec 3, 2020, 3:01 AM

#

it expects you to pass a 4D array

#

of shape (image_count, height, width, channels)

cedar sun Dec 3, 2020, 3:01 AM

#

an array with number of images and dimensions?

velvet thorn Dec 3, 2020, 3:02 AM

#

but your data is currently 3D of shape (image_count, height, width)

cedar sun Dec 3, 2020, 3:02 AM

#

aaaah

#

okey

velvet thorn Dec 3, 2020, 3:02 AM

#

so you, again, need to add another axis to the end

#

got all that?

cedar sun Dec 3, 2020, 3:02 AM

#

to the end or the the beggining?

velvet thorn Dec 3, 2020, 3:02 AM

#

end

#

because

#

channels are the last axis

midnight trench Dec 3, 2020, 3:02 AM

#

hmm

cedar sun Dec 3, 2020, 3:02 AM

#

(height, width, channels, batch_size)?

velvet thorn Dec 3, 2020, 3:03 AM

#

you want to go (image_count, height, width) -> (image_count, height, width, channels)

#

like

cedar sun Dec 3, 2020, 3:03 AM

#

well now i am lost

velvet thorn Dec 3, 2020, 3:03 AM

#

your training data has shape (8656, 64, 64)

#

you want to turn it into (8656, 64, 64, 1)

cedar sun Dec 3, 2020, 3:03 AM

#

aaah

#

ok ok

#

so i need to modify train_data, and add 1 more dimension?

#

okey

velvet thorn Dec 3, 2020, 3:04 AM

#

validation data too

midnight trench Dec 3, 2020, 3:04 AM

#

📎 unknown.png

cedar sun Dec 3, 2020, 3:04 AM

#

why the [..., np.newaxis] didnt work?

midnight trench Dec 3, 2020, 3:04 AM

#

i think it explains it .-.

velvet thorn Dec 3, 2020, 3:05 AM

#

midnight trench i think it explains it .-.

what?

velvet thorn Dec 3, 2020, 3:05 AM

#

cedar sun why the ``[..., np.newaxis]`` didnt work?

what do you mean didn't work

cedar sun Dec 3, 2020, 3:05 AM

#

now dense is wrong owo

#

ValueError: Error when checking target: expected dense_2 to have shape (8656,) but got array with shape (1,)

#

i guess dense_2 is the last layer

midnight trench Dec 3, 2020, 3:05 AM

#

velvet thorn what?

as u can see in that idk why but it prints what is in list too .-.

cedar sun Dec 3, 2020, 3:05 AM

#

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=(*dimension, 1)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(len(train_label), activation='softmax'))```

velvet thorn Dec 3, 2020, 3:05 AM

#

cedar sun i guess dense_2 is the last layer

your dense layer is wrong too

#

think about this:

#

this is a classification problem, right?

cedar sun Dec 3, 2020, 3:06 AM

#

yes

velvet thorn Dec 3, 2020, 3:06 AM

#

so how many neurons should your last layer have?

cedar sun Dec 3, 2020, 3:06 AM

#

the same amount as labels i have

velvet thorn Dec 3, 2020, 3:06 AM

#

midnight trench as u can see in that idk why but it prints what is in list too .-.

what?

#

I do not understand

velvet thorn Dec 3, 2020, 3:06 AM

#

cedar sun the same amount as labels i have

are you sure?

midnight trench Dec 3, 2020, 3:06 AM

#

so

velvet thorn Dec 3, 2020, 3:06 AM

#

so think about this

#

say you have two categorie

#

s

#

dog/cat

cedar sun Dec 3, 2020, 3:07 AM

#

well, if i wanna classify car, truck, cycle, cat, and dogs

velvet thorn Dec 3, 2020, 3:07 AM

#

how many neurons?

cedar sun Dec 3, 2020, 3:07 AM

#

i need 1 neuron for each one

velvet thorn Dec 3, 2020, 3:07 AM

#

cedar sun i need 1 neuron for each one

yes

#

but this is not the same as what you said earlier

#

or your code

#

what is len(train_labels)? @cedar sun

cedar sun Dec 3, 2020, 3:07 AM

#

same as the number of images i have...

#

uggg

#

how can i get the number of labels?

velvet thorn Dec 3, 2020, 3:08 AM

#

cedar sun how can i get the number of labels?

🙂

cedar sun Dec 3, 2020, 3:08 AM

#

without hardcoding it

#

i know i have 151

#

but from where can i take it?

velvet thorn Dec 3, 2020, 3:08 AM

#

cedar sun but from where can i take it?

try to figure it out yourself

#

if you can't get it in like half an hour then ask me again

#

I'll give you a hint

cedar sun Dec 3, 2020, 3:08 AM

#

i would make a set of train_label

#

but meh

velvet thorn Dec 3, 2020, 3:09 AM

#

['dog', 'cat', 'cat', 'dog', 'dog', 'whale', 'cat'] <- how many neurons?

#

and how did you get that number?

cedar sun Dec 3, 2020, 3:09 AM

#

yes but

midnight trench Dec 3, 2020, 3:09 AM

#

list = [['Aashika\n', 'Anjila\n', 'Anushka\n', 'Arahat\n', 'Bian\n', 'Bijay\n', 'Crisma\n', 'Deepika\n', 'Dinesh \n', 'Diya\n', 'Habib\n', 'Hanery\n', 'Hishi\n', 'Kusum\n', 'Milan\n', 'Phurba\n', 'Pramik\n', 'Rijan\n', 'Rikesh\n', 'Riya\n', 'Rohit\n', 'Sangeet\n', 'Sanjita\n', 'Sushant\n', 'Swornim']]

list1 = [['Sushant\n', 'Aashika\n', 'Anushka\n', 'Anjila\n', 'Arahat\n', 'Bian\n', 'Bijay\n', 'Dinesh\n', 'Diya\n', 'Habib\n', 'Hanery\n', 'Milan\n', 'Hishi\n', 'Rikesh\n', 'Kusum\n', 'Phurba\n', 'Pramik\n', 'Rijan\n', 'Riya\n', 'Rohit\n', 'Sangeet\n', 'Sanjita\n', 'Swornim']]

for i in list1:
    print([x for x in i if i not in list])

output - ['Sushant\n', 'Aashika\n', 'Anushka\n', 'Anjila\n', 'Arahat\n', 'Bian\n', 'Bijay\n', 'Dinesh\n', 'Diya\n', 'Habib\n', 'Hanery\n', 'Milan\n', 'Hishi\n', 'Rikesh\n', 'Kusum\n', 'Phurba\n', 'Pramik\n', 'Rijan\n', 'Riya\n', 'Rohit\n', 'Sangeet\n', 'Sanjita\n', 'Swornim']

aspected - Students not present today while comparing to total students

cedar sun Dec 3, 2020, 3:09 AM

#

okey i guess i need to show u how do i create my data

velvet thorn Dec 3, 2020, 3:09 AM

#

cedar sun okey i guess i need to show u how do i create my data

it doesn't matter

velvet thorn Dec 3, 2020, 3:09 AM

#

midnight trench ```py list = [['Aashika\n', 'Anjila\n', 'Anushka\n', 'Arahat\n', 'Bian\n', 'Bija...

your lists are nested.

cedar sun Dec 3, 2020, 3:09 AM

#

first, tell me if this is correct

midnight trench Dec 3, 2020, 3:10 AM

#

how do i fix it tho

velvet thorn Dec 3, 2020, 3:10 AM

#

midnight trench how do i fix it tho

do you understand what I'm saying?

cedar sun Dec 3, 2020, 3:10 AM

#

train_data and train_label must have the same size?

velvet thorn Dec 3, 2020, 3:10 AM

#

cedar sun ``train_data`` and ``train_label`` must have the same size?

yes

midnight trench Dec 3, 2020, 3:10 AM

#

velvet thorn do you understand what I'm saying?

yea

velvet thorn Dec 3, 2020, 3:10 AM

#

midnight trench yea

then the answer should be clear

cedar sun Dec 3, 2020, 3:10 AM

#

and the label i must be the label associated with image i?

velvet thorn Dec 3, 2020, 3:10 AM

#

cedar sun and the label ``i`` must be the label associated with image ``i``?

yes

midnight trench Dec 3, 2020, 3:11 AM

#

so im reading from a file by [open("file", 'r').readlines()] will removing that [] will fix and readlines will be converted into normal list?

velvet thorn Dec 3, 2020, 3:11 AM

#

midnight trench so im reading from a file by [open("file", 'r').readlines()] will removing that ...

try it.

midnight trench Dec 3, 2020, 3:12 AM

#

velvet thorn try it.

idk gm im now getting some random shit

📎 unknown.png

cedar sun Dec 3, 2020, 3:12 AM

#

okey so i tell u what i have without showing code. I downloaded a set of images. Each image is on a folder with the name of the label that image belongs. I loop through all the folders and append the image to train data and the name of the folder to train label. So i dont have the number of classes directly. I should do something like len(os.listdir(path))?

velvet thorn Dec 3, 2020, 3:13 AM

#

cedar sun okey so i tell u what i have without showing code. I downloaded a set of images....

you do because you have the array train_label.

cedar sun Dec 3, 2020, 3:13 AM

#

but the train label has duplicates

velvet thorn Dec 3, 2020, 3:14 AM

#

yes

#

so...

cedar sun Dec 3, 2020, 3:14 AM

#

so the other way is making a set

velvet thorn Dec 3, 2020, 3:14 AM

#

deal with them.

cedar sun Dec 3, 2020, 3:14 AM

#

but is it the way to go?

velvet thorn Dec 3, 2020, 3:14 AM

#

midnight trench idk gm im now getting some random shit

why are you mixing for loops and list comprehensions?

#

I think you might want to start with something a bit simpler

#

probably

midnight trench Dec 3, 2020, 3:14 AM

#

BCZ u are the one who suggested me this method first

#

.-.

velvet thorn Dec 3, 2020, 3:14 AM

#

cedar sun but is it the way to go?

you can use a set, but there is a numpy-only way

velvet thorn Dec 3, 2020, 3:14 AM

#

midnight trench BCZ u are the one who suggested me this method first

I'm p sure I didn't say "mix loops and comprehensions"

midnight trench Dec 3, 2020, 3:15 AM

#

like can u give some example of a bit simplter?

#

simpler

velvet thorn Dec 3, 2020, 3:15 AM

#

midnight trench like can u give some example of a bit simplter?

start with this:

#

for name in names:
    print(name)

#

don't call your list list

#

bad practice

midnight trench Dec 3, 2020, 3:15 AM

#

ok

#

then?

cedar sun Dec 3, 2020, 3:15 AM

#

could u show me the numpy way? XD

velvet thorn Dec 3, 2020, 3:16 AM

#

cedar sun could u show me the numpy way? XD

Google it

velvet thorn Dec 3, 2020, 3:16 AM

#

midnight trench then?

then understand what is happening

#

each iteration of the loop

midnight trench Dec 3, 2020, 3:16 AM

#

velvet thorn then understand what is happening

it's looping and printing the names

cedar sun Dec 3, 2020, 3:16 AM

#

unique?

midnight trench Dec 3, 2020, 3:16 AM

#

all names inside the list

velvet thorn Dec 3, 2020, 3:17 AM

#

midnight trench it's looping and printing the names

so how would you check if each value is in a different list?

#

yeah your residuals are weird

#

show the histogram

cedar sun Dec 3, 2020, 3:17 AM

#

omg

#

ValueError: Error when checking target: expected dense_2 to have shape (151,) but got array with shape (1,)

velvet thorn Dec 3, 2020, 3:17 AM

#

honestly I would suspect your plotting code first

velvet thorn Dec 3, 2020, 3:18 AM

#

cedar sun omg

do you know the difference

cedar sun Dec 3, 2020, 3:18 AM

#

yes i do XD

velvet thorn Dec 3, 2020, 3:18 AM

#

between categorical crossentropy

#

and sparse categorical crossentropy?

cedar sun Dec 3, 2020, 3:18 AM

#

ah

velvet thorn Dec 3, 2020, 3:18 AM

#

go read up on that

cedar sun Dec 3, 2020, 3:18 AM

#

not rlly

midnight trench Dec 3, 2020, 3:18 AM

#

if i is not in anotherlist:
print(i)

velvet thorn Dec 3, 2020, 3:18 AM

#

and you will understand more

cedar sun Dec 3, 2020, 3:18 AM

#

but sparse seems like an average

velvet thorn Dec 3, 2020, 3:18 AM

#

midnight trench if i is not in anotherlist: print(i)

that's not even valid Python

cedar sun Dec 3, 2020, 3:18 AM

#

like sparse / spread

midnight trench Dec 3, 2020, 3:19 AM

#

i mean if i not in anotherlist:

cedar sun Dec 3, 2020, 3:19 AM

#

i will, let me fix this

#

@velvet thorn could u help me with the last thing please? and i wont bother u again hopefully

midnight trench Dec 3, 2020, 3:20 AM

#

hmm lookes like im just dumb doing it more complex way without knowing what's happening thx gm u gave me a good knowledge

cedar sun Dec 3, 2020, 3:20 AM

#

i move to a help channel

#

oxygen, if u wanna help me a bit more

torpid cave Dec 3, 2020, 3:29 AM

#

Residuals might indicate values in your data and how they are limited

#

Which sort of make sense as you are using daily increments.... which are bounded

#

But you are threating ts data as if it was cross-sectional

#

Which will create auto-correlation between your residuals

#

Which will invalidate your model

#

Yes but panel with observations in multiple times

#

You can't assume gains from tomorrow are not dependent on today's gains

#

You should introduce the time-series approach to your analysis

#

Then you would use the Haussman-Wu approach

#

I would suggest using other type of regression though

#

Panel is not ideal for this... try using SVAR/VAR

cedar sun Dec 3, 2020, 3:38 AM

#

velvet thorn and you will understand more

xdxdxd

torpid cave Dec 3, 2020, 3:38 AM

#

Well I am an econometrician, I would tell you that if you want to do dynamic panels you should consider timeseries

cedar sun Dec 3, 2020, 3:38 AM

#

Alternatively, you can use the loss function `sparse_categorical_crossentropy` instead, which does expect integer targets.

torpid cave Dec 3, 2020, 3:38 AM

#

And then approach Haussan Wu methodologies

#

Which get complicated quite fast

#

Regression can't have random gaps so don't worry... you usually look at residuals to test for correlation but as your data is bounded, I would try to focus on the distribution of the residuals rather than theirp lot

#

It should be normal

#

Yes

#

As you are not considering the time-series factor, then having gaps in your data (in terms of time-periods) should not be an issue from a theoretical way

#

Ah nww

#

Even in post-grad

#

teachers don't understand Haussman Wu

#

Haha nww, just keep in mind that for this data you should not be using panel

#

Not ideally

#

There are time-series models designed for this

#

Looks great

#

Panel OLS relies on OLS assumptions

#

Maybe do tests for autocorrelation/homoskedasticity

#

Residuals on vertical, independent on horizontal

#

Bounds are related to how your data is structured (bounds)

#

I understand the lower bound, as you can't have negative returns

#

The top bound is quite weird though, can't think why it is there

#

well, if you stock has a price of 100, the worst that can happen is that it goes to 0

#

And that is a return of 0

#

even 1 you get a return of 0.01

#

yep

#

1/100

#

yep

#

Oh ok

#

There you go

#

Mmmm... not much in Python

cedar sun Dec 3, 2020, 3:52 AM

#

@velvet thorn if u are still here, could u tell me what is this? TypeError: Cannot cast scalar from dtype('<U14') to dtype('<f4') according to the rule 'same_kind'

torpid cave Dec 3, 2020, 3:52 AM

#

Interact 2 variables?

#

Do it when you pass the fitted eq?

#

Just reviewed you formula...

#

Maybe pass it within the Exog-Variables

#

And see if the interactions are significant

#

[var1, var2, var1 * var2]

#

Should haha

#

Oh you are doing it with your categorical variables

#

Actually

#

nvm

#

Maybe do it in the back-end (create another column) and pass it as a parameter...

#

Yeah maybe create new categories and do the interaction

#

If it is between dummies

#

I can't remember doing interactions ever tbh

#

interactions between dummies

#

I do econometrics

#

In business environments

#

Haha

#

But models are simpler than this... or they use some sort of ML

velvet thorn Dec 3, 2020, 4:01 AM

#

cedar sun <@!171929073063297024> if u are still here, could u tell me what is this? ``Type...

stop pinging me please

#

I'm busy

cedar sun Dec 3, 2020, 4:01 AM

#

i only pinged u once :(

torpid cave Dec 3, 2020, 4:25 AM

#

Keep

#

It just affects interpretation

merry wadi Dec 3, 2020, 5:29 AM

#

Are there any good resources to learn some more advanced features of pandas?

paper stone Dec 3, 2020, 5:50 AM

#

hist(turnover_updated$age, labels=TRUE, xlab=“Age”, main=“Random Title”,col=c(2,3,4,5,6,7))
I have a histogram where the number on top of the tallest histogram is cut off. It is for R Studio.

#

📎 image0.jpg

paper stone Dec 3, 2020, 6:44 AM

#

Figured it out. Needed to use ylim

merry wadi Dec 3, 2020, 6:59 AM

#

In pandas is it possible to remove x number of rows based on if it matches a condition?
Lets say I had a store dataframe with a column called "stores" that had 20 different store numbers. Each store has anywhere from 15-200 rows. If i wanted to standardize it and only keep 20 rows for all stores. Does anyone know how to do that?

torpid cave Dec 3, 2020, 7:03 AM

#

@merry wadi define your business rules and use indexes

#

df = df[df['value'] == 'my_condition']

#

smthing among those lines

merry wadi Dec 3, 2020, 7:04 AM

#

Could i keep a specific amount for all stores? @torpid cave

#

if df['store'].count > 20 then Delete extra rows

torpid cave Dec 3, 2020, 7:05 AM

#

You could group by store

#

And then grab the top 20 with head()

#

head(20)

merry wadi Dec 3, 2020, 7:06 AM

#

I mean 20 row values for each store

torpid cave Dec 3, 2020, 7:06 AM

#

Top 20

#

bottom 20

#

which 20?

merry wadi Dec 3, 2020, 7:07 AM

#

Lets say top 20. So instead of each store having a different amount, they would all have 20 rows

torpid cave Dec 3, 2020, 7:07 AM

#

I remember this because that was the first thing I did with Python

#

df.groupby('Store').head(20)

#

Try something among those lines

merry wadi Dec 3, 2020, 7:09 AM

#

Damn that worked @torpid cave cant believe it was that simple. I've been at this for awhile

autumn veldt Dec 3, 2020, 7:10 AM

#

Ask: im still beginner for data science, here I'm trying to make a classification program, using dataset that only show yes(true) or no(false) in every feature column. So i wonder, what methods that fit(best) for classification using datasets that only have index yes(1) or no(0) only?

torpid cave Dec 3, 2020, 7:12 AM

#

@merry wadi haha nww, took me a while to do as well just bc of the syntax

merry wadi Dec 3, 2020, 7:15 AM

#

Would it be possible to add another column as well? @torpid cave

trim oar Dec 3, 2020, 11:15 AM

#

Has anyone worked with deeplearningj4 before? How is it?

lapis sequoia Dec 3, 2020, 12:02 PM

#

so tf.data.Dataset internally one-hot encodes labels and predicts that, how could i convert the predicted vector into a label?

#

is there a way to obtain the internal encoding?

lapis sequoia Dec 3, 2020, 12:39 PM

#

anyone?

torpid cave Dec 3, 2020, 12:48 PM

#

sorry never used that package

lapis sequoia Dec 3, 2020, 12:59 PM

#

nvm, its probably gonna be the same one-hot encoding as any other encoder from keras, sklearn, etc, assuming the labels are in the same order right

torpid cave Dec 3, 2020, 1:05 PM

#

Is there something like factors in python?

lapis sequoia Dec 3, 2020, 1:13 PM

#

wdym

earnest forge Dec 3, 2020, 2:22 PM

#

I have a question related to ML: First we train our model on train_set and then test it on test_set. After we fine-tuned it - what's next? Utilize the model on new data or what?

torpid cave Dec 3, 2020, 2:40 PM

#

Deployment

lapis sequoia Dec 3, 2020, 3:13 PM

#

hello

#

when i use pandas.read_json() the index is weird

#

i have ~150 rows but index goes from 0 to 11

austere swift Dec 3, 2020, 3:15 PM

#

!d pandas.DataFrame.reset_index

lapis sequoia Dec 3, 2020, 3:16 PM

#

alright, but what causes that?

arctic wedgeBOT Dec 3, 2020, 3:16 PM

#

`pandas.DataFrame.reset_index`

DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')```
Reset the index, or a level of it.

Reset the index of the DataFrame, and use the default one instead. If the DataFrame has a MultiIndex, this method can remove one or more levels.

Parameters  **level**int, str, tuple, or list, default NoneOnly remove the given levels from the index. Removes all levels by default.

**drop**bool, default FalseDo not try to insert index into dataframe columns. This resets the index to the default integer index.

**inplace**bool, default FalseModify the DataFrame in place (do not create a new object).

**col\_level**int or str, default 0If the columns have multiple levels, determines which level the labels are inserted into. By default it is inserted into the first level.

**col\_fill**object, default ‘’If the columns have multiple levels, determines how the other levels are named. If None then the index name is repeated.

Returns  DataFrame or NoneDataFrame with the new index or None if `inplace=True`.

See also... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html#pandas.DataFrame.reset_index)

lapis sequoia Dec 3, 2020, 3:16 PM

#

there seems to be no pattern

austere swift Dec 3, 2020, 3:16 PM

#

idk what causes it tbh

#

idk what causes it tbh

lapis sequoia Dec 3, 2020, 3:16 PM

#

is this just a thing that happens?

#

in any case thanks

trim oar Dec 3, 2020, 3:39 PM

#

Don't think I've ever seen it

livid quartz Dec 3, 2020, 3:40 PM

#

Anyone know why the unnamed column is appearing when I read my csv file with pandas?

#

📎 unknown.png

trim oar Dec 3, 2020, 4:14 PM

#

It's usually the index

#

But I don't know why it's at the end

#

It could be an csv exported from a specific program

#

Just read_csv(index = False) should remove it?

#

Maybe the program was exporting index at the last column instead

livid quartz Dec 3, 2020, 4:16 PM

#

It was originally an ARFF file so that could probably be why

rustic dew Dec 3, 2020, 4:17 PM

#

I think there is just comma at the end of each line which pandas reads as empty column, I've seen this on csv export from various programs - they would put comma at the end of a line, not sure why

trim oar Dec 3, 2020, 4:18 PM

#

I see

#

Then it may not be viewed as an index

#

Just drop it I guess

west lava Dec 3, 2020, 4:40 PM

#

Hi - have a quick Pandas question.

What is the difference between these two lines at the bottom? They both work, but I am not sure if one is needed.

df = pd.json_normalize(urls)
df_recon = pd.DataFrame(columns=["server", "port", "full_url"])
df_recon.server = df.rdb_url
df_recon.server = df.rdb_url.to_numpy()

serene scaffold Dec 3, 2020, 4:42 PM

#

def plot(points: list[tuple[int, int, int]]):
    from mpl_toolkits.mplot3d import Axes3D
    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')
    xs = np.array([p[0] for p in points])
    ys = np.array([p[1] for p in points])
    zs = np.array([p[2] for p in points])
    ax.contour3D(xs, ys, zs)
    plt.show()

# error
>>> ax.contour3D(xs, ys, zs)
TypeError: Input z must be at least a (2, 2) shaped array, but has shape (1, 1000)

why would Z need to be a different shape?

rustic dew Dec 3, 2020, 4:46 PM

#

serene scaffold ```py def plot(points: list[tuple[int, int, int]]): from mpl_toolkits.mplot3...

because Z represents values on Z axis for different Xs and Ys, so 2D array

serene scaffold Dec 3, 2020, 4:47 PM

#

But there's only one value for z for any x, y

rustic dew Dec 3, 2020, 4:51 PM

#

ah so.. is it on a regular grid tho? or irregular?

wild pine Dec 3, 2020, 4:52 PM

#

hey guys, I'm trying to wrap my head around deep Q networks.
when using visual observations (pixel data), the examples i've seen defines the state/observation for each timestep as current_screen - previous_screen
i get that you can't figure out direction or velocity from a single still image, but doesn't the agent lose all information about static elements this way?
let's say i have an environment that is initialized with a random map or configuration.
how would the agent learn to generalize if it is only ever provided with data from moving elements?

rustic dew Dec 3, 2020, 4:52 PM

#

serene scaffold But there's only one value for z for any x, y

if it's regular, you would need to create 2D Z array yourself

rustic dew Dec 3, 2020, 4:52 PM

#

serene scaffold But there's only one value for z for any x, y

if it's irregular, I am nor sure mpl 3D can handle those, you'd need to interpolate

serene scaffold Dec 3, 2020, 4:53 PM

#

so if the array is currently [1, 2, 3, 4, 5], do I need [[1, 2, 3, 4, 5], [0, 0, 0, 0 ,0]]?

#

something like that?

rustic dew Dec 3, 2020, 4:55 PM

#

serene scaffold so if the array is currently `[1, 2, 3, 4, 5]`, do I need `[[1, 2, 3, 4, 5], [0,...

mmm I dont think so.. more like imagine you have 4 points: [0, 0, 4], [0, 1, 12], [1, 0, 1.4], [1, 1, 2.85]

#

each is written as x, y, z

#

you'd need to create Z as Z = np.array([[4, 12], [1.4, 2.85]])

#

and x, y = np.meshgrid([0, 1], [0, 1])

#

that should work

#

but as I was saying, only if x and y makes a regular grid

serene scaffold Dec 3, 2020, 4:56 PM

#

I don't understand how that works. What do 4 and 12 have to do with each other?

rustic dew Dec 3, 2020, 4:56 PM

#

those were just example of 4 points, where each point is written as its x, y, and z - coordinates

#

so first point is [0, 0, 4] meaning its x-coordinate is 0, y-coordinate is 0 and y-value is 4

#

from your copy-pasted code I got the idea that the points you are trying to plot are a list of tuples like this

serene scaffold Dec 3, 2020, 4:58 PM

#

yes

rustic dew Dec 3, 2020, 4:59 PM

#

so, the contour3D only work with regularly sampled points on a grid, meaning you need to have values for all combinations of (x, y) points

#

so e.g. it will work if you have 4 points with x being 0 and 1 and y being 0 and 1

#

but it won't work if you have e.g. 8 random points with arbitrary x, y, z coordinates

#

you can use scatter3d for those

serene scaffold Dec 3, 2020, 5:02 PM

#

Alright, Thanks @rustic dew!

rustic dew Dec 3, 2020, 5:02 PM

#

serene scaffold Alright, Thanks <@!783727604208369754>!

np:), sorry if I was too cryptic

lapis sequoia Dec 3, 2020, 5:04 PM

#

hello

leaden girder Dec 3, 2020, 7:08 PM

#

Who knows and can give me advice with linear programming ?

fallow prism Dec 3, 2020, 7:49 PM

#

livid quartz It was originally an ARFF file so that could probably be why

when you are saving put pd.to_csv(index=False)

hasty orchid Dec 3, 2020, 8:07 PM

#

Is there a way to add interactive gui filters to a jupyter notebook? Can these be easily exported to html? For notebook sharing? Trying to replicate tablaeu basically

woven tundra Dec 3, 2020, 8:10 PM

#

hasty orchid Is there a way to add interactive gui filters to a jupyter notebook? Can these b...

Check out plotly, it renders on a notebook

#

Not sure if it can be exported to html

hasty orchid Dec 3, 2020, 8:12 PM

#

Yeah, definitely looking for something that works with the export html function of jupyter notebooks, I don't know how I can share these with end users realistically otherwise

#

Otherwise I might have to end up putting reports in excel which is a snore lol

fallow prism Dec 3, 2020, 8:27 PM

#

can someone tell me their opinion about vim with python (and data science of course)?

rustic dew Dec 3, 2020, 8:54 PM

#

hasty orchid Is there a way to add interactive gui filters to a jupyter notebook? Can these b...

also bokeh seems to be an option: https://docs.bokeh.org/en/latest/docs/user_guide/jupyter.html

Using with Jupyter — Bokeh 2.2.3 Documentation

Bokeh visualization library, documentation site.

hasty orchid Dec 3, 2020, 8:56 PM

#

@rustic dew do you know if poorly can do the same thing? Is bokeh popular? Never heard of it looks sick

rustic dew Dec 3, 2020, 9:00 PM

#

hasty orchid <@783727604208369754> do you know if poorly can do the same thing? Is bokeh popu...

so they both can do similar things, I was using both of them but that is some time ago and since then both of them grew... from what I remember bokeh was easier to use, but plotly is (or was...) more feature-rich, here I found some nice recent (2020) comparison: https://pauliacomi.com/2020/06/07/plotly-v-bokeh.html I think it boils down with which you'd be more comfortable

Paul Iacomi

Plotly vs. Bokeh: Interactive Python Visualisation Pros and Cons

Over the last year, I’ve worked extensively with large datasets in Python, which meant that I needed a more powerful data visualisation than trusty old Matplotlib. There are essentially only two libraries which provide the high level of interactivity I was looking for, while being mature enough: Plotly (+Dash) and Bokeh. Each has their own stren...

#

also, if you need export to html just because of sharing your notebook, also consider https://nbviewer.jupyter.org/ - you just host your ipynb anywhere (dropbox, drive, your webpage, keybase, anything), render it with nbviewer, and share the nbviewer link - I find it much easier than exporting

nbviewer

hasty orchid Dec 3, 2020, 9:04 PM

#

Haha thanks I appreciate it- oh wow, your explanation of nb viewer makes me much more interested than what I was reading

#

On overflow

rustic dew Dec 3, 2020, 9:05 PM

#

not sure what you've read, but I love it! my typical workflow is to work in jupyterlab, put the ipynb into my public keybase folder and render it with nbviewer. done in 30seconds

waxen birch Dec 3, 2020, 9:19 PM

#

Do you guys know any good tutorial for numpy matrix?

lapis sequoia Dec 3, 2020, 10:25 PM

#

if the train out put is saying acc = 0.008 it sucks right? XD

heady hatch Dec 3, 2020, 10:34 PM

#

Hey guys couple questions about data splits.

So the ML problem is predicting political affiliation based on the metaphors authors used.

one split is

📎 unknown.png

#

While the other one is

📎 unknown.png

#

I'm trying to understand why the first one is bad and why we should go with the second one.

#

From my understanding what's happening with the first split is data from test set is being leaked into the training and validation. Which makes sense.

#

I think I'm stuck on the idea that how do we learn to predict the test set if the data wasn't seen. Because we're assuming that the test set is going to hold the same distribution as the training and validation, even if the authors are different.

velvet thorn Dec 3, 2020, 11:09 PM

#

I'm trying to understand why the first one is bad and why we should go with the second one.
@heady hatch who says the first one is bad

gray wigeon Dec 3, 2020, 11:45 PM

#

velvet thorn > I'm trying to understand why the first one is bad and why we should go with th...

I wanna know too. Wouldn't the second one result to underfitting

heady hatch Dec 4, 2020, 12:00 AM

#

@velvet thorn Google.

Their reasoning on why the second split is more appropriate is because the data was clustered properly while sampling.

Because in the first split, it would be hard to debug when it does badly in production, while you can find that out in the second split.

But caveat, just because it's Google doesn't necessarily mean they're correct, but I definitely see splitting being context appropriate.

gray wigeon Dec 4, 2020, 12:01 AM

#

Oh, so you're making a clustering model?

heady hatch Dec 4, 2020, 12:01 AM

#

Oh no this is a classification problem.

#

I think I unintentionally primed us too.

I might have framed the problem as the first one being objectively bad while the second one is better and etc.

I apologize. I think the problem was both are bad in production but the first split is good during experimentation and showing higher scores than the second.

gray wigeon Dec 4, 2020, 12:03 AM

#

I think for classification models, the first split is more desirable

heady hatch Dec 4, 2020, 12:04 AM

#

I think it's context dependent, because of the reasons stated above.

#

Or I guess I want to know, why do you say so?

gray wigeon Dec 4, 2020, 12:05 AM

#

I mean what are you trying to classify anyway?

lapis sequoia Dec 4, 2020, 12:05 AM

#

once i have my model trained, how can i predict with it?

heady hatch Dec 4, 2020, 12:05 AM

#

@lapis sequoia are you using a library or building it from scratch?

#

@gray wigeon it's a case study, but they were trying to classify author political leaning from their metaphor usage.

lapis sequoia Dec 4, 2020, 12:06 AM

#

keras

gray wigeon Dec 4, 2020, 12:06 AM

#

heady hatch <@!476573611768021015> it's a case study, but they were trying to classify autho...

NLP? Then the first split is still better.

#

How are you doing it? Do you have a set where you have identified metaphors with corresponding authors?

heady hatch Dec 4, 2020, 12:07 AM

#

lapis sequoia keras

You can do something like model.predict unless you're trying to evaluate it and such.

lapis sequoia Dec 4, 2020, 12:07 AM

#

and what to give it?

heady hatch Dec 4, 2020, 12:07 AM

#

Whatever you gave it while training.

lapis sequoia Dec 4, 2020, 12:08 AM

#

so if it trained with black-white

#

i cant give a color img?

#

and dimensions must be the same?

heady hatch Dec 4, 2020, 12:08 AM

#

gray wigeon How are you doing it? Do you have a set where you have identified metaphors with...

Why do you say the first split is still better?

Sorry to clarify, I'm not experimenting or writing the code. It's a case study from Google where they were trying to debug why the model did well in experimentation but not in production.

#

Like in experimentation, the model did 99% or something with the first split

#

but in production, the model did 50%.

#

I'm just making up the numbers, but they saw that there was a problem where in experiment, it didn't match up with production.

gray wigeon Dec 4, 2020, 12:09 AM

#

heady hatch Why do you say the first split is still better? Sorry to clarify, I'm not exper...

Ooh okay. Then there could be a lot of answers to that.

heady hatch Dec 4, 2020, 12:10 AM

#

They did solve it. They found out the first split was not representative.

#

Because the data was leaking across training, validation, and test set.

gray wigeon Dec 4, 2020, 12:10 AM

#

Still so weird to see the second set

heady hatch Dec 4, 2020, 12:10 AM

#

Right?

#

I completely feel you on that, which was why I wanted to ask about it.

#

I was thinking that it would under fit as well.

gray wigeon Dec 4, 2020, 12:12 AM

#

Can I see the link to the study? This is going to keep me up all day now lmao

lapis sequoia Dec 4, 2020, 12:13 AM

#

lapis sequoia and dimensions must be the same?

.

heady hatch Dec 4, 2020, 12:14 AM

#

gray wigeon Can I see the link to the study? This is going to keep me up all day now lmao

The paper
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.103.21

Google's thing
https://developers.google.com/machine-learning/crash-course/18th-century-literature

gray wigeon Dec 4, 2020, 12:14 AM

#

Thanks

heady hatch Dec 4, 2020, 12:14 AM

#

Let me know if I've interpreted it wrong or something.

heady hatch Dec 4, 2020, 12:15 AM

#

lapis sequoia .

Yes, training and inference features need to be the same.

If you want to predict it on colored images while you've trained it on black and white, you can transform the colored images to black and white and then predict.

gray wigeon Dec 4, 2020, 12:16 AM

#

My initial guess before I read is that to account for the leakage, the suggestion might have been to implement a clustering preprocessing to define the set more properly but this is just a guess

lapis sequoia Dec 4, 2020, 12:16 AM

#

if i load an image as np.array(img), can i do something with numpy to convert them to black white?

#

i do it with opencv

#

i would need to to the average per pixel, rigth?

#

image[i][j] = sum etc

#

?

heady hatch Dec 4, 2020, 12:17 AM

#

I would love to give you tips and advice but you might have an easier time googling "transform colored images to black and white opencv" or something.

I mainly work in NLP so I can't say much.

lapis sequoia Dec 4, 2020, 12:18 AM

#

lapis sequoia image[i][j] = sum etc

this is the way

lapis sequoia Dec 4, 2020, 12:18 AM

#

lapis sequoia i do it with opencv

xd

torpid cave Dec 4, 2020, 12:18 AM

#

@heady hatch is there a library I should be aware to implement NLP?

#

I have a project aligned for next year

#

And should start doing the DD in the next weeks

#

If you want to know more about the project, I built scrapers to get product-reviews, now I have to classify these reviews

#

So far I know it will be unsupervised learning

#

But I need to add more metrics like... sentiment on the review

#

if it is critical

heady hatch Dec 4, 2020, 12:22 AM

#

torpid cave <@!542872811245666305> is there a library I should be aware to implement NLP?

Hmm kinda lots of factors to consider to be honest.

I'm assuming you're not talking about implementing algorithms from scratch.

What size do you think your data is going to be?

torpid cave Dec 4, 2020, 12:22 AM

#

Nono, not from scratch

#

It is quite large

#

over 1M rows

heady hatch Dec 4, 2020, 12:23 AM

#

I think I would get familiar with Spark NLP.

#

or the Spark family in general to work with big data.

#

How familiar are you with NLP?

torpid cave Dec 4, 2020, 12:24 AM

#

Tbh I just did data manipulation

#

with strings

#

The project will take around 6 months and I will be working with another guy as well who has created NLP libraries though

#

But I need to catch-up with him as fast as I can before starting

heady hatch Dec 4, 2020, 12:25 AM

#

I'd say get used to working with classical methods first, so maybe start with sklearn and whatnot.

So you understand what kind of features you'll be creating for modeling.

#

Don't touch NN for now.

torpid cave Dec 4, 2020, 12:25 AM

#

I don't think we have used NN for any project in my place tbh

heady hatch Dec 4, 2020, 12:25 AM

#

Like learn the different things about NLP such as ngrams, bow, tfidf, cleaning, tokenization, parsing, lemmatizing.

#

I think SpacCy and NLTK and whatever library out there will give you functions to help you learn it.

#

The library isn't really that important, IMO. It's more about understanding text and what happens when you transform them.

torpid cave Dec 4, 2020, 12:27 AM

#

OK

#

Quick question.... do text gets converted to a number at any point in time?

heady hatch Dec 4, 2020, 12:27 AM

#

Yes! so bow and tfidf are two ways of transforming them. Then you'll come to learn about embeddings to deal with sparse representations.

gray wigeon Dec 4, 2020, 12:27 AM

#

@heady hatch okay i think i got the geist of it now.

heady hatch Dec 4, 2020, 12:28 AM

#

Embeddings are more of what today's industry use because sparse representations isn't efficient at all, space nor compute wise.

gray wigeon Dec 4, 2020, 12:29 AM

#

@heady hatch the miscalculation was that they split the texts by sentences to determine the political affiliation. and the writers wrote in a very particular fashion that in a sentence level split, it's difficult to point out who says who. basically, the second split isn't "better" per se, it's just an alternative approach the study tried.

torpid cave Dec 4, 2020, 12:29 AM

#

Thank you very much @heady hatch

#

I will update on how the project goes

#

I got a better idea now, I imagine I will be cleaning the dataset for about 1~2 months though and working on the data side before moving onto applying anything

heady hatch Dec 4, 2020, 12:30 AM

#

@torpid cave oh learn to use regex to clean too and learn when not to use regex.

torpid cave Dec 4, 2020, 12:30 AM

#

Well I hate regex just like any other normal person

gray wigeon Dec 4, 2020, 12:31 AM

#

torpid cave Well I hate regex just like any other normal person

i'm yet to meet a person who honest to god loves regex

heady hatch Dec 4, 2020, 12:31 AM

#

I'm totally imagining a person in some dark hole just churning out complex regex patterns that captures essence of human language.

torpid cave Dec 4, 2020, 12:32 AM

#

hahahaha

marsh lantern Dec 4, 2020, 2:54 AM

#

Have you tried using one $$ before and after your equations? I don't think I would use all of those $$ to delimit every line

lapis sequoia Dec 4, 2020, 7:49 AM

#

Here is a mini project I worked on today to learn data mining. Feel free to add/change something. https://github.com/murathany7/scraper

GitHub

murathany7/scraper

This script mines data from the biggest forum in Turkey. It gets around 150,000 questions and answers in Turkish. - murathany7/scraper

sick skiff Dec 4, 2020, 9:51 AM

#

hey im new to tensorflow, i made one of those ais that predict mnist numbers, thing is whenever i draw my own number it gets it completely wrong, like nowhere close, a 3 is a 5, a 3 but one pixel to the left is 0

#

anyone got ideas?

#

do i need a larger sample? maybe more epoches? or is it that its just memorising those images and not predicting anything, and when its met with something new it just throws random guesses

halcyon vale Dec 4, 2020, 10:30 AM

#

https://github.com/ThinamXx/66Days__NaturalLanguageProcessing/blob/master/README.md

GitHub

ThinamXx/66Days__NaturalLanguageProcessing

I am sharing my Journey of 66DaysofData in Natural Language Processing. - ThinamXx/66Days__NaturalLanguageProcessing

sick skiff Dec 4, 2020, 11:11 AM

#

?

lapis sequoia Dec 4, 2020, 11:17 AM

#

When working with Keras, should I save the whole model or just the weights if I want to train new data continuously?

fallow prism Dec 4, 2020, 12:49 PM

#

torpid cave Quick question.... do text gets converted to a number at any point in time?

https://youtu.be/ERibwqs9p38

YouTube

Stanford University School of Engineering

Lecture 2 | Word Vector Representations: word2vec

Lecture 2 continues the discussion on the concept of representing words as numeric vectors and popular approaches to designing word vectors.

Key phrases: Natural Language Processing. Word Vectors. Singular Value Decomposition. Skip-gram. Continuous Bag of Words (CBOW). Negative Sampling. Hierarchical Softmax. Word2Vec.

-----------------------...

▶ Play video

prime slate Dec 4, 2020, 1:23 PM

#

What float('inf') equal to in python? is there a limit for it, since the data type float can hold 64-bit?

rustic dew Dec 4, 2020, 1:40 PM

#

prime slate What float('inf') equal to in python? is there a limit for it, since the data ty...

how do you mean limit for it? you can check upper limit of float64 with np.finfo("d").max and it equals to 1.7976931348623157e+308 so np.inf is a bit larger:)

#

funny enough, you can do np.finfo("d").max + 1 but e.g. np.finfo("d").max * 1.1 throws overflow:)

lapis sequoia Dec 4, 2020, 1:42 PM

#

when we're getting the loss of a graph

#

why do we square the distance from y to the slope line/ y = mx + b version of y

#

why do we square that distance and sum them?

rustic dew Dec 4, 2020, 1:49 PM

#

it's just one of the options on how to define "loss function". when you are trying to optimise something (like fit a linear regression = optimising parameters m and b of the model) you need to define some loss / fitness / objective function (it goes by many names). sum of squared distances (mean squared error, usually abbreviated as MSE) is one of the options and it is pretty popular.. you can do absolute error, or root mean square error etc. each of the fitness function has its pros and cons.. so when you're fitting a linear regression, you are trying to minimise the MSE, so that the error between your data and your linear model is minimal

lapis sequoia Dec 4, 2020, 2:02 PM

#

Do you think it will be better if I just learn Linear Algebra

rustic dew Dec 4, 2020, 2:15 PM

#

well, it most certainly will help;)

indigo bronze Dec 4, 2020, 2:30 PM

#

lapis sequoia why do we square that distance and sum them?

We use the mean of the squared errors because it is mathematically simple to optimize, iirc

rustic dew Dec 4, 2020, 2:34 PM

#

some slightly-related piece to error functions: https://towardsdatascience.com/https-medium-com-chayankathuria-regression-why-mean-square-error-a8cad2a1c96f

Medium

Regression — Why Mean Square Error?

Choosing the best Loss Function for regression algorithms

vale vector Dec 4, 2020, 2:37 PM

#

can anyone explain how the structure naming a class with a dot? Like in the datetime module they have a "class datetime.date"
if i try and make a class with a dot, i get an error

rustic dew Dec 4, 2020, 3:01 PM

#

not sure this is a question for data-science channel.. however all properties and attributes of a class are accessible through dot, eg.

class foo:
    def __init__(self, a=5, b=12.4):
        self.a = a
        self.b = b
    @property
    def ab_sum(self):
        return self.a + self.b

then

baz = foo(a=3)
baz.a # prints 3
baz.ab_sum # prints 15.4

rustic dew Dec 4, 2020, 3:02 PM

#

vale vector can anyone explain how the structure naming a class with a dot? Like in the date...

the example you refer to with datetime.date is different, since AFAIK datetime is a module, there is a difference between modules and classes, but both can enjoy dot notation

prime slate Dec 4, 2020, 4:11 PM

#

rustic dew how do you mean limit for it? you can check upper limit of float64 with `np.finf...

I mean by limit, is for the data type itself not the inf in real life, we all know that inf is unlimited, and it's one of the attributes for the inf number, but what I meant is for the computer, by your answer I understood that if we reach to a number after that it will give an error, so the limit for the inf number in the computer is the float data type limit.

#

or at least for python, cuz I know that R use long double to represent the inf number

serene scaffold Dec 4, 2020, 4:22 PM

#

Not urgent, but is there a "numpy" way to do this:

matrix = np.zeros((21, 21))
for x, y in product(range(21), range(21)):
    matrix[x, y] = func(x, y)

rustic dew Dec 4, 2020, 4:29 PM

#

serene scaffold Not urgent, but is there a "numpy" way to do this: ```py matrix = np.zeros((21, ...

if your func is properly vectorized this one should do the trick

x, y = np.meshgrid(range(21), range(21))
matrix = func(x, y)

#

or one liner

matrix = func(*np.meshgrid(range(21), range(21)))

if you like that kind of things

serene scaffold Dec 4, 2020, 4:38 PM

#

rustic dew if your `func` is properly vectorized this one should do the trick ```py x, y = ...

I like easy-to-read one-liners. like list comps. I'm not sure which I'd prefer in this case lemon_glass

sick skiff Dec 4, 2020, 4:45 PM

#

i can make it run tests, save and load a model, but no matter what i do, i just cant test it on a new set of images, it either crashes, or gives me a completely wrong answer, and ive tested it with keras and tenseofrlow, i must be loading the ima4ges wrongly or smth

earnest forge Dec 4, 2020, 5:37 PM

#

how to keep only first character in each value of PClass so I'd further be able to convert it to numeric type of data

📎 unknown.png

rustic dew Dec 4, 2020, 5:42 PM

#

df.PClass = df.PClass.apply(lambda x: int(x[0]))

#

or float instead of int if you need floats...

lapis sequoia Dec 4, 2020, 5:43 PM

#

Is it a good start to try learning data science with Python by trying to analyze random samples by googling almost every single code

earnest forge Dec 4, 2020, 5:43 PM

#

you are late 😄

earnest forge Dec 4, 2020, 5:43 PM

#

rustic dew `df.PClass = df.PClass.apply(lambda x: int(x[0]))`

i've already figured it out

earnest forge Dec 4, 2020, 5:45 PM

#

lapis sequoia Is it a good start to try learning data science with Python by trying to analyze...

I suggest you to read 'Python Data Science Handbook' at first place. It covers essentials and a bit more. Some kind of a fast start

lapis sequoia Dec 4, 2020, 5:48 PM

#

I’m studying with Python for Data Science, but somehow it takes too much time for me, so I was thinking maybe I might wanna skip some part and try real world project by googling stuff?

earnest forge Dec 4, 2020, 5:51 PM

#

do you learn DS on your own?

lapis sequoia Dec 4, 2020, 5:51 PM

#

Yes

#

I mean for now

earnest forge Dec 4, 2020, 5:52 PM

#

how I do: after finishing chapter, go practicing what you've learned

lapis sequoia Dec 4, 2020, 5:52 PM

#

Is “Python Data Science Handbook” some kind of short version of “Python for Data Science”?

#

Yea thats what I do but with this book it takes forever

#

Too many stuff to do

earnest forge Dec 4, 2020, 5:54 PM

#

lapis sequoia Is “Python Data Science Handbook” some kind of short version of “Python for Data...

It comprises of 500 pages (I'd say 300, cuz last 200s are ML and it's better to take more deep course in ML since it's essential for ds)

#

I have finished everything (except ML part, I didn't read it) in ~2 months

fallen prism Dec 4, 2020, 5:55 PM

#

hello buds, i just learned python basics , can you suggest me any books , videos to start learning DS by my own and start doing ML

earnest forge Dec 4, 2020, 5:55 PM

#

btw, one more effective way: whenever you stumble in something unfamiliar go searching how it works and when to use

lapis sequoia Dec 4, 2020, 5:55 PM

#

Hmm okay I gotta try that book with youtube videos then

earnest forge Dec 4, 2020, 5:55 PM

#

sad thing, DS handbook doesn't deal a lot with numpy package, so that googling method helped me a lot

lapis sequoia Dec 4, 2020, 5:56 PM

#

Do real data scientists in the field google a lot too? xD

fallen prism Dec 4, 2020, 5:56 PM

#

@earnest forge @lapis sequoia

earnest forge Dec 4, 2020, 5:57 PM

#

fallen prism hello buds, i just learned python basics , can you suggest me any books , videos...

from my perspective, it's better to spend more time on python too. Cuz when you reach ML you'll have to deal with memory consumption and implemention your own classes

fallen prism Dec 4, 2020, 5:57 PM

#

earnest forge from my perspective, it's better to spend more time on python too. Cuz when you ...

i do some projects but how can i learn data structures

lapis sequoia Dec 4, 2020, 5:58 PM

#

@fallen prism
I started learning Python and DS with the book named “Python for Data Analysis”, but seems like it consumes a lot of time so I’m thinking about switching to “Python Data Science Handbook” @earnest forge recommended

fallen prism Dec 4, 2020, 5:59 PM

#

lapis sequoia <@552147336592359446> I started learning Python and DS with the book named “Pyt...

Thanks is it available everywhere?

earnest forge Dec 4, 2020, 5:59 PM

#

Handbook covers only basics and delves a bit into complicated task. Anyway, it really goes well for the fresh start

fallen prism Dec 4, 2020, 5:59 PM

#

earnest forge from my perspective, it's better to spend more time on python too. Cuz when you ...

thank you <3

earnest forge Dec 4, 2020, 5:59 PM

#

fallen prism i do some projects but how can i learn data structures

google 'python data structures' and also find tasks related to data structures

lapis sequoia Dec 4, 2020, 6:00 PM

#

There are free pdfs that might be piracy though

earnest forge Dec 4, 2020, 6:00 PM

#

tbh, the most frequent for using is list, but real project demand your deep understanding of DataFrames

lapis sequoia Dec 4, 2020, 6:01 PM

#

I know basics like pd.DataFrame, concat, whatsoever but the problem is I barely know other sub functions like inplace=True and stuff like that

#

so I google it all the time xD

earnest forge Dec 4, 2020, 6:04 PM

#

handbook covers it all

lapis sequoia Dec 4, 2020, 6:05 PM

#

Wow really

#

Thats good to know

#

so its all about memorizing xD

#

cant wait to start again

azure cedar Dec 4, 2020, 8:01 PM

#

hey y'all

#

I have a question about the pd.merge() function and its behavior

#

and I opened a help channel at the same time as someone else and they kinda took over

#

do you mind if I left this question here?

tribal wind Dec 4, 2020, 9:10 PM

#

anyone here good with NLTK? (:

wintry olive Dec 4, 2020, 10:04 PM

#

We show data augmentation is equivalent to an averaging operation over the orbits of a certain group that keeps the data distribution approximately invariant.

#

the above statement was taken from a computer vision related study attempting to establish a practical mathematical framework for understanding the effects augmenting or curating data into a datasets prior to building and tuning models with the data. I get the effect of augmentation causinginvariance but not the process which generates it. The word orbits jumped out at me.

#

I learned about orbits of 0 under an iteration from studying the mathematics of the Mandelbrot set

#

that's on the complex plane though. Intuition leads me to think that was a property of the imaginary numbers doing special things. do orbits work similarly in statistics except with both axis containing streams of real numbers? what do those orbits look like?

#

there was an app someone was using to show animations of orbits from hoovering mouse over the complex plane. I'll just look for that..

velvet thorn Dec 4, 2020, 10:30 PM

#

do you mind if I left this question here?
@azure cedar just do it

azure cedar Dec 4, 2020, 10:59 PM

#

So i have two dataframes

#

Three identical columns

#

I'm trying to make sure that the ones that match are verified that match for each row across all of the columns per row

#

as well as collect the ones that don't match and send that to a function

#

right now I'm merging on a single column and it seems to yield a "match_list" and then doing df[df[~does not contain the matchlist]]

#

my question is: when I do the merge, am I getting the result I have just assumed?

#

also is this the best way to do this?

drifting hemlock Dec 4, 2020, 11:45 PM

#

I'm seriously banging my head around this problem which I'm pretty sure has a simple answer, I'm using the Online Retail dataset, you can load it:

customers = pd.read_excel('https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx')

I have a multi-index by doing:

customers.set_index(['CustomerID', 'InvoiceNo'])

What I want to is to select the first invoice for every client. Meaning that for the client 13047 I would only see the invoice 536367 NOT the other invoices for that client.

paper niche Dec 5, 2020, 1:47 AM

#

@drifting hemlock You can 1. first sort customers by the "InvoiceDate" (so the earliest invoice by date would appear at the top), then 2. .groupby('CustomerID').head(1) which will select the first invoice per group.

azure cedar Dec 5, 2020, 1:51 AM

#

any help here?

serene scaffold Dec 5, 2020, 1:54 AM

#

nan makes me so mad.

azure cedar Dec 5, 2020, 1:54 AM

#

I have and it seems okay and the behavior in the function seems to indicate it's able to pick up where it left off

#

is it okay to use merge as a "check for dataframe symmetry"?

#

not in terms of dimensions but in terms of row content as well

#

i don't want to be accidentally using it not knowing it has some under the hood function that might overwrite something I don't want it to overwrite

#

I'm comparing two tables in memory to make sure each Key entry has the proper metadata associated if not the list of Keys not found in both tables need to be sent to a function to remedy that difference

#

I had a version that merges on "Key", I'm wondering if it referenced the other columns as well when considering that intersection

#

and whether bad columns would have created a N/A or NAN field

lapis sequoia Dec 5, 2020, 2:06 AM

#

anyone here have any experience with artificial neural networks and convolution neural networks in python?

azure cedar Dec 5, 2020, 2:07 AM

#

i have experience in pandas but I didn't know how to verbalize my question enough to find a stackoverflow answer for this

#

so i guess to clarify: two lists, both lists have a Product Name (Key), Serial, and Quantity for example

#

DF 1 may have new entries but I'm checking that the serial and quantity align for existing listings in DF2

#

if they don't they need to be sent to a function to amend DF2 with the new entries

#

If merging on one key is destructive, do I merge on multiple keys?

#

So merging on one key will yield N/As if the other columns don't line up?

#

just trying to find a correct way to validate here

#

so if i take that a step further if I merge on more than column

#

it'll merge on the intersection of all 3 indicated columns?

#

yeah I'm also on the "i think" step so trying to confirm it somehow

#

okay will do

#

brb doing that

#

Sum is 0 for both merge on ['Key'] and ['Key', 'serial', 'quantity]

#

however the latter didn't reduce the final DF by much, infact it has the same count as the original D1 list

#

merging on ['Key'] alone yields a reduced "to send to function" list

#

which is what I'm after but back to square 1 wondering if I'm missing some knowledge about the under-the-hood function of merge

somber bane Dec 5, 2020, 2:27 AM

#

I have a question, why do we still need to divide by n(total number of data) even in stochastic gradient descent?
Isn't that we are iterating each individual data

#

📎 unknown.png

azure cedar Dec 5, 2020, 2:42 AM

#

@somber bane is this the answer you're looking for: https://stats.stackexchange.com/questions/251982/stochastic-gradient-descent-for-regularized-logistic-regression

Cross Validated

Stochastic gradient descent for regularized logistic regression

At 8:30 of this video Andrew Ng mentions that the cost function for stochastic gradient descent (for a single observation) for logistic regression is

$-y_i \log h_w(x_i) - (1 - y_i) \log h_w(1 - x...

somber bane Dec 5, 2020, 3:14 AM

#

does anyone have any article or video that shows how to apply mini batch gradient descent on matrix factorization?

rocky maple Dec 5, 2020, 4:09 AM

#

Who has used pretrained Torch models for videos?

#

I want to use a Resnet3d torchvision.models.video.r3d_18(pretrained=True)

#

But the data needs to be preprocessed in a certain way

#

But I'm stapling my own video frames together, I'm not using a torch dataset.

trim oar Dec 5, 2020, 5:43 AM

#

azure cedar Sum is 0 for both merge on ['Key'] and ['Key', 'serial', 'quantity]

To do what you wanna do, I'd do the following:

Make a copy of both DF to play around
Merge on all the columns that exist on both DF, if all the columns are the same, simply on = df1.columns
Drop "Quantity" on MergedDF so you only have ProductName and SerialNumber
MergedDF.duplicated() would yield any duplicated Product/Serial number. Unless you have two products sharing the same serial number, or two serial numbers sharing the same product name, should help you find duplicated values

#

I didn't realy through everything cuz my head still hurts from the hackathon I just did. So if checking duplicated product is not the only task, please point me to it

azure cedar Dec 5, 2020, 5:44 AM

#

ok let me read this and think it over

#

thanks @trim oar!

trim oar Dec 5, 2020, 5:45 AM

#

🙂

azure cedar Dec 5, 2020, 5:46 AM

#

so would this work if I'm trying to find the inverse

#

Having exact pairs is a good sign

#

so saying for anything that has a pair, remove it from the new DF, if it doesn't send that list to my function

#

would I even have to use merge here then?

#

Couldn't I just concat the tables (identical columns) and find and delete anything that has a valid pair?

#

basically in my case: Pair of files with 3 matching columns = good = do nothing

#

if no pair, needs to be sent to function for rework

trim oar Dec 5, 2020, 5:47 AM

#

Oh

azure cedar Dec 5, 2020, 5:48 AM

#

i based my response off of your idea

trim oar Dec 5, 2020, 5:48 AM

#

concat would be better then, and still duplicated would yield it

#

Because merge probably would have merged identical ones

#

but concat doesn't do that

azure cedar Dec 5, 2020, 5:48 AM

#

yeah i want to 100% avoid any deletion of things which is why i caught myself when applying merge

#

no overwriting or deleting i'm looking for a hard match for pairs as validation

trim oar Dec 5, 2020, 5:49 AM

#

so concat axis = 0

azure cedar Dec 5, 2020, 5:49 AM

#

another way that can be done is concating all 3 columns into a string and hashing it

#

and then simply every time you have to analyze a list hash the row and check the other table for the hash

#

i just realized that

#

what does axis = 0 do

trim oar Dec 5, 2020, 5:50 AM

#

concats vertically

azure cedar Dec 5, 2020, 5:50 AM

#

okay so just glues it on the bottom

trim oar Dec 5, 2020, 5:50 AM

#

yes instead of horizontally

#

That's probably by default I just tend to specify it

#

Glad you have a solution!

azure cedar Dec 5, 2020, 5:51 AM

#

i need to get into a habit of doing that more

#

thanks!!!

shadow quiver Dec 5, 2020, 6:55 AM

#

Hey guys. I just have encountered a ML problem. Say there are photos of leafs, and leafs contain disease on them. And there are masked photos for 4 types of disease, for each leaf. There are tabular data about picture height and weight and encoded mask

Basically it's a plant disease recognition. I should train a model and predict the 4 types of diseases for every leaf. Where can I start this? I'm completely new to this masking topic.

livid quartz Dec 5, 2020, 8:41 AM

#

can someone help me interpret this t-sne plot? I don't know if its good... The dataset I used predicts mortality after thirty days from flu patients. Outcome is binary with 0 being no death and 1 being death

📎 unknown.png

lapis sequoia Dec 5, 2020, 9:45 AM

#

Amazing

#

I wish I’d be as good as you

shadow quiver Dec 5, 2020, 11:22 AM

#

How can I predict image from an image in Keras? Like, input shape is 512x512 and output is also a 512x512 array?

sturdy dune Dec 5, 2020, 12:22 PM

#

Hello everyone, I have written a blog on K-means clustering, the unsupervised machine learning algorithm. Give it a try,I hope it helps you. https://datamahadev.com/understanding-k-means-clustering/

datamahadev.com

Samiksha Bhavsar

Understanding K-Means Clustering - datamahadev.com

This article will introduce you to Unsupervised Learning and will help you gain a proper conceptual understanding of K-Means Clustering.

lapis sequoia Dec 5, 2020, 12:29 PM

#

so I just started learning data analysis with Python...
this is some part of my dirty ass scripts.
do you guys think I'm in the right path?

#

📎 1111.png

wintry olive Dec 5, 2020, 1:01 PM

#

good morning ecneics-atad

#

e nics a tad

#

hypothetical question what if you were given the freedom to model a fictional character

limber crow Dec 5, 2020, 1:03 PM

#

can I use panda with django?

wintry olive Dec 5, 2020, 1:04 PM

#

whoa django i haven't heard that in awhile

#

i completely forgot why we were using it probably for this: a lightweight and standalone web server for development and testing

limber crow Dec 5, 2020, 1:09 PM

#

its actually heavy weight

#

:C

lapis sequoia Dec 5, 2020, 1:32 PM

#

: /

spare lotus Dec 5, 2020, 2:38 PM

#

I need to predict age, gender and race from a picture. I looked at many popular solutions, and they all use CNNs, which I'm not allowed to use. Any ideas on what else I can use for this task?

azure stump Dec 5, 2020, 3:16 PM

#

https://asr373.medium.com/these-5-concepts-can-make-your-life-easier-594544304178?sk=c514e9db55b7971e070b239e109b9aef

Medium

These 5 concepts can make your life easier.

For every data scientists.

trim oar Dec 5, 2020, 3:25 PM

#

spare lotus I need to predict age, gender and race from a picture. I looked at many popular ...

RNN or its variations

#

Takes in 3D shape, learns from previous sequence, tho I have not done it on images

trim oar Dec 5, 2020, 3:59 PM

#

lapis sequoia so I just started learning data analysis with Python... this is some part of my ...

I'm not sure what do you mean by the right path, especially when the same things can be done in many ways, and I think simply by taking actions to learn is admirable already, but:

df.groupby(df['Fly Date'])[['passengers','seats']].sum().reset_index should achieve your occupancy_rate already. Where pandas only seems to only take one argument, you can always put the list of columns inside a list to input as a single argument. Same thing with your org
calculation looks good, but it may be more sensible to separate month and year. This way you can do more manipulation later for ad hoc requests, such as calculation by month or by year. You can to_datetime first, and then create new columns with to_period. See here. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.to_period.html
0.425 is not a strong correlation but still a moderate correlation. You can see the cut-off for correlations here. https://www.dummies.com/education/math/statistics/how-to-interpret-a-correlation-coefficient-r/

dummies

How to Interpret a Correlation Coefficient r - dummies

In statistics, the correlation coefficient r measures the strength and direction of a linear relationship between two variables on a scatterplot. The value of r is always between +1 and –1. To interpret its value, see which of the following values your correlation r is closest to: Exactly –1. A perfect downhill (negative) linear relationship […]

waxen birch Dec 5, 2020, 4:12 PM

#

having matrix with columns x1,x2.....x15,y i have to perform linear regression, do you guys know how to do it using python?

#

15 x columns and one y column

#

(i can provide a sample of data if needed)

trim oar Dec 5, 2020, 4:33 PM

#

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html

#

Assume you've normalised the data

worthy scarab Dec 5, 2020, 4:34 PM

#

Any idea why it says sigma has incorrect shape?

📎 unknown.png

trim oar Dec 5, 2020, 4:35 PM

#

It's as simple as model.fit_transform(X_train, ytrain), model.score(X_test, y_test)

bronze skiff Dec 5, 2020, 4:40 PM

#

worthy scarab Any idea why it says sigma has incorrect shape?

print out the shapes of sigma (in this case, Error?)

#

also, welcome to python-- please learn to use camelcase

heady tide Dec 5, 2020, 4:44 PM

#

quick question, Spacy or Nltk for lemmatization?

trim oar Dec 5, 2020, 5:00 PM

#

Personally nltk

lapis sequoia Dec 5, 2020, 5:13 PM

#

@trim oar
I am so much grateful for your sincere advices! It’s been only a month to learn data analysis with python so I barely know few stuff. Thanks again and I’ll look into those links!!

heady tide Dec 5, 2020, 5:37 PM

#

@trim oar I see, went with that since it offers more flexibility

#

Are there any more steps that I need to take before computing the tf-idf ?

#

should I remove all the stop words ? check ngrams ? Or will tf-idf do that for me ?

trim oar Dec 5, 2020, 6:37 PM

#

heady tide <@!680795279758327819> I see, went with that since it offers more flexibility

I have limited experience with NLP but here's my two cents. TF-IDF works like this: the more a word appears in a single corpus, the more the weight it'll have. At the same time, the more the same word occurs in different corpuses, the less the weight it'll have. That sort of balance it out.TF-IDF probably could balance out the stop words for you, but I'd personally just do the usual cleaning such as removing the stop words. ngrams is different iirc, about providing context (grouping of words). I haven't worked much on it yet, but I'd imagine you'd still need them.

#

I welcome correction

quasi crypt Dec 5, 2020, 7:05 PM

#

Hi! is there a more advance library for encrypt PDF's?
I've tried PyPDF2 but the options are limited (i can't set all these options to what i want).

📎 unknown.png

heady tide Dec 5, 2020, 7:11 PM

#

much appreciated @trim oar, I will remove stop words as you said and then lemmatize, then the data will be fed to the tf-idf

lapis sequoia Dec 5, 2020, 7:45 PM

#

I got a question. How can i reload my neural network weights? i wanna recalculate them ONLY if there have been changes on number of epochs, batch_size or anything that if changed, will modify the weights

ocean pumice Dec 5, 2020, 8:00 PM

#

doesn't it depend on your implementation ?

mellow kestrel Dec 5, 2020, 8:38 PM

#

hello , am new to python ,question , is it easier to use Conda for installing tessract for an OCR project

bronze skiff Dec 5, 2020, 9:18 PM

#

why not just use pip

lapis sequoia Dec 5, 2020, 9:23 PM

#

idk, i can change my implementation for sure. I would like to refactor it, but idk how

bronze skiff Dec 5, 2020, 9:24 PM

#

lapis sequoia I got a question. How can i reload my neural network weights? i wanna recalculat...

don't you save your weights after training? just load them back in for inference and don't retrain it

mellow kestrel Dec 5, 2020, 9:29 PM

#

bronze skiff why not just use pip

yeah ill try it thanks 😄

lapis sequoia Dec 5, 2020, 9:31 PM

#

yeah but 1 epoch more will make weights change

#

so if i changed epochs, i need to recalculate weights

#

1 epoch or 1 more image on trainin

#

or validation

#

or w/e

trim oar Dec 5, 2020, 9:32 PM

#

I haven’t tried but

#

ModelCheckpoint has save_weights_only attribute

#

https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint

TensorFlow

tf.keras.callbacks.ModelCheckpoint | TensorFlow Core v2.3.0

Callback to save the Keras model or model weights at some frequency.

lapis sequoia Dec 5, 2020, 9:33 PM

#

save_weights saves weights only

trim oar Dec 5, 2020, 9:34 PM

#

Yeah isn’t that what you wanted? Change however you want and then load the weights?

lapis sequoia Dec 5, 2020, 9:34 PM

#

yeah but how do i know if training data or epochs or anything changed?

trim oar Dec 5, 2020, 9:36 PM

#

What do you mean? Isn’t transfer learning all about having pretrained model and train on new data/problem to save time?

#

Sorry I didn’t read anything else if you stated your origins problem

lapis sequoia Dec 5, 2020, 9:37 PM

#

mmm

#

imagine i have 10 images

trim oar Dec 5, 2020, 9:37 PM

#

Ok

lapis sequoia Dec 5, 2020, 9:37 PM

#

the first time, all the weights are calculated, but for the second time, i just load them. I dont need to recalculate them

#

i only wanna recalc them if i have 11 images, for example

#

or if i have 10 images but different from the first ones

#

or if number of epochs change

trim oar Dec 5, 2020, 9:38 PM

#

Yeah so you just save the model

lapis sequoia Dec 5, 2020, 9:38 PM

#

or anything that will make weights change

trim oar Dec 5, 2020, 9:39 PM

#

There may be better way for it but I would have just saved the model as a file

#

You just want to predict now so it’s model.predict, no?

lapis sequoia Dec 5, 2020, 9:39 PM

#

yes

#

but again, if my training data changes, weights will change, and i want to recalculate them

trim oar Dec 5, 2020, 9:42 PM

#

So yeah, have a folder specific to your project, save the entire model to an h5 file or whichever, document date and probably what did you train it on with a txt. You have that version of the model forever now, and you can keep on training without losing the original one.

lapis sequoia Dec 5, 2020, 9:42 PM

#

i dont mean that

trim oar Dec 5, 2020, 9:42 PM

#

Unless I’m missing something

lapis sequoia Dec 5, 2020, 9:42 PM

#

i just wanna recalculate weights if anything that would have make weights change, has changed

trim oar Dec 5, 2020, 9:44 PM

#

Oh cuz you have it on a running script you mean

lapis sequoia Dec 5, 2020, 9:44 PM

#

yes

trim oar Dec 5, 2020, 9:45 PM

#

Ah.. sorry then. I’m not familiar with the production environment yet

#

Sorry for the misunderstanding

lapis sequoia Dec 5, 2020, 9:46 PM

#

no problems

bronze skiff Dec 5, 2020, 9:55 PM

#

still not understanding

#

you have a model that's you've trained with some hyperparameters-- i assume you pass it in using argparse or something in the command line?

#

then you have to things you want: if you rerun the script again with no changes in hyperparameters, it'll just load the last trained model and run inference

#

otherwise if something changes, you retrain?

#

because it sounds like you want a CI/CD integration

#

with state+hyperparams being managed by, say, mlflow or something

#

@lapis sequoia

lapis sequoia Dec 5, 2020, 10:00 PM

#

mmm it is written on python xd

#

i can paste the code if u really dont understand

#

but right now, i retrain the nn if there isnt any file like weights. But i have to manually delete the file so the nn is trained again. So yeah, i wanna retrain it if anything changed

arctic wedgeBOT Dec 5, 2020, 10:02 PM

#

Hey @lapis sequoia!

It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com

lapis sequoia Dec 5, 2020, 10:03 PM

#

https://pastebin.com/0ejUWw4a

Pastebin

ai - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

sorry if coding sucks 😄

#

as you see, data is remade if npy files do not exist (but should be remade also if the content has changed) and same for weights. So if i want that to happen, i need to remove the files

bronze skiff Dec 5, 2020, 10:11 PM

#

yeah, but that's nothing you can do within the script

#

if you change something outside of the script and then rerun it, the script doesn't know what changed

#

unless the script queries some persistent datastore that can check the diff of the files

lapis sequoia Dec 5, 2020, 10:13 PM

#

yeah i though something about (for weights only) saving on a txt the epochs and batch_size, and on the script itself, compare the current ones with the ones on the txt

#

but for the images... idk what to do

#

i wanted to put it on al class for me, but i dont have that experience on making classes

#

my plan was:
one class called cnn, with functions such as: load data, train, and predict

#

and save ofc

bronze skiff Dec 5, 2020, 10:15 PM

#

usually for hyperparameters in the script you can write them out into a yaml or txt file as metadata

#

and then compare it to the passed in hyperparameters when running the script to see if an update is needed

lapis sequoia Dec 5, 2020, 10:15 PM

#

yeah but i think everything will be better inside a class

bronze skiff Dec 5, 2020, 10:15 PM

#

data follows similar techniques-- you can perform a hash of the data

#

if the hash of your new data differs from the old one, reupdate

lapis sequoia Dec 5, 2020, 10:16 PM

#

yeah i though about md5 for images. But to create the md5 i need to make the npy file, and this is what takes time 😛

#

is like: creating the npy everytime, and save or not

#

but what takes time is creating it

#

anyway, my neural network thinks bulbasaur is graveler so

#

xDDD

#

i dont think i will make it work

#

cuz i dont have much more images

jaunty scroll Dec 5, 2020, 10:21 PM

#

hello is it possible to convert json to dataframe if the levels of nesting in json data are arbitrary and change from tier to tier?

vague mauve Dec 5, 2020, 11:02 PM

#

guys i need help from someone who really knows what the are doing when it comes to coding especially in the realms of a.i if you do please dm me

hollow gull Dec 5, 2020, 11:16 PM

#

jaunty scroll hello is it possible to convert json to dataframe if the levels of nesting in js...

what does change from tier to tier mean?

hollow gull Dec 5, 2020, 11:16 PM

#

vague mauve guys i need help from someone who really knows what the are doing when it comes ...

Ask your question if you have one.

fossil pecan Dec 5, 2020, 11:16 PM

#

guys is there anyone that knows about keras? I have a simple question.

hollow gull Dec 5, 2020, 11:17 PM

#

Ask your question if you have one.

hollow gull Dec 5, 2020, 11:17 PM

#

fossil pecan guys is there anyone that knows about keras? I have a simple question.

Ask your question if you have one.

jaunty scroll Dec 5, 2020, 11:18 PM

#

hollow gull what does change from tier to tier mean?

I mean that sometimes a nested dictionary will X number of values, and then the next tier down might have 10 more than that for example, and then theres also lists of dictionaries

#

and if im trying to create dataframe, it doesnt seem like it fits that changing number of values in a column-row relationship

fossil pecan Dec 5, 2020, 11:18 PM

#

I see everywhere this, a variable next to a layer. What does it mean? The picture is from a guide on the official keras site

📎 unknown.png

hollow gull Dec 5, 2020, 11:19 PM

#

jaunty scroll I mean that sometimes a nested dictionary will X number of values, and then the ...

I would try converting it to a dict and dropping it into a dataframe. I suspect that pandas is going to search all of the rows to determine what columns need to be present.

jaunty scroll Dec 5, 2020, 11:22 PM

#

hollow gull I would try converting it to a dict and dropping it into a dataframe. I suspect ...

ok i will give that a go thank you

hollow gull Dec 5, 2020, 11:23 PM

#

pandas typically does a good job of converting a dict into a dataframe in my experience with a syntax like

df = pd.DataFrame(dict_data)

hollow gull Dec 5, 2020, 11:58 PM

#

@glad mulch the thing that is going to make this a bit harder is that there are so many interaction terms with SOC. But in general I would say write out the equation for the linear regression. Then you can group the equation based on different variables and see how the predictions are going to depend on that variable.

#

SOC and energy are both slightly negative but SOC Energy is highly positive. Therefore if you hold SOC constant and increase energy what is the impact. What is the impact if you increase both SOC and Energy?

visual rivet Dec 6, 2020, 12:27 AM

#

fossil pecan I see everywhere this, a variable next to a layer. What does it mean? The pictur...

i think it means that theres is 10 possible categories so there needs to be 10 outputs. its weird because i would expect there to be a softmax activation. without looking at the rest of the guide im not 100%

visual rivet Dec 6, 2020, 12:28 AM

#

fossil pecan I see everywhere this, a variable next to a layer. What does it mean? The pictur...

also i think the official keras documentation is different than the tensorflow doc. so the implementation would be different

visual rivet Dec 6, 2020, 12:50 AM

#

fossil pecan I see everywhere this, a variable next to a layer. What does it mean? The pictur...

looking at it again, it seems to be using a functional api so its passing in the first layer "x" to a output layer with 10 nodes.

fossil pecan Dec 6, 2020, 12:51 AM

#

@visual rivet thank you very much. Now I get it.

trim oar Dec 6, 2020, 2:00 AM

#

I'm not super great at statistics yet but from the equity world it's pretty simple. Value stocks in the past 10 years have been underperforming. But when times are bad, money flocks to value instead of growth. And I think it's early this year before March or last year that we saw a super abnormal flock into the energy sector.

deep rock Dec 6, 2020, 4:09 AM

#

Heyo!
So, i have created this dataframe with extra data that i calculated from original data source:

#

📎 unknown.png

#

Here is how I constructed the dataframe:

#

📎 unknown.png

#

The thing is ... I ploted another line with the acceptance_rate_y values on y-axis but it only initiates on value 2 on x-axis. Like this:

#

📎 unknown.png

#

Anyone can help?

trim oar Dec 6, 2020, 5:09 AM

#

deep rock The thing is ... I ploted another line with the acceptance_rate_y values on y-ax...

I only saw countplot with data from tm_df by the way

cunning dust Dec 6, 2020, 5:48 AM

#

Hi sorry i may be asking something really silly but i have a question

#

if i have a huge number of information, wich would be faster to analyse and occupy less space, numbers or letters?

trim oar Dec 6, 2020, 7:08 AM

#

cunning dust if i have a huge number of information, wich would be faster to analyse and occu...

There never is a stupid question. Usually to analyze text they are still encoded.

#

I’m not sure where you’re getting at or what you’re trying to do

lapis sequoia Dec 6, 2020, 8:13 AM

#

Wew, I made it over the ~waves~

north plinth Dec 6, 2020, 10:14 AM

#

hey can anybody help me

#

bout convolution neural network

livid quartz Dec 6, 2020, 10:19 AM

#

"J48 performs better than Random forest because it deals with both categorical and continuous values,
whereas Random forest gets biased in favor of the attributes with categorical values. "

#

I found that statement in a research paper, just wondering if it was true that random forest is biased towards catergorical values?

north plinth Dec 6, 2020, 10:32 AM

#

can anybody tell me how can i run convolution neural network with just numpy ..Weights are saved in a pickle file which was trained with keras

lapis sequoia Dec 6, 2020, 10:43 AM

#

are there any great youtube tutorials on data science ?

#

should i opt in buying a udemy course instead ?

whole vortex Dec 6, 2020, 12:26 PM

#

Guys, I've been given to task to cluster a seed dataset but unsure how to cluster the info as a row represents a different seed but unsure what data from each row to use when clustering the data

regal herald Dec 6, 2020, 12:26 PM

#

lapis sequoia should i opt in buying a udemy course instead ?

That should be your last resort.

whole vortex Dec 6, 2020, 12:28 PM

#

lapis sequoia are there any great youtube tutorials on data science ?

Bruh there's loads of pandas tutorials online

ocean pumice Dec 6, 2020, 12:28 PM

#

google have a lot of ressource about NN from numpy. i was searching for it a few days ago 🙂

regal herald Dec 6, 2020, 12:28 PM

#

lapis sequoia are there any great youtube tutorials on data science ?

There's great websites and Youtube tutorials on it.

#

@lapis sequoia List of resources for data science, I can remember:
Coursera - https://www.coursera.org/learn/python-data-analysis
SimpliLearn - https://www.simplilearn.com/big-data-and-analytics/python-for-data-science-training
DataCamp - https://www.datacamp.com/courses/intro-to-python-for-data-science
FreeCodeCamp video - https://www.youtube.com/watch?v=LHBE6Q9XlzI
Edureka video - https://www.youtube.com/watch?v=-6RqxhNO2yY

Coursera

Introduction to Data Science in Python

Offered by University of Michigan. This course will introduce the learner to the basics of the python programming environment, including fundamental python programming techniques such as lambdas, reading and manipulating csv files, and the numpy library. The course will introduce data manipulation and cleaning techniques using the popular python...

Simplilearn.com

Data Science with Python Course

Enroll now for Data Science with Python course to get expertise in Python libraries such as NumPy, Pandas, SciPy, and Matplotib with ✔️ 6 Real-World Projects.

Introduction to Python

Master the basics of data analysis in Python. Expand your skillset by learning scientific computing with numpy.

YouTube

freeCodeCamp.org

Python for Data Science - Course for Beginners (Learn Python, Panda...

This Python data science course will take you from knowing nothing about Python to coding and analyzing data with Python using tools like Pandas, NumPy, and Matplotlib.

This is a hands-on course and you will practice everything you learn step-by-step.

💻 Code: https://github.com/datapublishings/Course-python-data-science

🎥 Learn more about Dat...

▶ Play video

YouTube

edureka!

Python For Data Science Full Course - 9 Hours | Data Science With P...

🔥Edureka Python Certification Training: https://www.edureka.co/data-science-python-certification-course
This Edureka video on the 'Python For Data Science Full Course' will help you learn Python for Data Science including all the relevant libraries. Following are the topics discussed in this tutorial:
00:00 Agenda
02:37 Introduction To Data Scie...

▶ Play video

lapis sequoia Dec 6, 2020, 1:12 PM

#

regal herald There's great websites and Youtube tutorials on it.

thanks so much man

whole vortex Dec 6, 2020, 3:32 PM

#

Hey guys, I have a csv file with 3 columns. The first representing the row number, the second representing a node and the third representing a second node. I want to store the edge for each row between nodes for all rows... Does anyone know how I can do that with the networkx library?

noble rock Dec 6, 2020, 3:41 PM

#

https://discover-ai-with-microsoft.agorize.com/en/challenges/msazurevirtualhack-2021?lang=en

Microsoft Azure Virtual Hackathon 2021

Provide innovative solutions in advanced data analytics and AI for a number of booming industries!

#

Are u interested in this competition ?

#

We need only a team member because we have grouped with three people already. If you're eligible and would like to join , feel free to let me know.

trim oar Dec 6, 2020, 3:47 PM

#

whole vortex Hey guys, I have a csv file with 3 columns. The first representing the row numbe...

Can you translate that for a non-network person to understand?

whole vortex Dec 6, 2020, 3:47 PM

#

I have 2 columns in a csv file

#

column1 represents node1

#

column2 represents node2

#

node 1 always connects to node2

#

the connection between node1 and node2 is called an edge

#

I need to find a way to create a list of edges and visualise the data from the csv file as a graph

#

@trim oar

trim oar Dec 6, 2020, 3:49 PM

#

so each data point / row is a connection

#

and there are some properties on node2 that could be on node1 representing continuous connection

arctic wedgeBOT Dec 6, 2020, 5:35 PM

#

Hey @solid isle!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

solid isle Dec 6, 2020, 5:37 PM

#

Can anyone help me in installing chatterbot library, I have tried several times but it is showing error , I tried to install it in pycharm, anaconda but at both places, I got error

#

WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/chatterbot/ WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/chatterbot/ WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/chatterbot/ WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/chatterbot/ WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='pypi.org', port=443): Read timed out. (read timeout=15)")': /simple/chatterbot/ ERROR: Could not find a version that satisfies the requirement chatterbot ERROR: No matching distribution found for chatterbot

#

this was the error, can anyone tell what's the issue and how should I install this library

trim oar Dec 6, 2020, 5:53 PM

#

While waiting, and while I don't have experience with chatterbox, I did encounter similar things awhile back. What happened to me was that I installed tensorflow on my machine but Jupyter was not recognizing it somehow. So it was a simple pip install tensorflow on the Jupyter notebook again for me at the end, but basically it may be the environment/path that you're directing them to

#

are different

tribal wind Dec 6, 2020, 6:04 PM

#

hello guys, anyone here good with NLTK? could you pls dm me if possible

heady tide Dec 6, 2020, 6:28 PM

#

Has anyone used a multiprocessing pool for lemmatization ? If so did it considerably improve the execution time ?

solid isle Dec 6, 2020, 6:35 PM

#

trim oar While waiting, and while I don't have experience with chatterbox, I did encounte...

how to change the path then😭 for anaconda

civic fractal Dec 6, 2020, 6:35 PM

#

https://stackoverflow.com/questions/65043975/batch-convert-pandas-dataframe-to-protobuf/65135908?noredirect=1#comment115206320_65135908

Stack Overflow

Batch convert pandas dataframe to protobuf

I have a dataframe with about 1.5 million rows. I want to convert this to a protobuf.
Naive method

generated with protoc

import my_proto

pb = my_proto.Table()
for _, row in big_table.iterrows():...

Stack Overflow

Batch convert pandas dataframe to protobuf

I have a dataframe with about 1.5 million rows. I want to convert this to a protobuf.
Naive method

generated with protoc

import my_proto

pb = my_proto.Table()
for _, row in big_table.iterrows():...

#

Could anyone answer this, would me much appreciated

#

Pandas and protobuf

#

Could anyone answer this, would be much apprecaite

heady tide Dec 6, 2020, 6:40 PM

#

protobuf will replace json

rocky copper Dec 6, 2020, 8:52 PM

#

can anyone reccomend me a good plagiarism checker algorithm for comaparing to two documents.

cobalt elbow Dec 7, 2020, 1:01 AM

#

hello

cobalt elbow Dec 7, 2020, 1:01 AM

#

rocky copper can anyone reccomend me a good plagiarism checker algorithm for comaparing to tw...

bibliography

thin wolf Dec 7, 2020, 1:33 AM

#

Hi y’all! I’m a python beginner trying to learn how to use Jupyter notebooks!

#

Good at sql, but new to notebooks and python

#

Does anyone know where I can find a list of recommended resources to get started?

thin wolf Dec 7, 2020, 1:37 AM

#

regal herald <@456226577798135808> List of resources for data science, I can remember: Course...

Oh just saw this

frosty pine Dec 7, 2020, 4:55 AM

#

Anyone active?

#

I have questions about Machine learning

austere swift Dec 7, 2020, 5:01 AM

#

Just ask your questions

olive lichen Dec 7, 2020, 5:39 AM

#

hey all, does anyone know of how to do regular expression searches which iterate over a list

#

e.g. I have a list of regular expressions r, and I have some list of strings s and I want to search s for each regular expression in r

trim oar Dec 7, 2020, 5:57 AM

#

olive lichen e.g. I have a list of regular expressions *r*, and I have some list of strings *...

List comprehensions?

olive lichen Dec 7, 2020, 6:12 AM

#

trim oar List comprehensions?

having a hard time coming up with one. not sure how to iterate over a list when the iteration is happening inside of re.findall(x,y)

digital charm Dec 7, 2020, 6:52 AM

#

Can i ask some doubts here?

midnight stone Dec 7, 2020, 7:39 AM

#

is this the right place to ask about dirs and paths

lapis sequoia Dec 7, 2020, 8:14 AM

#

guys does anyone knows Data Science with Python. If yes can u contact me

azure stump Dec 7, 2020, 8:19 AM

#

These functions can really help u guys
https://medium.com/analytics-vidhya/knowing-these-can-make-you-better-in-python-26e43afc0fd

Medium

Knowing these can make you better in Python.

Functions you use at very rare occasions

hardy shale Dec 7, 2020, 8:37 AM

#

Might be a Careers question, however do you guys feel like a masters/PHD is required to get a datascience/data engineering position?

whole vortex Dec 7, 2020, 9:00 AM

#

Can someone help me out with getting this info and plotting it into graphs

eager heath Dec 7, 2020, 9:01 AM

#

Is this an assignment question?

whole vortex Dec 7, 2020, 9:53 AM

#

@eager heath yeah

#

But it’s the only damn question I haven’t answered

#

I think it’s because there are 3 pieces of information

#

I feel like I’d need multiple graphs here for each day of the week for each bike type

#

So there are 3 bike types and 7 days in a week

#

So 21 graphs reepresenting the duration of a bike ride for each bike on each day of the week

#

@eager heath would you say I’m on the right track or no

eager heath Dec 7, 2020, 9:56 AM

#

!rule 5

arctic wedgeBOT Dec 7, 2020, 9:56 AM

#

Rules

5. Do not provide or request help on projects that may break laws, breach terms of services, be considered malicious or inappropriate. Do not help with ongoing exams. Do not provide or request solutions for graded assignments, although general guidance is okay.

eager heath Dec 7, 2020, 9:56 AM

#

We can't help with with an assignment, sorry

whole vortex Dec 7, 2020, 10:06 AM

#

But this is outside if that

#

I can reference the help

ripe lion Dec 7, 2020, 10:33 AM

#

Does anybody have a good tutorial on how to use the matplotlib basemap?

hoary breach Dec 7, 2020, 11:29 AM

#

I am trying to learn monte carlos using python. Can anyone help

#

for background this is for modeling molecules

ocean pumice Dec 7, 2020, 12:01 PM

#

what's your problem with montecarlo ?

#

did you try solving pi with it ? there isn't much to understand about montecarlo

hoary breach Dec 7, 2020, 12:04 PM

#

no i didnt try to solve pi with it

#

funny..

ocean pumice Dec 7, 2020, 12:04 PM

#

it should help you understand it, and it's very easy to implement

hoary breach Dec 7, 2020, 12:04 PM

#

i just wanted to know if monte carlo is the only way for probability heuristics

ocean pumice Dec 7, 2020, 12:05 PM

#

ha

#

probably not ? dunno

#

the best video about spiking neuron i found so far : https://www.youtube.com/watch?v=5SrEycIbfRE&list=PL09WqqDbQWHFvM9DFYkM_GfnrVnIdLRhy

YouTube

Neural Reckoning

Sander Bohte (CWI) - Effective and Efficient Computation with Multi...

The emergence of brain-inspired neuromorphic computing as a paradigm for edge AI is motivating the search for high-performance and efficient spiking neural networks to run on this hardware. However, compared to classical neural networks in deep learning, current spiking neural networks lack competitive performance in compelling areas. Here, for ...

▶ Play video

jagged plume Dec 7, 2020, 1:23 PM

#

Hi, I am trying to plot a confusion matrix but I am getting the following error:

Error when checking input: expected sequential_22_input to have shape (50, 99) but got array with shape (1, 50)

This is my code: https://pastebin.com/4hqV7hmi

Any clue what the issue is, I have been trying to resolve this for hours now... 😅

Pastebin

def kerasNeuralNetwork(): # Reference: https://www.youtube.com/w...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

still delta Dec 7, 2020, 2:14 PM

#

Guys after converting PDF to String, I want to extract a line from a long string text, for example a question "1. *************?" from the string?

#

Anyone could help?

torpid cave Dec 7, 2020, 2:52 PM

#

use regex?

#

Not sure

#

With an example it would be easier to help you

worthy scarab Dec 7, 2020, 2:53 PM

#

📎 unknown.png

#

anyone any idea why this curve fit isnt looking the same as my plot

lapis sequoia Dec 7, 2020, 2:54 PM

#

Hello everyone! I just dropped a new YouTube video on how to interact with APIs in Python to load and work with data. Let me know what you think. https://youtu.be/laOQ3Sfw5yo

YouTube

ThirdEyeCyborg

Python Programming Project - Python Project Examples - Episode 2 | ...

Python Project 2 : How to interact with APIs in Python
Beginner Level Tutorial

See all my content here:
https://linktr.ee/thirdeyecyborg

Medium Article referenced in this video:
https://towardsdatascience.com/how-to-interact-with-apis-in-python-10efece03d2b

To discover more about this Python crash course, PLEASE check out:
https://thirdeyecy...

▶ Play video

worthy scarab Dec 7, 2020, 2:54 PM

#

Looks like my coefficients are completely wrong

still delta Dec 7, 2020, 3:15 PM

#

torpid cave With an example it would be easier to help you

I think it will work, I am on it thank so much

brazen owl Dec 7, 2020, 3:49 PM

#

i

#

Hi

#

Give the mean of the series and its standard deviation. Is it possible to model the lifetime with a
exponential law? Argument based on the values obtained for the mean and the standard deviation.

#

that what i need to do

arctic wedgeBOT Dec 7, 2020, 3:49 PM

#

Hey @brazen owl!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

brazen owl Dec 7, 2020, 3:49 PM

#

import numpy as np
import matplotlib.pyplot as plt
import math

dat=np.loadtxt (fname=r"C:\Users\Amine13\Desktop\COURS 3I\math maintenance\a09.txt")
x=dat[:,0]
y=dat[:,1]


plt.plot(x,y,'.')
plt.show()

final scaffold Dec 7, 2020, 5:30 PM

#

Hi, need a quick help! Consider that I've (10+x) columns {x: 0 to any even number}.

I want to apply a condition where column 6th is not null + column names containing the word "events" is not null (these "events" column will depend on x).
How to do this in pandas?

serene scaffold Dec 7, 2020, 5:34 PM

#

Is there a way to do this without looping?

    letters = ['A', 'B', 'C', 'D', 'E']
    vec = np.ones((5,))
    for i, letter in enumerate(letters):
        vec[i] = np.nanmax(n[LETTER_GRID == letter])

pure sedge Dec 7, 2020, 6:22 PM

#

Hi i have downloaded stock market data from yahoo finance in csv format , i want it to update daily in my csv , how can i do that?

serene scaffold Dec 7, 2020, 6:28 PM

#

pure sedge Hi i have downloaded stock market data from yahoo finance in csv format , i want...

what have you tried so far?

pure sedge Dec 7, 2020, 6:36 PM

#

What i want to do eventually is create flask app with list of listed companies and when user selects date it creates a auto arima model

#

But what is the use if it does not update daily

twilit pilot Dec 7, 2020, 6:38 PM

#

Hello, I am getting this error when I am using the sklearn.linear_model LinearRegression. ```py
from sklearn.linear_model import LinearRegression
import numpy as np

X = np.array([1604275200 1604361600 1604448000 1604534400 1604620800 1604880000
1604966400 1605052800 1605139200 1605225600 1605484800 1605571200
1605657600 1605744000 1605830400 1606089600 1606176000 1606262400
1606435200 1606694400 1606780800])

y = np.array([400.51000977 423.8999939 420.98001099 438.08999634 429.95001221
421.26000977 410.35998535 417.13000488 411.76000977 408.5
408.08999634 441.60998535 486.64001465 499.26998901 489.60998535
521.84997559 555.38000488 574. 585.76000977 567.59997559
584.76000977])

model = LinearRegression()
model.fit(X, y)
The error looks like this
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

fiery fossil Dec 7, 2020, 6:44 PM

#

serene scaffold Is there a way to do this without looping? ```py letters = ['A', 'B', 'C', '...

What's 'n' and 'LETTER_GRID'?

livid quartz Dec 7, 2020, 7:04 PM

#

Hi, . I want to do some exploratory data analysis on some data before I put it into a model, like create a correlation matrix, but with 77 variables it won't look tidy. Is there a way that I can reduce the number of variables by selecting the most important ones? https://github.com/GitEricLin/BMJOpen/blob/master/OGDataset.xlsx <- the data

GitHub

GitEricLin/BMJOpen

BMJ Open - Article “Predicting Mortality in Critically Ill Influenza Patients using an Extreme Gradient Boosting Model: a Multicentre Study in Taiwan" - GitEricLin/BMJOpen

serene scaffold Dec 7, 2020, 7:10 PM

#

fiery fossil What's 'n' and 'LETTER_GRID'?

another matrix with the same shape as LETTER_GRID

#

And LETTER_GRID is a matrix of strings.

#

whereas n is all floats.

trim oar Dec 7, 2020, 7:19 PM

#

twilit pilot Hello, I am getting this error when I am using the sklearn.linear_model LinearRe...

It's exactly like it says. Reshape it to 2D.

twilit pilot Dec 7, 2020, 7:20 PM

#

i did that but then when i print the coef_ and the intercept_, i get 0, 0

#

which makes not sense, because the graph has an upward trend

ionic moss Dec 7, 2020, 7:27 PM

#

Hi, I have a Python Pandas question

I have a dataframe containing categorical columns Country/Region and Province/State.
I want to visualize the state/province wise combine number of confirmed, deaths, recovered, active COVID-19 cases in the USA
Ideally I want a bar chart with each of the states on the x-axis and the numerical values from column 'Confirmed', 'Deaths', 'Recovered' and 'Active' on the y-axis

📎 unknown.png

fiery fossil Dec 7, 2020, 7:27 PM

#

serene scaffold And `LETTER_GRID` is a matrix of strings.

Run this code as it is gives error , as n and LETTER_GRID is not defined. What output do you want from this code?

ionic moss Dec 7, 2020, 7:28 PM

#

Ideally it should look like this, all hints and tips are appericiated

📎 unknown.png

serene scaffold Dec 7, 2020, 7:29 PM

#

fiery fossil Run this code as it is gives error , as `n` and `LETTER_GRID` is not defined. Wh...

that's not the entire program, so it's not expected that it would run on its own.

trim oar Dec 7, 2020, 7:30 PM

#

twilit pilot i did that but then when i print the coef_ and the intercept_, i get 0, 0

I didn't get 0

#

How did you reshape it

fiery fossil Dec 7, 2020, 7:32 PM

#

serene scaffold that's not the entire program, so it's not expected that it would run on its own...

I meant to say what is this block of code doing?

twilit pilot Dec 7, 2020, 7:34 PM

#

trim oar I didn't get 0

oh wait nvm. i was comparing stocks and 0 was what i got for a different stock. the other stock had no linear relationship lol. thanks for the tip tho

trim oar Dec 7, 2020, 7:35 PM

#

🙂

trim oar Dec 7, 2020, 7:50 PM

#

ionic moss Ideally it should look like this, all hints and tips are appericiated

I can't play around with the notebook to find it for you right now but: https://stackoverflow.com/questions/38807895/seaborn-multiple-barplots

Stack Overflow

Seaborn multiple barplots

I have a pandas dataframe that looks like this:

class       men       woman   children

0 first 0.91468 0.667971 0.660562
1 second 0.30012 0.329380 0.882608
2 third 0.11899...

#

Don't know if this helps

final scaffold Dec 7, 2020, 7:51 PM

#

Hi, need a quick help! Consider that I've (10+x) columns {x: 0 to any even number}.

I want to apply a condition where column 6th is not null + column names containing the word "events" is not null (these "events" column will depend on x).
How to do this in pandas?

trim oar Dec 7, 2020, 7:57 PM

#

final scaffold Hi, need a quick help! Consider that I've (10+x) columns {x: 0 to any even numbe...

I don't quite understand what you're trying to convey but check out applymap. You can take alook here: https://stackoverflow.com/questions/19798153/difference-between-map-applymap-and-apply-methods-in-pandas

Stack Overflow

Difference between map, applymap and apply methods in Pandas

Can you tell me when to use these vectorization methods with basic examples?

I see that map is a Series method whereas the rest are DataFrame methods. I got confused about apply and applymap meth...

final scaffold Dec 7, 2020, 7:57 PM

#

Should i share an image of a sample?

trim oar Dec 7, 2020, 8:00 PM

#

Sure but I don't have anything to play with right now

final scaffold Dec 7, 2020, 8:03 PM

#

📎 image.png

#

Please see this ^@trim oar

#

Number of columns Untill column L is fixed, but after that columns will vary, there can be either 0 column or any number of columns

#

I want to filter this table when 'Installs'!=null & columns containing string 'events' are not null as well

trim oar Dec 7, 2020, 8:14 PM

#

This is not optimized but if it’s a one time thing I’d probably just:

Filter with regex to retain only columns with “event”. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html
Concat or merge with installs
Dropna (how=“any”)

#

There’s probably a better way to do it just not on top of my head right now

#

But as always I suggest playing these with copy() only.

final scaffold Dec 7, 2020, 8:19 PM

#

That's what i have been thinking to do, but i was avoiding this.
Well, if you come across something better, please ping me.

#

Thank you.

strong oasis Dec 7, 2020, 8:31 PM

#

Hey has anybody here done the freecodecamp rock paper scissors program?

https://www.freecodecamp.org/learn/machine-learning-with-python/machine-learning-with-python-projects/rock-paper-scissors

It's my first ML program and I could use some help with it this week if anybody is familiar and/or available.

freeCodeCamp.org

Learn to code. Build projects. Earn certifications.Since 2015, 40,000 graduates have gotten jobs at tech companies including Google, Apple, Amazon, and Microsoft.

arctic wedgeBOT Dec 7, 2020, 8:40 PM

#

Hey @true oasis!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .txt attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

lapis sequoia Dec 7, 2020, 10:06 PM

#

What do you want to do with data ?

red roost Dec 7, 2020, 10:09 PM

#

Can anyone here point me to a good guide for advanced machine learning. I'm having trouble finding people on youtube who actually KNOW WHAT THEY ARE TALKING ABOUT. I am looking for some pseudocode for a basic neural net node class

vital patio Dec 7, 2020, 10:10 PM

#

lapis sequoia What do you want to do with data ?

I have no intention to use it. I was just trying to do something to learn scrapy so this was the first idea that popped up in my head.

lapis sequoia Dec 7, 2020, 10:10 PM

#

red roost Can anyone here point me to a good guide for advanced machine learning. I'm hav...

Coursera

lapis sequoia Dec 7, 2020, 10:10 PM

#

vital patio I have no intention to use it. I was just trying to do something to learn scrapy...

Oh but check TOS first before doing scraping

red roost Dec 7, 2020, 10:10 PM

#

im looking for something free

#

if something like that exists

lapis sequoia Dec 7, 2020, 10:11 PM

#

You study zt university'?

red roost Dec 7, 2020, 10:11 PM

#

no

lapis sequoia Dec 7, 2020, 10:11 PM

#

If so, it's free

#

Oh

red roost Dec 7, 2020, 10:12 PM

#

thanks

vital patio Dec 7, 2020, 10:12 PM

#

lapis sequoia Oh but check TOS first before doing scraping

Is there any risk for my github account or anything? Because this is just for educational purposes only.

lapis sequoia Dec 7, 2020, 10:13 PM

#

Im not sure but check TOS

vital patio Dec 7, 2020, 10:13 PM

#

lapis sequoia Im not sure but check TOS

Ok thanks anyway.

molten hamlet Dec 7, 2020, 10:26 PM

#

~~im reading a book, and they say that variance is divided not by n, but n-1~~

#

~~and Im confused, cause wherever I look, formula is for n~~

#

📎 unknown.png

#

https://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation

Unbiased estimation of standard deviation

In statistics and in particular statistical theory, unbiased estimation of a standard deviation is the calculation from a statistical sample of an estimated value of the standard deviation (a measure of statistical dispersion) of a population of values, in such a way that the expected value of the calculation equals the true value. Except in so...

high badge Dec 7, 2020, 10:36 PM

#

yea i learned something in my statistics class that dividing by n-1 for some reason fits the measurement of standard deviation better

frozen gazelle Dec 8, 2020, 1:44 AM

#

Please, if anyone will let me interview them about their job PM me or fill this out: https://docs.google.com/document/d/1oddinBkUCT1DQrFkJDRH6GYjmmWJDRmagcaW0hmQp0o/edit?usp=sharing

Google Docs

Please help answer some

Please help answer some! What is your job title and what does your daily role look like? How long have you been doing this? What would you do in your early career? College? First position? What would you have done differently straight out of college/when you began entry level/first job? Are ...

#

I'll pay if needbe

olive lichen Dec 8, 2020, 2:28 AM

#

hi everyone, the round() function is being a piece of shit. i'm really angry lmfao

#

this is what i have written

    for num in list:
        print(round(num, 3))
        
roundlist(liberalMindZScores)```

#

the goal is to round every number in the list to three decimal places

#

when i run, i get this error

TypeError                                 Traceback (most recent call last)
<ipython-input-66-fcef4bcaa5f3> in <module>
      3         print(round(num, 3))
      4 
----> 5 roundlist(liberalMindZScores)

<ipython-input-66-fcef4bcaa5f3> in roundlist(list)
      1 def roundlist(list):
      2     for num in list:
----> 3         print(round(num, 3))
      4 
      5 roundlist(liberalMindZScores)

TypeError: round() takes 1 positional argument but 2 were given```

#data-science-and-ml

generated with protoc

generated with protoc