plush zenith Nov 12, 2020, 8:37 PM

#

this one

heady hatch Nov 12, 2020, 8:37 PM

#

Also is that sympy? I'm not familiar with sympy.

plush zenith Nov 12, 2020, 8:37 PM

#

its not suppoused to be sympy i think i passed it as a numpy with the lambdify

heady hatch Nov 12, 2020, 8:38 PM

#

Oh I mean I'm not familiar with Sympy syntax.

plush zenith Nov 12, 2020, 8:38 PM

#

ahh

heady hatch Nov 12, 2020, 8:38 PM

#

What's lambdify supposed to do?

plush zenith Nov 12, 2020, 8:38 PM

#

sorry haha

#

it does something like lambda (it converts it to an anonymous function i think) and pass it with a numpy format

heady hatch Nov 12, 2020, 8:40 PM

#

I would double check there if lambdify is returning what you're expecting it to.

#

I'm googling the error right now and someone is having similar problem too. Where they mix sympy with numpy.

#

But to be more specific, I think your error might be coming from np.linalg.inv.

plush zenith Nov 12, 2020, 8:41 PM

#

mmm it could be possible

#

yes the error come from there

heady hatch Nov 12, 2020, 8:42 PM

#

You should print Jr and fi's dtypes.

#

And check if they're compatible with np.linalg.inv.

plush zenith Nov 12, 2020, 8:43 PM

#

f=
[5*(x + 1)**2 + (y + 1)2 - 25, 2.71828182845905x - y]

J=
[[10x + 10 2y + 2]
[1.0*2.71828182845905**x -1]]

#

they are both arrays

heady hatch Nov 12, 2020, 8:43 PM

#

What about the dtypes?

#

Like the data types.

plush zenith Nov 12, 2020, 8:44 PM

#

how can i see that?

heady hatch Nov 12, 2020, 8:44 PM

#

You can check for them via f.dtypes or j.dtypes.

#

Either dtype or dtypes.

#

I don't remember the the exact syntax.

plush zenith Nov 12, 2020, 8:44 PM

#

let me try!

#

AttributeError: 'function' object has no attribute 'dtype'

heady hatch Nov 12, 2020, 8:46 PM

#

So this is my intuition.

#

np.linalg.inv takes numerical values.

plush zenith Nov 12, 2020, 8:46 PM

#

yes

heady hatch Nov 12, 2020, 8:46 PM

#

and you're passing in matrix that haven't been evaluated yet.

#

So it's trying to do inverse on functions instead of matrix with numerical values.

#

f=
[5*(x + 1)**2 + (y + 1)2 - 25, 2.71828182845905x - y]

J=
[[10x + 10 2y + 2]
[1.0*2.71828182845905**x -1]]

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
@plush zenith

If this is what f and J are, then I'm assuming the x and y aren't evaluated.

plush zenith Nov 12, 2020, 8:48 PM

#

yes i understand that

#

but i passed the data

#

and when i write the J outside the loop

#

it shows a numerical result

#

the fi doesnt

#

and dont understand why

heady hatch Nov 12, 2020, 8:50 PM

#

I think this is where your investigation might have to start. hahaha

plush zenith Nov 12, 2020, 8:50 PM

#

sorry fi too appears as a numerical

heady hatch Nov 12, 2020, 8:50 PM

#

Run this in your loop.

plush zenith Nov 12, 2020, 8:50 PM

#

sure!!

heady hatch Nov 12, 2020, 8:50 PM

#

 for i in range(20):
        Jr=np.array(J(v[0], v[1]))
        fi=np.array(f(v[0],v[1]))
        print(Jr)
        print(fi)
        break

#

Print it on first iteration and see if it's evaluated.

plush zenith Nov 12, 2020, 8:51 PM

#

yes it does

#

[[10. 6.]
[ 1. -1.]]
[[10. 6.]
[ 1. -1.]]

heady hatch Nov 12, 2020, 8:52 PM

#

Hmm that's interesting.

#

I wonder if it breaks somewhere.

#

I guess add the inverse operation to that?

#

the J_inv

#

run the first iteration where it calculates Jr, fi, and J_inv.

plush zenith Nov 12, 2020, 8:53 PM

#

TypeError: 'module' object is not callable

#

this appears

heady hatch Nov 12, 2020, 8:54 PM

#

Wait what module object is not callable?

plush zenith Nov 12, 2020, 8:54 PM

#

TypeError Traceback (most recent call last)
<ipython-input-36-3485713534db> in <module>
37
38 #print(v0- np.linalg.inv(np.array(J(v0[0],v0[1])) @ np.array(f(v0[0],v0[1]))))
---> 39 print(sistema_newton(funcion1, funcion2,[0,2] ))

<ipython-input-36-3485713534db> in sistema_newton(funcion1, funcion2, v0)
16 Jr=np.array(J(v[0], v[1]))
17 fi=np.array(f(v[0],v[1]))
---> 18 J_inv= np.linalg(Jr)
19 print(Jr)
20 print(fi)

TypeError: 'module' object is not callable

#

i dont understand

#

it was passed as an array and it was evaluated

heady hatch Nov 12, 2020, 8:55 PM

#

isn't it supposed to be

#

np.linalg.inv?

#

I think you're missing .inv.

plush zenith Nov 12, 2020, 8:55 PM

#

yes

#

im a stupid

#

hahah

heady hatch Nov 12, 2020, 8:56 PM

#

You might be tired.

plush zenith Nov 12, 2020, 8:56 PM

#

by now it works

heady hatch Nov 12, 2020, 8:56 PM

#

Okay so it works on first iteration?

plush zenith Nov 12, 2020, 8:56 PM

#

hahah id been struglin with this since yesterday hahah

#

yes

heady hatch Nov 12, 2020, 8:56 PM

#

If that's the case then do this.

    J=sp.lambdify([x, y],[dp1,dp2], "numpy")
    f=sp.lambdify([x, y],[dp1,dp2], "numpy")
    v = v0
    print(v)
    for i in range(20):
        print(f'on {i} it')
        Jr=np.array(J(v[0], v[1]))
        fi=np.array(f(v[0],v[1]))
        J_inv=np.linalg.inv(Jr)
        #print(J_inv)
        print("")
        v = v - J_inv @ fi
        print("v")
        print(v)
        print("")
    return

#

So you can see on which iteration it breaks.

plush zenith Nov 12, 2020, 8:57 PM

#

on 1 it

#

the first one

#

TypeError: No loop matching the specified signature and casting was found for ufunc inv

heady hatch Nov 12, 2020, 8:59 PM

#

Oh, then by process of elimination, if Jr, fi, and J_inv all process.

#

I'm assuming v is your issue.

#

Is v a matrix?

plush zenith Nov 12, 2020, 9:00 PM

#

yes

#

and i think it should be a vector

#

yes v is a vector

heady hatch Nov 12, 2020, 9:01 PM

#

Could you print what v is?

plush zenith Nov 12, 2020, 9:01 PM

#

v
[[-1. 2.]
[ 0. 1.]]

#

this is v

heady hatch Nov 12, 2020, 9:01 PM

#

Hm interesting.

plush zenith Nov 12, 2020, 9:01 PM

#

[1.0625 2.0625]

#

but this should be

heady hatch Nov 12, 2020, 9:01 PM

#

This should be?

plush zenith Nov 12, 2020, 9:02 PM

#

yes

heady hatch Nov 12, 2020, 9:02 PM

#

You're giving me two values. hahaha I'm assuming one of them is v.

plush zenith Nov 12, 2020, 9:02 PM

#

im tryng to make thing in an automatic way

#

ohh

#

sorry hahah

#

the matrix is what my programme is returning

#

the vector is what it should be

heady hatch Nov 12, 2020, 9:03 PM

#

Hm and it's able to print v in the loop?

#

v = v - J_inv @ fi

It's able to evaluate this and print this?

plush zenith Nov 12, 2020, 9:04 PM

#

just the first one

#

v
[[-1. 2.]
[ 0. 1.]]

heady hatch Nov 12, 2020, 9:04 PM

#

Ahh okay so
v = v - J_inv @ fi is your issue then.

plush zenith Nov 12, 2020, 9:04 PM

#

[0, 2] this is the values i passed

#

[[-1. 2.]
[ 0. 1.]]

#

this is the matrix it returns

#

but it should be a vector not a matrix

heady hatch Nov 12, 2020, 9:05 PM

#

Quick question is
v = v - J_inv @ fi supposed to be v - (J_inv @ fi) or (v - J_inv) @ fi.

#

I'm really lost. hahaha

#

Because you have multiple v's and I'm only focusing on v = v - J_inv @ fi

#

and I'm not sure what you're referring to now.

plush zenith Nov 12, 2020, 9:05 PM

#

hahah

#

okey

heady hatch Nov 12, 2020, 9:06 PM

#

Focus on the loop.

plush zenith Nov 12, 2020, 9:06 PM

#

well

heady hatch Nov 12, 2020, 9:06 PM

#

Unless you're saying v = v0 has a bug

plush zenith Nov 12, 2020, 9:06 PM

#

can i show you the original code

heady hatch Nov 12, 2020, 9:06 PM

#

Sure.

plush zenith Nov 12, 2020, 9:06 PM

#

and the one im trynig to do?

heady hatch Nov 12, 2020, 9:06 PM

#

Sure?

plush zenith Nov 12, 2020, 9:07 PM

#

def f1(v):
    return(5*(v[0]+1)**2 + (v[1]+1)**2 - 25)

def f2(v):
    return(np.e**(v[0]) - v[1])


def f(v):
    return(np.array([f1(v), f2(v)]))
def J(v):
    M = np.array([[10*v[0]+10, 2*v[1]+2], [np.e**v[0], -1]])
    print(M)
    return(M)
def newtonSistemas(f, J, v0, n):
    v = v0
    for i in range(n):
        print("V")
        v = v - np.linalg.inv(J(v)) @ f(v)  
        print(v)
    return 
print(newtonSistemas(f, J, np.array([0,2]), 20))

#

this is the one i know it works

#

the thing with this is that you have to replace the x and y for v[0] and v[1], manually

#

and this is my poor son who doesnt work correctly hahah

#

funcion1=5*(x+1)**2 +(y+1)2-25
funcion2=(math.ex)-y

#

sorry i paste other

#

def sistema_newton(funcion1, funcion2, v0):
    x, y, z = sp.symbols('x y z')
    f=[funcion1, funcion2]
    print("f=")
    print(f)
    print("")
    dp1=[sp.diff(funcion1,x),sp.diff(funcion1,y)]#sp.diff(funcion1,z)]
    dp2=[sp.diff(funcion2,x),sp.diff(funcion2,y)]#sp.diff(funcion1,z)]
    print("J=")
    print(np.array([dp1,dp2]))
    print("")
    J=sp.lambdify([x, y],[dp1,dp2], "numpy")
    f=sp.lambdify([x, y],[dp1,dp2], "numpy")
    v = v0
    print(v)
    for i in range(20):
        print(f'on {i} it')
        Jr=np.array(J(v[0], v[1]))
        fi=np.array(f(v[0],v[1]))
        J_inv=np.linalg.inv(Jr)
        print(J_inv)
        print("")
        v = v -(J_inv @ fi)
        print("v")
        print(v)
        print("")
    return 
#print(v0- np.linalg.inv(np.array(J(v0[0],v0[1])) @ np.array(f(v0[0],v0[1]))))
print(sistema_newton(funcion1, funcion2,[0,2] ))

#

this is the one you were helping me with

#

i just tried to make the computer do the replacements of x and y

heady hatch Nov 12, 2020, 9:11 PM

#

Hmm I'm trying to understand what do you mean by you have to replace the x and y for v[0] and v[1] manually.

#

Because it seems like both are inserting [0, 2] from what I'm seeing.

plush zenith Nov 12, 2020, 9:11 PM

#

ohh

#

yes yes

#

but

#

in the first one

#

funcion1=5*(x+1)2 +(y+1)2-25
funcion2=(math.e**x)-y

#

i have to pass

#

the x and y

#

manually

#

i have to copy in the computer

#

typing

#

5*(v[0]+1)2 +(v[1]+1)2-25

heady hatch Nov 12, 2020, 9:13 PM

#

Can't you call f1(v) and f2(v)?

plush zenith Nov 12, 2020, 9:14 PM

#

in the first code i cant

#

you have to pass the v

heady hatch Nov 12, 2020, 9:14 PM

#

Oh why not?

plush zenith Nov 12, 2020, 9:14 PM

#

so it can replace it

#

is like you are passing v as the parameter of a function

#

so unless you call a variable v (in this case a list) it wont work

#

i mean you could pass f1

#

v=[0, 1]

#

and it will replace all v for 0 and 1

heady hatch Nov 12, 2020, 9:18 PM

#

Okay I'm going to try to focus on what you're trying to do now.

plush zenith Nov 12, 2020, 9:18 PM

#

sure

heady hatch Nov 12, 2020, 9:18 PM

#

try to see if J-inv @ fi works.

#

print that

#

see if it evaluates

#

and then add v - to that dot product/matrix multiplication.

#

See where the error might be coming up.

plush zenith Nov 12, 2020, 9:19 PM

#

m
[[1. 0.]
[0. 1.]]

#

it prints this

#

sorry

#

[[-1. 2.]
[ 0. 1.]]

#

if i rest the v

heady hatch Nov 12, 2020, 9:21 PM

#

What do you mean by rest the v?

plush zenith Nov 12, 2020, 9:21 PM

#

sorry

#

i speak spanish haha

#

i refear

heady hatch Nov 12, 2020, 9:22 PM

#

So J_inv @ fi gives you
[[1. 0.]
[0. 1.]]?

plush zenith Nov 12, 2020, 9:22 PM

#

yes

heady hatch Nov 12, 2020, 9:22 PM

#

and v - J_inv @ fi gives you
[[-1. 2.]
[ 0. 1.]]

plush zenith Nov 12, 2020, 9:22 PM

#

yes

heady hatch Nov 12, 2020, 9:23 PM

#

Oh wait I'm forgetting it's 0 index.

#

Your loop is freaking out on the second loop.

plush zenith Nov 12, 2020, 9:23 PM

#

ohh

#

that is good?

heady hatch Nov 12, 2020, 9:23 PM

#

maybe?

#

So on the second loop

#

v = [[-1. 2.]
[ 0. 1.]]

plush zenith Nov 12, 2020, 9:23 PM

#

well that is better than anything hahah

heady hatch Nov 12, 2020, 9:24 PM

#

Where v[0] = [-1, 2] and v[1] = [0, 1]

#

but your x and y are supposed to be single values, right?

plush zenith Nov 12, 2020, 9:24 PM

#

yes

#

no

#

v[0]= 0

#

and

#

v[1]=2

heady hatch Nov 12, 2020, 9:25 PM

#

To clarify you want v[0] to be 0 and v[1] to be 2?

plush zenith Nov 12, 2020, 9:25 PM

#

in the first iteration

heady hatch Nov 12, 2020, 9:26 PM

#

Right but in the second iteration

#

it's turning into a 2 by 2 matrix.

plush zenith Nov 12, 2020, 9:26 PM

#

in the second it should be rplace

#

yes and i dont understand hat

heady hatch Nov 12, 2020, 9:26 PM

#

try print(sistema_newton(funcion1, funcion2, np.array([0,2]) )))

plush zenith Nov 12, 2020, 9:27 PM

#

is the same

heady hatch Nov 12, 2020, 9:27 PM

#

still the same error?

plush zenith Nov 12, 2020, 9:27 PM

#

it must be something about dimensions

heady hatch Nov 12, 2020, 9:27 PM

#

is v supposed to be a 2 by 1 vector?

#

Yea

plush zenith Nov 12, 2020, 9:27 PM

#

yes

heady hatch Nov 12, 2020, 9:27 PM

#

what dimension is Jr and fi supposed to be?

plush zenith Nov 12, 2020, 9:29 PM

#

f is a 2 by 1 vectir

#

and J is a matrix

heady hatch Nov 12, 2020, 9:29 PM

#

but jr and fi, what about their dimensions?

plush zenith Nov 12, 2020, 9:29 PM

#

yes

#

sorry

#

fi should be a vectir

#

jr the matrix

heady hatch Nov 12, 2020, 9:29 PM

#

what shape is the matrix?

plush zenith Nov 12, 2020, 9:30 PM

#

2x2

heady hatch Nov 12, 2020, 9:30 PM

#

Okay that makes sense.

2x1 - (2x2 @ 2x1) => 2x1

plush zenith Nov 12, 2020, 9:31 PM

#

i think i know the error

#

give me my prize for stupidity hahah

heady hatch Nov 12, 2020, 9:32 PM

#

Oh what's the error?

plush zenith Nov 12, 2020, 9:32 PM

#

man

#

im so sorry

#

and embarassed

#

sorry for keep you with this

#

for so much time

heady hatch Nov 12, 2020, 9:32 PM

#

It's okay. We've learned from this. But what was the error?

plush zenith Nov 12, 2020, 9:32 PM

#

do you remember f?

heady hatch Nov 12, 2020, 9:33 PM

#

Mhm.

#

The 2x1 vector.

#

Was it a 2x2?

plush zenith Nov 12, 2020, 9:33 PM

#

yes

heady hatch Nov 12, 2020, 9:33 PM

#

hahaha

plush zenith Nov 12, 2020, 9:33 PM

#

but i passed it

#

as

#

this

#

J=sp.lambdify([x, y],[dp1,dp2], "numpy")

heady hatch Nov 12, 2020, 9:33 PM

#

Ahh.

plush zenith Nov 12, 2020, 9:33 PM

#

f=sp.lambdify([x, y],[dp1,dp2], "numpy")

#

and it should be as

#

J=sp.lambdify([x, y],f, "numpy")

#

im so sorr

#

i cant believe it

heady hatch Nov 12, 2020, 9:34 PM

#

Congratulations on solving the issue.

plush zenith Nov 12, 2020, 9:34 PM

#

hahahaha

#

of what all the things it could

#

be

#

it was the silliest one

#

hahahaahah

heady hatch Nov 12, 2020, 9:34 PM

#

Oftentimes it's the tiny details.

plush zenith Nov 12, 2020, 9:34 PM

#

i think i copy paste

#

the lambdify

#

and forgot about

#

i didnt replace

#

you helped me to see that the vector wasnt correct haha

#

i have no words

#

thanks a lot

#

and sorry for wasting your time with this

#

thank you very much

heady hatch Nov 12, 2020, 9:36 PM

#

It's not a waste of time if we learn something.

#

Happy to help, hope you have a wonderful mathematical journey from now.

plush zenith Nov 12, 2020, 9:36 PM

#

hahah

#

thanks i learnt a lot

#

i hope so!!!

#

Thanks again and have a good day!!

chrome orbit Nov 12, 2020, 9:38 PM

#

can someone give me a brief explanation on enums?

fallow prism Nov 12, 2020, 9:49 PM

#

key,value = enumerate(iter)

#

sorry, index*

#

index,value

chrome orbit Nov 12, 2020, 10:30 PM

#

i mean more general then that

#

like i have a list of 4 bit enum's

#

so i can store a values in the index between 0-15

#

right?

hushed wasp Nov 12, 2020, 11:57 PM

#

Hello, I would like to apply this line only the numeric variables of my dataset. Can anyone can help please?
(np.abs(stats.zscore(df)) < 3).all(axis=1)

velvet thorn Nov 13, 2020, 12:58 AM

#

@hushed wasp df.select_dtypes(np.number)

#

key,value = enumerate(iter)
@fallow prism that's enumerate, not enum

#

two different things

#

can someone give me a brief explanation on enums?
@chrome orbit what do you want to do?

marsh chasm Nov 13, 2020, 1:01 AM

#

@marsh chasm nice. thanks for telling me. gridsearch does the validation curve stuff for the whole set of parameters and plots with correct axis labels for the parameter set? sounds way easier.
@remote valley it doesnt actually do the validation curve stuff it just stores the validation score and training score both (which is what i needed)

oblique vine Nov 13, 2020, 1:02 AM

#

Does amount of data I feed into keras model affect the speed of this model?
I mean, I want the model to predict ASAP, and I don't know if I should decrease the amount of data slightly, to get faster predictions

#

does it affect size of model?

lapis sequoia Nov 13, 2020, 1:09 AM

#

@oblique vine are you trying to develop a neural network?

cerulean spindle Nov 13, 2020, 1:22 AM

#

It does not affect the size of the model. The amount of layers in the model are specified by you. If you're model already performs well, you should decrease the amount of data slightly. You could also use a regularization parameter like L2 or L1 which will "ignore" some features. Be careful about the model not converging @oblique vine

chrome orbit Nov 13, 2020, 1:26 AM

#

@velvet thorn can i pm u

velvet thorn Nov 13, 2020, 1:28 AM

#

@velvet thorn can i pm u
@chrome orbit nope, sorry

#

It does not affect the size of the model. The amount of layers in the model are specified by you. If you're model already performs well, you should decrease the amount of data slightly. You could also use a regularization parameter like L2 or L1 which will "ignore" some features. Be careful about the model not converging @oblique vine
@cerulean spindle L2 doesn’t perform feature selection

cerulean spindle Nov 13, 2020, 1:29 AM

#

oh my bad

chrome orbit Nov 13, 2020, 1:31 AM

#

@velvet thorn ok i have a 4 bit message, enum, and just want to know how reference the data?

velvet thorn Nov 13, 2020, 1:32 AM

#

@velvet thorn ok i have a 4 bit message, enum, and just want to know how reference the data?
@chrome orbit that's still not clear TBH

#

how does the enum relate to the message

#

what do you mean "reference the data"

#

or are you saying the 4 bits are the enum

#

how about you give an example

chrome orbit Nov 13, 2020, 1:37 AM

#

0: System - Power Saving
1: System - ON, no hand detected
2: System - ON, hand detected
3: System - ERROR
4-15: Not used

velvet thorn Nov 13, 2020, 1:37 AM

#

okay, and what do you want to do with this

chrome orbit Nov 13, 2020, 1:38 AM

#

so for example, if i have System - ON, no hand detected, the value would be 0010?

velvet thorn Nov 13, 2020, 1:41 AM

#

so for example, if i have System - ON, no hand detected, the value would be 0010?
@chrome orbit ...in bits?

#

yes, but are you trying to convert a string to a number or what

chrome orbit Nov 13, 2020, 1:42 AM

#

no im just trying to read the message on the CAN bus

velvet thorn Nov 13, 2020, 1:43 AM

#

...so is that a microcontroller question or what

teal vine Nov 13, 2020, 3:11 AM

#

Hello! How would I go about call the 'i' in a loop? I am trying to create a series of pandas dataframes by splitting apart a big dataframe

for i in range(13):
    dfi = df[df.index < 60*i]

I want to make it so dfi is actually changing with the i of the loop so I would end up with 13 dataframes

#

It really feels like I should be able to call that i somehow but I don't know how

hasty grail Nov 13, 2020, 3:21 AM

#

dfs = []
for i in range(13):
    dfs.append(df[df.index < 60*i])

#

or, using a list comprehension,

dfs = [df[df.index < 60*i] for i in range(13)]

teal vine Nov 13, 2020, 3:23 AM

#

Thank you ❤️

fierce shadow Nov 13, 2020, 5:57 AM

#

hey can anyone tell me is deep q network able to solve gym's mountain car environment?

#

I created a dqn which beats the cartpole env, but it fails on mountain-car env

#

I am using just a simple dqn without any target networks or anything

#

if you answer ping me

uneven wren Nov 13, 2020, 11:30 AM

#

How can I get specific data from a website and then put it into my program?

mint olive Nov 13, 2020, 12:01 PM

#

web scraping

hushed wasp Nov 13, 2020, 12:04 PM

#

@velvet thorn Thanks gm but I don't know to apply this line on the numeric variables : (np.abs(stats.zscore(df)) < 3).all(axis=1)

df.select_dtypes(np.number) and (np.abs(stats.zscore(df)) < 3).all(axis=1) doesn't work together

dawn basin Nov 13, 2020, 12:23 PM

#

Does anybody here know anything about automated stock trading?

velvet thorn Nov 13, 2020, 12:26 PM

#

@velvet thorn Thanks gm but I don't know to apply this line on the numeric variables : (np.abs(stats.zscore(df)) < 3).all(axis=1)

df.select_dtypes(np.number) and (np.abs(stats.zscore(df)) < 3).all(axis=1) doesn't work together
@hushed wasp do you understand what that line does...?

hushed wasp Nov 13, 2020, 12:26 PM

#

I want to get out of the outliers but i can apply it only on the numeric variables

velvet thorn Nov 13, 2020, 12:27 PM

#

no

#

I mean, from a coding point of view

hushed wasp Nov 13, 2020, 12:27 PM

#

not sure to understand you point

velvet thorn Nov 13, 2020, 12:28 PM

#

okay

#

a different question

#

not sure to understand you point
@hushed wasp why are you using numpy and scipy.stats there?

#

do you understand?

#

in the sense that

#

you can do that

#

entirely within pandas

#

without referencing numpy or scipy directly

#

and I think

#

if you understand that

#

you will see how to solve the problem.

hushed wasp Nov 13, 2020, 12:30 PM

#

it's the code I found to be able to get rid of the outliers on the web using the zscore

velvet thorn Nov 13, 2020, 12:30 PM

#

yeah.

#

that's what I thought

#

I think you need to understand pandas fundamentals first

#

before doing such things

#

your foundation is the most important

hushed wasp Nov 13, 2020, 12:33 PM

#

sure but I don't see why I can't use numpy and scipy here

velvet thorn Nov 13, 2020, 12:34 PM

#

you can

#

but

#

it would actually be really simple to combine what I gave you

#

and what you have.

#

so if you don't know how, that suggests that you are doing something that is a bit too advanced for you

#

which is fine, that's how we learn

#

but again...can you write that piece of code purely using pandas?

#

because if you can, then how to integrate what I gave you will be clear.

#

(np.abs(stats.zscore(df)) < 3).all(axis=1) this.

#

actually, just the condition.

hushed wasp Nov 13, 2020, 12:36 PM

#

it raises me an error if I apply it on the whole dataset that's why I try to apply only on numerical data

velvet thorn Nov 13, 2020, 12:36 PM

#

yes.

#

I understand that.

teal vine Nov 13, 2020, 12:39 PM

#

does np.argmax() have issues parsing nan observations? I have a list of lists and the lists towards the start of my bigger list have a lot of nans

velvet thorn Nov 13, 2020, 12:40 PM

#

does np.argmax() have issues parsing nan observations? I have a list of lists and the lists towards the start of my bigger list have a lot of nans
@teal vine what kind of issues are you thinking of

teal vine Nov 13, 2020, 12:41 PM

#

Well the first say 30 lists end up returning 0 for np.argmax() or nan for np.max() when there are actual floats in there that should be returned. As the lists start having less nans the 2 functions start returning the actual max

velvet thorn Nov 13, 2020, 12:42 PM

#

Well the first say 30 lists end up returning 0 for np.argmax() or nan for np.max() when there are actual floats in there that should be returned. As the lists start having less nans the 2 functions start returning the actual max
@teal vine yup.

#

that's because of how comparisons work with nan

teal vine Nov 13, 2020, 12:42 PM

#

Oof

velvet thorn Nov 13, 2020, 12:42 PM

#

!e

import numpy as np

nan = np.nan
print(np.max([0, np.nan]))
print(np.min([0, np.nan]))

#

see?

arctic wedgeBOT Nov 13, 2020, 12:42 PM

#

@velvet thorn :white_check_mark: Your eval job has completed with return code 0.

001 | nan
002 | nan

teal vine Nov 13, 2020, 12:43 PM

#

Yeah that

#

Any way to work around that?

velvet thorn Nov 13, 2020, 12:43 PM

#

however, there is a body of functions that ignore nans

#

that would be the subject of a Google search 😉

teal vine Nov 13, 2020, 12:43 PM

#

AYAYA

#

Well at least I know its not an error in how my data is set up

#

Thank you

velvet thorn Nov 13, 2020, 12:44 PM

#

yw

teal vine Nov 13, 2020, 12:47 PM

#

Why is argmax and max not deprecated when nanargmax exists? ThinkRotate Seems like the only use would be detecting if there's a nan in your data but there's other functions for that

#

Am I missing something you could use argmax for over nanargmax?

velvet thorn Nov 13, 2020, 12:50 PM

#

Why is argmax and max not deprecated when nanargmax exists? :ThinkRotate: Seems like the only use would be detecting if there's a nan in your data but there's other functions for that
@teal vine efficiency.

#

📎 unknown.png

teal vine Nov 13, 2020, 12:52 PM

#

Gotta save those microseconds. But yeah that makes sense thank you again.

#

You're very quick smile

iron shell Nov 13, 2020, 1:14 PM

#

Hello everyone, I'm trying to make a real-time dataset that came from a MySQL database, to be consumed on powerbi, so I'm using apache Kafka, and I'm asking myself if that is the best way to do that?

odd tendon Nov 13, 2020, 2:10 PM

#

Can anybody recommend any paid training on data science/ML?

crude marsh Nov 13, 2020, 3:20 PM

#

Guys, I need help. Anyone up

#

?

#

Anyone?

bronze barn Nov 13, 2020, 3:24 PM

#

Anybody have any suggestions on how to cluster this? Granted there's some noise visible there looks to be at least four clusters. I've tried DBSCAN and Gaussian mixtures which don't seem to work well.

📎 cluster_img.png

#

📎 Gaussian_cluster_img.png

#

📎 dbscan_cluster_img.png

crude marsh Nov 13, 2020, 3:31 PM

#

@bronze barn You look experienced. I have a small problem. Can you please help me?

#

https://paste.pythondiscord.com/otiwuvovat.coffeescript

#

This code fetches a year old data. I need it to take the latest data. How do I do this

#

@bronze barn ?

bronze barn Nov 13, 2020, 3:34 PM

#

I'm not familiar with quandl, is that a web scraping package or do you have the raw data for the latest stock prices?

crude marsh Nov 13, 2020, 3:34 PM

#

Its a web scraping package

#

Okay. Thanks.

bronze barn Nov 13, 2020, 3:40 PM

#

How much data does it give you, a month's worth?

#

Aside from that I can't be much help to you then but I would guess that the issues has to do with calling quandl so I'd read up on their documentation, maybe there's an argument you can pass for that. Good luck though!

#

@crude marsh

hoary sluice Nov 13, 2020, 5:13 PM

#

@bronze barn take a look at HDBSCAN, it uses hierarquical cluster with dbscan to improve clustering => https://github.com/scikit-learn-contrib/hdbscan

GitHub

scikit-learn-contrib/hdbscan

A high performance implementation of HDBSCAN clustering. - scikit-learn-contrib/hdbscan

teal vine Nov 13, 2020, 6:03 PM

#

When you finally get your code to run but it runs 3 times as slow as you thought it would rooKms

#

Turns out that iterating over lists of DataFrames was not a good idea, who knew

hollow sentinel Nov 13, 2020, 6:13 PM

#

Linear regression models the output, or target variable 𝑦 ∈ R as a linear combination of the 𝑃 - dimensional input x ∈ R𝑃 . Let X be the 𝑁 × 𝑃 matrix with each row an input vector (with a 1 in the first position), and similarly let y be the 𝑁-dimensional vector of outputs in the training set, the linear model will predict the y given x using the parameter vector, or weight vector w ∈ R𝑃 according to

#

lol what

#

can someone translate into english

#

import sklearn.metrics as metrics %matplotlib inline
# Fit Ordinary Least Squares: OLS
csv = pd.read_csv('https://raw.githubusercontent.com/neurospin/pystatsml/master/datasets/ ˓→Advertising.csv', index_col=0)
X = csv[['TV', 'Radio']]
y = csv['Sales']
lr = lm.LinearRegression().fit(X, y)
y_pred = lr.predict(X)
print("R-squared =", metrics.r2_score(y, y_pred))
print("Coefficients =", lr.coef_)
# Plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(csv['TV'], csv['Radio'], csv['Sales'], c='r', marker='o')
xx1, xx2 = np.meshgrid(
    np.linspace(csv['TV'].min(), csv['TV'].max(), num=10),
    np.linspace(csv['Radio'].min(), csv['Radio'].max(), num=10))
XX = np.column_stack([xx1.ravel(), xx2.ravel()])
yy = lr.predict(XX)
ax.plot_surface(xx1, xx2, yy.reshape(xx1.shape), color='None')
ax.set_xlabel('TV')
ax.set_ylabel('Radio')
_ = ax.set_zlabel('Sales')

#

these people don't even use train_test_spliti

teal vine Nov 13, 2020, 6:20 PM

#

Who are 'these people' ?

hollow sentinel Nov 13, 2020, 6:20 PM

#

the author of the book

#

Statistics and Machine Learning in Python by Edouard Duchesnay, Tommy Löfstedt, Feki Younes

teal vine Nov 13, 2020, 6:22 PM

#

seems pretty sketchy to not use train and test splits but I'm not an expert on the topic.

hollow sentinel Nov 13, 2020, 6:22 PM

#

yeah it's sus

#

I just found a PDF of the book online

teal vine Nov 13, 2020, 6:23 PM

#

I used "An Introduction to Statistical Learning" by Gareth James et al. when I was doing some ML stuff

#

It's focused on R but the concepts and explanations are pretty solid

hollow sentinel Nov 13, 2020, 6:24 PM

#

oh yeah I have that too

teal vine Nov 13, 2020, 6:24 PM

#

I found most python implementations are just a google search away sooo I just focus on concepts

hollow sentinel Nov 13, 2020, 6:24 PM

#

it's just that these books put me to sleep

#

there's also data science from scratch by McReilly

#

O Reilly has another book about ML

#

hands on machine learning

teal vine Nov 13, 2020, 6:52 PM

#

I haven't really looked that deeply into it

#

Just did some exercises that involved classifying defaulters, its interesting enough.

analog geode Nov 13, 2020, 7:15 PM

#

I guess this is a better place the post the question

serene scaffold Nov 13, 2020, 7:33 PM

#

I have to classify some DNA/RNA/etc sequences, and I have to come up with the features myself. I figure if I can find some of the longest substrings that appear frequently among all the sequences, the presence of that substring might be a good feature.

#

Not sure how to do "top k longest most common substrings"

heady hatch Nov 13, 2020, 7:35 PM

#

@serene scaffold As in something like this? https://www.geeksforgeeks.org/find-the-longest-substring-with-k-unique-characters-in-a-given-string/

GeeksforGeeks

Find the longest substring with k unique characters in a given stri...

A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

serene scaffold Nov 13, 2020, 7:36 PM

#

@heady hatch not quite, because there aren't going to be any reasonably long substrings shared between all sequences

#

I'm thinking "what's the most common n-character substring in the set of strings" even if it's not a substring of all the strings.

heady hatch Nov 13, 2020, 8:02 PM

#

@serene scaffold I was thinking about your problem, and would bow work here?

From my understanding of DNA (and maybe RNA), they’re in sequences of 4, right?
Naive feature would be the count of different sequences of 4.

#

Otherwise, you can look for the longest substring shared, and keep decreasing it to create more features.

median dove Nov 13, 2020, 8:04 PM

#

Hey. I’m new to Deep Learning and I was following Sentdex’s video on the handwritten digit recognition with Keras. Great video. Just a question, how did he now how many layers he should use for the neural network? And how many neurons should he put in each layer?

#

Please ping me if you can help 👍

#

Not only in that example but in any - how do I know how many layers should I use on a neural network and how many neurons should I use in each one?

heady hatch Nov 13, 2020, 8:27 PM

#

@median dove

I hope others will provide input as well.

In terms of how many layers and units per model, that's where research comes in. Often built on other research.

I'm not familiar with shallow NN, but in terms of deep NN it's usually some kind of architecture that was experimented and found to be working then scale from there.

molten hamlet Nov 13, 2020, 8:29 PM

#

https://imgur.com/AuJNEgS

Imgur

heady hatch Nov 13, 2020, 8:31 PM

#

The first and last layer is determined by your input and output. The hidden layers are meant to capture relationship between the input and the output.

In computer vision, depending on the problem, the hidden layers does so by extracting features that which can be used to calculate the logits.

Or have its features extracted to have it decoded into something else.

#

The number of units and layers can also be set as a hyperparmeter. To answer your question of how many should you use, experiment and see what works for the problem you're trying to solve.

median dove Nov 13, 2020, 8:46 PM

#

Awesome, thank you very much @heady hatch

austere swift Nov 13, 2020, 11:52 PM

#

@median dove I'm pretty late to this answer, but hyperparameter tuning is arguably the hardest part about making neural networks, since there is no way to "calculate" exactly how many neurons to put in each layer, what kind of layers to put, how many layers to put, etc. It's all experimentation. There are automated hyperparameter tuning methods that are arising, such as HyperBand, Bayesan Optimization, Grid Search (the most basic of them), which basically test out different parameters and see how they do, but even then it takes a while to find the optimal set of parameters. Different people will tell you different ways of how to go about it, like scaling up from a simple model, scaling down from a large model, using "blocks" of layers (basically the same set of layers that you repeat a few times), but like nine said it's usually best to start out with something that was experimentally proven to work on some problem, and then to adapt it to yours and make changes accordingly.

median dove Nov 14, 2020, 12:30 AM

#

Thanks you @austere swift 😊

haughty hinge Nov 14, 2020, 12:46 AM

#

Hey! Anyone free to help?

#

i just wanted to ask how to make a scatter plot that has budget, revenue and profits columns from the dataset and shows how they change over time. i.e. years on the bottom axis

#

assuming a cleaned dataset i have

velvet thorn Nov 14, 2020, 12:56 AM

#

i just wanted to ask how to make a scatter plot that has budget, revenue and profits columns from the dataset and shows how they change over time. i.e. years on the bottom axis
@haughty hinge are you sure you wouldn't like a barplot or a line plot

haughty hinge Nov 14, 2020, 12:57 AM

#

was thinking a scatterplot with each of those mentioned in different colours

#

but a line plot makes sense

hollow gull Nov 14, 2020, 1:28 AM

#

@haughty hinge that is something I could help you with if you are still in need of help.

haughty hinge Nov 14, 2020, 1:29 AM

#

@hollow gull Thanks 🙂 im trying it myself after reading stuff online ill message on the channel if anything

hollow gull Nov 14, 2020, 1:31 AM

#

okay, I will start putting an answer together and you can ignore it if you want 🙂

haughty hinge Nov 14, 2020, 1:33 AM

#

legend!!

hollow gull Nov 14, 2020, 1:33 AM

#

make sure your index is the date column for this to work: df.index = df['date']

import matplotlib.pyplot as plt

fig, ax = plt. subplots()
columnname_y = 'budget'
df.plot(y=columname_y, label=columname_y, ax=ax)
columnname_y = 'revenue'
df.plot(y=columname_y, label=columname_y, ax=ax)
columnname_y = 'profits'
df.plot(y=columname_y, label=columname_y, ax=ax)

fig.legend()
ax.grid()
fig.show()

#

I made a lot of assumptions about your dataframe, but maybe that is enough to push you forward a bit.

#

You don't have to make the date column your index, but if you do pandas does a pretty good job of making your life easier from my point of view.

mortal pendant Nov 14, 2020, 2:55 AM

#

Hey! I have a dictionary where the key is a string which I would like an AI to predict based on another string. The value is a list of strings as 'examples' of what should point my AI to to the key as a result. I want to be able to provide an AI a string similar to one that would be found as a value in the dictionary and it to return the probabilities of it being a part of each key. I'm very new to machine learning so I'm unsure which AI method (RNN, GAN, Q-learning) to use for this, but do wish to have a good go by myself. Could someone please point me in the right direction for this project? I assume not RNN since it's not a step by step process- it's just a single calculation- and I assume not a GAN since I am not wanting to generate values, so I assume Q-learning (the one that is most confusing to me, sadly) which I feel would make sense since this is something where it is given a question and it needs to determine the correct answer and be rewarded, which is mostly all I know about Q-learning.

haughty hinge Nov 14, 2020, 3:29 AM

#

@hollow gull thanks alot !!

heady hatch Nov 14, 2020, 4:37 AM

#

@mortal pendant Could you give us an example of what the keys and values are like? How long are the strings?

agile wing Nov 14, 2020, 5:11 AM

#

bow chika bow bow

cerulean spindle Nov 14, 2020, 5:13 AM

#

Can anyone tell me the relationship between all ML algorithms (SVM, Neural Net...)? I vaguely remember that all of the sorta acted like a neural net, but I'm not quite sure. Can anyone confirm this?

agile wing Nov 14, 2020, 5:15 AM

#

neuralnet is like... a huge nest of log regs imo

mental timber Nov 14, 2020, 5:15 AM

#

Hi, I got a question. I'm using a dataset in my python code which shows symptoms of diabetes, they're in true and false meaning some people have them and some do not. I am making a yes or no type survey and wanted to know how I can compare the answers to the dataset so it gives me a percentage of similarity. For example there are 14 questions, if more than half is given the answer yes, then I want it to compare the "yes" or "true" to the dataset and give me a percentage of similarity. Sorry if this is a very specific question. I have been researching this and currently been stuck on it for a while now

cerulean spindle Nov 14, 2020, 5:18 AM

#

Usually for those types of datasets, you'd actually want some type of data (ex: measuring blood levels) instead of yes or no questions. If so, I think you'd have to reformat your data so that it can understand these yes or no questions (the datapoints only values can be 0 or 1)

#

I think you should reformat you're data though if you want to stick to the survey

mental timber Nov 14, 2020, 5:18 AM

#

Yes I have reformatted them in my python code to identify "true" and "false"

cerulean spindle Nov 14, 2020, 5:23 AM

#

The way you're structuring it, "if more than half is given the answer yes, then I want it to compare the 'yes' or 'true' to the dataset and give me a percentage of similarity", I would approach this by using this survey as a sample and feeding the sample into an ML model. If it's an sklearn model, it should support predict_proba() and/or decision_function(), I don't remember entirely how they differ, I just know that they give you a probability for how likely it is for a sample to belong to each class. For example, an example output is [0.25, 0.50, 0.25] which means that there is a 25% it is in class1 50% in class2...

mental timber Nov 14, 2020, 5:24 AM

#

Ah ok, that's very helpful thanks

cerulean spindle Nov 14, 2020, 5:26 AM

#

yeah sure. I'm pretty sure decision_function() gives you the probability in %, but predict_proba() doesn't use % so it's harder to understand, so I perfer decision_function()

mental timber Nov 14, 2020, 5:26 AM

#

Gotcha ty

crude marsh Nov 14, 2020, 6:02 AM

#

Guys, anyone uses ML to train their data-science algorithm?

#

Guys? Is anyone there

fierce shadow Nov 14, 2020, 6:10 AM

#

what kind of algorithm?

#

you can go for neural networks as they are universal function approximators

heady hatch Nov 14, 2020, 6:17 AM

#

@crude marsh

What do you mean by ML to train data science algorithms?

What's your definition of ML and what's your definition of data science along with algorithms?

fierce shadow Nov 14, 2020, 6:19 AM

#

he might mean K-means clustering, binary tree classification stuff..

#

although I am not into those

heady hatch Nov 14, 2020, 6:20 AM

#

Ahh as in those are the algorithms?

mental timber Nov 14, 2020, 6:35 AM

#

My Random Forest Algorithm got a perfect prediction out of 30 trys is this wrong?

heady hatch Nov 14, 2020, 6:43 AM

#

Depends. What did it get a perfect prediction on?

mental timber Nov 14, 2020, 6:44 AM

#

it perfectly predicted 520 people with diabetes

#

of which it had 16 different symptoms

heady hatch Nov 14, 2020, 6:48 AM

#

Right so did you just train it on the data and predicted it on what it trained on?

mental timber Nov 14, 2020, 6:49 AM

#

70% training 30% Test data. It also used bootstrapping

heady hatch Nov 14, 2020, 6:50 AM

#

Sounds like you have a great model.

#

When you say 30 tries, do you mean you trained it on the training set and predicted on the test 30 times?

#

Or did you mean something else?

mental timber Nov 14, 2020, 6:52 AM

#

Ye 30 tries

heady hatch Nov 14, 2020, 6:53 AM

#

Try cross validation.

#

But since you're using rf, you can also check oob errors.

#

Could you clarify what you mean by you used bootstrapping?

#

So split your data, train test split.

do cv on your training data. See how it goes, then train it on your training data and test it on your testing data if your cv looks okay.

mental timber Nov 14, 2020, 6:58 AM

#

bootstrapbool, default=True

Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.

heady hatch Nov 14, 2020, 6:59 AM

#

Ahh okay.

#

But yea do your cross validation.

mental timber Nov 14, 2020, 6:59 AM

#

Thats how i learned it atleast xd

heady hatch Nov 14, 2020, 6:59 AM

#

Grab your dataset, split your data into training and testing.

#

Then do your cross validation on your training data.

mental timber Nov 14, 2020, 7:00 AM

#

I'll give it a shot. Thanks man.

heady hatch Nov 14, 2020, 7:00 AM

#

Yup. See if it still gives it the same result as your previous findings.

mental timber Nov 14, 2020, 7:00 AM

#

gotcha

#

hmm, sorry to ask how do I would do that? Only recently began this project and this whole thing is quite new to me. xD researching confuses me a lot since theres like 10 different examples and idk what they all do

heady hatch Nov 14, 2020, 7:04 AM

#

Oh yea no worries.

#

So

#

from sklearn.model_selection import cross_val_score

cross_val_score(model, x, y)

#

and if you want to read more on it.
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html

#

Because there are more arguments.

mental timber Nov 14, 2020, 7:05 AM

#

Ah ok. Ill read through it ty

heady hatch Nov 14, 2020, 7:05 AM

#

Your x and y should be your training set.

#

So you can leave your testing set out as hold out.

mental timber Nov 14, 2020, 7:06 AM

#

Thanks!

indigo steppe Nov 14, 2020, 7:14 AM

#

Hi guys,i have never tried machine learning and whenever i look into it it seems very very complicated.I guess my foundations of python are not enough yet but the jump towards ml still seems overwhelming and i guess i will need 1-2 years with my current tempo.But today i found out about the libra library that creates neural networks and other stuff in 1-2 lines of code.I am pretty sure that pros are laughing at the abilities of libra but would you consider it to be a good starting point for someone who struggles with python and want to learn a bit about ml,especially for trading (i am aware that for trading it needs more but for experimenting purposes)?

heady hatch Nov 14, 2020, 7:22 AM

#

@indigo steppe

Depending on what your learning style is.

Being honest with you, Libra seems like they only cover the basic cases which can be done via automl as well.

If you want some direction moving forward, learning your Python foundation is useful whether you go into ds/ml or not.

You don't really need to master python to play with data science and machine learning, but you should be comfortable at least.

Libra might be able to get you started and allow you to play with small ml projects, but it doesn't seem to help you understand what the model is doing which I think might be a core part to working with them.

viscid crescent Nov 14, 2020, 7:22 AM

#

Hi! I'm new to discord. How does one use this chat room to start a discussion. How does one track replies, threads...it looks confusing. Thank you for helping me understand!

indigo steppe Nov 14, 2020, 7:25 AM

#

@heady hatch thx for your honest answer

viscid crescent Nov 14, 2020, 7:26 AM

#

I'm attempting to create a notebook for exploratory data analysis. I have an outline for the project that I'd like to bounce off someone experienced. I hope this is the right way to ask for help!

indigo steppe Nov 14, 2020, 7:37 AM

#

@heady hatch sorry for tagging you once again but i just found out that automl isn't a library but a subcategory of ml...you gave your opponion about libra and i respect that.would you still advice a beginner to use some automl libraries or would you advice to jump directly into the non automated stuff once i am "ready"?

heady hatch Nov 14, 2020, 7:38 AM

#

When you say subcategory of ml, are you referring to something like this?

#

https://cloud.google.com/automl

Google Cloud

Cloud AutoML - Custom Machine Learning Models | Google Cloud

Cloud AutoML helps you easily build high quality custom machine learning models with limited machine learning expertise needed.

#

I mean not just the one that google made, but pretty much this whole section where ml is automated.

#

btw please don't worry about offending me in any way, my advice aren't gold or absolute truth. hahaha

indigo steppe Nov 14, 2020, 7:47 AM

#

no need to offend you,i just learned about automl which i consider usefull information.even if you were a "douchebag",i learned something from that "douchebag" which is progress in my book.so thank you for that "douchebag" (pls i am just joking,you are helping me so thx).never heard about this cloudml stuff,i was reading about auto sklearn,TPOT and hyperopt and found out there are a few libraries that do automl so i am not sure if i should go play with them or if it is a waste of time for learning about ml in trading.i guess they won't help me make money but at least i could learn MAYBE something

#

oh,autokeras is one of them too

heady hatch Nov 14, 2020, 7:52 AM

#

Again depending on your learning style. But ml is just as wide as it is deep.

Can you use ml without any understanding of how things work just via calling libraries? Sure. In fact automl was made for things like this. Where you can use ml solutions without knowing how anything works other than the libraries or the api themselves.

Depending on your future job responsibility, it's hard to implement a model if you cannot train, test, debug or monitor it.

And automl will probably help ease with that. I don't know the degree of customization they allow you. And for most problems, it seems like a cookie cutter solution would do.

#

If you want to jump as quickly into deep learning as soon as possible, learn your python first and you can try fastai's deep learning course.

it's very well made and they teach from a top down perspective.

#

Meaning that if you just want to stop at calling libraries, sure.

#

And if you want to go farther, then they offer that too.

#

Nonetheless, Python is still needed.

#

To answer your question, whether these ml solutions will help you learn actual ml, I don't know since I've never used them.

You can gauge it for yourself. Use the library and look up papers, conversation, and notebooks and see if you can understand what others are talking about after using them.

#

I was taught under the discipline of implementing things myself and take things apart.

And this is how I learn. People are different, maybe these solutions will help you like training wheels on a bike.

indigo steppe Nov 14, 2020, 8:05 AM

#

have you gone through the course of fastai?it just read a bit about it and it sounds interesting.the thing that i am interested in is,is it only a course/tutorial or does it have assignments and "homework" too..?i went through many,many tutorials and tbh i forgot 80-90% of it and with time it will be even more i guess.so i just recently started with the foundations again so i can implement the tutorials knowledge into assignments and exercises.so does fastai offer some kind of exercises?

heady hatch Nov 14, 2020, 8:23 AM

#

I didn’t take the in person class, but just watched the videos.

I don’t think there are exercises or homework for the online videos but they did release their notebooks for people to study from.

indigo steppe Nov 14, 2020, 8:33 AM

#

thank you very much,i appreciate your time

mossy dragon Nov 14, 2020, 9:22 AM

#

I'm looking a dataset for classification with p>=40 and n>max(500,p*5) if anyone happens to know of one please dm me!

lapis sequoia Nov 14, 2020, 10:45 AM

#

yo man

#

wtf

#

ive never seen this error in my life

#

AssertionError: in user code:

    <ipython-input-10-f20008174a6b>:13 train_step  *
        disc_real_output = discriminator(target_images, training=True)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:985 __call__  **
        outputs = call_fn(inputs, *args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py:386 call
        inputs, training=training, mask=mask)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py:517 _run_internal_graph
        assert x_id in tensor_dict, 'Could not compute output ' + str(x)

    AssertionError: Could not compute output Tensor("activation_27/Sigmoid:0", shape=(None, 1), dtype=float32)```

#

@cobalt jetty i tried the keras preprocessing thing seems to work fine in terms of using an input so thanks heaps for that suggestion man

#

but i cant seem to figure out this error lol

cobalt jetty Nov 14, 2020, 11:27 AM

#

sounds like you have an issue with your 27th activation layer

#

maybe a shape issue

lapis sequoia Nov 14, 2020, 11:38 AM

#

im not sure

#

maybe its because im trying to downsample from 32 to 28

#

https://pastebin.com/Wtim2kmh

Pastebin

generator - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

does strides work with non integers

lapis sequoia Nov 14, 2020, 12:40 PM

#

would anyone know how i could downsample the inputs to 28x28 using conv2d? or by another means?

cobalt jetty Nov 14, 2020, 12:58 PM

#

strides=(1.1428,1.1428)

#

what are you trying to achieve here?

lapis sequoia Nov 14, 2020, 1:02 PM

#

downsampling from 32,32,1, to 28,28,1

#

but i realised it doesnt work just now lol

#

ill send my discriminator cos thats where the issue is happening

#

one sec

#

https://pastebin.com/HFXwbfNj

Pastebin

discriminator - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

#

x = Dense(1,activation='sigmoid')(x) seems 2 be the issue

mortal pendant Nov 14, 2020, 1:08 PM

#

@mortal pendant Could you give us an example of what the keys and values are like? How long are the strings?
@heady hatch I’m on mobile so can’t get an example but hope this helps: there are 5 6-12 character keys and the values all contain atleast 300 strings, where these strings’ lengths vary a lot- anywhere from 1 character to 2000 characters. Most of the time, they’re around 300 characters though. The dataset is flexible for the vocab, so if the AI would train better with just alphanumeric characters then it would be fine to just filter it down to that, but it can also contain a wide range of symbols including emojis if that would help. I can also filter out strings that are too long or too short if that would help, though this isn’t as flexible and would greatly decrease the amount of data

#

If this helps anyone else, here’s the context #data-science-and-ml message

lapis sequoia Nov 14, 2020, 1:20 PM

#

Would you guys help me running opencv functions in GPU

mortal pendant Nov 14, 2020, 1:37 PM

#

btw for my issue it’s also worth noting I’m using Google Colab if that makes a difference

lapis sequoia Nov 14, 2020, 1:47 PM

#

ok i fixed my activation problem the issue was me concatenating my inputs but i dont think i actually need to do that

#

but im having compatibility issues getting my input and the mnist dataset to work, 256x256 images wont work

#

@cobalt jetty sorry to bother you again, do you think i should change my images to like 224x224 and then progressively downsample to 28x28

#

using keras preprocessing

cobalt jetty Nov 14, 2020, 1:51 PM

#

your issue, I think is that you're using a sigmoid activation function, which is used for binary classification. It won't help to get a matrix output imo.

#

Look into softmax activation

lapis sequoia Nov 14, 2020, 1:51 PM

#

oh nah i figured that out it should be all good

#

im getting this issue because im trying to downsample a 256x256 image and get a 28x28 but i cant do that mathematically

#

a new issue i mean

cobalt jetty Nov 14, 2020, 1:52 PM

#

downsampling is just a tool, you should instead look into how your shapes change across the learning process, maybe add a 784 dense layer at the end where each neuron is a grayscale value output

lapis sequoia Nov 14, 2020, 1:52 PM

#

.WARNING:tensorflow:Model was constructed with shape (None, 28, 28, 1) for input Tensor("input_2:0", shape=(None, 28, 28, 1), dtype=float32), but it was called on an input with incompatible shape (1, 32, 32, 1).

cobalt jetty Nov 14, 2020, 1:52 PM

#

so you can reconstruct your 28x28 picture.

#

yeah, you're feeding a wrongly shaped input into one of your layers

lapis sequoia Nov 14, 2020, 1:53 PM

#

yeah thats what i mean

#

so in my generator model do you think i should add a dense layer that then connects to another layer to output a 28x28 image

cobalt jetty Nov 14, 2020, 1:56 PM

#

You could .squeeze() your output labels (the MNIST image matrices) so it is represented as a 1d vector. Each point would represent a target for your neural network, so you don't have to worry about the end shape of the model beyond that it's a 784-element vector.

#

afterward you can just create a function that converts that 784-element vector into a 28x28 matrix.

#

iirc, a grayscale image is a single channel matrix compared to a RGB, so its shape for MNIST is technically 1x28x28, while if the picture was RGB, it would be 3x28x28 (due to the three channels R, G and B).

lapis sequoia Nov 14, 2020, 1:59 PM

#

ye thats right

cobalt jetty Nov 14, 2020, 2:00 PM

#

so, squeeze your output targets as 1x784 and create an end layer with 784 neurons that should output a value between 0 and 255.

#

The difficulty of your model, imo, is that images have some margin of error when you think about their interpretability. It's not because one image of a "3" is slightly lighter that it will look like a 4. So thinking about how you will measure the quality of your model is something to think about too. Which is an interesting problem. Your model matching your input to the output with the exact hue or saturation isn't necessary for it to be valid.

cerulean spindle Nov 14, 2020, 2:03 PM

#

You should also look into a dimensionality reduction technique like TruncatedSVD since MNIST data is 80% zeros. TruncatedSVD can eliminated those zeros so your model is faster

lapis sequoia Nov 14, 2020, 2:03 PM

#

well yeah thats fine because its all grey values anyway

#

true that makes sense

cerulean spindle Nov 14, 2020, 2:10 PM

#

If you train your model on data that’s (60000, 784) it’ll take three hours to train. I shrinked the data to (6000, 149) and got the same results (finished in < 1 min). If time is a factor for you of course

cobalt jetty Nov 14, 2020, 2:16 PM

#

MNIST data is 80% zeros
technically the MNIST data is made of roughly an even number of objects of each label (see picture). It's better to say that each object's features are mostly zeros, imo.

(6000, 149)
How long did it take to run the TruncatedSVD on the data?

📎 unknown.png

vapid sorrel Nov 14, 2020, 2:35 PM

#

Hi, I have some questions about the google colab, It is easy to run a Neural Network on colab with a TPU?

cobalt jetty Nov 14, 2020, 2:37 PM

#

I've not used Colab often, but you have a TPU option for free. It doesn't mean it will be a lot of TPU of course.

#

But perf should be improved somewhat compared to a 'normal' CPU instance.

vapid sorrel Nov 14, 2020, 2:44 PM

#

Do wou have a linl to understand how to implement the model to train with TPU?

cobalt jetty Nov 14, 2020, 2:49 PM

#

tbh, if you're using Gcolab it should be seamless. It's basically a dropdown select in the options.

#

otherwise, look up the blogs Towards Data Science

#

there must be one looking over the concept.

vapid sorrel Nov 14, 2020, 2:50 PM

#

I'll check thanks

#

I don't know if this is the good channel but do you know if exist a server speaking about data science for trading in python, algorithmic tradin, etc?

cobalt jetty Nov 14, 2020, 3:02 PM

#

I don't, sorry. You might want to hit up reddit and find related servers on specialized subreddit. Otherwise, look into data science applied to time series. That's what you want to look at.

hollow sentinel Nov 14, 2020, 4:07 PM

#

hey guys

#

has anyone done this course on edX: Machine Learning with Python: from Linear Models to Deep Learning

#

it's an MIT course that uses python for machine learning

hollow sentinel Nov 14, 2020, 4:28 PM

#

sike i saw the reviews for it it's pretty bad

mortal pendant Nov 14, 2020, 4:48 PM

#

For my question, I've now finished writing some code that tries to just provide each key a score based on simply how many times each word used in the dataset occurs in each key in the dict, but it's only 30.0% accurate which is just 1.5x better than if it were to just choose completely at random since there are only 5 keys atm. This should also give more clarity about the format of the dataset https://colab.research.google.com/drive/1EuMSz-Dcgulphjs8bRv1XO_lr7M5Doko?usp=sharing
So, any ideas how I could get started improving this accuracy using AI? Thanks so much in advance!

hollow gull Nov 14, 2020, 6:06 PM

#

@mellow saffron I think you are going to have to go into more detail about what the problem is for people to be able to help.

#

Maybe this is similar to a small part of that problem, but I don't think it uses opencv.
https://aws.amazon.com/deepracer/

Amazon Web Services, Inc.

AWS DeepRacer - the fastest way to get rolling with machine learning

AWS DeepRacer is the fastest way to get rolling with machine learning, literally. Get hands-on with a fully autonomous 1/18th scale race car driven by reinforcement learning, 3D racing simulator, and global racing league.

#

@mortal pendant Maybe look into TFIDF
https://en.wikipedia.org/wiki/Tf–idf#:~:text=In information retrieval%2C tf–idf,in a collection or corpus.

Tf%E2%80%93idf

#

@re

any idea which one would be best suitable for this?
@mellow saffron Maybe someone else has better ideas than I do, but I am imagining that as a multi-step problem with a couple of challenging parts. I don't work much with images, so I am a little out of my depth. Building a CNN that is able to classify different plant types based on images might be a useful early step. Image segmentation might make the later tasks easier, but I haven't seen implementations of image segmentation outside of screenshots on aws sagemaker. Then there will also be the problem of identifying where to move based on the image, that seems more in line with the AWS DeepRacer that I posted. I have seen blogs where people showed code to solve toy problems that didn't have problems like tall weeds.

#

@mortal pendant This feels as much like a natural language processing (NLP) problem as a classification problem to me, but maybe I am missing the point. I am imagining using NLP to build features that you could then put into a logistic regression or other classification algorithm to obtain higher accuracy. It sounds sort of like the goal is to use ML in the solution, not necessarily to achieve the best accuracy by any method.

#

but i take it pulling images from that stream is what i'd do in the end anyways?
@mellow saffron It is the simple hacky solution instead of the more elegant possibly more complicated solution. I am not saying it is the better way, but it feels easier to implement.

hollow gull Nov 14, 2020, 6:38 PM

#

Short answer: No, I have no idea. Longer answer: That seems like it would be really dependent on what your end solution ends up looking like. Image processing is pretty computationally expensive though, which I would imagine might be challenging for a raspberry pi. I would probably search of projects on google and see if anyone has done something similar with some reasonable degree of success.

#

Has anyone had trouble with Virtualenv inside of pycharm? I use to use conda environments, but it felt like Virtualenv was preferred (I don't remember what caused me to think that.) I feel like since I switched I am having many more issues with my virtual environments failing for whatever reason (not finding a package that use to be there) Anyone have any experience similar to this / suggestions on one type of environment over another?

heady hatch Nov 14, 2020, 7:04 PM

#

@hollow gull
I can't say much on the pycharm part, but in terms of virtual environments, I used to use pipenv which managed the virtual environments and packages for me.

Then I went to python's native venv since I wanted to switch package manager.

hollow gull Nov 14, 2020, 7:09 PM

#

@heady hatch can you provide any context on why you wanted to switch or why you switched to python's native venv?

heady hatch Nov 14, 2020, 7:12 PM

#

@hollow gull
I think I just wanted to try different package managers.

It was either poetry or pip-tools.

poetry is like pipenv, comes with its own virtual environment manager. But pip-tools did not, which means I had to manage my own virtual env.

I ended up going with pip-tools because it's more robust plus easier to transition across projects.

#

Though to clarify, I don't use any IDE.

#

So I'm not quite sure how the interaction with IDE is over there.

#

with vscode, it's just setting the venv's python as the interpreter.

#

and all its packages and environment stuff will be there.

#

Being honest, people in my circle don't really use Virtualenv anymore. Just because there's a native solution or they use the package manager for it.

#

Often in the latter.

hollow gull Nov 14, 2020, 7:15 PM

#

Cool, thanks for the info.
Yeah, it should be that easy inside an IDE as well, I assume I just didn't do a good job of cleaning things up before transferring or there is some other historical issue that is confusing some part of the process.

heady hatch Nov 14, 2020, 7:16 PM

#

Ahh fair fair. Would package reinstall be suitable?

hollow gull Nov 14, 2020, 7:18 PM

#

Yeah, I tried that with no luck, then I deleted the virtual environment and started over with no luck, then I downgraded the package that was causing issues to an earlier version which seems to have 'fixed' it. Maybe I am assigning the blame to virualenv and pycharm when it really was an issue with the library and I didn't remember upgrading to the most recent/broken? version.

#

It isn't like it is some unsupported package, it was numpy.

heady hatch Nov 14, 2020, 7:19 PM

#

Oh man.

hollow gull Nov 14, 2020, 7:19 PM

#

But hey, it is working again, so back to working on things that are interesting.

heady hatch Nov 14, 2020, 7:54 PM

#

Hey guys, side question that's kind of related to data science.

Let's say your team and you share a development server on the cloud. How do you guys manage your ssh keys?

or do you just have a server for each dev?

lapis sequoia Nov 14, 2020, 8:28 PM

#

I would like to get a bit of assistance regarding anaconda.

hollow gull Nov 14, 2020, 8:44 PM

#

@lapis sequoia what assistance do you need?

lapis sequoia Nov 14, 2020, 8:49 PM

#

@hollow gull I've been having trouble with virtual enviorments lately, this is my 3rd day not overcoming the issues so I've decided to try anaconda, which I've been told is a lot friendlier with virtual enviorments, 1. when I create the enviorment on anaconda navigator the launch visual studio I don't get the README.md page, 2. if you look on vscode terminal top right says bash, shouldn't it say conda or something similar?

hollow gull Nov 14, 2020, 8:53 PM

#

I haven't used visual studio, and it seems like visual studio knowledge is an important part of answering your questions.

I would expect there to be some indicator in your terminal that you are in a virtual environment.

#

I am skimming this have you looked through it/ does it look potentially helpful? https://code.visualstudio.com/docs/python/environments

Using Python Environments in Visual Studio Code

Configuring Python Environments in Visual Studio Code

#

https://code.visualstudio.com/docs/python/environments#_manually-specify-an-interpreter

Using Python Environments in Visual Studio Code

Configuring Python Environments in Visual Studio Code

lapis sequoia Nov 14, 2020, 8:58 PM

#

for some reason the visual studio sites are not working.

#

they are charging now.

#

thanks

hollow gull Nov 14, 2020, 9:06 PM

#

"""However, launching VS Code from a shell in which a certain Python environment is activated does not automatically activate that environment in the default Integrated Terminal.""" seems like it might be an issue for you depending on how you are trying to get into the conda environment.

#

Later on they give examples of how to select the interpreter. Maybe you are still in the base environment and that is why it isn't indicating an environment.

stray ivy Nov 14, 2020, 10:24 PM

#

if you're in a bash terminal & activate a venv, you'll see the name of the root folder of your venv in the terminal, which signifies that you're venv is activated

late jackal Nov 14, 2020, 11:12 PM

#

anyone around here who is well versed in the Pandas package

velvet thorn Nov 14, 2020, 11:23 PM

#

anyone around here who is well versed in the Pandas package
@late jackal don’t gatekeep, just ask

late jackal Nov 14, 2020, 11:25 PM

#

not famillar with the term but thank you i am trying to convert a column in minutes to date time but i cant seem to find how to specify HH:MM:SS not including like month or days

velvet thorn Nov 14, 2020, 11:25 PM

#

pd.to_datetime

late jackal Nov 14, 2020, 11:26 PM

#

i have that but it seems like it wants like MM DD HH:MM:SS

velvet thorn Nov 14, 2020, 11:29 PM

#

check the documentation

#

there’s a parameter that lets you change that

hollow gull Nov 14, 2020, 11:52 PM

#

@late jackal Are you sure that makes sense, can you have a datetime = 'date' + 'time' without a date? You could use today's date information to build a datetime with whatever time you want to specify.

#

Even a timestamp I think will always have a date as part of it, I think. Otherwise all of the benefits that I see of datetimes go away. What is subtraction on time hours and minutes without knowing what days they correspond to?

boreal summit Nov 15, 2020, 12:17 AM

#

Mehn, I just tried to install kite on my PC based on someone's recommendation here. It couldn't install on my PC for some reason.

#

📎 20201115_011732.jpg

#

So sad. ☹️

#

I use a HP EliteBook 8440p

heady hatch Nov 15, 2020, 12:28 AM

#

You'll just have to make do with vscode and such.

ruby wyvern Nov 15, 2020, 4:53 AM

#

It will be great if anyone can help me with this question

#

https://stackoverflow.com/questions/64840285/how-to-generate-a-list-of-tokens-that-are-most-likely-to-occupy-the-place-of-a-m

Stack Overflow

How to generate a list of tokens that are most likely to occupy the...

How to generate a list of tokens that are most likely to occupy the place of a missing token in a given sentence? (Without PyTorch)
For example,

sentence = 'Roger Federer is a Swiss

lapis sequoia Nov 15, 2020, 5:16 AM

#

Need some help with confusion matrices

#

pls lmk if anyone can help and ill send source code and dataset

austere swift Nov 15, 2020, 5:20 AM

#

Whats your question

#

!paste @lapis sequoia

arctic wedgeBOT Nov 15, 2020, 5:21 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

lapis sequoia Nov 15, 2020, 5:25 AM

#

    cm = confusion_matrix(actual, predicted)
    sns.heatmap(cm, annot=True,  fmt='.2f', xticklabels = [0,1] , yticklabels = [0,1] )
    plt.ylabel("Predicted_Values")
    plt.xlabel("Actual_Values")
    plt.show()

draw_cm(y_predict, z['Observed Loan Status'])```

#

Error I'm getting: Classification metrics can't handle a mix of binary and unknown targets

austere swift Nov 15, 2020, 5:26 AM

#

you need to convert your predictions into binary

#

with argmax

lapis sequoia Nov 15, 2020, 5:27 AM

#

hmm

#

@austere swift in my notebook above, i dont think i am supposed to

austere swift Nov 15, 2020, 5:29 AM

#

predicted is usually a float

lapis sequoia Nov 15, 2020, 5:29 AM

#

my predicted is binary

#

i messed up the order

#

the y_predict is in binary

austere swift Nov 15, 2020, 5:30 AM

#

and what about z['Observed Loan Status']

lapis sequoia Nov 15, 2020, 5:30 AM

#

but the z["Observed Loan Status"] is outputing: LogisticRegression(fit_intercept=False, random...

#

its a logistic regression

#

idk why its not also in binary

austere swift Nov 15, 2020, 5:31 AM

#

they both need to be binary for the confusion matrix to work lol

lapis sequoia Nov 15, 2020, 5:31 AM

#

so that means something sup with my logistic regression im assuming

#

@austere swift logreg = LogisticRegression(random_state=42,fit_intercept=False) logreg.fit(X_train, Y_train)

#

this is my logistic regression code but its working fine

#

it returns: LogisticRegression(fit_intercept=False, random_state=42)

#

so i dont know where im going rong

lapis sequoia Nov 15, 2020, 6:52 AM

#

fi = pd.DataFrame()
a = logreg.dict["coef_"]
#fi['Col'] = np.arange(0, 14)
fi['Coeff'] = a

fi.sort_values(by='Coeff',ascending=False)

fi

#

need some help with pandas dataframe functions

#

im getting an error that the data must be 1d

#

but i dont know how to proceed with that

#

any help would be appreciated

lapis sequoia Nov 15, 2020, 9:22 AM

#

https://machinelearningmastery.com/resources-for-linear-algebra-in-machine-learning/?unapproved=574301&moderation-hash=086e01875a99cbd139c5eb74786193b2#comment-574301

Machine Learning Mastery

Jason Brownlee

Top Resources for Learning Linear Algebra for Machine Learning

How to Get Help with Linear Algebra for Machine Learning? Linear algebra is a field of mathematics and an important pillar of the field of machine learning. It can be a challenging topic for beginners, or for practitioners who have not looked at the topic in decades. In this p...

#

☝️What do you think of these books? Should I read them all?

lapis sequoia Nov 15, 2020, 12:26 PM

#

Yes You Should

#

MSE = tf.keras.losses.MeanSquaredError()
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)
alpha = tf.random_normal_initializer(mean=0.0, stddev=0.05, seed=None)
def discriminator_loss(disc_real_output, disc_fake_output):
    real_loss = cross_entropy(tf.ones_like(disc_real_output), disc_real_output)
    fake_loss = cross_entropy(tf.zeros_like(disc_fake_output), disc_fake_output)
    total_loss = real_loss + fake_loss
    return total_loss
def generator_loss(disc_fake_output, gen_output, target):
    total_loss = MSE(target, gen_output) + (alpha * cross_entropy(tf.ones_like(disc_fake_output), disc_fake_output))
    return total_loss``` 
do yall see anything wrong with my loss functions? i think it's trying to convert datasets to tensors again but i dont see where im making the mistake of putting a dataset as an input?

#

cos im getting the same TypeError: Failed to convert object of type <class 'tensorflow.python.data.ops.dataset_ops.BatchDataset'> to Tensor. Contents: <BatchDataset shapes: (None, 28, 28, 1), types: tf.float32>. Consider casting elements to a supported type.

#

i defined everything there in the train step

    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
      gen_output = generator(image_batch, training=True)

      disc_real_output = discriminator(target_images, training=True)
      disc_fake_output = discriminator(gen_output, training=True)

      gen_total_loss, gen_gan_loss, gen_l1_loss = generator_loss(disc_fake_output, gen_output, target_dataset)
      disc_loss = discriminator_loss(disc_real_output, disc_fake_output)

    generator_gradients = gen_tape.gradient(gen_total_loss,
                                          generator.trainable_variables)
    discriminator_gradients = disc_tape.gradient(disc_loss,
                                               discriminator.trainable_variables)```

#

and the error traceback saying the problem happening with mean squared error function

#

but i cant figure out what goin on

brave crest Nov 15, 2020, 2:38 PM

#

How do I split my database three times instead of two? currently using the train_test_split

heady hatch Nov 15, 2020, 2:43 PM

#

@brave crest use train_test_split again on the dataset you want.

brave crest Nov 15, 2020, 2:43 PM

#

Oh! Thanks will do!

heady hatch Nov 15, 2020, 2:50 PM

#

@lapis sequoia What's your gen_output and disc_output, both disc output?

lapis sequoia Nov 15, 2020, 2:51 PM

#

no

#

gen output is the output of the generator

#

and outputs an image

#

discriminator is a network that is binary classifier, saying either true or false

heady hatch Nov 15, 2020, 2:52 PM

#

I apologize, to clarify what are they outputting.

#

Are they output tensors, datasets, etc etc.

lapis sequoia Nov 15, 2020, 2:52 PM

#

gen should output a tensor that can be read as an image

#

disc is just binary yes or no to see if the image is fake or not

heady hatch Nov 15, 2020, 2:53 PM

#

Could you double check?

#

So we can trace down your problem.

lapis sequoia Nov 15, 2020, 2:53 PM

#

ill post my model

#

https://pastebin.com/PMAgN9RQ

Pastebin

gan full - Pastebin.com

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

heady hatch Nov 15, 2020, 2:54 PM

#

This is tf2, right?

lapis sequoia Nov 15, 2020, 2:54 PM

#

2.3

#

yea

#

generator is an image generator literally and discriminator is a binary classification network

#

its 2am for me so im gonna sleep

heady hatch Nov 15, 2020, 2:58 PM

#

okie dokes.

lapis sequoia Nov 15, 2020, 3:03 PM

#

if you find anything in the model that i havent seen that might be cooking the whole thing and messing it up, lmk ping me/dm me im fine with either ive been looking at this trying to figure out whats wrong for hours on end now and i just cant pick it

#

appreciate any and all help : )

#

Yes You Should
@lapis sequoia

Okay Thank you, It will take a few 2month

lapis sequoia Nov 15, 2020, 3:40 PM

#

i need help with clustering in python, im not sure which columns are suitable to perform clustering over in this dataset https://www.kaggle.com/aramacus/electricity-demand-in-victoria-australia

Daily Electricity Price and Demand Data

Daily price, demand and weather data in Australia's second largest state

heady hatch Nov 15, 2020, 3:52 PM

#

Why not all?

hollow gull Nov 15, 2020, 3:59 PM

#

@glad mulch you can use pd.merge, pd.join, df.merge, or df.join. One uses the index the other you have to pass what column you want to join on (which would probably be easier in your case). In your case it looks like you want to join on the ticker column.

#

@lapis sequoia what is the type of logreg.dict["coef_"]?

lapis sequoia Nov 15, 2020, 4:00 PM

#

doesn't one column need to be the outcome? @heady hatch

heady hatch Nov 15, 2020, 4:00 PM

#

For clustering? What do you mean and what do you have in mind?

lapis sequoia Nov 15, 2020, 4:03 PM

#

i think i will use k-means clustering, doesnt one column have to be the true label

#

so i was thinking maybe sorting the demand into high, medium, low for the true label @heady hatch

heady hatch Nov 15, 2020, 4:04 PM

#

Clustering is usually an unsupervised task.

#

It produces labels for you.

hollow gull Nov 15, 2020, 4:57 PM

#

I don't think I understand exactly what you mean yet, can you elaborate further? To clarify what I mean, it seems like you are telling me how you want to do something but not what you want to do. If you can go into what you are trying to do, it might help me understand what you are asking.

desert oar Nov 15, 2020, 4:59 PM

#

@glad mulch you want to iterate through unique values of the date index?

#

df.index.get_level_values('Date').unique() iirc

#

Or df.index.unique(level='Date') maybe

twin moth Nov 15, 2020, 5:02 PM

#

Hey guys, I am currently trying to scrape a website for the sake of studying data science but I'm currently stuck on the following:

<ul>
<li>
<h3 class="match">
        John
        <span class="number">
            123
        </span>
</h3>
</li>
<li>
<h3 class="match">
        Bob
        <span class="number">
            619
        </span>
</h3>
</li>
<ul>
...

I am using BS4 and I want to run a for loop which will iterate over all those h3s and return the name and number.
I was trying to use the following syntax:

for name, number in current.find_all('h3').text.split():
  list.append(MyObject(name, number))

But unfortunately I cannot do that since text cannot be run on an the return value of find_all, any ideas on how to solve it in an elegant way?

desert oar Nov 15, 2020, 5:03 PM

#

Loop over the output of find_all

#

Don't overthink it

twin moth Nov 15, 2020, 5:03 PM

#

I did that, used a temp variable and stored the returned array, then I just took indices 0 and 1 and put them in that object

#

But want something more elegant

#

P.S. thanks for the quick response 🙂

desert oar Nov 15, 2020, 5:05 PM

#

items = []
for elem in curr.find_all('h3'):
    name, num = elem.text.split(maxsplit=1)
    items.append(MyObject(name, num))

#

This is perfectly idiomatic python

#

That said there is a lot of inefficiency here

#

Let me get to a real computer and i will show you

twin moth Nov 15, 2020, 5:06 PM

#

TYTY

desert oar Nov 15, 2020, 5:09 PM

#

@twin moth the first problem is that find_all constructs a full list in memory, which is slow and wasteful if you're just iterating over it

#

ah you know what

#

beautifulsoup doesn't support lazy iteration

#

but you can write this as a list comprehension at least, which might or might not feel more elegant to you

#

items = [
    MyObject(*elem.text.split(maxsplit=1))
    for elem in curr.find_all('h3')
]

#

perhaps even more "fancy" is doing it like this:

def process_h3(elem):
    name, num = elem.text.split(maxsplit=1)
    return MyObject(name, num)

items = list(map(process_h3, curr.find_all('h3')))

twin moth Nov 15, 2020, 5:14 PM

#

perhaps even more "fancy" is doing it like this:

def process_h3(elem):
    name, num = elem.text.split(maxsplit=1)
    return MyObject(name, num)

items = list(map(process_h3, curr.find_all('h3')))

@desert oar That's nice!

desert oar Nov 15, 2020, 5:18 PM

#

what would be more efficient is, if you only need h3 tags and don't need other stuff from the page, you can use SoupStrainer: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#parsing-only-part-of-a-document

#

that lets you specify only specific parts of the page to parse, which could be faster

twin moth Nov 15, 2020, 5:18 PM

#

Nah, I'm using a shit-ton of other things

#

@desert oar is the last line in that code supposed to be in the function?

#

Oh, nvm, just noticed that's it called from it

#

Okay, so I've tried it all but seems like it doesn't work

#

I get a myriad of errors when I try it

#

Either TypeError: 'NoneType' object is not callable or TypeError: __init__() takes 2 positional arguments but 3 were given

#

I think that I'd just go back to using the one way that works

        for li in soup.find('ul').find_all('li', class_=True, recursive=False):
            each = li.find('h3', class_='match')
            each = each.text.split()
            curr_poke.lister.append(MyObject(each[0], each[1]))

steep olive Nov 15, 2020, 5:35 PM

#

Hey... Can somebody tell me how to load a file from my PC to Tensorflow?

twin moth Nov 15, 2020, 5:39 PM

#

Oh, sorry, that's an older gen

        for h3 in soup.find('ul', class_='bla').find_all('h3', class_='match'):
            each = h3.text.split()
            obj.lister.append(MyObject(each[0], each[1]))

Thanks for the help @desert oar , if you have any other ideas on how to fix it I'd be delighted to hear

proud iron Nov 15, 2020, 5:51 PM

#

Guys, would someone please help me understand this page of the documentation of "matplotlib" please?
https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.bar.html

It regards the creating a bar chart. In which parameter do we place the height of each column? Where do we place the labels for which column?

twin moth Nov 15, 2020, 6:01 PM

#

Guys, would someone please help me understand this page of the documentation of "matplotlib" please?
https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.bar.html

It regards the creating a bar chart. In which parameter do we place the height of each column? Where do we place the labels for which column?
@proud iron Never used it, but seems like there are examples in the bottom of the page, maybe you could use them

late jackal Nov 15, 2020, 6:16 PM

#

i posted this yesterday and some very nice people gave sugestions but I had to go before I could try and trouble shoot it with them

        self.pframe1 = sql.ExcelWriter1()
        self.pframe1['Time'] = self.pframe1['Time'].astype(float)
        self.pframe1['Time'] = pd.to_datetime(self.pframe1['Time'], format='%H:%M:%S')
        print(self.pframe1)

I have this code where i am trying to take a column of elapsed time in minutes and convert it to HH:MM:SS

#

yet im given this error

Traceback (most recent call last):
  File "c:\Users\Nicks\Desktop\Deethanizer V4\HYSYSapp\source1\tankui.py", line 580, in onSaveData
    self.darray.writer(file_dir)
  File "c:\Users\Nicks\Desktop\Deethanizer V4\HYSYSapp\source1\tankui.py", line 802, in writer
    self.pframe1['Time'] = pd.to_datetime(self.pframe1['Time'], format='%H:%M:%S')
  File "C:\Users\Nicks\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 803, in to_datetime
    values = convert_listlike(arg._values, format)
  File "C:\Users\Nicks\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 454, in _convert_listlike_datetimes
    raise e
  File "C:\Users\Nicks\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 418, in _convert_listlike_datetimes
    arg, format, exact=exact, errors=errors
  File "pandas\_libs\tslibs\strptime.pyx", line 144, in pandas._libs.tslibs.strptime.array_strptime
ValueError: time data '25' does not match format '%H:%M:%S' (match)

#

it should alredy be all floats but i added that just incase there was some issue with it not being floats

proud iron Nov 15, 2020, 6:24 PM

#

@twin moth cheers! 🙂

hollow gull Nov 15, 2020, 6:31 PM

#

@twin moth I have frequently gotten errors like: TypeError: __init__() takes 2 positional arguments but 3 were given when I am get confused an don't properly instantiate my objects. For example, if you are suppose to do MyObject().run() and you do MyObject.run(). I don't know if that is what is going on in your code, but error messages like that give me painful flash backs.

#

@late jackal that seems like a really instructive error message. Your value seems to be the string '25' and pd.to_datetime doesn't know how to convert that to a '%H:%M:%S' format. Maybe you can use datetime.timedelta objects instead of datetimes? I would add a print(self.pyfram1) before the line that is failing.

late jackal Nov 15, 2020, 6:35 PM

#

i figured i am converting the entire column to floats first though

#

im trying it with the print moved now unfortunatly i have to reload the whole app anytime i make a change to the code

#

C:\Users\Nicks\Desktop\datasheet.xlsx
         Time
0   24.169001
1   25.581001
2   27.978501
3   27.978501
4   29.105001
5   30.398001
6   31.770001
7   33.011001
8   34.184501
9   36.915001
10  38.162002
11  40.650002
12  41.817002
Traceback (most recent call last):
  File "C:\Users\Nicks\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 450, in _convert_listlike_datetimes
    values, tz = conversion.datetime_to_datetime64(arg)
  File "pandas\_libs\tslibs\conversion.pyx", line 350, in pandas._libs.tslibs.conversion.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'int'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\Nicks\Desktop\Deethanizer V4\HYSYSapp\source1\tankui.py", line 580, in onSaveData
    self.darray.writer(file_dir)
  File "c:\Users\Nicks\Desktop\Deethanizer V4\HYSYSapp\source1\tankui.py", line 803, in writer
    self.pframe1['Time'] = pd.to_datetime(self.pframe1['Time'], format='%H:%M:%S')
  File "C:\Users\Nicks\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 803, in to_datetime
    values = convert_listlike(arg._values, format)
  File "C:\Users\Nicks\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 454, in _convert_listlike_datetimes
    raise e
  File "C:\Users\Nicks\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 418, in _convert_listlike_datetimes
    arg, format, exact=exact, errors=errors
  File "pandas\_libs\tslibs\strptime.pyx", line 144, in pandas._libs.tslibs.strptime.array_strptime
ValueError: time data '24' does not match format '%H:%M:%S' (match)

#

im very confused at this it seems the value is 24.16 not 24

#

@hollow gull

#

is there a way to get the "type" of that column?

hollow gull Nov 15, 2020, 6:39 PM

#

I would make a toy example to test out the logic outside of your entire app to make troubleshooting faster.

#

df.dtypes

late jackal Nov 15, 2020, 6:39 PM

#

yeah ive been thinking i should do that just didnt wanna have to look how to make a dataframe

#

i guess it isnt hard

hollow gull Nov 15, 2020, 6:40 PM

#

df = pd.DataFrame([24.169001, 25.581001], index=[0, 1])

late jackal Nov 15, 2020, 6:45 PM

#

import pandas as pd

d = {'Time': [24.169001, 25.581001,27.565457]}
df = pd.DataFrame(data=d,  index=[0, 1, 2])
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S')
print(df)

this is the test app

#

gives the same error

serene scaffold Nov 15, 2020, 6:46 PM

#

I have a dataframe where each row is a list of arbitrary strings (the columns are not meaningful) and I want to transform that to a table of string counts for each row.

hollow gull Nov 15, 2020, 6:47 PM

#

What is 24.169001 suppose to be?

#

hours?

late jackal Nov 15, 2020, 6:48 PM

#

24.16 minutes

#

converted into hours minutes seconds

hollow gull Nov 15, 2020, 6:48 PM

#

can you use timedeltas instead?

late jackal Nov 15, 2020, 6:48 PM

#

thats the time passed between two points

#

i don't think I can because i have a bunch of data tied to those exact times

hollow gull Nov 15, 2020, 6:49 PM

#

datetime.timedelta(0, 0, 0, 0, 24.16)

#

@glad mulch join with a how='inner' argument

late jackal Nov 15, 2020, 6:51 PM

#

@hollow gull i can try but if its the difference between two time intervals i don't think that will work

sleek yarrow Nov 15, 2020, 6:51 PM

#

e

hollow gull Nov 15, 2020, 6:52 PM

#

@late jackal you can create a timedelta by subtracting two datetimes, but you can also create them directly like the example I sent. I don't know how you plan to use this later, so maybe it doesn't solve your use, but it makes more sense to me to use a timedelta if you are talking about a duration than a datetime, which really isn't meant to handle durations (in my opinion)

late jackal Nov 15, 2020, 6:53 PM

#

ahh ok

#

idk what all the 0's represent

#

though

hollow gull Nov 15, 2020, 6:53 PM

#

@glad mulch you can specify the suffixes so the left version doesn't have a added '_x" then you can drop the '_y' versions. Otherwise, look at the arguments, there might be one to prevent duplicating the columns, I don't recall off the top of my head.

#

@late jackal look up the documentation. google: python datetime.timedelta

late jackal Nov 15, 2020, 6:54 PM

#

doing so now lol

hollow gull Nov 15, 2020, 6:55 PM

#

Or if you are in a good IDE it will show you the arguments. In pycharm, if you are inside the parentheses you can hit Ctrl+p and it will show you the argument names or you can hit Ctrl+B to navigate to the definition.

late jackal Nov 15, 2020, 7:04 PM

#

@hollow gull this seems to have worked and ill try implimenting it in a sec however it gives it to me in this format

                    Time
0 0 days 00:24:10.140060
1 0 days 00:25:34.860060
2 0 days 00:27:33.927420

#

i dont see a format arguement anywhere in the docs

#

any idea if i can get rid of Days

hollow gull Nov 15, 2020, 7:05 PM

#

What are you going to do with these durations?

late jackal Nov 15, 2020, 7:05 PM

#

feed them into a program that calculates tuning parameters for valve controlers

#

that program asks for hh:MM:SS

#

sure it can be done manually in excel but kind of defeats the purpose of automating it

hollow gull Nov 15, 2020, 7:06 PM

#

If you are going to do any arithmetic on them later, I would leave them as timedelta objects because it will handle those operations for you. If you really just want a number of hours, they why convert it out of a float?

late jackal Nov 15, 2020, 7:07 PM

#

because the program only accepts a .txt file with a column in hh:mm:ss

hollow gull Nov 15, 2020, 7:07 PM

#

Yeah, I don't see a elegant solution, I can think of a couple hacky ones.

late jackal Nov 15, 2020, 7:07 PM

#

ok, thanks though ill keep hitting my head against the wall for a bit xD

#

could i split it into two columns? like 1 doe days the other for hhmmss

hollow gull Nov 15, 2020, 7:07 PM

#

You can add the timedelta to a midnight datetime and then take the hours, minutes, and seconds from the constructed datetime.

#

timedelta.total_seconds() will return a float that you can then format as HH:MM:SS using a custom function.

late jackal Nov 15, 2020, 7:09 PM

#

that may work xD

hollow gull Nov 15, 2020, 7:10 PM

#

dt = datetime.datetime(2020, 1, 1)
dt += datetime.timedelta(0, 0, 0, 0, 24.16)
dt.time().isoformat()

#

or
dt = datetime.datetime(2020, 1, 1)
dt += datetime.timedelta(0, 0, 0, 0, 24.16)
dt.time().strftime('%H:%M:%S')

lapis sequoia Nov 15, 2020, 9:10 PM

#

in pandas i have a dataset. one column is the month and the other is the rainfall

#

there are multiple rows with the same month

#

i want to calculate the total rainfall for each month

#

how do i do it

#

because i want to plot a violin plot with that dara

#

data*

late jackal Nov 15, 2020, 10:12 PM

#

@hollow gull thought I'd let you know i did get it working so TYVM here is the code

 self.pframe1 = sql.ExcelWriter1()
 self.pframe1['Time'] = 60*self.pframe1['Time']
 self.pframe1['Time'] = pd.to_datetime(self.pframe1["Time"], unit='s').dt.strftime("%H:%M:%S.%f")
 print(self.pframe1)

sullen mortar Nov 15, 2020, 11:29 PM

#

I just created a RegEx that works perfectly but I'm wondering if it is disgusting 😆
Anyone care to review it?

#

https://regexr.com/5ga1h

RegExr

RegExr: Learn, Build, & Test RegEx

RegExr is an online tool to learn, build, & test Regular Expressions (RegEx / RegExp).

sullen mortar Nov 15, 2020, 11:47 PM

#

I've opened this up as a question in #help-chili 🧐

lapis sequoia Nov 16, 2020, 12:35 AM

#

can anyone help me in k-means clustering? i am really stuck, dont even know where to begin

velvet thorn Nov 16, 2020, 12:40 AM

#

can anyone help me in k-means clustering? i am really stuck, dont even know where to begin
@lapis sequoia ask your question

lapis sequoia Nov 16, 2020, 1:09 AM

#

@velvet thorn i dont even know where to start. what columns columns can i do it over https://www.kaggle.com/aramacus/electricity-demand-in-victoria-australia

Daily Electricity Price and Demand Data

Daily price, demand and weather data in Australia's second largest state

somber bane Nov 16, 2020, 1:11 AM

#

I got a question, is there anyway I can get the 'food name of username 'jack'>

#

username food
0 jack none
1 2 none
2 hishi 123123
3 ghfdsa 23
4 12 rfgvfbc
5 hihihiihi jack
6 g2 none23
7 g2qd 65s4
8 12 44
9 12 44
10 12 44ssss

#

using pandas

lapis sequoia Nov 16, 2020, 1:18 AM

#

@velvet thorn any ideas?

shell berry Nov 16, 2020, 2:47 AM

#

Can someone recommend some good tutorials/resources for intent classification with pytorch?

smoky bobcat Nov 16, 2020, 4:42 AM

#

good morning data hunters

#

I have a question, do we really decide the features to keep with heat map correlation? I mean what if it's like heart failure prediction and the heat map says that only 2-3 features of 10 are needed

#

would that calculate accurate?

desert oar Nov 16, 2020, 5:12 AM

#

no, do not use pairwise correlation to choose features in your model when you only have 10 features

#

unless you have a good reason to believe that only those 2-3 features are useful

#

e.g. if one feature is "is it raining today?" then obviously you can remove that feature

plush zenith Nov 16, 2020, 5:20 AM

#

Hi! Can i ask here something about sympy?

scarlet cloak Nov 16, 2020, 5:27 AM

#

Hey, I'm trying to write a script for looping through a VCF file (638 million lines) and an SNP list (15K lines) to match SNPs with chromosome and position and make an outfile with 3 columns for each peice of info. My code makes sense to me that it should work and I understand that it will take a while, im just wondering if people have any experience with this or about how long files of this size (~114gbs) usually take to loop through on a home pc

ripe field Nov 16, 2020, 6:55 AM

#

hey can someone tell me how to get statred with data science like i know python basic programming and the math but i dont know how to get started

lapis sequoia Nov 16, 2020, 7:02 AM

#

hey can someone tell me how to get statred with data science like i know python basic programming and the math but i dont know how to get started
@ripe field If you want a simple introduction to what ML is, checkout https://www.youtube.com/watch?v=IpGxLWOIZy4

YouTube

Luis Serrano

A Friendly Introduction to Machine Learning

Grokking Machine Learning Book: https://www.manning.com/books/grokking-machine-learning
40% discount promo code: serranoyt

A friendly introduction to the main algorithms of Machine Learning with examples.
No previous knowledge required.

What is Machine Learning: (0:05)
Linea...

▶ Play video

charred helm Nov 16, 2020, 8:13 AM

#

Yo, do you guys learn pandas, matplotlib, numpy together or one after the other? Currently I'm learning pandas, can't do anything with it as I don't know matplotlib and numpy.

#

Do I learn matplotlib and numpy alongside pandas?

lapis sequoia Nov 16, 2020, 8:50 AM

#

You can use plot directly from pandas, which uses matplotlib by default to make these plots

#

but you wont have to work with the matplotlib API directly

#

I dont know how you 'learn' that stuff one by one

#

but why not use one of the hundreds of toy projects which use these libraries

ruby wyvern Nov 16, 2020, 9:26 AM

#

Anyone here know BERT?

#

ping me

lapis sequoia Nov 16, 2020, 9:31 AM

#

OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.

#

u guys know what this is?

tawdry sage Nov 16, 2020, 10:15 AM

#

Anyone here know BERT?
@ruby wyvern
I’ve used a bit.
I’m trying compressive transformers now. It seams cool, but it’s not pretrained

ruby wyvern Nov 16, 2020, 10:15 AM

#

@tawdry sage any way I can find a missing word in a sentence?

#

Without pytorch

#

https://stackoverflow.com/questions/64851632/how-to-generate-a-list-of-tokens-that-are-most-likely-to-occupy-the-place-of-a-m

Stack Overflow

How to generate a list of tokens that are most likely to occupy the...

How to generate a list of tokens that are most likely to occupy the place of a missing token in a given sentence? (Without PyTorch)
For example,

sentence = 'Roger Federer is a Swiss

#

too lazy to type it all out again

#

lol

#

I searched through GitHub

#

And internet

#

All uses pytorch

tawdry sage Nov 16, 2020, 10:17 AM

#

Sure. I haven’t tried though.
I’m not sure, but I think you have to fine tune it with the data format you want.

#

I found much easier to use it with pytorch

ruby wyvern Nov 16, 2020, 10:18 AM

#

Pytorch is like numpy right?

tawdry sage Nov 16, 2020, 10:18 AM

#

But there is some documentation explaining how to fine tune in tensor flow

ruby wyvern Nov 16, 2020, 10:19 AM

#

I think I can use tensor flow

#

I’m using repl

#

let me try to download

tawdry sage Nov 16, 2020, 10:20 AM

#

Pytorch is like numpy right?
@ruby wyvern
Little different, it has some things that looks odd for a regular python syntax, but it’s not that hard

ruby wyvern Nov 16, 2020, 10:23 AM

#

@tawdry sage what should I do?

tawdry sage Nov 16, 2020, 10:23 AM

#

username food
0 jack none
1 2 none
2 hishi 123123
3 ghfdsa 23
4 12 rfgvfbc
5 hihihiihi jack
6 g2 none23
7 g2qd 65s4
8 12 44
9 12 44
10 12 44ssss
@somber bane
I hope you’ve found out already.
df.loc[df.username == ‘jack’].food

ruby wyvern Nov 16, 2020, 10:24 AM

#

Are there any package that does this?

#

GitHub repo

#

or any other thing

tawdry sage Nov 16, 2020, 10:28 AM

#

https://github.com/Qordobacode/fitbert

GitHub

Qordobacode/fitbert

Use BERT to Fill in the Blanks. Contribute to Qordobacode/fitbert development by creating an account on GitHub.

#

This one does that

#

Actually apparently you don’t have to re-train the model to fill the blanks.
But anyway, this seams to do the trick

ruby wyvern Nov 16, 2020, 10:30 AM

#

It uses torch TT

#

Unfortunately

tawdry sage Nov 16, 2020, 10:30 AM

#

You want to do it in tensorflow?

ruby wyvern Nov 16, 2020, 10:30 AM

#

📎 image0.png

#

Ye I think that will work

#

Btw thanks for helping me out

#

For real

tawdry sage Nov 16, 2020, 10:31 AM

#

But why you don’t want to install pytorch? I don’t get it

ruby wyvern Nov 16, 2020, 10:31 AM

#

I want to

#

Repl.it doesn’t work

#

It hasn’t been fixed for years

#

really annoying

tawdry sage Nov 16, 2020, 10:32 AM

#

Repl.it doesn’t work
@ruby wyvern
I don’t know that. Maybe you can use colab instead?

#

I’m google drive. Do you know?

ruby wyvern Nov 16, 2020, 10:33 AM

#

it’s for group

#

We started alr on repl

#

TT

tawdry sage Nov 16, 2020, 10:33 AM

#

Ahwm

#

Still you should be able to install using
!pip install torch

ruby wyvern Nov 16, 2020, 10:36 AM

#

nah it doesn’t work

#

I did that

tawdry sage Nov 16, 2020, 10:39 AM

#

Well. If you have Bert installed there, you can try to use it to predict the blanks.
But I really think it’s much easier to use torch or Albert

ruby wyvern Nov 16, 2020, 10:41 AM

#

Albert?

#

I think I can use that

lapis sequoia Nov 16, 2020, 11:02 AM

#

Hi everyone! My team and I are working on a collaborative python notebook called Deepnote. We have just opened the platform for public beta, so I'd love to hear your thoughts and feedback! Here's a sample Deepnote notebook showcasing Python 3.9: https://deepnote.com/project/09e2609b-986b-40fa-9f56-fcbbc60eb61d#%2Fnotebook.ipynb

A bit more context on the product: We've built Deepnote on top of Jupyter so it has all the features you'd expect - it's Jupyter-compatible, supports Python, R and Julia and it runs in the cloud. We improve the notebooks experience with real-time collaborative editing (just like Google Docs), shared datasets and a powerful interface with features like a command palette, variable explorer and autocomplete. We want Deepnote to be an interface that empowers programmers to collaborate and experiment easily. Looking forward to your feedback!

Project - Deepnote

vital marten Nov 16, 2020, 11:15 AM

#

Hi. Anyone know how to drop rows in a dataframe based on the data type?

I have 1000 rows with str and list mixed. I want to drop all the rows that has a list in them.

#

df2 = df2[(type(df2['text']) != 'list')] isn't working.

velvet thorn Nov 16, 2020, 11:24 AM

#

df2 = df2[(type(df2['text']) != 'list')] isn't working.
@vital marten you need to .map(type)

vital marten Nov 16, 2020, 11:25 AM

#

Ah! It's working now. Thanks.

velvet thorn Nov 16, 2020, 11:26 AM

#

I think you want != list

#

(no quotes)

charred helm Nov 16, 2020, 12:46 PM

#

What are some toy projects I could do to learn pandas

lapis sequoia Nov 16, 2020, 12:49 PM

#

Isn't there the pandas cookbook?

#

and least there was something like this some time ago

#

with a bike dataset iirc

brave crest Nov 16, 2020, 12:49 PM

#

I am using the train test split function to create: Training, Test validation. How do i split them 60/20/20. It keeps giving me "Found input variables with inconsistent numbers of samples"

lapis sequoia Nov 16, 2020, 12:49 PM

#

so bike usage in seattle?

#

something like that

#

split 60/40

#

and then 50/50

brave crest Nov 16, 2020, 12:50 PM

#

So test_size 0.4 and then 0.5?

lapis sequoia Nov 16, 2020, 12:52 PM

#

I got that correctly didnt I

#

you want to split the whole data into three groups, right?

#

so first split everything 60/40 and then split the 40% again in half

brave crest Nov 16, 2020, 12:53 PM

#

Yeah but i think im doing something wrong

lapis sequoia Nov 16, 2020, 12:53 PM

#

so you end up with 60/20/20

brave crest Nov 16, 2020, 12:53 PM

#

Yeah but i get a inconsistent number error

#

I think I might be comparing it to a bigger dataset somewhere

lapis sequoia Nov 16, 2020, 12:59 PM

#

Not sure, what your code looks like and when exactly there is an error

brave crest Nov 16, 2020, 1:10 PM

#

I fixed it. Well i think i did! I trained the model using my validation sets then predicted against the train one instead of the test one

#

I am quite new to this so I definitely sound stupid right now xD

eternal nacelle Nov 16, 2020, 1:17 PM

#

hi, idk if this is the right channel, but how can i save a dynamic file using matplotlib.savefig() ? right now im doing this ```py
plt.savefig('path/image_name.png')

#

i want something like ```py
plt.savefig('path/image_name_(id).png')

quiet breach Nov 16, 2020, 1:30 PM

#

just use string formatting

#

'path/image_name_{0}'.format(id)

twin moth Nov 16, 2020, 1:41 PM

#

Do you guys know if it's possible to show graphs or pandas dataframes in PyCharm like one would do in Jupyter Notebooks?

true nacelle Nov 16, 2020, 1:46 PM

#

https://www.jetbrains.com/help/pycharm/matplotlib-support.html#sm
This might help @twin moth

PyCharm Help

Scientific mode - Help | PyCharm

twin moth Nov 16, 2020, 1:53 PM

#

Thanks Thanos!

eternal nacelle Nov 16, 2020, 2:32 PM

#

it works! thanks @quiet breach

brave crest Nov 16, 2020, 2:46 PM

#

Now that I have three sets how do I validate the trained data now?

undone crown Nov 16, 2020, 3:41 PM

#

Hi everyone, I don't know if it is the correct channel, but how can I remove vertical grids from a graph with matplotlib?

vapid sorrel Nov 16, 2020, 3:55 PM

#

Hi, someone knows if there is as simple code on the web to do a LinearRegression but with a correlation coefficient to loss fonction

molten hamlet Nov 16, 2020, 4:30 PM

#

I got pyplot ax, and want to get handles or save this matplotlib drawing somehow
gradient_ax.savefig("Pendulum path.jpg")

light wind Nov 16, 2020, 4:55 PM

#

Create a student table and insert data. Implement the following SQL commands on the student table: ALTER table to add new attributes / modify data type / drop attribute UPDATE table to modify data ORDER By to display data in ascending / descending order DELETE to remove tuple(s) GROUP BY and find the min, max, sum, count and average.

#

Help me!

true nacelle Nov 16, 2020, 5:30 PM

#

@undone crown https://stackoverflow.com/questions/16074392/getting-vertical-gridlines-to-appear-in-line-plot-in-matplotlib
This isn't technically what you asked about, but the solution to this problem also covers your problem. Hopefully it helps.

Stack Overflow

Getting vertical gridlines to appear in line plot in matplotlib

I want to get both horizontal and vertical grid lines on my plot but only the horizontal grid lines are appearing by default. I am using a pandas.DataFrame from an sql query in python to generate a...

undone crown Nov 16, 2020, 5:36 PM

#

@true nacelle Thanks!!

true nacelle Nov 16, 2020, 5:46 PM

#

Hi. I'm doing some data analysis on a csv, and I need to group together by region, the top 3 most expensive houses. I basically want "Region A: house1, house2, house3" and so on, for each region, as a df.

This is my dataset (note that this is edited/sub-df. The original df has over 20 columns, but they are not relevant in this case.)

So far, I've tried a few things, but I haven't gotten the solution in the way I'd like: grouped by region.

df2 = df.sort_values(by = ['Price'], ascending = False)
df2.groupby(['Regionname']).head(3) #this shows df sorted high-low price, but not categorised by region
This right here produces the original df with the prices sorted, but they are not group together by "Regionname". I'm at a bit of a loss as to how to approach it, even conceptually.

📎 unknown.png

ashen violet Nov 16, 2020, 5:57 PM

#

Has anyone got an example of how to use the "golden features" paremeter in Catboost?

novel timber Nov 16, 2020, 6:23 PM

#

Data Science & Machine Learning projects and tutorials in python from beginner to advanced level.

Machine Learning Algorithms included

https://github.com/hhhrrrttt222111/DS_and_ML_projects

GitHub

hhhrrrttt222111/DS_and_ML_projects

Data Science & Machine Learning projects and tutorials in python from beginner to advanced level. - hhhrrrttt222111/DS_and_ML_projects

undone crown Nov 16, 2020, 6:34 PM

#

Hi. I'm doing some data analysis on a csv, and I need to group together by region, the top 3 most expensive houses. I basically want "Region A: house1, house2, house3" and so on, for each region, as a df.

This is my dataset (note that this is edited/sub-df. The original df has over 20 columns, but they are not relevant in this case.)

So far, I've tried a few things, but I haven't gotten the solution in the way I'd like: grouped by region.

df2 = df.sort_values(by = ['Price'], ascending = False)
df2.groupby(['Regionname']).head(3) #this shows df sorted high-low price, but not categorised by region
This right here produces the original df with the prices sorted, but they are not group together by "Regionname". I'm at a bit of a loss as to how to approach it, even conceptually.
@true nacelle Hi Thanos, when one applies .groupby () it must be related to a summary measure so that it can group. Otherwise, you will not have to group.

true nacelle Nov 16, 2020, 6:36 PM

#

I see.

green hemlock Nov 16, 2020, 6:39 PM

#

@true nacelle you can use df.sort_values(['Regionname','Price'],ascending=True).groupby('Regionname',sort=True).head(3)

#

and wherever possible, use nlargest() instead of head(), if you have numerical numbers

boreal summit Nov 16, 2020, 6:40 PM

#

Hello everyone, I'm practicing with the housing dataset and using vs code. I was trying to fit my data into a linear regression model after some tuning but I'm getting an error. It says unable to allocate 464. MiB for an array with shape (16512, 3683) and data type float64. Anyone know what this means?

#

Title of the error is Memory error.

green hemlock Nov 16, 2020, 6:42 PM

#

also quoting from quora:
The order of rows WITHIN A SINGLE GROUP are preserved, however groupby has a sort=True statement by default which means the groups themselves may have been sorted on the key. In other words if my dataframe has keys (on input) 3 2 2 1,.. the group by object will shows the 3 groups in the order 1 2 3 (sorted). Use sort=False to make sure group order and row order are preserved

true nacelle Nov 16, 2020, 6:59 PM

#

Thanks a lot @green hemlock! I'll be sure to look into this! edit: This was really useful. It's so simple when you see it, but I struggle to "find" the solution.

hollow gull Nov 16, 2020, 11:28 PM

#

@boreal summit do those dimensions look correct to you? It seems like your data got very wide, to the extent that it might be an error. I am assuming that is why it is taking up so much memory and you are getting the memory error.

undone crown Nov 16, 2020, 11:36 PM

#

Hi everyone, I am with the following problem. I want to convert a 'Date' column of my dataframe that has the format '09/21/2020' to the 'Sep-20' format. Can anyone help me on how I could do it?

slim fox Nov 16, 2020, 11:39 PM

#

You can parse it to datetime and then do strptime on it

velvet thorn Nov 16, 2020, 11:42 PM

#

Hello everyone, I'm practicing with the housing dataset and using vs code. I was trying to fit my data into a linear regression model after some tuning but I'm getting an error. It says unable to allocate 464. MiB for an array with shape (16512, 3683) and data type float64. Anyone know what this means?
@boreal summit it seems pretty self-explanatory to me. you don’t have enough memory.

#

what feature engineering did you do before that?

hollow gull Nov 16, 2020, 11:49 PM

#

@undone crown oh.... why????

#

google datetime formats python

#

Okay, I am realizing that they way I find this isn't very linear... but then search for strftime or strptime. There will be a link named: strftime() and strptime() Behavior. to here: https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior

velvet thorn Nov 16, 2020, 11:53 PM

#

www.strftime.org

full glacier Nov 16, 2020, 11:53 PM

#

Although never is often better than right now.

hollow gull Nov 16, 2020, 11:53 PM

#

That has a table with all of the shortcuts when you are doing formatting of datetimes in python. So it looks like your format string will be '%b-%d'

full glacier Nov 16, 2020, 11:53 PM

#

you are crazy

hollow gull Nov 16, 2020, 11:55 PM

#

putting it all together, I think this will work:

df['evenstrangerdateformat'] = df['weirdamericandatetime'].apply(datetime.datetime.strptime('%b-%d'))

boreal summit Nov 16, 2020, 11:57 PM

#

@velvet thorn from what I garnered online, seems it's cause I'm using Python 32 bit, i want to upgrade to 64 bit and see if that will help.

#

I read that 32 bit Python can't run some complex stuffs that need memory.

hollow gull Nov 16, 2020, 11:58 PM

#

@boreal summit I would really recommend looking at the shape of what you are putting into the LR and make sure it is what you are expecting... that seems really wide.

velvet thorn Nov 17, 2020, 12:00 AM

#

@velvet thorn from what I garnered online, seems it's cause I'm using Python 32 bit, i want to upgrade to 64 bit and see if that will help.
@boreal summit this might be true, but besides the point.

#

why do you have so many columns?

#

putting it all together, I think this will work:
df['evenstrangerdateformat'] = df['weirdamericandatetime'].apply(datetime.datetime.strptime('%b-%d'))

@hollow gull nope

#

prefer pd.to_datetime to .apply

#

and then .dt.strftime

boreal summit Nov 17, 2020, 12:01 AM

#

Might be my processing and stuff. I used standard scaler to scale the Input data.

velvet thorn Nov 17, 2020, 12:01 AM

#

Might be my processing and stuff. I used standard scaler to scale the Input data.
@boreal summit doesn't add columns

#

what are all the steps you took?

#

did you OHE a continuous column or something

boreal summit Nov 17, 2020, 12:01 AM

#

It's the housing dataset I was learning with.

#

No, I only did label encoder.

#

Hold on. Lemme screenshot it.

#

📎 20201117_010310.jpg

#

Pardon me please, I'm typing with my phone and doing stuff on my PC.

velvet thorn Nov 17, 2020, 12:03 AM

#

you should really copy-paste code

#

that is mega hard to see

boreal summit Nov 17, 2020, 12:04 AM

#

Okay, lemme login on my PC.

velvet thorn Nov 17, 2020, 12:04 AM

#

but yeah

#

I do not see anything particularly strange about that

#

actually

#

can you just tell me what data.shape is

cerulean spindle Nov 17, 2020, 12:05 AM

#

I used to use jupyter with VScode. It was really slow and sometimes made really weird errors. Did anyone have the same problem? Does anyone have alternatives?

boreal summit Nov 17, 2020, 12:06 AM

#

it was showing something like (323020, 1342)

velvet thorn Nov 17, 2020, 12:06 AM

#

...what dataset is this?

#

it was showing something like (323020, 1342)
@boreal summit are you sure?

boreal summit Nov 17, 2020, 12:07 AM

#

I already cleared all the output, and uninstalled python 3.8 but I can remember the R was in 6 digits while the C was in 4 digits

#

the housing dataset

#

Don't know where things went wrong

#

yea I'm sure

velvet thorn Nov 17, 2020, 12:07 AM

#

the housing dataset
@boreal summit ...which housing dataset

#

do you mean the Boston housing dataset?

boreal summit Nov 17, 2020, 12:08 AM

#

yea

hollow gull Nov 17, 2020, 12:08 AM

#

I think that the columns are blowing up because you are doing a logisticregression with multiclass=auto

boreal summit Nov 17, 2020, 12:08 AM

#

Boston housing dataset

#

I was trying to look it up on haggle

velvet thorn Nov 17, 2020, 12:09 AM

#

the Boston housing dataset has shape (506, 14).

boreal summit Nov 17, 2020, 12:09 AM

#

this one has 20,641 rows

#

I'm trying to predict the median house income according to the excercise

#

should I leave the Hyper parameters in default?

hollow gull Nov 17, 2020, 12:10 AM

#

If it is a continuous variable you are trying to predict a regression model typically fits the problem better.

#

So LinearRegression instead of LogisticRegression

#

LogisticRegression is designed for a target variable with 2 or a few unique values, not something continuous.

boreal summit Nov 17, 2020, 12:11 AM

#

okay, I will try it out and see what happens. Thanks alot, I appreciate.

#

I actually intended to use LinearRegression, didn't know it was Logistic I used.

boreal summit Nov 17, 2020, 12:44 AM

#

Thanks everyone, @hollow gull it ran successfully this time.

#

The dataset successfully fitted in. I guess it's cause I was using logistic regression.

#

The dataset wasn't fit for the model.

small tartan Nov 17, 2020, 12:45 AM

#

Hey guys, i have a question around dataset and what actions to take. I have 3 variables that are not at all on the same scale. I need to normalize or standardize them so they result in something i can weight and then pull into a single score as a result.

boreal summit Nov 17, 2020, 12:46 AM

#

You can either use standard scaler or minimax scaler. Each has its own disadvantages and advantages.

small tartan Nov 17, 2020, 12:47 AM

#

My question being, can the ranges be set for scaling outside of the max/min of the data set