#data-science-and-ml

1 messages · Page 268 of 1

plush zenith
#

this one

heady hatch
#

Also is that sympy? I'm not familiar with sympy.

plush zenith
#

its not suppoused to be sympy i think i passed it as a numpy with the lambdify

heady hatch
#

Oh I mean I'm not familiar with Sympy syntax.

plush zenith
#

ahh

heady hatch
#

What's lambdify supposed to do?

plush zenith
#

sorry haha

#

it does something like lambda (it converts it to an anonymous function i think) and pass it with a numpy format

heady hatch
#

I would double check there if lambdify is returning what you're expecting it to.

#

I'm googling the error right now and someone is having similar problem too. Where they mix sympy with numpy.

#

But to be more specific, I think your error might be coming from np.linalg.inv.

plush zenith
#

mmm it could be possible

#

yes the error come from there

heady hatch
#

You should print Jr and fi's dtypes.

#

And check if they're compatible with np.linalg.inv.

plush zenith
#

f=
[5*(x + 1)**2 + (y + 1)2 - 25, 2.71828182845905x - y]

J=
[[10x + 10 2y + 2]
[1.0*2.71828182845905**x -1]]

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>

#

they are both arrays

heady hatch
#

What about the dtypes?

#

Like the data types.

plush zenith
#

how can i see that?

heady hatch
#

You can check for them via f.dtypes or j.dtypes.

#

Either dtype or dtypes.

#

I don't remember the the exact syntax.

plush zenith
#

let me try!

#

AttributeError: 'function' object has no attribute 'dtype'

heady hatch
#

So this is my intuition.

#

np.linalg.inv takes numerical values.

plush zenith
#

yes

heady hatch
#

and you're passing in matrix that haven't been evaluated yet.

#

So it's trying to do inverse on functions instead of matrix with numerical values.

#

f=
[5*(x + 1)**2 + (y + 1)2 - 25, 2.71828182845905x - y]

J=
[[10x + 10 2y + 2]
[1.0*2.71828182845905**x -1]]

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
@plush zenith

If this is what f and J are, then I'm assuming the x and y aren't evaluated.

plush zenith
#

yes i understand that

#

but i passed the data

#

and when i write the J outside the loop

#

it shows a numerical result

#

the fi doesnt

#

and dont understand why

heady hatch
#

I think this is where your investigation might have to start. hahaha

plush zenith
#

sorry fi too appears as a numerical

heady hatch
#

Run this in your loop.

plush zenith
#

sure!!

heady hatch
#
 for i in range(20):
        Jr=np.array(J(v[0], v[1]))
        fi=np.array(f(v[0],v[1]))
        print(Jr)
        print(fi)
        break
#

Print it on first iteration and see if it's evaluated.

plush zenith
#

yes it does

#

[[10. 6.]
[ 1. -1.]]
[[10. 6.]
[ 1. -1.]]

heady hatch
#

Hmm that's interesting.

#

I wonder if it breaks somewhere.

#

I guess add the inverse operation to that?

#

the J_inv

#

run the first iteration where it calculates Jr, fi, and J_inv.

plush zenith
#

TypeError: 'module' object is not callable

#

this appears

heady hatch
#

Wait what module object is not callable?

plush zenith
#

TypeError Traceback (most recent call last)
<ipython-input-36-3485713534db> in <module>
37
38 #print(v0- np.linalg.inv(np.array(J(v0[0],v0[1])) @ np.array(f(v0[0],v0[1]))))
---> 39 print(sistema_newton(funcion1, funcion2,[0,2] ))

<ipython-input-36-3485713534db> in sistema_newton(funcion1, funcion2, v0)
16 Jr=np.array(J(v[0], v[1]))
17 fi=np.array(f(v[0],v[1]))
---> 18 J_inv= np.linalg(Jr)
19 print(Jr)
20 print(fi)

TypeError: 'module' object is not callable

#

i dont understand

#

it was passed as an array and it was evaluated

heady hatch
#

isn't it supposed to be

#

np.linalg.inv?

#

I think you're missing .inv.

plush zenith
#

yes

#

im a stupid

#

hahah

heady hatch
#

You might be tired.

plush zenith
#

by now it works

heady hatch
#

Okay so it works on first iteration?

plush zenith
#

hahah id been struglin with this since yesterday hahah

#

yes

heady hatch
#

If that's the case then do this.

    J=sp.lambdify([x, y],[dp1,dp2], "numpy")
    f=sp.lambdify([x, y],[dp1,dp2], "numpy")
    v = v0
    print(v)
    for i in range(20):
        print(f'on {i} it')
        Jr=np.array(J(v[0], v[1]))
        fi=np.array(f(v[0],v[1]))
        J_inv=np.linalg.inv(Jr)
        #print(J_inv)
        print("")
        v = v - J_inv @ fi
        print("v")
        print(v)
        print("")
    return 
#

So you can see on which iteration it breaks.

plush zenith
#

on 1 it

#

the first one

#

TypeError: No loop matching the specified signature and casting was found for ufunc inv

heady hatch
#

Oh, then by process of elimination, if Jr, fi, and J_inv all process.

#

I'm assuming v is your issue.

#

Is v a matrix?

plush zenith
#

yes

#

and i think it should be a vector

#

yes v is a vector

heady hatch
#

Could you print what v is?

plush zenith
#

v
[[-1. 2.]
[ 0. 1.]]

#

this is v

heady hatch
#

Hm interesting.

plush zenith
#

[1.0625 2.0625]

#

but this should be

heady hatch
#

This should be?

plush zenith
#

yes

heady hatch
#

You're giving me two values. hahaha I'm assuming one of them is v.

plush zenith
#

im tryng to make thing in an automatic way

#

ohh

#

sorry hahah

#

the matrix is what my programme is returning

#

the vector is what it should be

heady hatch
#

Hm and it's able to print v in the loop?

#

v = v - J_inv @ fi

It's able to evaluate this and print this?

plush zenith
#

just the first one

#

v
[[-1. 2.]
[ 0. 1.]]

heady hatch
#

Ahh okay so
v = v - J_inv @ fi is your issue then.

plush zenith
#

[0, 2] this is the values i passed

#

[[-1. 2.]
[ 0. 1.]]

#

this is the matrix it returns

#

but it should be a vector not a matrix

heady hatch
#

Quick question is
v = v - J_inv @ fi supposed to be v - (J_inv @ fi) or (v - J_inv) @ fi.

#

I'm really lost. hahaha

#

Because you have multiple v's and I'm only focusing on v = v - J_inv @ fi

#

and I'm not sure what you're referring to now.

plush zenith
#

hahah

#

okey

heady hatch
#

Focus on the loop.

plush zenith
#

well

heady hatch
#

Unless you're saying v = v0 has a bug

plush zenith
#

can i show you the original code

heady hatch
#

Sure.

plush zenith
#

and the one im trynig to do?

heady hatch
#

Sure?

plush zenith
#
def f1(v):
    return(5*(v[0]+1)**2 + (v[1]+1)**2 - 25)

def f2(v):
    return(np.e**(v[0]) - v[1])


def f(v):
    return(np.array([f1(v), f2(v)]))
def J(v):
    M = np.array([[10*v[0]+10, 2*v[1]+2], [np.e**v[0], -1]])
    print(M)
    return(M)
def newtonSistemas(f, J, v0, n):
    v = v0
    for i in range(n):
        print("V")
        v = v - np.linalg.inv(J(v)) @ f(v)  
        print(v)
    return 
print(newtonSistemas(f, J, np.array([0,2]), 20))
#

this is the one i know it works

#

the thing with this is that you have to replace the x and y for v[0] and v[1], manually

#

and this is my poor son who doesnt work correctly hahah

#

funcion1=5*(x+1)**2 +(y+1)2-25
funcion2=(math.e
x)-y

#

sorry i paste other

#
def sistema_newton(funcion1, funcion2, v0):
    x, y, z = sp.symbols('x y z')
    f=[funcion1, funcion2]
    print("f=")
    print(f)
    print("")
    dp1=[sp.diff(funcion1,x),sp.diff(funcion1,y)]#sp.diff(funcion1,z)]
    dp2=[sp.diff(funcion2,x),sp.diff(funcion2,y)]#sp.diff(funcion1,z)]
    print("J=")
    print(np.array([dp1,dp2]))
    print("")
    J=sp.lambdify([x, y],[dp1,dp2], "numpy")
    f=sp.lambdify([x, y],[dp1,dp2], "numpy")
    v = v0
    print(v)
    for i in range(20):
        print(f'on {i} it')
        Jr=np.array(J(v[0], v[1]))
        fi=np.array(f(v[0],v[1]))
        J_inv=np.linalg.inv(Jr)
        print(J_inv)
        print("")
        v = v -(J_inv @ fi)
        print("v")
        print(v)
        print("")
    return 
#print(v0- np.linalg.inv(np.array(J(v0[0],v0[1])) @ np.array(f(v0[0],v0[1]))))
print(sistema_newton(funcion1, funcion2,[0,2] ))
#

this is the one you were helping me with

#

i just tried to make the computer do the replacements of x and y

heady hatch
#

Hmm I'm trying to understand what do you mean by you have to replace the x and y for v[0] and v[1] manually.

#

Because it seems like both are inserting [0, 2] from what I'm seeing.

plush zenith
#

ohh

#

yes yes

#

but

#

in the first one

#

funcion1=5*(x+1)2 +(y+1)2-25
funcion2=(math.e**x)-y

#

i have to pass

#

the x and y

#

manually

#

i have to copy in the computer

#

typing

#

5*(v[0]+1)2 +(v[1]+1)2-25

heady hatch
#

Can't you call f1(v) and f2(v)?

plush zenith
#

in the first code i cant

#

you have to pass the v

heady hatch
#

Oh why not?

plush zenith
#

so it can replace it

#

is like you are passing v as the parameter of a function

#

so unless you call a variable v (in this case a list) it wont work

#

i mean you could pass f1

#

v=[0, 1]

#

and it will replace all v for 0 and 1

heady hatch
#

Okay I'm going to try to focus on what you're trying to do now.

plush zenith
#

sure

heady hatch
#

try to see if J-inv @ fi works.

#

print that

#

see if it evaluates

#

and then add v - to that dot product/matrix multiplication.

#

See where the error might be coming up.

plush zenith
#

m
[[1. 0.]
[0. 1.]]

#

it prints this

#

sorry

#

[[-1. 2.]
[ 0. 1.]]

#

if i rest the v

heady hatch
#

What do you mean by rest the v?

plush zenith
#

sorry

#

i speak spanish haha

#

i refear

heady hatch
#

So J_inv @ fi gives you
[[1. 0.]
[0. 1.]]?

plush zenith
#

yes

heady hatch
#

and v - J_inv @ fi gives you
[[-1. 2.]
[ 0. 1.]]

plush zenith
#

yes

heady hatch
#

Oh wait I'm forgetting it's 0 index.

#

Your loop is freaking out on the second loop.

plush zenith
#

ohh

#

that is good?

heady hatch
#

maybe?

#

So on the second loop

#

v = [[-1. 2.]
[ 0. 1.]]

plush zenith
#

well that is better than anything hahah

heady hatch
#

Where v[0] = [-1, 2] and v[1] = [0, 1]

#

but your x and y are supposed to be single values, right?

plush zenith
#

yes

#

no

#

v[0]= 0

#

and

#

v[1]=2

heady hatch
#

To clarify you want v[0] to be 0 and v[1] to be 2?

plush zenith
#

in the first iteration

heady hatch
#

Right but in the second iteration

#

it's turning into a 2 by 2 matrix.

plush zenith
#

in the second it should be rplace

#

yes and i dont understand hat

heady hatch
#

try print(sistema_newton(funcion1, funcion2, np.array([0,2]) )))

plush zenith
#

is the same

heady hatch
#

still the same error?

plush zenith
#

it must be something about dimensions

heady hatch
#

is v supposed to be a 2 by 1 vector?

#

Yea

plush zenith
#

yes

heady hatch
#

what dimension is Jr and fi supposed to be?

plush zenith
#

f is a 2 by 1 vectir

#

and J is a matrix

heady hatch
#

but jr and fi, what about their dimensions?

plush zenith
#

yes

#

sorry

#

fi should be a vectir

#

jr the matrix

heady hatch
#

what shape is the matrix?

plush zenith
#

2x2

heady hatch
#

Okay that makes sense.

2x1 - (2x2 @ 2x1) => 2x1

plush zenith
#

i think i know the error

#

give me my prize for stupidity hahah

heady hatch
#

Oh what's the error?

plush zenith
#

man

#

im so sorry

#

and embarassed

#

sorry for keep you with this

#

for so much time

heady hatch
#

It's okay. We've learned from this. But what was the error?

plush zenith
#

do you remember f?

heady hatch
#

Mhm.

#

The 2x1 vector.

#

Was it a 2x2?

plush zenith
#

yes

heady hatch
#

hahaha

plush zenith
#

but i passed it

#

as

#

this

#

J=sp.lambdify([x, y],[dp1,dp2], "numpy")

heady hatch
#

Ahh.

plush zenith
#

f=sp.lambdify([x, y],[dp1,dp2], "numpy")

#

and it should be as

#

J=sp.lambdify([x, y],f, "numpy")

#

im so sorr

#

i cant believe it

heady hatch
#

Congratulations on solving the issue.

plush zenith
#

hahahaha

#

of what all the things it could

#

be

#

it was the silliest one

#

hahahaahah

heady hatch
#

Oftentimes it's the tiny details.

plush zenith
#

i think i copy paste

#

the lambdify

#

and forgot about

#

i didnt replace

#

you helped me to see that the vector wasnt correct haha

#

i have no words

#

thanks a lot

#

and sorry for wasting your time with this

#

thank you very much

heady hatch
#

It's not a waste of time if we learn something.

#

Happy to help, hope you have a wonderful mathematical journey from now.

plush zenith
#

hahah

#

thanks i learnt a lot

#

i hope so!!!

#

Thanks again and have a good day!!

chrome orbit
#

can someone give me a brief explanation on enums?

fallow prism
#

key,value = enumerate(iter)

#

sorry, index*

#

index,value

chrome orbit
#

i mean more general then that

#

like i have a list of 4 bit enum's

#

so i can store a values in the index between 0-15

#

right?

hushed wasp
#

Hello, I would like to apply this line only the numeric variables of my dataset. Can anyone can help please?
(np.abs(stats.zscore(df)) < 3).all(axis=1)

velvet thorn
#

@hushed wasp df.select_dtypes(np.number)

#

key,value = enumerate(iter)
@fallow prism that's enumerate, not enum

#

two different things

#

can someone give me a brief explanation on enums?
@chrome orbit what do you want to do?

marsh chasm
#

@marsh chasm nice. thanks for telling me. gridsearch does the validation curve stuff for the whole set of parameters and plots with correct axis labels for the parameter set? sounds way easier.
@remote valley it doesnt actually do the validation curve stuff it just stores the validation score and training score both (which is what i needed)

oblique vine
#

Does amount of data I feed into keras model affect the speed of this model?
I mean, I want the model to predict ASAP, and I don't know if I should decrease the amount of data slightly, to get faster predictions

#
  • does it affect size of model?
lapis sequoia
#

@oblique vine are you trying to develop a neural network?

cerulean spindle
#

It does not affect the size of the model. The amount of layers in the model are specified by you. If you're model already performs well, you should decrease the amount of data slightly. You could also use a regularization parameter like L2 or L1 which will "ignore" some features. Be careful about the model not converging @oblique vine

chrome orbit
#

@velvet thorn can i pm u

velvet thorn
#

@velvet thorn can i pm u
@chrome orbit nope, sorry

#

It does not affect the size of the model. The amount of layers in the model are specified by you. If you're model already performs well, you should decrease the amount of data slightly. You could also use a regularization parameter like L2 or L1 which will "ignore" some features. Be careful about the model not converging @oblique vine
@cerulean spindle L2 doesn’t perform feature selection

cerulean spindle
#

oh my bad

chrome orbit
#

@velvet thorn ok i have a 4 bit message, enum, and just want to know how reference the data?

velvet thorn
#

@velvet thorn ok i have a 4 bit message, enum, and just want to know how reference the data?
@chrome orbit that's still not clear TBH

#

how does the enum relate to the message

#

what do you mean "reference the data"

#

or are you saying the 4 bits are the enum

#

how about you give an example

chrome orbit
#

0: System - Power Saving
1: System - ON, no hand detected
2: System - ON, hand detected
3: System - ERROR
4-15: Not used

velvet thorn
#

okay, and what do you want to do with this

chrome orbit
#

so for example, if i have System - ON, no hand detected, the value would be 0010?

velvet thorn
#

so for example, if i have System - ON, no hand detected, the value would be 0010?
@chrome orbit ...in bits?

#

yes, but are you trying to convert a string to a number or what

chrome orbit
#

no im just trying to read the message on the CAN bus

velvet thorn
#

...so is that a microcontroller question or what

teal vine
#

Hello! How would I go about call the 'i' in a loop? I am trying to create a series of pandas dataframes by splitting apart a big dataframe

for i in range(13):
    dfi = df[df.index < 60*i]

I want to make it so dfi is actually changing with the i of the loop so I would end up with 13 dataframes

#

It really feels like I should be able to call that i somehow but I don't know how

hasty grail
#
dfs = []
for i in range(13):
    dfs.append(df[df.index < 60*i])
#

or, using a list comprehension,

dfs = [df[df.index < 60*i] for i in range(13)]
teal vine
#

Thank you ❤️

fierce shadow
#

hey can anyone tell me is deep q network able to solve gym's mountain car environment?

#

I created a dqn which beats the cartpole env, but it fails on mountain-car env

#

I am using just a simple dqn without any target networks or anything

#

if you answer ping me

uneven wren
#

How can I get specific data from a website and then put it into my program?

mint olive
#

web scraping

hushed wasp
#

@velvet thorn Thanks gm but I don't know to apply this line on the numeric variables : (np.abs(stats.zscore(df)) < 3).all(axis=1)

df.select_dtypes(np.number) and (np.abs(stats.zscore(df)) < 3).all(axis=1) doesn't work together

dawn basin
#

Does anybody here know anything about automated stock trading?

velvet thorn
#

@velvet thorn Thanks gm but I don't know to apply this line on the numeric variables : (np.abs(stats.zscore(df)) < 3).all(axis=1)

df.select_dtypes(np.number) and (np.abs(stats.zscore(df)) < 3).all(axis=1) doesn't work together
@hushed wasp do you understand what that line does...?

hushed wasp
#

I want to get out of the outliers but i can apply it only on the numeric variables

velvet thorn
#

no

#

I mean, from a coding point of view

hushed wasp
#

not sure to understand you point

velvet thorn
#

okay

#

a different question

#

not sure to understand you point
@hushed wasp why are you using numpy and scipy.stats there?

#

do you understand?

#

in the sense that

#

you can do that

#

entirely within pandas

#

without referencing numpy or scipy directly

#

and I think

#

if you understand that

#

you will see how to solve the problem.

hushed wasp
#

it's the code I found to be able to get rid of the outliers on the web using the zscore

velvet thorn
#

yeah.

#

that's what I thought

#

I think you need to understand pandas fundamentals first

#

before doing such things

#

your foundation is the most important

hushed wasp
#

sure but I don't see why I can't use numpy and scipy here

velvet thorn
#

you can

#

but

#

it would actually be really simple to combine what I gave you

#

and what you have.

#

so if you don't know how, that suggests that you are doing something that is a bit too advanced for you

#

which is fine, that's how we learn

#

but again...can you write that piece of code purely using pandas?

#

because if you can, then how to integrate what I gave you will be clear.

#

(np.abs(stats.zscore(df)) < 3).all(axis=1) this.

#

actually, just the condition.

hushed wasp
#

it raises me an error if I apply it on the whole dataset that's why I try to apply only on numerical data

velvet thorn
#

yes.

#

I understand that.

teal vine
#

does np.argmax() have issues parsing nan observations? I have a list of lists and the lists towards the start of my bigger list have a lot of nans

velvet thorn
#

does np.argmax() have issues parsing nan observations? I have a list of lists and the lists towards the start of my bigger list have a lot of nans
@teal vine what kind of issues are you thinking of

teal vine
#

Well the first say 30 lists end up returning 0 for np.argmax() or nan for np.max() when there are actual floats in there that should be returned. As the lists start having less nans the 2 functions start returning the actual max

velvet thorn
#

Well the first say 30 lists end up returning 0 for np.argmax() or nan for np.max() when there are actual floats in there that should be returned. As the lists start having less nans the 2 functions start returning the actual max
@teal vine yup.

#

that's because of how comparisons work with nan

teal vine
#

Oof

velvet thorn
#

!e

import numpy as np

nan = np.nan
print(np.max([0, np.nan]))
print(np.min([0, np.nan]))
#

see?

arctic wedgeBOT
#

@velvet thorn :white_check_mark: Your eval job has completed with return code 0.

001 | nan
002 | nan
teal vine
#

Yeah that

#

Any way to work around that?

velvet thorn
#

however, there is a body of functions that ignore nans

#

that would be the subject of a Google search 😉

teal vine
#

Well at least I know its not an error in how my data is set up

#

Thank you

velvet thorn
#

yw

teal vine
#

Why is argmax and max not deprecated when nanargmax exists? ThinkRotate Seems like the only use would be detecting if there's a nan in your data but there's other functions for that

#

Am I missing something you could use argmax for over nanargmax?

velvet thorn
#

Why is argmax and max not deprecated when nanargmax exists? :ThinkRotate: Seems like the only use would be detecting if there's a nan in your data but there's other functions for that
@teal vine efficiency.

teal vine
#

Gotta save those microseconds. But yeah that makes sense thank you again.

#

You're very quick smile

iron shell
#

Hello everyone, I'm trying to make a real-time dataset that came from a MySQL database, to be consumed on powerbi, so I'm using apache Kafka, and I'm asking myself if that is the best way to do that?

odd tendon
#

Can anybody recommend any paid training on data science/ML?

crude marsh
#

Guys, I need help. Anyone up

#

?

#

Anyone?

bronze barn
#

Anybody have any suggestions on how to cluster this? Granted there's some noise visible there looks to be at least four clusters. I've tried DBSCAN and Gaussian mixtures which don't seem to work well.

crude marsh
#

@bronze barn You look experienced. I have a small problem. Can you please help me?

#

This code fetches a year old data. I need it to take the latest data. How do I do this

#

@bronze barn ?

bronze barn
#

I'm not familiar with quandl, is that a web scraping package or do you have the raw data for the latest stock prices?

crude marsh
#

Its a web scraping package

#

Okay. Thanks.

bronze barn
#

How much data does it give you, a month's worth?

#

Aside from that I can't be much help to you then but I would guess that the issues has to do with calling quandl so I'd read up on their documentation, maybe there's an argument you can pass for that. Good luck though!

#

@crude marsh

hoary sluice
teal vine
#

When you finally get your code to run but it runs 3 times as slow as you thought it would rooKms

#

Turns out that iterating over lists of DataFrames was not a good idea, who knew

hollow sentinel
#

Linear regression models the output, or target variable 𝑦 ∈ R as a linear combination of the 𝑃 - dimensional input x ∈ R𝑃 . Let X be the 𝑁 × 𝑃 matrix with each row an input vector (with a 1 in the first position), and similarly let y be the 𝑁-dimensional vector of outputs in the training set, the linear model will predict the y given x using the parameter vector, or weight vector w ∈ R𝑃 according to

#

lol what

#

can someone translate into english

#
import sklearn.metrics as metrics %matplotlib inline
# Fit Ordinary Least Squares: OLS
csv = pd.read_csv('https://raw.githubusercontent.com/neurospin/pystatsml/master/datasets/ ˓→Advertising.csv', index_col=0)
X = csv[['TV', 'Radio']]
y = csv['Sales']
lr = lm.LinearRegression().fit(X, y)
y_pred = lr.predict(X)
print("R-squared =", metrics.r2_score(y, y_pred))
print("Coefficients =", lr.coef_)
# Plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(csv['TV'], csv['Radio'], csv['Sales'], c='r', marker='o')
xx1, xx2 = np.meshgrid(
    np.linspace(csv['TV'].min(), csv['TV'].max(), num=10),
    np.linspace(csv['Radio'].min(), csv['Radio'].max(), num=10))
XX = np.column_stack([xx1.ravel(), xx2.ravel()])
yy = lr.predict(XX)
ax.plot_surface(xx1, xx2, yy.reshape(xx1.shape), color='None')
ax.set_xlabel('TV')
ax.set_ylabel('Radio')
_ = ax.set_zlabel('Sales')
#

these people don't even use train_test_spliti

teal vine
#

Who are 'these people' ?

hollow sentinel
#

the author of the book

#

Statistics and Machine Learning in Python by Edouard Duchesnay, Tommy Löfstedt, Feki Younes

teal vine
#

seems pretty sketchy to not use train and test splits but I'm not an expert on the topic.

hollow sentinel
#

yeah it's sus

#

I just found a PDF of the book online

teal vine
#

I used "An Introduction to Statistical Learning" by Gareth James et al. when I was doing some ML stuff

#

It's focused on R but the concepts and explanations are pretty solid

hollow sentinel
#

oh yeah I have that too

teal vine
#

I found most python implementations are just a google search away sooo I just focus on concepts

hollow sentinel
#

it's just that these books put me to sleep

#

there's also data science from scratch by McReilly

#

O Reilly has another book about ML

#

hands on machine learning

teal vine
#

I haven't really looked that deeply into it

#

Just did some exercises that involved classifying defaulters, its interesting enough.

analog geode
#

I guess this is a better place the post the question

serene scaffold
#

I have to classify some DNA/RNA/etc sequences, and I have to come up with the features myself. I figure if I can find some of the longest substrings that appear frequently among all the sequences, the presence of that substring might be a good feature.

#

Not sure how to do "top k longest most common substrings"

heady hatch
serene scaffold
#

@heady hatch not quite, because there aren't going to be any reasonably long substrings shared between all sequences

#

I'm thinking "what's the most common n-character substring in the set of strings" even if it's not a substring of all the strings.

heady hatch
#

@serene scaffold I was thinking about your problem, and would bow work here?

From my understanding of DNA (and maybe RNA), they’re in sequences of 4, right?
Naive feature would be the count of different sequences of 4.

#

Otherwise, you can look for the longest substring shared, and keep decreasing it to create more features.

median dove
#

Hey. I’m new to Deep Learning and I was following Sentdex’s video on the handwritten digit recognition with Keras. Great video. Just a question, how did he now how many layers he should use for the neural network? And how many neurons should he put in each layer?

#

Please ping me if you can help 👍

#

Not only in that example but in any - how do I know how many layers should I use on a neural network and how many neurons should I use in each one?

heady hatch
#

@median dove

I hope others will provide input as well.

In terms of how many layers and units per model, that's where research comes in. Often built on other research.

I'm not familiar with shallow NN, but in terms of deep NN it's usually some kind of architecture that was experimented and found to be working then scale from there.

molten hamlet
heady hatch
#

The first and last layer is determined by your input and output. The hidden layers are meant to capture relationship between the input and the output.

In computer vision, depending on the problem, the hidden layers does so by extracting features that which can be used to calculate the logits.

Or have its features extracted to have it decoded into something else.

#

The number of units and layers can also be set as a hyperparmeter. To answer your question of how many should you use, experiment and see what works for the problem you're trying to solve.

median dove
#

Awesome, thank you very much @heady hatch

austere swift
#

@median dove I'm pretty late to this answer, but hyperparameter tuning is arguably the hardest part about making neural networks, since there is no way to "calculate" exactly how many neurons to put in each layer, what kind of layers to put, how many layers to put, etc. It's all experimentation. There are automated hyperparameter tuning methods that are arising, such as HyperBand, Bayesan Optimization, Grid Search (the most basic of them), which basically test out different parameters and see how they do, but even then it takes a while to find the optimal set of parameters. Different people will tell you different ways of how to go about it, like scaling up from a simple model, scaling down from a large model, using "blocks" of layers (basically the same set of layers that you repeat a few times), but like nine said it's usually best to start out with something that was experimentally proven to work on some problem, and then to adapt it to yours and make changes accordingly.

median dove
#

Thanks you @austere swift 😊

haughty hinge
#

Hey! Anyone free to help?

#

i just wanted to ask how to make a scatter plot that has budget, revenue and profits columns from the dataset and shows how they change over time. i.e. years on the bottom axis

#

assuming a cleaned dataset i have

velvet thorn
#

i just wanted to ask how to make a scatter plot that has budget, revenue and profits columns from the dataset and shows how they change over time. i.e. years on the bottom axis
@haughty hinge are you sure you wouldn't like a barplot or a line plot

haughty hinge
#

was thinking a scatterplot with each of those mentioned in different colours

#

but a line plot makes sense

hollow gull
#

@haughty hinge that is something I could help you with if you are still in need of help.

haughty hinge
#

@hollow gull Thanks 🙂 im trying it myself after reading stuff online ill message on the channel if anything

hollow gull
#

okay, I will start putting an answer together and you can ignore it if you want 🙂

haughty hinge
#

legend!!

hollow gull
#

make sure your index is the date column for this to work: df.index = df['date']

import matplotlib.pyplot as plt

fig, ax = plt. subplots()
columnname_y = 'budget'
df.plot(y=columname_y, label=columname_y, ax=ax)
columnname_y = 'revenue'
df.plot(y=columname_y, label=columname_y, ax=ax)
columnname_y = 'profits'
df.plot(y=columname_y, label=columname_y, ax=ax)

fig.legend()
ax.grid()
fig.show()

#

I made a lot of assumptions about your dataframe, but maybe that is enough to push you forward a bit.

#

You don't have to make the date column your index, but if you do pandas does a pretty good job of making your life easier from my point of view.

mortal pendant
#

Hey! I have a dictionary where the key is a string which I would like an AI to predict based on another string. The value is a list of strings as 'examples' of what should point my AI to to the key as a result. I want to be able to provide an AI a string similar to one that would be found as a value in the dictionary and it to return the probabilities of it being a part of each key. I'm very new to machine learning so I'm unsure which AI method (RNN, GAN, Q-learning) to use for this, but do wish to have a good go by myself. Could someone please point me in the right direction for this project? I assume not RNN since it's not a step by step process- it's just a single calculation- and I assume not a GAN since I am not wanting to generate values, so I assume Q-learning (the one that is most confusing to me, sadly) which I feel would make sense since this is something where it is given a question and it needs to determine the correct answer and be rewarded, which is mostly all I know about Q-learning.

haughty hinge
#

@hollow gull thanks alot !!

heady hatch
#

@mortal pendant Could you give us an example of what the keys and values are like? How long are the strings?

agile wing
#

bow chika bow bow

cerulean spindle
#

Can anyone tell me the relationship between all ML algorithms (SVM, Neural Net...)? I vaguely remember that all of the sorta acted like a neural net, but I'm not quite sure. Can anyone confirm this?

agile wing
#

neuralnet is like... a huge nest of log regs imo

mental timber
#

Hi, I got a question. I'm using a dataset in my python code which shows symptoms of diabetes, they're in true and false meaning some people have them and some do not. I am making a yes or no type survey and wanted to know how I can compare the answers to the dataset so it gives me a percentage of similarity. For example there are 14 questions, if more than half is given the answer yes, then I want it to compare the "yes" or "true" to the dataset and give me a percentage of similarity. Sorry if this is a very specific question. I have been researching this and currently been stuck on it for a while now

cerulean spindle
#

Usually for those types of datasets, you'd actually want some type of data (ex: measuring blood levels) instead of yes or no questions. If so, I think you'd have to reformat your data so that it can understand these yes or no questions (the datapoints only values can be 0 or 1)

#

I think you should reformat you're data though if you want to stick to the survey

mental timber
#

Yes I have reformatted them in my python code to identify "true" and "false"

cerulean spindle
#

The way you're structuring it, "if more than half is given the answer yes, then I want it to compare the 'yes' or 'true' to the dataset and give me a percentage of similarity", I would approach this by using this survey as a sample and feeding the sample into an ML model. If it's an sklearn model, it should support predict_proba() and/or decision_function(), I don't remember entirely how they differ, I just know that they give you a probability for how likely it is for a sample to belong to each class. For example, an example output is [0.25, 0.50, 0.25] which means that there is a 25% it is in class1 50% in class2...

mental timber
#

Ah ok, that's very helpful thanks

cerulean spindle
#

yeah sure. I'm pretty sure decision_function() gives you the probability in %, but predict_proba() doesn't use % so it's harder to understand, so I perfer decision_function()

mental timber
#

Gotcha ty

crude marsh
#

Guys, anyone uses ML to train their data-science algorithm?

#

Guys? Is anyone there

fierce shadow
#

what kind of algorithm?

#

you can go for neural networks as they are universal function approximators

heady hatch
#

@crude marsh

What do you mean by ML to train data science algorithms?

What's your definition of ML and what's your definition of data science along with algorithms?

fierce shadow
#

he might mean K-means clustering, binary tree classification stuff..

#

although I am not into those

heady hatch
#

Ahh as in those are the algorithms?

mental timber
#

My Random Forest Algorithm got a perfect prediction out of 30 trys is this wrong?

heady hatch
#

Depends. What did it get a perfect prediction on?

mental timber
#

it perfectly predicted 520 people with diabetes

#

of which it had 16 different symptoms

heady hatch
#

Right so did you just train it on the data and predicted it on what it trained on?

mental timber
#

70% training 30% Test data. It also used bootstrapping

heady hatch
#

Sounds like you have a great model.

#

When you say 30 tries, do you mean you trained it on the training set and predicted on the test 30 times?

#

Or did you mean something else?

mental timber
#

Ye 30 tries

heady hatch
#

Try cross validation.

#

But since you're using rf, you can also check oob errors.

#

Could you clarify what you mean by you used bootstrapping?

#

So split your data, train test split.

do cv on your training data. See how it goes, then train it on your training data and test it on your testing data if your cv looks okay.

mental timber
#

bootstrapbool, default=True

Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.
heady hatch
#

Ahh okay.

#

But yea do your cross validation.

mental timber
#

Thats how i learned it atleast xd

heady hatch
#

Grab your dataset, split your data into training and testing.

#

Then do your cross validation on your training data.

mental timber
#

I'll give it a shot. Thanks man.

heady hatch
#

Yup. See if it still gives it the same result as your previous findings.

mental timber
#

gotcha

#

hmm, sorry to ask how do I would do that? Only recently began this project and this whole thing is quite new to me. xD researching confuses me a lot since theres like 10 different examples and idk what they all do

heady hatch
#

Oh yea no worries.

#

So

#
from sklearn.model_selection import cross_val_score

cross_val_score(model, x, y)
#

Because there are more arguments.

mental timber
#

Ah ok. Ill read through it ty

heady hatch
#

Your x and y should be your training set.

#

So you can leave your testing set out as hold out.

mental timber
#

Thanks!

indigo steppe
#

Hi guys,i have never tried machine learning and whenever i look into it it seems very very complicated.I guess my foundations of python are not enough yet but the jump towards ml still seems overwhelming and i guess i will need 1-2 years with my current tempo.But today i found out about the libra library that creates neural networks and other stuff in 1-2 lines of code.I am pretty sure that pros are laughing at the abilities of libra but would you consider it to be a good starting point for someone who struggles with python and want to learn a bit about ml,especially for trading (i am aware that for trading it needs more but for experimenting purposes)?

heady hatch
#

@indigo steppe

Depending on what your learning style is.

Being honest with you, Libra seems like they only cover the basic cases which can be done via automl as well.

If you want some direction moving forward, learning your Python foundation is useful whether you go into ds/ml or not.

You don't really need to master python to play with data science and machine learning, but you should be comfortable at least.

Libra might be able to get you started and allow you to play with small ml projects, but it doesn't seem to help you understand what the model is doing which I think might be a core part to working with them.

viscid crescent
#

Hi! I'm new to discord. How does one use this chat room to start a discussion. How does one track replies, threads...it looks confusing. Thank you for helping me understand!

indigo steppe
#

@heady hatch thx for your honest answer

viscid crescent
#

I'm attempting to create a notebook for exploratory data analysis. I have an outline for the project that I'd like to bounce off someone experienced. I hope this is the right way to ask for help!

indigo steppe
#

@heady hatch sorry for tagging you once again but i just found out that automl isn't a library but a subcategory of ml...you gave your opponion about libra and i respect that.would you still advice a beginner to use some automl libraries or would you advice to jump directly into the non automated stuff once i am "ready"?

heady hatch
#

When you say subcategory of ml, are you referring to something like this?

#

I mean not just the one that google made, but pretty much this whole section where ml is automated.

#

btw please don't worry about offending me in any way, my advice aren't gold or absolute truth. hahaha

indigo steppe
#

no need to offend you,i just learned about automl which i consider usefull information.even if you were a "douchebag",i learned something from that "douchebag" which is progress in my book.so thank you for that "douchebag" (pls i am just joking,you are helping me so thx).never heard about this cloudml stuff,i was reading about auto sklearn,TPOT and hyperopt and found out there are a few libraries that do automl so i am not sure if i should go play with them or if it is a waste of time for learning about ml in trading.i guess they won't help me make money but at least i could learn MAYBE something

#

oh,autokeras is one of them too

heady hatch
#

Again depending on your learning style. But ml is just as wide as it is deep.

Can you use ml without any understanding of how things work just via calling libraries? Sure. In fact automl was made for things like this. Where you can use ml solutions without knowing how anything works other than the libraries or the api themselves.

Depending on your future job responsibility, it's hard to implement a model if you cannot train, test, debug or monitor it.

And automl will probably help ease with that. I don't know the degree of customization they allow you. And for most problems, it seems like a cookie cutter solution would do.

#

If you want to jump as quickly into deep learning as soon as possible, learn your python first and you can try fastai's deep learning course.

it's very well made and they teach from a top down perspective.

#

Meaning that if you just want to stop at calling libraries, sure.

#

And if you want to go farther, then they offer that too.

#

Nonetheless, Python is still needed.

#

To answer your question, whether these ml solutions will help you learn actual ml, I don't know since I've never used them.

You can gauge it for yourself. Use the library and look up papers, conversation, and notebooks and see if you can understand what others are talking about after using them.

#

I was taught under the discipline of implementing things myself and take things apart.

And this is how I learn. People are different, maybe these solutions will help you like training wheels on a bike.

indigo steppe
#

have you gone through the course of fastai?it just read a bit about it and it sounds interesting.the thing that i am interested in is,is it only a course/tutorial or does it have assignments and "homework" too..?i went through many,many tutorials and tbh i forgot 80-90% of it and with time it will be even more i guess.so i just recently started with the foundations again so i can implement the tutorials knowledge into assignments and exercises.so does fastai offer some kind of exercises?

heady hatch
#

I didn’t take the in person class, but just watched the videos.

I don’t think there are exercises or homework for the online videos but they did release their notebooks for people to study from.

indigo steppe
#

thank you very much,i appreciate your time

mossy dragon
#

I'm looking a dataset for classification with p>=40 and n>max(500,p*5) if anyone happens to know of one please dm me!

lapis sequoia
#

yo man

#

wtf

#

ive never seen this error in my life

#
AssertionError: in user code:

    <ipython-input-10-f20008174a6b>:13 train_step  *
        disc_real_output = discriminator(target_images, training=True)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py:985 __call__  **
        outputs = call_fn(inputs, *args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py:386 call
        inputs, training=training, mask=mask)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/functional.py:517 _run_internal_graph
        assert x_id in tensor_dict, 'Could not compute output ' + str(x)

    AssertionError: Could not compute output Tensor("activation_27/Sigmoid:0", shape=(None, 1), dtype=float32)```
#

@cobalt jetty i tried the keras preprocessing thing seems to work fine in terms of using an input so thanks heaps for that suggestion man

#

but i cant seem to figure out this error lol

cobalt jetty
#

sounds like you have an issue with your 27th activation layer

#

maybe a shape issue

lapis sequoia
#

im not sure

#

maybe its because im trying to downsample from 32 to 28

#

does strides work with non integers

lapis sequoia
#

would anyone know how i could downsample the inputs to 28x28 using conv2d? or by another means?

cobalt jetty
#

strides=(1.1428,1.1428)

#

what are you trying to achieve here?

lapis sequoia
#

downsampling from 32,32,1, to 28,28,1

#

but i realised it doesnt work just now lol

#

ill send my discriminator cos thats where the issue is happening

#

one sec

#

x = Dense(1,activation='sigmoid')(x) seems 2 be the issue

mortal pendant
#

@mortal pendant Could you give us an example of what the keys and values are like? How long are the strings?
@heady hatch I’m on mobile so can’t get an example but hope this helps: there are 5 6-12 character keys and the values all contain atleast 300 strings, where these strings’ lengths vary a lot- anywhere from 1 character to 2000 characters. Most of the time, they’re around 300 characters though. The dataset is flexible for the vocab, so if the AI would train better with just alphanumeric characters then it would be fine to just filter it down to that, but it can also contain a wide range of symbols including emojis if that would help. I can also filter out strings that are too long or too short if that would help, though this isn’t as flexible and would greatly decrease the amount of data

lapis sequoia
#

Would you guys help me running opencv functions in GPU

mortal pendant
#

btw for my issue it’s also worth noting I’m using Google Colab if that makes a difference

lapis sequoia
#

ok i fixed my activation problem the issue was me concatenating my inputs but i dont think i actually need to do that

#

but im having compatibility issues getting my input and the mnist dataset to work, 256x256 images wont work

#

@cobalt jetty sorry to bother you again, do you think i should change my images to like 224x224 and then progressively downsample to 28x28

#

using keras preprocessing

cobalt jetty
#

your issue, I think is that you're using a sigmoid activation function, which is used for binary classification. It won't help to get a matrix output imo.

#

Look into softmax activation

lapis sequoia
#

oh nah i figured that out it should be all good

#

im getting this issue because im trying to downsample a 256x256 image and get a 28x28 but i cant do that mathematically

#

a new issue i mean

cobalt jetty
#

downsampling is just a tool, you should instead look into how your shapes change across the learning process, maybe add a 784 dense layer at the end where each neuron is a grayscale value output

lapis sequoia
#

.WARNING:tensorflow:Model was constructed with shape (None, 28, 28, 1) for input Tensor("input_2:0", shape=(None, 28, 28, 1), dtype=float32), but it was called on an input with incompatible shape (1, 32, 32, 1).

cobalt jetty
#

so you can reconstruct your 28x28 picture.

#

yeah, you're feeding a wrongly shaped input into one of your layers

lapis sequoia
#

yeah thats what i mean

#

so in my generator model do you think i should add a dense layer that then connects to another layer to output a 28x28 image

cobalt jetty
#

You could .squeeze() your output labels (the MNIST image matrices) so it is represented as a 1d vector. Each point would represent a target for your neural network, so you don't have to worry about the end shape of the model beyond that it's a 784-element vector.

#

afterward you can just create a function that converts that 784-element vector into a 28x28 matrix.

#

iirc, a grayscale image is a single channel matrix compared to a RGB, so its shape for MNIST is technically 1x28x28, while if the picture was RGB, it would be 3x28x28 (due to the three channels R, G and B).

lapis sequoia
#

ye thats right

cobalt jetty
#

so, squeeze your output targets as 1x784 and create an end layer with 784 neurons that should output a value between 0 and 255.

#

The difficulty of your model, imo, is that images have some margin of error when you think about their interpretability. It's not because one image of a "3" is slightly lighter that it will look like a 4. So thinking about how you will measure the quality of your model is something to think about too. Which is an interesting problem. Your model matching your input to the output with the exact hue or saturation isn't necessary for it to be valid.

cerulean spindle
#

You should also look into a dimensionality reduction technique like TruncatedSVD since MNIST data is 80% zeros. TruncatedSVD can eliminated those zeros so your model is faster

lapis sequoia
#

well yeah thats fine because its all grey values anyway

#

true that makes sense

cerulean spindle
#

If you train your model on data that’s (60000, 784) it’ll take three hours to train. I shrinked the data to (6000, 149) and got the same results (finished in < 1 min). If time is a factor for you of course

cobalt jetty
#

MNIST data is 80% zeros
technically the MNIST data is made of roughly an even number of objects of each label (see picture). It's better to say that each object's features are mostly zeros, imo.

(6000, 149)
How long did it take to run the TruncatedSVD on the data?

vapid sorrel
#

Hi, I have some questions about the google colab, It is easy to run a Neural Network on colab with a TPU?

cobalt jetty
#

I've not used Colab often, but you have a TPU option for free. It doesn't mean it will be a lot of TPU of course.

#

But perf should be improved somewhat compared to a 'normal' CPU instance.

vapid sorrel
#

Do wou have a linl to understand how to implement the model to train with TPU?

cobalt jetty
#

tbh, if you're using Gcolab it should be seamless. It's basically a dropdown select in the options.

#

otherwise, look up the blogs Towards Data Science

#

there must be one looking over the concept.

vapid sorrel
#

I'll check thanks

#

I don't know if this is the good channel but do you know if exist a server speaking about data science for trading in python, algorithmic tradin, etc?

cobalt jetty
#

I don't, sorry. You might want to hit up reddit and find related servers on specialized subreddit. Otherwise, look into data science applied to time series. That's what you want to look at.

hollow sentinel
#

hey guys

#

has anyone done this course on edX: Machine Learning with Python: from Linear Models to Deep Learning

#

it's an MIT course that uses python for machine learning

hollow sentinel
#

sike i saw the reviews for it it's pretty bad

mortal pendant
#

For my question, I've now finished writing some code that tries to just provide each key a score based on simply how many times each word used in the dataset occurs in each key in the dict, but it's only 30.0% accurate which is just 1.5x better than if it were to just choose completely at random since there are only 5 keys atm. This should also give more clarity about the format of the dataset https://colab.research.google.com/drive/1EuMSz-Dcgulphjs8bRv1XO_lr7M5Doko?usp=sharing
So, any ideas how I could get started improving this accuracy using AI? Thanks so much in advance!

hollow gull
#

@mellow saffron I think you are going to have to go into more detail about what the problem is for people to be able to help.

#

@re

any idea which one would be best suitable for this?
@mellow saffron Maybe someone else has better ideas than I do, but I am imagining that as a multi-step problem with a couple of challenging parts. I don't work much with images, so I am a little out of my depth. Building a CNN that is able to classify different plant types based on images might be a useful early step. Image segmentation might make the later tasks easier, but I haven't seen implementations of image segmentation outside of screenshots on aws sagemaker. Then there will also be the problem of identifying where to move based on the image, that seems more in line with the AWS DeepRacer that I posted. I have seen blogs where people showed code to solve toy problems that didn't have problems like tall weeds.

#

@mortal pendant This feels as much like a natural language processing (NLP) problem as a classification problem to me, but maybe I am missing the point. I am imagining using NLP to build features that you could then put into a logistic regression or other classification algorithm to obtain higher accuracy. It sounds sort of like the goal is to use ML in the solution, not necessarily to achieve the best accuracy by any method.

#

but i take it pulling images from that stream is what i'd do in the end anyways?
@mellow saffron It is the simple hacky solution instead of the more elegant possibly more complicated solution. I am not saying it is the better way, but it feels easier to implement.

hollow gull
#

Short answer: No, I have no idea. Longer answer: That seems like it would be really dependent on what your end solution ends up looking like. Image processing is pretty computationally expensive though, which I would imagine might be challenging for a raspberry pi. I would probably search of projects on google and see if anyone has done something similar with some reasonable degree of success.

#

Has anyone had trouble with Virtualenv inside of pycharm? I use to use conda environments, but it felt like Virtualenv was preferred (I don't remember what caused me to think that.) I feel like since I switched I am having many more issues with my virtual environments failing for whatever reason (not finding a package that use to be there) Anyone have any experience similar to this / suggestions on one type of environment over another?

heady hatch
#

@hollow gull
I can't say much on the pycharm part, but in terms of virtual environments, I used to use pipenv which managed the virtual environments and packages for me.

Then I went to python's native venv since I wanted to switch package manager.

hollow gull
#

@heady hatch can you provide any context on why you wanted to switch or why you switched to python's native venv?

heady hatch
#

@hollow gull
I think I just wanted to try different package managers.

It was either poetry or pip-tools.

poetry is like pipenv, comes with its own virtual environment manager. But pip-tools did not, which means I had to manage my own virtual env.

I ended up going with pip-tools because it's more robust plus easier to transition across projects.

#

Though to clarify, I don't use any IDE.

#

So I'm not quite sure how the interaction with IDE is over there.

#

with vscode, it's just setting the venv's python as the interpreter.

#

and all its packages and environment stuff will be there.

#

Being honest, people in my circle don't really use Virtualenv anymore. Just because there's a native solution or they use the package manager for it.

#

Often in the latter.

hollow gull
#

Cool, thanks for the info.
Yeah, it should be that easy inside an IDE as well, I assume I just didn't do a good job of cleaning things up before transferring or there is some other historical issue that is confusing some part of the process.

heady hatch
#

Ahh fair fair. Would package reinstall be suitable?

hollow gull
#

Yeah, I tried that with no luck, then I deleted the virtual environment and started over with no luck, then I downgraded the package that was causing issues to an earlier version which seems to have 'fixed' it. Maybe I am assigning the blame to virualenv and pycharm when it really was an issue with the library and I didn't remember upgrading to the most recent/broken? version.

#

It isn't like it is some unsupported package, it was numpy.

heady hatch
#

Oh man.

hollow gull
#

But hey, it is working again, so back to working on things that are interesting.

heady hatch
#

Hey guys, side question that's kind of related to data science.

Let's say your team and you share a development server on the cloud. How do you guys manage your ssh keys?

or do you just have a server for each dev?

lapis sequoia
#

I would like to get a bit of assistance regarding anaconda.

hollow gull
#

@lapis sequoia what assistance do you need?

lapis sequoia
#

@hollow gull I've been having trouble with virtual enviorments lately, this is my 3rd day not overcoming the issues so I've decided to try anaconda, which I've been told is a lot friendlier with virtual enviorments, 1. when I create the enviorment on anaconda navigator the launch visual studio I don't get the README.md page, 2. if you look on vscode terminal top right says bash, shouldn't it say conda or something similar?

hollow gull
#

I haven't used visual studio, and it seems like visual studio knowledge is an important part of answering your questions.

I would expect there to be some indicator in your terminal that you are in a virtual environment.

lapis sequoia
#

for some reason the visual studio sites are not working.

#

they are charging now.

#

thanks

hollow gull
#

"""However, launching VS Code from a shell in which a certain Python environment is activated does not automatically activate that environment in the default Integrated Terminal.""" seems like it might be an issue for you depending on how you are trying to get into the conda environment.

#

Later on they give examples of how to select the interpreter. Maybe you are still in the base environment and that is why it isn't indicating an environment.

stray ivy
#

if you're in a bash terminal & activate a venv, you'll see the name of the root folder of your venv in the terminal, which signifies that you're venv is activated

late jackal
#

anyone around here who is well versed in the Pandas package

velvet thorn
#

anyone around here who is well versed in the Pandas package
@late jackal don’t gatekeep, just ask

late jackal
#

not famillar with the term but thank you i am trying to convert a column in minutes to date time but i cant seem to find how to specify HH:MM:SS not including like month or days

velvet thorn
#

pd.to_datetime

late jackal
#

i have that but it seems like it wants like MM DD HH:MM:SS

velvet thorn
#

check the documentation

#

there’s a parameter that lets you change that

hollow gull
#

@late jackal Are you sure that makes sense, can you have a datetime = 'date' + 'time' without a date? You could use today's date information to build a datetime with whatever time you want to specify.

#

Even a timestamp I think will always have a date as part of it, I think. Otherwise all of the benefits that I see of datetimes go away. What is subtraction on time hours and minutes without knowing what days they correspond to?

boreal summit
#

Mehn, I just tried to install kite on my PC based on someone's recommendation here. It couldn't install on my PC for some reason.

#

So sad. ☹️

#

I use a HP EliteBook 8440p

heady hatch
#

You'll just have to make do with vscode and such.

ruby wyvern
#

It will be great if anyone can help me with this question

lapis sequoia
#

Need some help with confusion matrices

#

pls lmk if anyone can help and ill send source code and dataset

austere swift
#

Whats your question

#

!paste @lapis sequoia

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

lapis sequoia
#
    cm = confusion_matrix(actual, predicted)
    sns.heatmap(cm, annot=True,  fmt='.2f', xticklabels = [0,1] , yticklabels = [0,1] )
    plt.ylabel("Predicted_Values")
    plt.xlabel("Actual_Values")
    plt.show()

draw_cm(y_predict, z['Observed Loan Status'])```
#

Error I'm getting: Classification metrics can't handle a mix of binary and unknown targets

austere swift
#

you need to convert your predictions into binary

#

with argmax

lapis sequoia
#

hmm

#

@austere swift in my notebook above, i dont think i am supposed to

austere swift
#

predicted is usually a float

lapis sequoia
#

my predicted is binary

#

i messed up the order

#

the y_predict is in binary

austere swift
#

and what about z['Observed Loan Status']

lapis sequoia
#

but the z["Observed Loan Status"] is outputing: LogisticRegression(fit_intercept=False, random...

#

its a logistic regression

#

idk why its not also in binary

austere swift
#

they both need to be binary for the confusion matrix to work lol

lapis sequoia
#

so that means something sup with my logistic regression im assuming

#

@austere swift logreg = LogisticRegression(random_state=42,fit_intercept=False) logreg.fit(X_train, Y_train)

#

this is my logistic regression code but its working fine

#

it returns: LogisticRegression(fit_intercept=False, random_state=42)

#

so i dont know where im going rong

lapis sequoia
#

fi = pd.DataFrame()
a = logreg.dict["coef_"]
#fi['Col'] = np.arange(0, 14)
fi['Coeff'] = a

fi.sort_values(by='Coeff',ascending=False)

fi

#

need some help with pandas dataframe functions

#

im getting an error that the data must be 1d

#

but i dont know how to proceed with that

#

any help would be appreciated

lapis sequoia
#

How to Get Help with Linear Algebra for Machine Learning? Linear algebra is a field of mathematics and an important pillar of the field of machine learning. It can be a challenging topic for beginners, or for practitioners who have not looked at the topic in decades. In this p...

#

☝️What do you think of these books? Should I read them all?

lapis sequoia
#

Yes You Should

#
MSE = tf.keras.losses.MeanSquaredError()
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)
alpha = tf.random_normal_initializer(mean=0.0, stddev=0.05, seed=None)
def discriminator_loss(disc_real_output, disc_fake_output):
    real_loss = cross_entropy(tf.ones_like(disc_real_output), disc_real_output)
    fake_loss = cross_entropy(tf.zeros_like(disc_fake_output), disc_fake_output)
    total_loss = real_loss + fake_loss
    return total_loss
def generator_loss(disc_fake_output, gen_output, target):
    total_loss = MSE(target, gen_output) + (alpha * cross_entropy(tf.ones_like(disc_fake_output), disc_fake_output))
    return total_loss``` 
do yall see anything wrong with my loss functions? i think it's trying to convert datasets to tensors again but i dont see where im making the mistake of putting a dataset as an input?
#

cos im getting the same TypeError: Failed to convert object of type <class 'tensorflow.python.data.ops.dataset_ops.BatchDataset'> to Tensor. Contents: <BatchDataset shapes: (None, 28, 28, 1), types: tf.float32>. Consider casting elements to a supported type.

#

i defined everything there in the train step

    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
      gen_output = generator(image_batch, training=True)

      disc_real_output = discriminator(target_images, training=True)
      disc_fake_output = discriminator(gen_output, training=True)

      gen_total_loss, gen_gan_loss, gen_l1_loss = generator_loss(disc_fake_output, gen_output, target_dataset)
      disc_loss = discriminator_loss(disc_real_output, disc_fake_output)

    generator_gradients = gen_tape.gradient(gen_total_loss,
                                          generator.trainable_variables)
    discriminator_gradients = disc_tape.gradient(disc_loss,
                                               discriminator.trainable_variables)```
#

and the error traceback saying the problem happening with mean squared error function

#

but i cant figure out what goin on

brave crest
#

How do I split my database three times instead of two? currently using the train_test_split

heady hatch
#

@brave crest use train_test_split again on the dataset you want.

brave crest
#

Oh! Thanks will do!

heady hatch
#

@lapis sequoia What's your gen_output and disc_output, both disc output?

lapis sequoia
#

no

#

gen output is the output of the generator

#

and outputs an image

#

discriminator is a network that is binary classifier, saying either true or false

heady hatch
#

I apologize, to clarify what are they outputting.

#

Are they output tensors, datasets, etc etc.

lapis sequoia
#

gen should output a tensor that can be read as an image

#

disc is just binary yes or no to see if the image is fake or not

heady hatch
#

Could you double check?

#

So we can trace down your problem.

lapis sequoia
#

ill post my model

heady hatch
#

This is tf2, right?

lapis sequoia
#

2.3

#

yea

#

generator is an image generator literally and discriminator is a binary classification network

#

its 2am for me so im gonna sleep

heady hatch
#

okie dokes.

lapis sequoia
#

if you find anything in the model that i havent seen that might be cooking the whole thing and messing it up, lmk ping me/dm me im fine with either ive been looking at this trying to figure out whats wrong for hours on end now and i just cant pick it

#

appreciate any and all help : )

#

Yes You Should
@lapis sequoia

Okay Thank you, It will take a few 2month

lapis sequoia
heady hatch
#

Why not all?

hollow gull
#

@glad mulch you can use pd.merge, pd.join, df.merge, or df.join. One uses the index the other you have to pass what column you want to join on (which would probably be easier in your case). In your case it looks like you want to join on the ticker column.

#

@lapis sequoia what is the type of logreg.dict["coef_"]?

lapis sequoia
#

doesn't one column need to be the outcome? @heady hatch

heady hatch
#

For clustering? What do you mean and what do you have in mind?

lapis sequoia
#

i think i will use k-means clustering, doesnt one column have to be the true label

#

so i was thinking maybe sorting the demand into high, medium, low for the true label @heady hatch

heady hatch
#

Clustering is usually an unsupervised task.

#

It produces labels for you.

hollow gull
#

I don't think I understand exactly what you mean yet, can you elaborate further? To clarify what I mean, it seems like you are telling me how you want to do something but not what you want to do. If you can go into what you are trying to do, it might help me understand what you are asking.

desert oar
#

@glad mulch you want to iterate through unique values of the date index?

#

df.index.get_level_values('Date').unique() iirc

#

Or df.index.unique(level='Date') maybe

twin moth
#

Hey guys, I am currently trying to scrape a website for the sake of studying data science but I'm currently stuck on the following:

<ul>
<li>
<h3 class="match">
        John
        <span class="number">
            123
        </span>
</h3>
</li>
<li>
<h3 class="match">
        Bob
        <span class="number">
            619
        </span>
</h3>
</li>
<ul>
...

I am using BS4 and I want to run a for loop which will iterate over all those h3s and return the name and number.
I was trying to use the following syntax:

for name, number in current.find_all('h3').text.split():
  list.append(MyObject(name, number))

But unfortunately I cannot do that since text cannot be run on an the return value of find_all, any ideas on how to solve it in an elegant way?

desert oar
#

Loop over the output of find_all

#

Don't overthink it

twin moth
#

I did that, used a temp variable and stored the returned array, then I just took indices 0 and 1 and put them in that object

#

But want something more elegant

#

P.S. thanks for the quick response 🙂

desert oar
#
items = []
for elem in curr.find_all('h3'):
    name, num = elem.text.split(maxsplit=1)
    items.append(MyObject(name, num))
#

This is perfectly idiomatic python

#

That said there is a lot of inefficiency here

#

Let me get to a real computer and i will show you

twin moth
#

TYTY

desert oar
#

@twin moth the first problem is that find_all constructs a full list in memory, which is slow and wasteful if you're just iterating over it

#

ah you know what

#

beautifulsoup doesn't support lazy iteration

#

but you can write this as a list comprehension at least, which might or might not feel more elegant to you

#
items = [
    MyObject(*elem.text.split(maxsplit=1))
    for elem in curr.find_all('h3')
]
#

perhaps even more "fancy" is doing it like this:

def process_h3(elem):
    name, num = elem.text.split(maxsplit=1)
    return MyObject(name, num)

items = list(map(process_h3, curr.find_all('h3')))
twin moth
#

perhaps even more "fancy" is doing it like this:

def process_h3(elem):
    name, num = elem.text.split(maxsplit=1)
    return MyObject(name, num)

items = list(map(process_h3, curr.find_all('h3')))

@desert oar That's nice!

desert oar
#

that lets you specify only specific parts of the page to parse, which could be faster

twin moth
#

Nah, I'm using a shit-ton of other things

#

@desert oar is the last line in that code supposed to be in the function?

#

Oh, nvm, just noticed that's it called from it

#

Okay, so I've tried it all but seems like it doesn't work

#

I get a myriad of errors when I try it

#

Either TypeError: 'NoneType' object is not callable or TypeError: __init__() takes 2 positional arguments but 3 were given

#

I think that I'd just go back to using the one way that works

        for li in soup.find('ul').find_all('li', class_=True, recursive=False):
            each = li.find('h3', class_='match')
            each = each.text.split()
            curr_poke.lister.append(MyObject(each[0], each[1]))
steep olive
#

Hey... Can somebody tell me how to load a file from my PC to Tensorflow?

twin moth
#

Oh, sorry, that's an older gen

        for h3 in soup.find('ul', class_='bla').find_all('h3', class_='match'):
            each = h3.text.split()
            obj.lister.append(MyObject(each[0], each[1]))

Thanks for the help @desert oar , if you have any other ideas on how to fix it I'd be delighted to hear

proud iron
twin moth
#

Guys, would someone please help me understand this page of the documentation of "matplotlib" please?
https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.bar.html

It regards the creating a bar chart. In which parameter do we place the height of each column? Where do we place the labels for which column?
@proud iron Never used it, but seems like there are examples in the bottom of the page, maybe you could use them

late jackal
#

i posted this yesterday and some very nice people gave sugestions but I had to go before I could try and trouble shoot it with them

        self.pframe1 = sql.ExcelWriter1()
        self.pframe1['Time'] = self.pframe1['Time'].astype(float)
        self.pframe1['Time'] = pd.to_datetime(self.pframe1['Time'], format='%H:%M:%S')
        print(self.pframe1)

I have this code where i am trying to take a column of elapsed time in minutes and convert it to HH:MM:SS

#

yet im given this error

Traceback (most recent call last):
  File "c:\Users\Nicks\Desktop\Deethanizer V4\HYSYSapp\source1\tankui.py", line 580, in onSaveData
    self.darray.writer(file_dir)
  File "c:\Users\Nicks\Desktop\Deethanizer V4\HYSYSapp\source1\tankui.py", line 802, in writer
    self.pframe1['Time'] = pd.to_datetime(self.pframe1['Time'], format='%H:%M:%S')
  File "C:\Users\Nicks\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 803, in to_datetime
    values = convert_listlike(arg._values, format)
  File "C:\Users\Nicks\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 454, in _convert_listlike_datetimes
    raise e
  File "C:\Users\Nicks\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 418, in _convert_listlike_datetimes
    arg, format, exact=exact, errors=errors
  File "pandas\_libs\tslibs\strptime.pyx", line 144, in pandas._libs.tslibs.strptime.array_strptime
ValueError: time data '25' does not match format '%H:%M:%S' (match)
#

it should alredy be all floats but i added that just incase there was some issue with it not being floats

proud iron
#

@twin moth cheers! 🙂

hollow gull
#

@twin moth I have frequently gotten errors like: TypeError: __init__() takes 2 positional arguments but 3 were given when I am get confused an don't properly instantiate my objects. For example, if you are suppose to do MyObject().run() and you do MyObject.run(). I don't know if that is what is going on in your code, but error messages like that give me painful flash backs.

#

@late jackal that seems like a really instructive error message. Your value seems to be the string '25' and pd.to_datetime doesn't know how to convert that to a '%H:%M:%S' format. Maybe you can use datetime.timedelta objects instead of datetimes? I would add a print(self.pyfram1) before the line that is failing.

late jackal
#

i figured i am converting the entire column to floats first though

#

im trying it with the print moved now unfortunatly i have to reload the whole app anytime i make a change to the code

#
C:\Users\Nicks\Desktop\datasheet.xlsx
         Time
0   24.169001
1   25.581001
2   27.978501
3   27.978501
4   29.105001
5   30.398001
6   31.770001
7   33.011001
8   34.184501
9   36.915001
10  38.162002
11  40.650002
12  41.817002
Traceback (most recent call last):
  File "C:\Users\Nicks\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 450, in _convert_listlike_datetimes
    values, tz = conversion.datetime_to_datetime64(arg)
  File "pandas\_libs\tslibs\conversion.pyx", line 350, in pandas._libs.tslibs.conversion.datetime_to_datetime64
TypeError: Unrecognized value type: <class 'int'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\Nicks\Desktop\Deethanizer V4\HYSYSapp\source1\tankui.py", line 580, in onSaveData
    self.darray.writer(file_dir)
  File "c:\Users\Nicks\Desktop\Deethanizer V4\HYSYSapp\source1\tankui.py", line 803, in writer
    self.pframe1['Time'] = pd.to_datetime(self.pframe1['Time'], format='%H:%M:%S')
  File "C:\Users\Nicks\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 803, in to_datetime
    values = convert_listlike(arg._values, format)
  File "C:\Users\Nicks\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 454, in _convert_listlike_datetimes
    raise e
  File "C:\Users\Nicks\anaconda3\lib\site-packages\pandas\core\tools\datetimes.py", line 418, in _convert_listlike_datetimes
    arg, format, exact=exact, errors=errors
  File "pandas\_libs\tslibs\strptime.pyx", line 144, in pandas._libs.tslibs.strptime.array_strptime
ValueError: time data '24' does not match format '%H:%M:%S' (match)
#

im very confused at this it seems the value is 24.16 not 24

#

@hollow gull

#

is there a way to get the "type" of that column?

hollow gull
#

I would make a toy example to test out the logic outside of your entire app to make troubleshooting faster.

#

df.dtypes

late jackal
#

yeah ive been thinking i should do that just didnt wanna have to look how to make a dataframe

#

i guess it isnt hard

hollow gull
#

df = pd.DataFrame([24.169001, 25.581001], index=[0, 1])

late jackal
#
import pandas as pd

d = {'Time': [24.169001, 25.581001,27.565457]}
df = pd.DataFrame(data=d,  index=[0, 1, 2])
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S')
print(df)

this is the test app

#

gives the same error

serene scaffold
#

I have a dataframe where each row is a list of arbitrary strings (the columns are not meaningful) and I want to transform that to a table of string counts for each row.

hollow gull
#

What is 24.169001 suppose to be?

#

hours?

late jackal
#

24.16 minutes

#

converted into hours minutes seconds

hollow gull
#

can you use timedeltas instead?

late jackal
#

thats the time passed between two points

#

i don't think I can because i have a bunch of data tied to those exact times

hollow gull
#

datetime.timedelta(0, 0, 0, 0, 24.16)

#

@glad mulch join with a how='inner' argument

late jackal
#

@hollow gull i can try but if its the difference between two time intervals i don't think that will work

sleek yarrow
#

e

hollow gull
#

@late jackal you can create a timedelta by subtracting two datetimes, but you can also create them directly like the example I sent. I don't know how you plan to use this later, so maybe it doesn't solve your use, but it makes more sense to me to use a timedelta if you are talking about a duration than a datetime, which really isn't meant to handle durations (in my opinion)

late jackal
#

ahh ok

#

idk what all the 0's represent

#

though

hollow gull
#

@glad mulch you can specify the suffixes so the left version doesn't have a added '_x" then you can drop the '_y' versions. Otherwise, look at the arguments, there might be one to prevent duplicating the columns, I don't recall off the top of my head.

#

@late jackal look up the documentation. google: python datetime.timedelta

late jackal
#

doing so now lol

hollow gull
#

Or if you are in a good IDE it will show you the arguments. In pycharm, if you are inside the parentheses you can hit Ctrl+p and it will show you the argument names or you can hit Ctrl+B to navigate to the definition.

late jackal
#

@hollow gull this seems to have worked and ill try implimenting it in a sec however it gives it to me in this format

                    Time
0 0 days 00:24:10.140060
1 0 days 00:25:34.860060
2 0 days 00:27:33.927420
#

i dont see a format arguement anywhere in the docs

#

any idea if i can get rid of Days

hollow gull
#

What are you going to do with these durations?

late jackal
#

feed them into a program that calculates tuning parameters for valve controlers

#

that program asks for hh:MM:SS

#

sure it can be done manually in excel but kind of defeats the purpose of automating it

hollow gull
#

If you are going to do any arithmetic on them later, I would leave them as timedelta objects because it will handle those operations for you. If you really just want a number of hours, they why convert it out of a float?

late jackal
#

because the program only accepts a .txt file with a column in hh:mm:ss

hollow gull
#

Yeah, I don't see a elegant solution, I can think of a couple hacky ones.

late jackal
#

ok, thanks though ill keep hitting my head against the wall for a bit xD

#

could i split it into two columns? like 1 doe days the other for hhmmss

hollow gull
#

You can add the timedelta to a midnight datetime and then take the hours, minutes, and seconds from the constructed datetime.

#

timedelta.total_seconds() will return a float that you can then format as HH:MM:SS using a custom function.

late jackal
#

that may work xD

hollow gull
#

dt = datetime.datetime(2020, 1, 1)
dt += datetime.timedelta(0, 0, 0, 0, 24.16)
dt.time().isoformat()

#

or
dt = datetime.datetime(2020, 1, 1)
dt += datetime.timedelta(0, 0, 0, 0, 24.16)
dt.time().strftime('%H:%M:%S')

lapis sequoia
#

in pandas i have a dataset. one column is the month and the other is the rainfall

#

there are multiple rows with the same month

#

i want to calculate the total rainfall for each month

#

how do i do it

#

because i want to plot a violin plot with that dara

#

data*

late jackal
#

@hollow gull thought I'd let you know i did get it working so TYVM here is the code

 self.pframe1 = sql.ExcelWriter1()
 self.pframe1['Time'] = 60*self.pframe1['Time']
 self.pframe1['Time'] = pd.to_datetime(self.pframe1["Time"], unit='s').dt.strftime("%H:%M:%S.%f")
 print(self.pframe1)
sullen mortar
#

I just created a RegEx that works perfectly but I'm wondering if it is disgusting 😆
Anyone care to review it?

sullen mortar
lapis sequoia
#

can anyone help me in k-means clustering? i am really stuck, dont even know where to begin

velvet thorn
#

can anyone help me in k-means clustering? i am really stuck, dont even know where to begin
@lapis sequoia ask your question

lapis sequoia
somber bane
#

I got a question, is there anyway I can get the 'food name of username 'jack'>

#

username food
0 jack none
1 2 none
2 hishi 123123
3 ghfdsa 23
4 12 rfgvfbc
5 hihihiihi jack
6 g2 none23
7 g2qd 65s4
8 12 44
9 12 44
10 12 44ssss

#

using pandas

lapis sequoia
#

@velvet thorn any ideas?

shell berry
#

Can someone recommend some good tutorials/resources for intent classification with pytorch?

smoky bobcat
#

good morning data hunters

#

I have a question, do we really decide the features to keep with heat map correlation? I mean what if it's like heart failure prediction and the heat map says that only 2-3 features of 10 are needed

#

would that calculate accurate?

desert oar
#

no, do not use pairwise correlation to choose features in your model when you only have 10 features

#

unless you have a good reason to believe that only those 2-3 features are useful

#

e.g. if one feature is "is it raining today?" then obviously you can remove that feature

plush zenith
#

Hi! Can i ask here something about sympy?

scarlet cloak
#

Hey, I'm trying to write a script for looping through a VCF file (638 million lines) and an SNP list (15K lines) to match SNPs with chromosome and position and make an outfile with 3 columns for each peice of info. My code makes sense to me that it should work and I understand that it will take a while, im just wondering if people have any experience with this or about how long files of this size (~114gbs) usually take to loop through on a home pc

ripe field
#

hey can someone tell me how to get statred with data science like i know python basic programming and the math but i dont know how to get started

lapis sequoia
#

hey can someone tell me how to get statred with data science like i know python basic programming and the math but i dont know how to get started
@ripe field If you want a simple introduction to what ML is, checkout https://www.youtube.com/watch?v=IpGxLWOIZy4

Grokking Machine Learning Book: https://www.manning.com/books/grokking-machine-learning
40% discount promo code: serranoyt

A friendly introduction to the main algorithms of Machine Learning with examples.
No previous knowledge required.

What is Machine Learning: (0:05)
Linea...

▶ Play video
charred helm
#

Yo, do you guys learn pandas, matplotlib, numpy together or one after the other? Currently I'm learning pandas, can't do anything with it as I don't know matplotlib and numpy.

#

Do I learn matplotlib and numpy alongside pandas?

lapis sequoia
#

You can use plot directly from pandas, which uses matplotlib by default to make these plots

#

but you wont have to work with the matplotlib API directly

#

I dont know how you 'learn' that stuff one by one

#

but why not use one of the hundreds of toy projects which use these libraries

ruby wyvern
#

Anyone here know BERT?

#

ping me

lapis sequoia
#

OperatorNotAllowedInGraphError: iterating over `tf.Tensor` is not allowed: AutoGraph did convert this function. This might indicate you are trying to use an unsupported feature.

#

u guys know what this is?

tawdry sage
#

Anyone here know BERT?
@ruby wyvern
I’ve used a bit.
I’m trying compressive transformers now. It seams cool, but it’s not pretrained

ruby wyvern
#

@tawdry sage any way I can find a missing word in a sentence?

#

Without pytorch

#

too lazy to type it all out again

#

lol

#

I searched through GitHub

#

And internet

#

All uses pytorch

tawdry sage
#

Sure. I haven’t tried though.
I’m not sure, but I think you have to fine tune it with the data format you want.

#

I found much easier to use it with pytorch

ruby wyvern
#

Pytorch is like numpy right?

tawdry sage
#

But there is some documentation explaining how to fine tune in tensor flow

ruby wyvern
#

I think I can use tensor flow

#

I’m using repl

#

let me try to download

tawdry sage
#

Pytorch is like numpy right?
@ruby wyvern
Little different, it has some things that looks odd for a regular python syntax, but it’s not that hard

ruby wyvern
#

@tawdry sage what should I do?

tawdry sage
#

username food
0 jack none
1 2 none
2 hishi 123123
3 ghfdsa 23
4 12 rfgvfbc
5 hihihiihi jack
6 g2 none23
7 g2qd 65s4
8 12 44
9 12 44
10 12 44ssss
@somber bane
I hope you’ve found out already.
df.loc[df.username == ‘jack’].food

ruby wyvern
#

Are there any package that does this?

#

GitHub repo

#

or any other thing

tawdry sage
#

This one does that

#

Actually apparently you don’t have to re-train the model to fill the blanks.
But anyway, this seams to do the trick

ruby wyvern
#

It uses torch TT

#

Unfortunately

tawdry sage
#

You want to do it in tensorflow?

ruby wyvern
#

Ye I think that will work

#

Btw thanks for helping me out

#

For real

tawdry sage
#

But why you don’t want to install pytorch? I don’t get it

ruby wyvern
#

I want to

#

It hasn’t been fixed for years

#

really annoying

tawdry sage
#

Repl.it doesn’t work
@ruby wyvern
I don’t know that. Maybe you can use colab instead?

#

I’m google drive. Do you know?

ruby wyvern
#

it’s for group

#

We started alr on repl

#

TT

tawdry sage
#

Ahwm

#

Still you should be able to install using
!pip install torch

ruby wyvern
#

nah it doesn’t work

#

I did that

tawdry sage
#

Well. If you have Bert installed there, you can try to use it to predict the blanks.
But I really think it’s much easier to use torch or Albert

ruby wyvern
#

Albert?

#

I think I can use that

lapis sequoia
#

Hi everyone! My team and I are working on a collaborative python notebook called Deepnote. We have just opened the platform for public beta, so I'd love to hear your thoughts and feedback! Here's a sample Deepnote notebook showcasing Python 3.9: https://deepnote.com/project/09e2609b-986b-40fa-9f56-fcbbc60eb61d#%2Fnotebook.ipynb

A bit more context on the product: We've built Deepnote on top of Jupyter so it has all the features you'd expect - it's Jupyter-compatible, supports Python, R and Julia and it runs in the cloud. We improve the notebooks experience with real-time collaborative editing (just like Google Docs), shared datasets and a powerful interface with features like a command palette, variable explorer and autocomplete. We want Deepnote to be an interface that empowers programmers to collaborate and experiment easily. Looking forward to your feedback!

vital marten
#

Hi. Anyone know how to drop rows in a dataframe based on the data type?

I have 1000 rows with str and list mixed. I want to drop all the rows that has a list in them.

#

df2 = df2[(type(df2['text']) != 'list')] isn't working.

velvet thorn
#

df2 = df2[(type(df2['text']) != 'list')] isn't working.
@vital marten you need to .map(type)

vital marten
#

Ah! It's working now. Thanks.

velvet thorn
#

I think you want != list

#

(no quotes)

charred helm
#

What are some toy projects I could do to learn pandas

lapis sequoia
#

Isn't there the pandas cookbook?

#

and least there was something like this some time ago

#

with a bike dataset iirc

brave crest
#

I am using the train test split function to create: Training, Test validation. How do i split them 60/20/20. It keeps giving me "Found input variables with inconsistent numbers of samples"

lapis sequoia
#

so bike usage in seattle?

#

something like that

#

split 60/40

#

and then 50/50

brave crest
#

So test_size 0.4 and then 0.5?

lapis sequoia
#

I got that correctly didnt I

#

you want to split the whole data into three groups, right?

#

so first split everything 60/40 and then split the 40% again in half

brave crest
#

Yeah but i think im doing something wrong

lapis sequoia
#

so you end up with 60/20/20

brave crest
#

Yeah but i get a inconsistent number error

#

I think I might be comparing it to a bigger dataset somewhere

lapis sequoia
#

Not sure, what your code looks like and when exactly there is an error

brave crest
#

I fixed it. Well i think i did! I trained the model using my validation sets then predicted against the train one instead of the test one

#

I am quite new to this so I definitely sound stupid right now xD

eternal nacelle
#

hi, idk if this is the right channel, but how can i save a dynamic file using matplotlib.savefig() ? right now im doing this ```py
plt.savefig('path/image_name.png')

#

i want something like ```py
plt.savefig('path/image_name_(id).png')

quiet breach
#

just use string formatting

#
'path/image_name_{0}'.format(id)
twin moth
#

Do you guys know if it's possible to show graphs or pandas dataframes in PyCharm like one would do in Jupyter Notebooks?

true nacelle
twin moth
#

Thanks Thanos!

eternal nacelle
#

it works! thanks @quiet breach

brave crest
#

Now that I have three sets how do I validate the trained data now?

undone crown
#

Hi everyone, I don't know if it is the correct channel, but how can I remove vertical grids from a graph with matplotlib?

vapid sorrel
#

Hi, someone knows if there is as simple code on the web to do a LinearRegression but with a correlation coefficient to loss fonction

molten hamlet
#

I got pyplot ax, and want to get handles or save this matplotlib drawing somehow
gradient_ax.savefig("Pendulum path.jpg")

light wind
#

Create a student table and insert data. Implement the following SQL commands on the student table: ALTER table to add new attributes / modify data type / drop attribute UPDATE table to modify data ORDER By to display data in ascending / descending order DELETE to remove tuple(s) GROUP BY and find the min, max, sum, count and average.

#

Help me!

true nacelle
#

@undone crown https://stackoverflow.com/questions/16074392/getting-vertical-gridlines-to-appear-in-line-plot-in-matplotlib
This isn't technically what you asked about, but the solution to this problem also covers your problem. Hopefully it helps.

undone crown
#

@true nacelle Thanks!!

true nacelle
#

Hi. I'm doing some data analysis on a csv, and I need to group together by region, the top 3 most expensive houses. I basically want "Region A: house1, house2, house3" and so on, for each region, as a df.

This is my dataset (note that this is edited/sub-df. The original df has over 20 columns, but they are not relevant in this case.)

So far, I've tried a few things, but I haven't gotten the solution in the way I'd like: grouped by region.

df2 = df.sort_values(by = ['Price'], ascending = False)
df2.groupby(['Regionname']).head(3) #this shows df sorted high-low price, but not categorised by region
This right here produces the original df with the prices sorted, but they are not group together by "Regionname". I'm at a bit of a loss as to how to approach it, even conceptually.

ashen violet
#

Has anyone got an example of how to use the "golden features" paremeter in Catboost?

novel timber
undone crown
#

Hi. I'm doing some data analysis on a csv, and I need to group together by region, the top 3 most expensive houses. I basically want "Region A: house1, house2, house3" and so on, for each region, as a df.

This is my dataset (note that this is edited/sub-df. The original df has over 20 columns, but they are not relevant in this case.)

So far, I've tried a few things, but I haven't gotten the solution in the way I'd like: grouped by region.

df2 = df.sort_values(by = ['Price'], ascending = False)
df2.groupby(['Regionname']).head(3) #this shows df sorted high-low price, but not categorised by region
This right here produces the original df with the prices sorted, but they are not group together by "Regionname". I'm at a bit of a loss as to how to approach it, even conceptually.
@true nacelle Hi Thanos, when one applies .groupby () it must be related to a summary measure so that it can group. Otherwise, you will not have to group.

true nacelle
#

I see.

green hemlock
#

@true nacelle you can use df.sort_values(['Regionname','Price'],ascending=True).groupby('Regionname',sort=True).head(3)

#

and wherever possible, use nlargest() instead of head(), if you have numerical numbers

boreal summit
#

Hello everyone, I'm practicing with the housing dataset and using vs code. I was trying to fit my data into a linear regression model after some tuning but I'm getting an error. It says unable to allocate 464. MiB for an array with shape (16512, 3683) and data type float64. Anyone know what this means?

#

Title of the error is Memory error.

green hemlock
#

also quoting from quora:
The order of rows WITHIN A SINGLE GROUP are preserved, however groupby has a sort=True statement by default which means the groups themselves may have been sorted on the key. In other words if my dataframe has keys (on input) 3 2 2 1,.. the group by object will shows the 3 groups in the order 1 2 3 (sorted). Use sort=False to make sure group order and row order are preserved

true nacelle
#

Thanks a lot @green hemlock! I'll be sure to look into this! edit: This was really useful. It's so simple when you see it, but I struggle to "find" the solution.

hollow gull
#

@boreal summit do those dimensions look correct to you? It seems like your data got very wide, to the extent that it might be an error. I am assuming that is why it is taking up so much memory and you are getting the memory error.

undone crown
#

Hi everyone, I am with the following problem. I want to convert a 'Date' column of my dataframe that has the format '09/21/2020' to the 'Sep-20' format. Can anyone help me on how I could do it?

slim fox
#

You can parse it to datetime and then do strptime on it

velvet thorn
#

Hello everyone, I'm practicing with the housing dataset and using vs code. I was trying to fit my data into a linear regression model after some tuning but I'm getting an error. It says unable to allocate 464. MiB for an array with shape (16512, 3683) and data type float64. Anyone know what this means?
@boreal summit it seems pretty self-explanatory to me. you don’t have enough memory.

#

what feature engineering did you do before that?

hollow gull
#

@undone crown oh.... why????

#

google datetime formats python

velvet thorn
full glacier
#

Although never is often better than right now.

hollow gull
#

That has a table with all of the shortcuts when you are doing formatting of datetimes in python. So it looks like your format string will be '%b-%d'

full glacier
#

you are crazy

hollow gull
#

putting it all together, I think this will work:

df['evenstrangerdateformat'] = df['weirdamericandatetime'].apply(datetime.datetime.strptime('%b-%d'))
boreal summit
#

@velvet thorn from what I garnered online, seems it's cause I'm using Python 32 bit, i want to upgrade to 64 bit and see if that will help.

#

I read that 32 bit Python can't run some complex stuffs that need memory.

hollow gull
#

@boreal summit I would really recommend looking at the shape of what you are putting into the LR and make sure it is what you are expecting... that seems really wide.

velvet thorn
#

@velvet thorn from what I garnered online, seems it's cause I'm using Python 32 bit, i want to upgrade to 64 bit and see if that will help.
@boreal summit this might be true, but besides the point.

#

why do you have so many columns?

#

putting it all together, I think this will work:

df['evenstrangerdateformat'] = df['weirdamericandatetime'].apply(datetime.datetime.strptime('%b-%d'))

@hollow gull nope

#

prefer pd.to_datetime to .apply

#

and then .dt.strftime

boreal summit
#

Might be my processing and stuff. I used standard scaler to scale the Input data.

velvet thorn
#

Might be my processing and stuff. I used standard scaler to scale the Input data.
@boreal summit doesn't add columns

#

what are all the steps you took?

#

did you OHE a continuous column or something

boreal summit
#

It's the housing dataset I was learning with.

#

No, I only did label encoder.

#

Hold on. Lemme screenshot it.

#

Pardon me please, I'm typing with my phone and doing stuff on my PC.

velvet thorn
#

you should really copy-paste code

#

that is mega hard to see

boreal summit
#

Okay, lemme login on my PC.

velvet thorn
#

but yeah

#

I do not see anything particularly strange about that

#

actually

#

can you just tell me what data.shape is

cerulean spindle
#

I used to use jupyter with VScode. It was really slow and sometimes made really weird errors. Did anyone have the same problem? Does anyone have alternatives?

boreal summit
#

it was showing something like (323020, 1342)

velvet thorn
#

...what dataset is this?

#

it was showing something like (323020, 1342)
@boreal summit are you sure?

boreal summit
#

I already cleared all the output, and uninstalled python 3.8 but I can remember the R was in 6 digits while the C was in 4 digits

#

the housing dataset

#

Don't know where things went wrong

#

yea I'm sure

velvet thorn
#

the housing dataset
@boreal summit ...which housing dataset

#

do you mean the Boston housing dataset?

boreal summit
#

yea

hollow gull
#

I think that the columns are blowing up because you are doing a logisticregression with multiclass=auto

boreal summit
#

Boston housing dataset

#

I was trying to look it up on haggle

velvet thorn
#

the Boston housing dataset has shape (506, 14).

boreal summit
#

this one has 20,641 rows

#

I'm trying to predict the median house income according to the excercise

#

should I leave the Hyper parameters in default?

hollow gull
#

If it is a continuous variable you are trying to predict a regression model typically fits the problem better.

#

So LinearRegression instead of LogisticRegression

#

LogisticRegression is designed for a target variable with 2 or a few unique values, not something continuous.

boreal summit
#

okay, I will try it out and see what happens. Thanks alot, I appreciate.

#

I actually intended to use LinearRegression, didn't know it was Logistic I used.

boreal summit
#

Thanks everyone, @hollow gull it ran successfully this time.

#

The dataset successfully fitted in. I guess it's cause I was using logistic regression.

#

The dataset wasn't fit for the model.

small tartan
#

Hey guys, i have a question around dataset and what actions to take. I have 3 variables that are not at all on the same scale. I need to normalize or standardize them so they result in something i can weight and then pull into a single score as a result.

boreal summit
#

You can either use standard scaler or minimax scaler. Each has its own disadvantages and advantages.

small tartan
#

My question being, can the ranges be set for scaling outside of the max/min of the data set