chrome rampart Mar 7, 2020, 11:59 PM

#

ik

#

It's mostly theoritical stuff, right?

lapis sequoia Mar 8, 2020, 12:00 AM

#

been studying python and compsci for 6 months and I still dunno jack shit compared to someone who's been doing it for 1 year or less

#

yes

#

they use pseudo languages

#

if this then that

chrome rampart Mar 8, 2020, 12:01 AM

#

like C?

#

Oh, so pseudocode is just code to explain programs simply, not real langauges

lapis sequoia Mar 8, 2020, 12:07 AM

#

yep

chrome rampart Mar 8, 2020, 12:08 AM

#

Welp, I think I started too quickly, I'm just gonna take tge CS50 course, maybe gonna take three or four months

#

the*

lapis sequoia Mar 8, 2020, 12:11 AM

#

I thought I'd be done in a month

#

stll here 6 months later

chrome rampart Mar 8, 2020, 12:11 AM

#

CS50?

lapis sequoia Mar 8, 2020, 12:11 AM

#

No just python courses and books

velvet thorn Mar 8, 2020, 2:15 AM

#

@merry portal like this?

#

>>> list(zip(*np.nonzero(a < 5)))
[(0, 0), (0, 1), (0, 2), (1, 1)]

slim elm Mar 8, 2020, 2:22 AM

#

No just python courses and books
@lapis sequoia I would play around with a lot of the python packages before you move to far from Python. Maybe start diving into SQL if you are looking to do some data work. You can essentially run SQL syntax in pandas anyways.

agile monolith Mar 8, 2020, 2:26 AM

#

does anybody here know R?

📎 unknown.png

slim elm Mar 8, 2020, 2:26 AM

#

r u trying to run data viz?

agile monolith Mar 8, 2020, 2:27 AM

#

me?

#

i started r studio a couple of weeks ago

#

i think this is data viz doe yh

#

im pretty sure it is

slim elm Mar 8, 2020, 2:29 AM

#

ya, I mean are you trying to create charts and shit

#

or run stats calcs

agile monolith Mar 8, 2020, 2:30 AM

#

yh

#

i have been given a task that is composed of 5 stages

#

the task is to simulate the double slit experiment using R

#

if i were to do this from scratch i would be clueless

#

but the tasks build up to give the result

#

can you help me with this? @slim elm

#

📎 unknown.png

slim elm Mar 8, 2020, 2:34 AM

#

rippin a game of HoTS quick then i will look at it

agile monolith Mar 8, 2020, 2:39 AM

#

i might take a powernap meanwhile

agile monolith Mar 8, 2020, 10:06 AM

#

how is the size of the screen defined here without defining "-y.max"??

📎 unknown.png

lapis sequoia Mar 8, 2020, 10:08 AM

#

>>> y_max = 200
>>> -y_max
-200

#

Since the screen goes from -y_max to y_max, you only have to define one of them. just negate it

agile monolith Mar 8, 2020, 10:10 AM

#

is -y.max automatically equal to -200 when y.max=200?

lapis sequoia Mar 8, 2020, 10:10 AM

#

It it were from [-300, 200], you'd have to define -y_max

#

I mean yeah, the negative of 200 is -200. Adding a - to a variable negates it

agile monolith Mar 8, 2020, 10:11 AM

#

damn didnt know that worked in R

#

mind if i ping you in the future for help

#

i just started this lang

#

its my first lang

lapis sequoia Mar 8, 2020, 10:11 AM

#

sure, though I'm not very knowledgable in R

agile monolith Mar 8, 2020, 10:11 AM

#

neither am i

hollow quartz Mar 8, 2020, 11:13 AM

#

hi I use pandas
I have two columns iand j example {(1,2), (3,4), (2,1)} I want to count the pairwise iand j . (1,2) = (2,1). Can I use once groupbyto do this. Or I use groupbyand after I iterate?

lapis sequoia Mar 8, 2020, 12:10 PM

#

Hey guys, does anyone have some tips on how to optimize randomforest regressions most effectively? random gridsearch? gridsearchcv? a combination thereof? I know how to do those, but I'm an absolute beginner in terms of randomforests... i only did a parameter grid test for n_estimatorsand max_features and the results are rather unsatisfactory. Thanks
bump

agile monolith Mar 8, 2020, 3:02 PM

#

`

distance (from apertures) to screen

L <- 1000

size of each aperture

b <- 0.5

wavelength of light

lambda <- 0.05

separation of the slits

d <- 3

wavenumber

k <- 2 * pi / lambda

#all of these define the physical parameters

define distance along the screen, y, from -y_max to y_max

size of the screen - extends (-y.max, y.max)

y.max <- 200

sets the number of pixel on the screen

n.screen <- 501

#stating -y.max negates the original y.max value therefore if y.max is defined -y.max is naturally the negative correspondance.
y <- seq(-y.max, y.max, length.out = n.screen)

#the units for these values are in mm and the centre of the screen is at y=0

#defines theta
theta <- atan(y/L)

sinc <- function(x) { #defines sinc
y <- sin(x) / x
y[x== 0] <- 1 # "x==0" checks if x is equal to zero and if it is zero then it will be replaced by 1
return(y)
}

phi<- 2pia*sin(theta)/lambda
`

#

this is my code

#

and i have no clue what a is supposed to be

📎 unknown.png

#

this is the task

📎 unknown.png

#

this is the theory

📎 unknown.png

#

a is slit width

#

but idk how to work it out

#

@lapis sequoia

#

can u help me find out what "a" (the slit width) is

#

everything is above

lapis sequoia Mar 8, 2020, 3:21 PM

#

That image is blurry on my phone si I can't make out the figures. But I would assume that you function to calculate phi is supposed to accept a as an argument

agile monolith Mar 8, 2020, 3:22 PM

#

that is probably correct

#

how do i make it accept a as an argument?

lapis sequoia Mar 8, 2020, 4:57 PM

#

anybody here use deeplearning.ai?

#

im trying to grok L2 regularization from a mathematical perspective

#

how come the regularization provides a lesser update, according to Ng, than an un-regularized weight update?

#

it seems like it would almost always increase the update magnitude unless lambda is negative xor the derivative is negative

deft harbor Mar 8, 2020, 6:10 PM

#

Say I have a 32 x 32 RGB image that I am passign through two convolution kernals. On the other end, I have a tensor with depth 2. Now I need to apply a bias, but does the bias just get added to each specific element in the output layer?

#

[[0, 1, 0],
 [1, 0, 1],
 [1, 3, 4]]

#

So adding a bias of 3 would result in:

#

[[3, 4, 3],
 [4, 3, 4],
 [4, 6, 7]]

#

correct?

#

Then I would add another bias to output from the other kernal

#

ignore dimensions, I just slapped something together

velvet thorn Mar 9, 2020, 2:21 AM

#

each filter has one bias, yes

valid drum Mar 9, 2020, 1:43 PM

#

Hi, I've recently started to learn neural nets and now I'm reading this book: http://static.latexstudio.net/article/2018/0912/neuralnetworksanddeeplearning.pdf
When I'm using the first neural net there I'm getting over 97% accuracy but only with pictures from the data set and not with pictures that I create (28x28 with MS Paint). What am I missing regarding to creating/processing the pictures?
The network is implemented here:
https://github.com/MichalDanielDobrzanski/DeepLearningPython35/blob/master/network.py

GitHub

MichalDanielDobrzanski/DeepLearningPython35

neuralnetworksanddeeplearning.com integrated scripts for Python 3.5.2 and Theano with CUDA support - MichalDanielDobrzanski/DeepLearningPython35

oblique belfry Mar 9, 2020, 2:15 PM

#

I know tf/keras has built in distributed training strategies (Pytorch also has some but I do not know enough about them), but I also know about Hovorod by Uber, and how many people use this library as well. Are there any good benchmarks showing if there is a difference between Hovorod and these other strategies? The ring-reduce method by Hovorod is interesting, but I am unsure how that could be better than using a parameter server. I do not know much about the mpi protocol.

fringe quiver Mar 9, 2020, 3:09 PM

#

Hi, guys! I have the following problem:
consider the dataset

day
month
year
hour
some per-hour variable values (like temperature, etc.)
XValue (real value in range 0.0 - 30000.0)
Is XValue Maximum for this day (0-1)
So, XValue changes every hour of each day and take always different values. Also, it reaches it's peak value on some hour. I need, given day, month, year and per-hour variable values for each hour guess, when does XValue will reach its peak value (basically receive an int in range 0 - 23). What approaches should I look into to solve such problem?

thorny pasture Mar 10, 2020, 1:38 AM

#

Looking at potentially getting a MBP, opinion on how much ram I should get? It's a toss up between 16GB and 32GB likewise with 4GB vs 8GB for graphics. My gaming desktop only has 16GB and I've never had an issue. I plan to get into Data Science more so that's why I'm asking here.

velvet thorn Mar 10, 2020, 1:43 AM

#

@thorny pasture depends, what do you want to do?

#

16 GB is plenty for many data science uses

thorny pasture Mar 10, 2020, 1:43 AM

#

I'm still too new to say for certain, could you maybe touch on some cases when 16 wouldn't be?

#

@velvet thorn

velvet thorn Mar 10, 2020, 2:25 AM

#

so the main use of RAM is storing your dataset to work with interactively

#

in this context.

#

there may be a point where 16 GB is too little and you need 32 GB, but that is rarely the case

#

in my experience, if it's too big for 16 GB, it will likely be too big for 64 GB, and you'll need to use a cluster or something

thorny pasture Mar 10, 2020, 2:56 AM

#

@velvet thorn what kind of specs does your computer have?

velvet thorn Mar 10, 2020, 3:42 AM

#

the laptop that I use for DS work, no DL, is some mid-range Lenovo ThinkPad

#

16 GB RAM, in particular

#

I use my desktop for gaming + DL, and it has a GTX 1080 (planning to upgrade in a few months)

#

also 16 GB of RAM

prisma imp Mar 10, 2020, 6:33 AM

#

any way to write text to a pdf file?

jade cloud Mar 10, 2020, 11:42 AM

#

@prisma imp reportlab

lapis sequoia Mar 10, 2020, 6:11 PM

#

can somebudy suggest how to optimize the following parameters in random forest?

min_impurity_decrease
min_impurity_split
n_jobs
random_state```

#

is random_state even necessary? I obviously used it for train_test split, but I'm not sure whether it is necessary for using bootstrapping

#

as for the other 3 commands, i don't really understand their explanations very well in the userguide...

#

@velvet thorn i recently updated my PC from 8 to now 24Gb because of the python project and I still have experienced crashes with randomforest gridsearch on higher n_estimators parameter values ^^ my working dataset is "only" 1-1.5m rows and less than 1gb after cleaning. the process took like 48+ hours and at some point crashed

velvet thorn Mar 10, 2020, 6:22 PM

#

bet it was in IPython

lapis sequoia Mar 10, 2020, 6:25 PM

#

jupyter lab

#

is there nobody who has experience with optimizing randomforests? :[

velvet thorn Mar 10, 2020, 6:27 PM

#

yeah, that's built on IPython

#

@lapis sequoia are you sure you know what you're doing?

#

n_jobs is the number of parallel jobs to run (e.g. fitting multiple trees at once).

#

it is not a hyperparameter.

#

neither is random_state.

#

you might as well optimise the time of day at which you run your code

#

min_impurity_decrease just means how much impurity (a measure of how good a tree is at differentiating between values) must go down by before your tree splits

#

min_impurity_split is deprecated, and serves roughly the same purpose as min_impurity_decrease.

#

don't touch it.

lapis sequoia Mar 10, 2020, 6:31 PM

#

i know random_state is not a hyper parameter, but it is suggested to use a random state when using bootstrapping in some tutorials... I didn't use a specific random_state though. My grid search, however, showed that using bootstrapping is better than not using it.

velvet thorn Mar 10, 2020, 6:31 PM

#

yes

#

you should set random_state

lapis sequoia Mar 10, 2020, 6:31 PM

#

thanks 🙂

velvet thorn Mar 10, 2020, 6:31 PM

#

but you should not be searching over it

#

which was what you asked, I believe

lapis sequoia Mar 10, 2020, 6:32 PM

#

my question was not well specified I guess, sorry for that

velvet thorn Mar 10, 2020, 6:32 PM

#

np

lapis sequoia Mar 10, 2020, 6:33 PM

#

it was aimed to get some explanations for the parameters, and whether random_state is needed when using bootstrapping

velvet thorn Mar 10, 2020, 6:33 PM

#

yes

#

in general, when you do something involving randomness

#

you should ensure reproducibility

#

which is done by, in this cas,e fixing random_state

lapis sequoia Mar 10, 2020, 6:33 PM

#

i tried fine-tuning only max_featuresand n_estimators in the beginning, but the results are very disappointing compared to rbf-SVM

#

I honestly expected (from the papers i read), that randomforest will have the best results for my study... but seems like it doesn't

velvet thorn Mar 10, 2020, 6:35 PM

#

IME

#

random forest is not very responsive to hyperparameter tuning

lapis sequoia Mar 10, 2020, 6:37 PM

#

you should ensure reproducibility
yup, that's logical, however, while train_test splits should obviously be reproducible, the random state of the bootstrapping-hyperparameter in my view shouldn't make any difference... because as you said, it's not like the result of hyperparameter tuning will change if you change the random_state

#

random forest is not very responsive to hyperparameter tuning
yeah, I agree

velvet thorn Mar 10, 2020, 6:38 PM

#

you'll get different bootstrapped samples

#

which will slightly change the fit of the individual trees

lapis sequoia Mar 10, 2020, 6:38 PM

#

aaah

#

right

velvet thorn Mar 10, 2020, 6:38 PM

#

of course, being a rather low variance method in the aggregate, this shouldn't change much, but it will make a difference, still.

lapis sequoia Mar 10, 2020, 6:38 PM

#

but then again.... bootstrapping is a yes/no option

#

so even if there is slight difference, the outcome should remain consistent right

velvet thorn Mar 10, 2020, 6:39 PM

#

huh?

#

no, each bootstrapped sample will vary slightly

#

because it's drawn randomly from the original data, right

#

the random state influences this

lapis sequoia Mar 10, 2020, 6:40 PM

#

sure sure

#

i mean... when you're using grid-search, then your goal is to check which is best for your specific model as in bootstrapping=[true, false]

velvet thorn Mar 10, 2020, 6:41 PM

#

ah

#

yes

lapis sequoia Mar 10, 2020, 6:41 PM

#

and if bootstrapping=true is the better option, that should not change whatever your random_state is

velvet thorn Mar 10, 2020, 6:41 PM

#

no, not necessarily

#

at least, not theoretically.

#

and not practically as well, if you're using sklearn

lapis sequoia Mar 10, 2020, 6:42 PM

#

ugh... ok 😦 😄

velvet thorn Mar 10, 2020, 6:42 PM

#

because the individual trees also are influenced by randomness, entirely apart from bootstrapping

#

see the documentation.

#

and, this aside

#

if you agree that bootstrapping performance varies with random_state

#

surely you can agree that it is conceivable that when random_state takes certain values, bootstrapping may perform worse than not bootstrapping.

#

on the other hand, when random_state takes other values, bootstrapping may perform better.

#

because, remember, it may influence the performance of bootstrapping.

#

if you accept this, then it does matter.

lapis sequoia Mar 10, 2020, 6:43 PM

#

but would that not make random_state a hyperparameter then? lol

#

if the performance difference was that meaningful, i mean

velvet thorn Mar 10, 2020, 6:44 PM

#

hm

#

no

#

or rather, practically speaking, perhaps

#

but theoretically, I'd say no

lapis sequoia Mar 10, 2020, 6:44 PM

#

haha 😄

velvet thorn Mar 10, 2020, 6:45 PM

#

this is kind of a philosophical question

lapis sequoia Mar 10, 2020, 6:45 PM

#

hahaha

#

ok, so you suggest i don't touch the min_impurity_split and _deacrease, but max_depth, min_samples_split, min_samples_leaf does make sense to finetune, right? on top of max_features and n_estimators, obviously

velvet thorn Mar 10, 2020, 6:46 PM

#

no I said

#

don't touch min_impurity_split becausei t's deprecated

agile wing Mar 10, 2020, 6:47 PM

#

hi @lapis sequoia, where did you learn machine learning with python?

velvet thorn Mar 10, 2020, 6:47 PM

#

in favour of min_impurity_decrease

#

anyway, go ahead

#

no harm trying.

lapis sequoia Mar 10, 2020, 6:50 PM

#

@agile wing self-taught learning by doing and using lots of tutorials on the web and youtube, plus 1-2 books. I had to learn it for my thesis-project... but i'm literally a beginner. this chat was a huge help for me, when I was stuck... gm certainly knows what he's talking about, I on the other hand do absolutely not 🙂

agile wing Mar 10, 2020, 6:50 PM

#

oh, I'm trying to find a pretty good MOOC on python and ml

lapis sequoia Mar 10, 2020, 6:52 PM

#

in favour of min_impurity_decrease
@velvet thorn any suggestions what values to use? since it's a threshold i have absolutely no idea... except that default is (default=1e-7)

#

@agile wing sorry what's MOOC?

agile wing Mar 10, 2020, 6:52 PM

#

online class, like coursera, edx...

#

online class platforms

lapis sequoia Mar 10, 2020, 6:54 PM

#

mh scikit has lots of data online, github too.... i never really had the time to play around with other data, because my own project had so much data to work with and my project obviously has a deadline

#

@velvet thorn oh and for n_jobs yeah i read the documentation... i was asking, because i thought it might help accelerating the gridsearch process... but i don't know and i don't really want to risk jupyter to crash my PC again :[

agile wing Mar 10, 2020, 7:00 PM

#

ic thanks

lapis sequoia Mar 10, 2020, 7:07 PM

#

@agile wing what's your current state of knowledge though? if you're starting from scratch, i would recommend to watch Daniel Chen`s python beginner tutorials on youtube...

agile wing Mar 10, 2020, 7:08 PM

#

i know my python, but Im new in ML

tame sedge Mar 10, 2020, 7:36 PM

#

Hey! I love solving challenges like Project Euler, Codewars, Codingame...
Do you have something similar for Data Science/Machine Learning?
I know about Kaggle of course but it's a bit hard for beginners. So far, I have found https://www.machinelearningplus.com/python/101-numpy-exercises-python/, https://www.machinelearningplus.com/python/101-pandas-exercises-python/ and a sub-subsection about Machine Learning on Hackerrank (about 20 questions). Do you have more cool stuff?

silk frigate Mar 10, 2020, 7:49 PM

#

Can someone help me get started with data science? I'm learning Python and I have to make a bar chart of the first column of a csv file but I'm really struggling with it

#

Pls tag me if you can

lapis sequoia Mar 10, 2020, 7:51 PM

#

@silk frigate what have you done so far?

silk frigate Mar 10, 2020, 7:51 PM

#

Well

#

I have downloaded the csv file (I opened in Excel now)

#

It's about the McDonalds menu

#

so its looks something like this:

#

📎 unknown.png

#

And now I want to make a bar chart of the first column (category)

#

At least that's what the assignment asks me to do haha 😂

#

They gave me the start of the program

#

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# import menu and display the first two rows of the dataframe
menu = pd.read_csv("mcdonalds_menu.csv")```

#

but I don't really understand how to make bar charts etc.

#

It's all new to me

lapis sequoia Mar 10, 2020, 7:53 PM

#

how new are you to programming?

silk frigate Mar 10, 2020, 7:54 PM

#

I started about 5 weeks ago I think. In my class we started with the basics (data types, etc.)

#

📎 unknown.png

#

I have worked my way through module 1 till 8

lapis sequoia Mar 10, 2020, 7:55 PM

#

Ok. So you're using a library called pandas to read the CSV file. pandas is very useful for anything related to data manipulation and analysis

silk frigate Mar 10, 2020, 7:55 PM

#

Yes

lapis sequoia Mar 10, 2020, 7:55 PM

#

matplotlib is usually used for plotting, but pandas has some built-in plotting methods

#

If you google pandas barchart you'll hopefully find this link: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.plot.bar.html#pandas.Series.plot.bar

silk frigate Mar 10, 2020, 7:56 PM

#

Yes I found that link

lapis sequoia Mar 10, 2020, 7:56 PM

#

(hold on, someone is calling me, brb in 2 min)

silk frigate Mar 10, 2020, 7:56 PM

#

sure thing

lapis sequoia Mar 10, 2020, 8:00 PM

#

ok im back

silk frigate Mar 10, 2020, 8:00 PM

#

hii

lapis sequoia Mar 10, 2020, 8:00 PM

#

anyways as I was saying. As you see pandas as a built-in method for creating a bar-plot

silk frigate Mar 10, 2020, 8:00 PM

#

plot.bar()

#

?

lapis sequoia Mar 10, 2020, 8:00 PM

#

yes

#

However since you want to make a plot from strings (category) you need to find a way to quantify them numerically

#

And in this case I assume it's the number of each items in the category

silk frigate Mar 10, 2020, 8:02 PM

#

yes

lapis sequoia Mar 10, 2020, 8:02 PM

#

If you google pandas count of unique values in column you'll hopefully find this link: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.value_counts.html

silk frigate Mar 10, 2020, 8:02 PM

#

it says it should look like this

#

📎 unknown.png

lapis sequoia Mar 10, 2020, 8:02 PM

#

oh

#

lol I kinda misunderstood what the task

silk frigate Mar 10, 2020, 8:03 PM

#

what did you have in mind then? haha

lapis sequoia Mar 10, 2020, 8:03 PM

#

no never mind I had it right

#

just got confused by the ordering

silk frigate Mar 10, 2020, 8:03 PM

#

ohh haha

lapis sequoia Mar 10, 2020, 8:03 PM

#

anyways did you see the above link?

silk frigate Mar 10, 2020, 8:03 PM

#

yea it's alphabetically

#

not yet let me see

#

how did you find that link?

#

Did you just googled "python pandas counting values in column" or something?

lapis sequoia Mar 10, 2020, 8:04 PM

#

yes

silk frigate Mar 10, 2020, 8:04 PM

#

hmm okay

#

I'm kinda scared with the formula that's on the top of that page haha

#

all the paramters

#

meters*

lapis sequoia Mar 10, 2020, 8:05 PM

#

don't worry about that yet

#

The only thing you should take a not at are all parameters that dont have a default value

#

Basically parameters that dont have =something at the end of them

silk frigate Mar 10, 2020, 8:06 PM

#

but all of them have a default value right?

lapis sequoia Mar 10, 2020, 8:07 PM

#

Yes exactly. With the exception of self. But that can be ignored

#

So that means you can just call .value_counts() without providing any parameters.

silk frigate Mar 10, 2020, 8:07 PM

#

hmm okay

lapis sequoia Mar 10, 2020, 8:08 PM

#

Which is great as that means we can just focus on getting some plot working, then finetuning it with parameters as needed

silk frigate Mar 10, 2020, 8:08 PM

#

haha yes

lapis sequoia Mar 10, 2020, 8:09 PM

#

Anyways. in pandas you have something called DataFrame and something called Series. The boiled down version is that a DataFrame is just a collection of Series

#

And the link above shows pandas.Series.value_counts()

silk frigate Mar 10, 2020, 8:09 PM

#

is a series one column?

#

and a dataframe multiple columns?

lapis sequoia Mar 10, 2020, 8:09 PM

#

yes

silk frigate Mar 10, 2020, 8:10 PM

#

ahh okay

#

that kind of makes sense

lapis sequoia Mar 10, 2020, 8:10 PM

#

yeah. So in your example, if you were to do:

df = pd.read_csv(...)

print(type(df['category'])

you should get <class 'pandas.core.series.Series'>

#

You can access any column in a dataframe like you would a dictionary.

silk frigate Mar 10, 2020, 8:11 PM

#

but aren't columns always series?

lapis sequoia Mar 10, 2020, 8:11 PM

#

And if the name of the column doens't have any spaces you can also do it without the brackets: df.category

#

Yes thats right. But that means that we can do this: df.category.value_counts()

silk frigate Mar 10, 2020, 8:12 PM

#

so python print(type(df.category))
would also work?

lapis sequoia Mar 10, 2020, 8:12 PM

#

yes

silk frigate Mar 10, 2020, 8:12 PM

#

ah okay

#

Yes so then we're counting all values

#

in that series

#

called category

lapis sequoia Mar 10, 2020, 8:15 PM

#

Now here's a task for you. Using the documentation, see if you can find out how to plot from a series (instead from a dataframe)

#

Link me the page about plotting a bar-plot from a pandas series

silk frigate Mar 10, 2020, 8:16 PM

#

It is this one?
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.plot.bar.html

lapis sequoia Mar 10, 2020, 8:18 PM

#

yes

#

So how would you use this?

silk frigate Mar 10, 2020, 8:20 PM

#

well pd.plot.bar() is a function from pandas that you can use

#

although I don't really get why there's pandas**.series**.plot.bar() in the title

#

because in the example they don't do series

lapis sequoia Mar 10, 2020, 8:20 PM

#

That means you can use it on series as well

silk frigate Mar 10, 2020, 8:20 PM

#

but you shouldn't use that in the actual function?

lapis sequoia Mar 10, 2020, 8:21 PM

#

What do you mean?

silk frigate Mar 10, 2020, 8:21 PM

#

well if you are going to use this function

#

should you do: pd.series.plot.bar()

#

or pd.plot.bar()

lapis sequoia Mar 10, 2020, 8:22 PM

#

Actually what pandas.Series.plot.bar means is that withing the pandas library, any Series has an attribute called plot whic has a bar method

#

And if you run this: print(type(df.category.value_counts())) you'll see that the output is a series as well

#

And as mentioned, since it's a series we can do this: df.category.value_counts().plot.bar()

silk frigate Mar 10, 2020, 8:25 PM

#

yes

lapis sequoia Mar 10, 2020, 8:25 PM

#

Lastly, what the pandas plotting methods do, is just create the figure. To display it we have to use matplotlib after by calling plt.show()

silk frigate Mar 10, 2020, 8:26 PM

#

yeah that's weird haha

#

why doesn't python just do that automatically 😭

#

wow

📎 unknown.png

#

it works haha

#

I'm already proud of this

lapis sequoia Mar 10, 2020, 8:28 PM

#

good job! 🙂

#

I hope everything made sense to you, and hopefully you learned not just how to plot a bar-chart, but also how to navigate the documentation and understand it

silk frigate Mar 10, 2020, 8:30 PM

#

haha yes, I really like that I can use discord to learn things like this. The lecture notes of my teacher are sometimes useful but do not really explain things in-depth

#

I do have 2 more questions then tho, how did they manage to sort the categories alphabetically and give cool colours to the bars?

#

If I do python plt.show(df.Category.value_counts(sort=False).plot.bar())
It shows this graph:

#

📎 unknown.png

#

so first it sorts it with the value counts descending

#

but how does it come up with the order of categories in this second one?

#

It's not the order in which they appear in the csv file

#

@lapis sequoia could you maybe help with this? It's also completely okay if you don't want to, I get that it takes a long time to help beginners like me started 😂

lapis sequoia Mar 10, 2020, 8:42 PM

#

I tried looking into how the sortargument was used but couldn't

silk frigate Mar 10, 2020, 8:43 PM

#

what do you mean?

#

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.sort_values.html

#

Can I use this function to sort my values from that series alphabetically?

lapis sequoia Mar 10, 2020, 8:44 PM

#

oh I looked at the wrong method

#

lol

#

oh man I read your line wrong and thought it was .bar(sort=False)

silk frigate Mar 10, 2020, 8:44 PM

#

ohh haha

#

that probably not possible in that function

lapis sequoia Mar 10, 2020, 8:48 PM

#

Ah I found it

#

So value_counts() uses something called a hashtable which means the ordering is essentially random

#

By default sort=True and so after counting the elements it will sort the result

silk frigate Mar 10, 2020, 8:49 PM

#

so when that's False, the order is completely random?

lapis sequoia Mar 10, 2020, 8:50 PM

#

essentially yes

silk frigate Mar 10, 2020, 8:50 PM

#

so is there a way to sort the values alphabetically?

lapis sequoia Mar 10, 2020, 8:51 PM

#

Sure. every series and dataframe has a column named index which as you can imagine is the index of every row

silk frigate Mar 10, 2020, 8:51 PM

#

Yes

#

it starts with 0

lapis sequoia Mar 10, 2020, 8:52 PM

#

So if you want to change the order of the plot, we have to reindex the series

silk frigate Mar 10, 2020, 8:52 PM

#

because the initial index was alphabetically?

lapis sequoia Mar 10, 2020, 8:53 PM

#

not alphabetically, in the documentation is says The resulting object will be in descending order so that the first element is the most frequently-occurring element

silk frigate Mar 10, 2020, 8:53 PM

#

yes

#

and it will probably create a new series with new indexes

#

so 'Coffee & Tea' has index number 1

#

or 0

#

since that's the most frequently-occuring element

lapis sequoia Mar 10, 2020, 8:57 PM

#

Yes so to change the ordering from most-frequent to alphabetical I would this:

category_value_counts = df.category.value_counts()
alphabetically_sorted_indices = sorted(category_value_counts.index)
category_value_counts.reindex(alphabetically_sorted_indices).plot.bar()
plt.show()

silk frigate Mar 10, 2020, 8:58 PM

#

how did u know to use .index?

#

I can't find a webpage that explains that

lapis sequoia Mar 10, 2020, 8:59 PM

#

Well I knew it already / used the documention. But if you google pandas change series index it should be first hit

#

oh you meant the .index

silk frigate Mar 10, 2020, 9:00 PM

#

ehh yes well both of them actually

#

with my question I meant the .index indeed but I'm confused by your 3rd line as well haha

#

at least I understand half of it

lapis sequoia Mar 10, 2020, 9:01 PM

#

so the same applies. I knew I could get the index by calling .index, but I also found it by googling pandas get index from series

#

I recommend you play around a bit. Try to print out the output of .index and the other methods

silk frigate Mar 10, 2020, 9:09 PM

#

alphabetically_sorted_indices = sorted((df.category.value_counts().index)

plt.show(df.alphabetically_sorted_indices.plot.bar())``` @lapis sequoia do you know why my python intepreter gives an `invalid syntax` error for the 2nd line?

lapis sequoia Mar 10, 2020, 9:44 PM

#

alphabetically_sorted_indices is just a normal python list

#

not a pandas.Series object

#

So .plot.bar() will not work

#

@silk frigate

silk frigate Mar 10, 2020, 9:46 PM

#

hmm okay well I'm working on another assignment now

#

I almost finished it

#

but I need to print 3 specific columns and now it prints all of them

#

Do you know how I can manage my code to print only 3 columns @lapis sequoia ?

#

healthy = df.loc[(df["Trans Fat"]==0) & (df["Cholesterol (% Daily Value)"]==0) & (df["Total Fat (% Daily Value)"]<=20) & (df["Sugars"]<=20)].sort_values('Calories')

print(healthy[(healthy['Category']!='Beverages') & (healthy['Category']!='Coffee & Tea')])```

#

I only want to print the columns 'Category', 'Item' and 'Calories'

#

it prints this now (all columns)

📎 unknown.png

#

but it should be this

📎 unknown.png

lapis sequoia Mar 10, 2020, 9:49 PM

#

df[['Category', 'Item', 'Calories']]

silk frigate Mar 10, 2020, 9:49 PM

#

inside of my first line or second line?

#

and why the double square brackets

lapis sequoia Mar 10, 2020, 9:49 PM

#

That's just an example of the syntax

#

use it on whatever dataframe you want. In your case Its probably on healthy

silk frigate Mar 10, 2020, 9:50 PM

#

I have difficulties with where to put such an argument inside of my function

#

So I have used df.loc to address specific columns

#

and I gave 4 arguments which need to be true

#

healthy = df.loc[(df["Trans Fat"]==0) & (df["Cholesterol (% Daily Value)"]==0) & (df["Total Fat (% Daily Value)"]<=20) & (df["Sugars"]<=20)].sort_values('Calories')
healthy2 = healthy['Category', 'Item', 'Calories']

print(healthy2[(healthy['Category']!='Beverages') & (healthy2['Category']!='Coffee & Tea')])```

#

I get an KeyError now

lapis sequoia Mar 10, 2020, 9:54 PM

#

double [

#

healthy2 = healthy[['Category', 'Item', 'Calories']]

silk frigate Mar 10, 2020, 9:54 PM

#

ahh yes it works

#

how did you know about the double brackets?

lapis sequoia Mar 10, 2020, 9:56 PM

#

I've worked with pandas before

#

Here is the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

lapis sequoia Mar 10, 2020, 10:21 PM

#

what does verbose mean? verbose: int, optional (default=0) Controls the verbosity when fitting and predicting.

oblique belfry Mar 10, 2020, 11:40 PM

#

It’s similar to logging level. As you run the fit method, it will log to the console the stats on each step of training.

lapis sequoia Mar 10, 2020, 11:43 PM

#

thanks

#

so if you would verbose=2 it would only log ever other or every third step?

#

why would anyone want to change that option? it's unnecessary basically, right?

oblique belfry Mar 11, 2020, 12:00 AM

#

For the Keras API, it is quite helpful.

#

You can read the docs for the functionality, but it’s like python logging levels (debug, info, error, warning, critical)

lapis sequoia Mar 11, 2020, 1:36 AM

#

thanks, i didn't find the explanation regarding verbose in random forest very helpful in the docs... but i don't have time to get too deep into it anyway. i really need to get this project done a.s.a.p

#

what would be a professional way of visualizing predictions vs actual values in a time series model, when there are like 100s of actual values and predictions to be considered for each day?
is there anything other than just comparing the RMSE or MAE with bar charts for different prediction models?

velvet thorn Mar 11, 2020, 1:41 AM

#

"professional" is a bit of a weird term

#

you could consider an interactive visualisation

#

or some sort of moving average

#

(don't @ me)

#

bar charts are like a one-stop comparison

#

you can use that as an intro

#

and go further into detail with other visualisations

#

there's also a middle ground

#

e.g. group by, say, month, and plot error bar charts

#

maybe plot the residuals

lapis sequoia Mar 11, 2020, 1:42 AM

#

mhh it's supposed to be in a paper, so interactive visualization is cool as i have to turn in the code, too.... but i'd prefer something that visualizes the results on paper

velvet thorn Mar 11, 2020, 1:43 AM

#

you're limited only by your creativity

lapis sequoia Mar 11, 2020, 1:44 AM

#

ok, so i did all the grouping stuff in the description part, to show what the data looks like etc

#

but for the results part, I'd like to have something more than just "here's the RMSEs for the different models, enjoy!" you know?

velvet thorn Mar 11, 2020, 1:47 AM

#

yeah, that's why I said

#

wait

#

what do you mean

#

ok, so i did all the grouping stuff in the description part, to show what the data looks like etc

lapis sequoia Mar 11, 2020, 1:48 AM

#

the problem i'm facing is, that the project is about delays that are predicted like 100s of times a day. So i can't just plot a time series on the x-axis and and predicted values on y, like when you predict a max temperature for a day or anything like that.

#

I could try to plot averages over all daily errors for each model-type (I'm comparing 3 model types) maybe... and print it against average actual errors (delay - schedule)

#

in a boxplot maybe?

velvet thorn Mar 11, 2020, 1:52 AM

#

yeah, moving average...

lapis sequoia Mar 11, 2020, 1:53 AM

#

what do you mean
in the description part, i showed how the actual delays are distributed for days of the week, hour of day etc.
sure, i could do the same with predictions, but goal is more like making good predictions in general... like for any day over the complete dataset of different trains etc

#

yeah, moving average...
mhh, can you elaborate? I'm considering a moving average a plot that takes previous average errors (let's say, for example, the past 4 days) into account when calculating "todays" error... how would that be an adequate solution?

#

boxplots would probably overlap like 90%. So plotting the average errors for any day for 3 models plus the actual delays from schedule and the baseline prediction (that i'm ultimately trying to beat) I'd probably end up with a very cluttered chart.

#

you're limited only by your creativity
haha, that could be the problem... guess my creativity is just really limited on this 😩 😂 despite having plotted lots of stuff in the description part, i just don't seem to find any good looking way to visualize the results. only bar chart for feature importances and bar chart for RMSE is kinda lame

velvet thorn Mar 11, 2020, 2:36 AM

#

okay I think I don't really understand what you're trying to do

#

which is why maybe my suggestions don't make sense

#

visualisation is quite an intimate art

lapis sequoia Mar 11, 2020, 2:46 AM

#

i don't know how to better explain what i'm trying to do :[ but I'll try

#

i have:

2. original baseline prediction (which is nothing but a linear shift of previous delays into the future)
3. predicted delays using linear regression
4. predicted delays using SVM
5. predicted delays using RF```
and I'd like to have some sort of visualization of the model performance results with high information content, other than just RMSE bar plots and residual plots and maybe feature importances and significance/t-stat in case of RF and lin reg, respectively

#

I'd very much like to incorporate the time-series aspect in one plot

burnt topaz Mar 11, 2020, 7:42 AM

#

hi data scientists of discord 😃, i have a question for you: is pandas and geopandas the same? or they have nothing to do with each other

summer plover Mar 11, 2020, 8:14 AM

#

geopandas builds on pandas but they are different. @burnt topaz

burnt topaz Mar 11, 2020, 8:15 AM

#

@summer plover oh okay

#

i have a problem on #help-coconut maybe you could help me with that

velvet thorn Mar 11, 2020, 8:27 AM

#

why can't you plot actual delays and error against time?

#

am I missing something

lapis sequoia Mar 11, 2020, 11:47 AM

#

@ gm how would you do it? i can't plot a line chart against time to compare the delays with predicted delays, because there is not just one delay per day, but hundreds or thousands actually.
So I could, if at all, do a scatter plot. But then again, the delays are in general all very similar... and plotting thousands of points on top of thousands of points in other colors doesn't help anybody to understand what's going on, right?

tough egret Mar 11, 2020, 1:39 PM

#

Could someone help me? I'm reading a 1.5gb file and when it arrives in 2kk lines it returns "ValueError: Length of values does not match length of index"

#

https://pastebin.com/cLrZhxqy

Pastebin

from datetime import date import numpy as np import pandas as p...

mild topaz Mar 11, 2020, 2:09 PM

#

Traceback (most recent call last):
  File "modeltest.py", line 26, in <module>
    model = load_model("E:/PanModel.model")
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\saving.py", line 492, in load_wrapper
    return load_function(*args, **kwargs)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\engine\saving.py", line 583, in load_model
    with H5Dict(filepath, mode='r') as h5dict:
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\keras\utils\io_utils.py", line 191, in __init__
    self.data = h5py.File(path, mode=mode)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\h5py\_hl\files.py", line 408, in __init__
    swmr=swmr)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python37\lib\site-packages\h5py\_hl\files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py\h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'E:/PanModel.model', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

echo monolith Mar 11, 2020, 5:54 PM

#

Please answer this

round crane Mar 11, 2020, 5:55 PM

#

try model = load_model("E:\\PanModel.model")

#

because windows sucks

echo monolith Mar 11, 2020, 5:55 PM

#

thisssssssssss

📎 Capture.JPG

oblique belfry Mar 11, 2020, 6:08 PM

#

I hate to be a dick, but this is where I just google what the ImageDataGenerator does. There are somethings I just don't remember.

jolly briar Mar 11, 2020, 6:09 PM

#

that's not being a dick - it's showing how you'd solve the problem

echo monolith Mar 11, 2020, 6:09 PM

#

oh I got that thing done... it does alll of it.. all options were to be checked

#

Anyways... could you help me with

#

What are the input and output shapes of an embedding layer with vocab_size = 1000 and embedding dimension = 25

oblique belfry Mar 11, 2020, 6:11 PM

#

Well, I try to not answer questions with just "google it", but that is how I got through grad school, so it works. lol

echo monolith Mar 11, 2020, 6:12 PM

#

ive been googling it past an hour... thanks anyways

oblique belfry Mar 11, 2020, 6:12 PM

#

Sometimes reading the documentation is the best answer I can give.

#

@echo monolith Going off of https://keras.io/layers/embeddings/, the answer is (batch_size, sequence_length, output_dim) where output_dim is the embedding dimension.

echo monolith Mar 11, 2020, 6:15 PM

#

thank you very muchhh

lapis sequoia Mar 11, 2020, 7:42 PM

#

Somebody there who is familiar with xarray and netCDF?

silent current Mar 11, 2020, 8:12 PM

#

I'm getting SystemError: <built-in function imread> returned NULL without setting an error

#

I'm passing a jpg file path to cv2.imread()

#

It was working a few minutes ago lol

#

literally didn't touch that portion of the code

eager heath Mar 11, 2020, 8:14 PM

#

Still using the same jpg file?

ripe forge Mar 11, 2020, 8:14 PM

#

Did the file poof? Is the file open? Did your computer turn against you?

silent current Mar 11, 2020, 8:15 PM

#

My computer has turned against me

ripe forge Mar 11, 2020, 8:15 PM

#

Abort abort!

silent current Mar 11, 2020, 8:15 PM

#

printing out the path right before the call to imread():

#

datasets\animals\cat\cats_00001.jpg

ripe forge Mar 11, 2020, 8:16 PM

#

Is the file still actually a proper image?

#

Try opening it outside python

silent current Mar 11, 2020, 8:16 PM

#

👌

📎 cats_00001.jpg

ripe forge Mar 11, 2020, 8:19 PM

#

Pretty. Hmm, I'm not sure. Try restarting python I guess

silent current Mar 11, 2020, 9:39 PM

#

¯_(ツ)_/¯

cosmic crater Mar 12, 2020, 3:21 AM

#

What are good Data Science Ideas for Beginners looking to build a portfolio? Do you have an suggestions for Environment or Finance related projects?

lapis sequoia Mar 12, 2020, 10:57 AM

#

I am interested in finding out the exactly same thing @cosmic crater

cosmic crater Mar 12, 2020, 10:59 AM

#

@lapis sequoia I would like projects to stick strictly to data analytics and statistics and not everything about machine learning for example for using linear algebra, differential calculus, practically anything about data science as I feel that's to advance right now.

What about you?

lapis sequoia Mar 12, 2020, 11:01 AM

#

Same, but I am also curious about machine learning @cosmic crater

jolly briar Mar 12, 2020, 11:37 AM

#

@cosmic crater being able to build a clean dataset from open data sources

shell mirage Mar 12, 2020, 11:45 AM

#

Hi,

I am not sure if this is the right channel to ask questions. If it’s not let me know.

So I have a file with X,Y,Z data which I managed to load.

I wanted to visualize it in a 2D plot with z as the color scale. Tried googling but nothing seems to be working from what I tried.

Any suggestions?

Thanks

cosmic crater Mar 12, 2020, 12:54 PM

#

@jolly briar Can you give example of open data sources. I think for me that's the first step. Finding Open Data Sources, I would like anything Environmental, Any-Science Based Sources, Finance, or Economic.

jolly briar Mar 12, 2020, 1:15 PM

#

@cosmic crater gov sites often have them

#

often buried in excel sheets et

#

*c

cosmic crater Mar 12, 2020, 2:19 PM

#

@jolly briar Thanks

jolly briar Mar 12, 2020, 2:21 PM

#

@cosmic crater search UK gov transport data or something

#

You'll bump into portals and there's loads of stuff

cosmic crater Mar 12, 2020, 2:24 PM

#

@jolly briar I'm going to search for U.S. based data since I'm from the United States. Unless it's climate change related, cuz I know the Trump Administration took down the data on climate change from the EPA and NOAA (I think).

jolly briar Mar 12, 2020, 2:26 PM

#

Whatevers clever 👍

nova nest Mar 12, 2020, 2:42 PM

#

Can anyone explain me the difference between MarkovChain and Word2Vec?? They did the same thing, group up words with closest context.

thin terrace Mar 12, 2020, 3:43 PM

#

Which models other than neural networks use gradient descent?

desert cradle Mar 12, 2020, 3:46 PM

#

evolutionary programming

#

maybe - i don't know much about this stuff tbh

deft harbor Mar 12, 2020, 5:51 PM

#

Im working with a keras multi-class classification problem. After fitting the model, how do I get the predictions? When I use model.predict, I'm not ending up with ones and zeros.

#

Is there another tool I should use?

thorny pasture Mar 12, 2020, 7:42 PM

#

How much memory do you think is needed for this field? I know another person was helping me the other day and said it doesn't matter if I do 16, 32, etc cause I will have to spin up a service no matter what when it gets big enough. I just wanted to confirm with some other Data Science people how they feel on that?

I'm looking at getting a 16" Macbook Pro, but I'm not sure what configuration. A lot of friends say 32GB, but they might just be thinking about themselves and what they do.

oblique belfry Mar 12, 2020, 8:03 PM

#

@deft harbor I assume the last layer in your model is a Softmax layer. Run an argmax on the output vector. It will tell you the class with the highest probability. That’s your output.

Note. If there are ten labels you want to predict, the output vector is either (10,) or (1, 10). Make sure argmax is working on the correct dimension.

deft harbor Mar 12, 2020, 8:04 PM

#

Thanks, I found that after a lot of digging.

#

Should have thought about it to be honest.

thorny pasture Mar 12, 2020, 10:20 PM

#

Anyone have input on my earlier question?

silent swan Mar 12, 2020, 10:28 PM

#

it depends on your usage

#

I don't do any compute on my laptop, but I'm always watching stuff in the background and have tons of tabs open

#

also pycharm eats memory

#

so I'm beyond stretching my 16gb

thorny pasture Mar 12, 2020, 10:31 PM

#

So you regret only buying 16?

#

@silent swan

silent swan Mar 12, 2020, 10:33 PM

#

yes but also my computer is 5 years old now

#

and I have permanent access to a cluster meaning I don't do any analysis on my laptop at all

thorny pasture Mar 12, 2020, 10:42 PM

#

I'm new, but how did you get permanent access to a cluster @silent swan

silent swan Mar 12, 2020, 10:42 PM

#

university researcher

thorny pasture Mar 12, 2020, 10:42 PM

#

Would you also then recommend I go for the 32GB over 16GB? On my personal PC for gaming I have 16GB and have maybe twice slowed down

#

Oh gotcha

silent swan Mar 12, 2020, 10:43 PM

#

really comes down to your own usage. I guess in my case most of what I'm using the memory for isn't really work related (except pycharm eating memory)

thorny pasture Mar 12, 2020, 10:44 PM

#

Normal usage would be fine, but alas I'm too new to say how much I'd use

#

But worst case I buy aws instance time

silent swan Mar 12, 2020, 10:44 PM

#

well monitor your own ram use now and see how much of a margin you have

thorny pasture Mar 12, 2020, 10:45 PM

#

generally speaking 50-60%

cinder prawn Mar 13, 2020, 3:14 AM

#

16 is most def good enough for data science

oblique belfry Mar 13, 2020, 5:01 AM

#

I have to give my team lead a write up of the ML lifecycle so we can make sure our project managers truly understand machine learning and how it will fit in our companies' workflow.

I'd appreciate it if you can give this a quick overview and let me know if there is anything I have forgotten that I need to add. Tried to really condense this down to easier talking. I could write an entire manifesto on the machine learning lifecycle.

https://docs.google.com/document/d/1slmpunUPAjR_bC8G4LjE8x6bKN9iijrI3SVeE4vMtpU/edit?usp=sharing

Thanks.

Google Docs

Machine Learning Lifecycle

Machine Learning Lifecycle The ML lifecycle is a continuous feedback loop; it is not a sequential operation. You have: Defining Business Objectives Data Acquisition Modeling Auditing Deployment Monitoring The business objectives define the data. The data define the model. The ...

thin terrace Mar 13, 2020, 9:01 AM

#

Which metric of accuracy, precision, recall, f1 and auc should I look at when tuning the hyperparameters of my model? I have an imbalanced dataset 4:1 ratio but I do resample the training data with SMOTEENN to make it balanced, however the test/validation remains imbalanced as it should not be resampled.

royal lodge Mar 13, 2020, 11:28 AM

#

Hi question about gridsearchCV

I thought grid.predict should behave as the predict of the best estimator, but it gives a different output (see first 3 columns)

📎 Screen_Shot_2020-03-13_at_7.25.20_PM.png

#

I'm thinking it's because the estimator stored in grid was only fitted with 90% of the dataset (cv=10). Is this assumption correct?

oblique belfry Mar 13, 2020, 12:58 PM

#

@thin terrace A choice of metric(s) will be influenced by the business objectives.

However, keep an eye on the validation loss as well. You want to minimize the loss as you optimize for hyperparams.

thin terrace Mar 13, 2020, 1:01 PM

#

@oblique belfry business metrics? Why is minimizing loss important and how important is it compared to the mentioned metrics?

oblique belfry Mar 13, 2020, 1:01 PM

#

I meant "business objectives." I corrected that mistake.

#

I have personally chased down metrics to be where I wanted them without realizing the loss was going up. I manipulated the data and the model to get me the metric that I wanted without realizing it was learning less.

thin terrace Mar 13, 2020, 1:10 PM

#

So I'm classifying the default credit card tabular dataset (binary).

oblique belfry Mar 13, 2020, 1:10 PM

#

I have a similar skepticism of p-values. One can "p-hack" all day, but does that mean that their model(s) is/are affective?

Just try to have a holistic view on everything.

#

Is this for practice or for your business?

thin terrace Mar 13, 2020, 1:11 PM

#

just a part of a ML experiment im running

#

Trying to learn some hyperparam optimization

oblique belfry Mar 13, 2020, 1:12 PM

#

Ah....

#

I probably a bit overkill with my advice.

thin terrace Mar 13, 2020, 1:13 PM

#

I'm basically doing random grid search and I compile the metrics for each param-setting before I go on and actually use the best model

#

Now I just need to know which metric to look for when deciding the best one

oblique belfry Mar 13, 2020, 1:15 PM

#

I'd watch the true positive / false positive rates (which are captured by precision/recall but I can't remember which.) For fraud, the cost of classifying something okay that is not fraud is more costly than not over identifying transactions of fraud. I'd go for a large true positive - false positive gap. But, that is just my approach to it.

#

I'd go with this approach because you are look how the model does on each class, and not in it's entirety.

#

If you have a 4/1 split of non-fraud/fraud data and you get 80% accuracy, this might seem great. But if you look at precision/recall, you might see that you labeled 99% of the non-fraud labels correctly, and got 1% of the fraud labels correct. Obviously, this is not good.

thin terrace Mar 13, 2020, 1:20 PM

#

Yes I'm aware of that part

#

However, this dataset does not classify frauds

#

it classifies whether a clients credit card will be defaulted next month or not

#

Maybe I should just go for f1-score?

oblique belfry Mar 13, 2020, 1:27 PM

#

Once again, another assumption. My bad. But the logic is still sound.

#

yeah. F1 seems like a good start. Should get you where you need to go.

This wiki might be overkill, but definitely helped me get that there were more metrics than just the normal accuracy. https://en.wikipedia.org/wiki/Sensitivity_and_specificity

Sensitivity and specificity

Sensitivity and specificity are statistical measures of the performance of a binary classification test, also known in statistics as a classification function, that are widely used in medicine:

Sensitivity (also called the true positive rate, the recall, or probability of det...

#

What library are you using to train?

thin terrace Mar 13, 2020, 1:34 PM

#

keras

oblique belfry Mar 13, 2020, 1:35 PM

#

There should be an F1 metric callback. Probably in the tf.keras.metrics in tensorflow 2.x.

thin terrace Mar 13, 2020, 1:37 PM

#

didnt keras remove their metrics because it's misleading to use during training?

#

https://github.com/keras-team/keras/wiki/Keras-2.0-release-notes

GitHub

keras-team/keras

Deep Learning for humans. Contribute to keras-team/keras development by creating an account on GitHub.

oblique belfry Mar 13, 2020, 1:38 PM

#

So....I know keras does not have metrics. tf.keras does.

#

https://www.tensorflow.org/api_docs/python/tf/keras/metrics

TensorFlow

Module: tf.keras.metrics | TensorFlow Core v2.1.0

thin terrace Mar 13, 2020, 1:39 PM

#

"Basically these are all global metrics that were approximated
batch-wise, which is more misleading than helpful. This was mentioned in
the docs but it's much cleaner to remove them altogether. It was a mistake
to merge them in the first place."

#

As I understand it, they should only be calculated once training is done?

oblique belfry Mar 13, 2020, 1:40 PM

#

Lol.

So....you bring up a good point. I had to modify my own keras ones to be stateful. for my old work. I think the tf.keras.metrics does this automatically.

#

So to have less work, probably ought to do it at the end. lol Will def be easier for you.

#

Thank goodness scikit-learn has nice helper functions for F1 score and Confusion Matrices. You can def use that.

thin terrace Mar 13, 2020, 1:42 PM

#

yeah, atm I calculate the metrics after the training at each fold of my 10-fold cross-validation, then I take the average of all 10 folds

#

using sklearn yes

#

then I get tables like this one

📎 unknown.png

oblique belfry Mar 13, 2020, 1:43 PM

#

I laugh because I learned that metrics thing the hard way and spent MANY hours reading Keras source code to get what I wanted to correctly.

thin terrace Mar 13, 2020, 1:43 PM

#

where each row is the metrics for a setting of parameters

lapis sequoia Mar 13, 2020, 5:40 PM

#

can anybody recommend some brief but useful NN tutorials? xD

#

i need to implement emoji2vec into my project and realising I have a limited timeframe to complete it

lapis sequoia Mar 13, 2020, 8:00 PM

#

What does this exactly mean?

#

1 threads and 100 connections

ripe forge Mar 14, 2020, 12:39 AM

#

Without context, tough to say

#

If I had to guess, it means a server that can only handle 1 request at a time, since it only runs on 1 thread. And then there's like 100 connection requests.

lapis sequoia Mar 14, 2020, 4:30 AM

#

I need someone to explain this to me so it'll stick

#

I want to understand why I need to include continent in the SELECT here

#

intuitively it makes sense, but I want to remember for the long run

#

SELECT continent, max(women_in_parliament)
FROM countries
GROUP BY continent
ORDER BY continent

velvet thorn Mar 14, 2020, 5:14 AM

#

when you group by a column and aggregate, you get one aggregated value for every unique value in the column.

#

max(women_in_parliament) is the aggregated value...

#

...and continent is the unique value it corresponds to.

lapis sequoia Mar 14, 2020, 8:01 AM

#

thanks.. now I can remember this

slow yew Mar 14, 2020, 3:17 PM

#

https://youtu.be/t1ViDz0MnJE

YouTube

Matt Jennings

Neural Network Learns To Play Club Penguin with Genetic Evolution!

Here, we teach a neural network to learn to play that classic Ice Fishing game from Club Penguin!

We first try using a fully-connected recurrent network with LSTM nodes, trained on human gameplay. We then use the NEAT genetic algorithm to evolve the neural connections from s...

▶ Play video

jaunty basin Mar 14, 2020, 3:31 PM

#

14 year old trying to figure out how i could get a p-value from quantum random numbers with a range of -2, and 2 in python. if someone can help me that would be much appreciated. my goal here is to see if consciousness intent has any sway over quantum random numbers. kinda like what this university is doing. http://noosphere.princeton.edu/
add me on discord: leyland124#3364

The Global Consciousness Project

The Global Consciousness Project, home page, scientific research network studying global consciousness

#

https://www.cia.gov/library/readingroom/docs/CIA-RDP96-00789R002200520001-0.pdf

lapis sequoia Mar 14, 2020, 6:07 PM

#

anyone knows how to print predictions on an SVM in a for loop?

#

import pandas as pd
from sklearn.model_selection import train_test_split as tts
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix
import numpy as np
import matplotlib.pyplot as plt

dataset = pd.read_excel('C:\\Users\\GentB\\OneDrive\\Documents\\Python\\2020\\FootballPredictions.py\\data.xlsx', 
                         sheet_name='Dataset')
dataset = dataset.head(500)

X = dataset.drop('Result', axis=1)
y = dataset['Result']

X_train, X_test, y_train, y_test = tts(X, y, test_size = 0.20)

svclassifier = SVC(kernel='linear')
svclassifier.fit(X_train, y_train)

y_pred = svclassifier.predict(X_test)

print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

#

the output I get is only the accuracy

 [ 1 39]]
              precision    recall  f1-score   support

          -1       0.98      0.98      0.98        60
           1       0.97      0.97      0.97        40

    accuracy                           0.98       100
   macro avg       0.98      0.98      0.98       100
weighted avg       0.98      0.98      0.98       100

hollow shard Mar 14, 2020, 7:41 PM

#

Does anyone know why this function with the guvectorize target set to cpu works, but when I set it to cuda, it gives me this error:

Invalid use of Function(<built-in function sub>) with argument(s) of type(s): (array(float64, 1d, A), array(float64, 1d, A))
Known signatures:
 * (int64, int64) -> int64

[The known signatures list goes on for a bit but you get the point]
Code:

@guvectorize([(float64[:], float64[:], float64, float64, float64[:])], "(n),(n),(),() -> (n)", nopython=True, target="cuda")
def calc1(a, b, g, m, out):
    vec = a-b
    r = ((a[0]-b[0])**2+(a[1]-b[1])**2)**0.5
    out = m*g*vec/(r*r*r)

#

I think I shouldn't be getting, as the two numbers I'm subtracting are float64, which is supported like it says in the error

#

Thought you guys might know since this is a numba question

paper niche Mar 15, 2020, 3:54 AM

#

@lapis sequoia what do you mean? just print(y_pred)?

lapis sequoia Mar 15, 2020, 12:08 PM

#

@paper niche yeah figured it out later on

wanton wasp Mar 15, 2020, 7:55 PM

#

I have a question if anybody can help me...
Im working on a project and i want to gather twitter data. Now i want to refrain from using the twitter API and i stumbled upon a module on github called twint.
Its more or less perfect for what i need but i always get an error after about some 8000 scrapped tweets.
Does anybody know any other way of going about this?

silent current Mar 15, 2020, 8:09 PM

#

So sklearn.KNeighborsClassifier has parameters for both metric and p. I'd like to test my model using the manhattan metric (which I was always under the impression implied p=1), and the euclidean metric (again, I assumed this implied p=2...). Do I need to be changing both parameters?

velvet thorn Mar 15, 2020, 11:41 PM

#

no

#

you don't

#

just change either

lapis sequoia Mar 16, 2020, 7:22 AM

#

I need some advice calculating vCPUs

#

can someone help me calculate, I'm not familiar with how to read machine sizing

granite steppe Mar 16, 2020, 10:57 AM

#

so i have found these MIT courses for linear algebra and single variable caluclus

#

https://ocw.mit.edu/courses/mathematics/18-06sc-linear-algebra-fall-2011/index.htm

MIT OpenCourseWare

Linear Algebra

This course covers matrix theory and linear algebra, emphasizing topics useful in other disciplines such as physics, economics and social sciences, natural sciences, and engineering. It parallels the combination of theory and applications in Professor Strang’s textbook Introdu...

#

https://ocw.mit.edu/courses/mathematics/18-01sc-single-variable-calculus-fall-2010/index.htm

MIT OpenCourseWare

Single Variable Calculus

This calculus course covers differentiation and integration of functions of one variable, and concludes with a brief discussion of infinite series. Calculus is fundamental to many scientific disciplines including physics, engineering, and economics.

#

was wondering is this enough for understanding basic level of linear algebra and calculus or sud i look for more resources regarding these topics ?

#

im trying to revise on the maths needed for data science ....

crimson flame Mar 16, 2020, 1:39 PM

#

should also learn some multivariable calculus

#

but it's mostly the easy stuff there that's useful (partial derivatives, directional derivatives, gradient, etc)

jolly briar Mar 16, 2020, 2:31 PM

#

@granite steppe most people who've been through uni won't remember most of their courses

granite steppe Mar 16, 2020, 2:33 PM

#

sounds fair enough..thnx for the info @crimson flame @jolly briar

crimson flame Mar 16, 2020, 2:52 PM

#

and probability/stats

small tartan Mar 17, 2020, 2:06 AM

#

Hey guys! I need a push in the right direction

#

I have 2 tables:

#

📎 unknown.png

#

Table2:

📎 unknown.png

#

This is just proof of concept. will be built in sql and displayed in tableau

#

Table1 is by week. Table2 is by month

#

This is content usage data and quota attainment with dummy data

#

I need to perform some kind of operation to get the desired outcome.

#

Desired outcome: Sort content by 'best' to 'worst' judging by the quota that was attained during its months usage.

#

Any ideas or directions to research? I'm kind of at a standstill and its late in the day so my head is not worth much

real wigeon Mar 17, 2020, 2:51 AM

#

heck I just wanna know how to set a condition for if the first cell in a a specific row then do this

velvet thorn Mar 17, 2020, 2:52 AM

#

cell, meaning Excel?

real wigeon Mar 17, 2020, 2:52 AM

#

yea

#

iloc?

velvet thorn Mar 17, 2020, 2:52 AM

#

huh?

#

didn't you say Excel?

real wigeon Mar 17, 2020, 2:53 AM

#

pandas df

#

should I use iloc?

velvet thorn Mar 17, 2020, 2:53 AM

#

so you mean reading a spreadsheet into a pandas dataframe

#

and working with it with the pandas API?

real wigeon Mar 17, 2020, 2:53 AM

#

ya

velvet thorn Mar 17, 2020, 2:53 AM

#

and not openpyxl?

real wigeon Mar 17, 2020, 2:53 AM

#

correct

velvet thorn Mar 17, 2020, 2:53 AM

#

okay, then that's not Excel, that's pandas

#

anyway

#

how do you identify the row?

real wigeon Mar 17, 2020, 2:53 AM

#

I can show my code

#

!paste

arctic wedgeBOT Mar 17, 2020, 2:54 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

real wigeon Mar 17, 2020, 2:54 AM

#

https://paste.pythondiscord.com/zedewihilo.py

velvet thorn Mar 17, 2020, 2:55 AM

#

so what do you want to do?

real wigeon Mar 17, 2020, 2:55 AM

#

lines 13 - 20

#

basically my script spits out some strings into a .txt file

#

and i ned to do certain things depending on the type of information in certain columns from the df

#

at the start of the df I need to print a certain set of strings

#

and I don't know how to articulate that into python code

#

so if on index 1 I guess?

#

does that make any sense?

velvet thorn Mar 17, 2020, 2:58 AM

#

a little...?

#

so basically

#

that's this line, right

#

if branch == df.iloc[0,0]:

real wigeon Mar 17, 2020, 2:59 AM

#

yeah

velvet thorn Mar 17, 2020, 2:59 AM

#

which is the first cell of the first row

real wigeon Mar 17, 2020, 2:59 AM

#

I believe

#

IDK if that includes the header though?

#

but either way i need to make it so that on the first row/cell adpo_x_syntax saves a set of instructions

#

that script has 3 sets of strings depending on conditions, but I only need help with what I just wrote

velvet thorn Mar 17, 2020, 3:01 AM

#

uh...

#

why don't you just if on i?

real wigeon Mar 17, 2020, 3:02 AM

#

I actually tried that as I was explaining this to you

#

            if i == 1:
                adpo_x_syntax = [
                    'Key tab',
                    'Type ' + str(buyer),
                    'Type ' + str(int(branch)),
                    'Type ' + str(int(vendor)),
                    'Key Enter',
                ]```

velvet thorn Mar 17, 2020, 3:02 AM

#

also I don't really see why you wouldn't use itertuples, too

real wigeon Mar 17, 2020, 3:02 AM

#

uh because the df reuses branch values

velvet thorn Mar 17, 2020, 3:03 AM

#

also, that should be 0, not 1

#

unless your index specifically starts from 1

#

for the general case you can just use if i == df.index[0]

real wigeon Mar 17, 2020, 3:03 AM

#

well is index 0 the header

#

?

velvet thorn Mar 17, 2020, 3:06 AM

#

depends on what you mean by "header" and how your file is formattedf

real wigeon Mar 17, 2020, 3:06 AM

#

it's an xls

#

and it has a header

#

and it's still not printing the right condition

velvet thorn Mar 17, 2020, 3:07 AM

#

so you mean the actual Excel header

real wigeon Mar 17, 2020, 3:08 AM

#

I'm asking if df.index[0] corresponds to the header row, because my data starts in the 2nd row

velvet thorn Mar 17, 2020, 3:08 AM

#

wait.

#

📎 unknown.png

#

do you mean the actual HEADER, or the first row which happens to contain the column names?

real wigeon Mar 17, 2020, 3:09 AM

#

row which contains the column names

velvet thorn Mar 17, 2020, 3:09 AM

#

okay, then that's not a header

#

it has a specific meaning in Excel

#

anyway, by default, the first row will be parsed as column names.

#

and therefore not part of the data

#

therefore the row you access with .iloc[0] refers to the second row in the Excel spreadsheet

real wigeon Mar 17, 2020, 3:10 AM

#

I see

velvet thorn Mar 17, 2020, 3:10 AM

#

try playing with it in an interactive session

#

it'll be easier for you to understand

real wigeon Mar 17, 2020, 3:10 AM

#

I can conceptualize it

#

but regardless running a if i == df.index[0]: still doesn't print the right set

#

it is printing the lines 49 and greater

velvet thorn Mar 17, 2020, 3:12 AM

#

print both values and see what you get

real wigeon Mar 17, 2020, 3:13 AM

#

please be more specific, which values

#

im referring to the variable adpo_x_syntax

velvet thorn Mar 17, 2020, 3:14 AM

#

i

#

actually, just play with the dataframe interactively

#

this should be pretty simple to debug if you actually have the data...?

#

like

#

print(list(df.iterrows())[0])

real wigeon Mar 17, 2020, 3:16 AM

#

?

#

you're asking me to print the i of an else condition

velvet thorn Mar 17, 2020, 3:18 AM

#

no

#

in the for loop

#

like

#

print(list(df.iterrows())[0][0]) this gives you the first value of i

#

compare that with df.index[0]

real wigeon Mar 17, 2020, 3:22 AM

#

with open('excel_po.txt', 'a+') as f:
        for i, row in df.iterrows():
            branch, item, distro_size, delivery_date, buyer, vendor = row
            print(df.index[0])
            print("---------")
            print(list(df.iterrows())[0][0])
            if pd.isnull(branch):```

#

0
---------
0
0
0
---------
0
1
0
---------
0
2
0
---------
0
3
0
---------```

#

so on and so forth

velvet thorn Mar 17, 2020, 3:25 AM

#

uh

#

your if conditions are wrong

#

I think you want an elif in the middle

#

also I actually meant to have those print statements outside, since the input doesn't change

real wigeon Mar 17, 2020, 3:26 AM

#

outside the?

velvet thorn Mar 17, 2020, 3:26 AM

#

loop

#

doesn't matter

real wigeon Mar 17, 2020, 3:27 AM

#

well df.index[0] corresponds to 0

#

and

#

print(list(df.iterrows())[0][0])

#

the first value is ``0

velvet thorn Mar 17, 2020, 3:28 AM

#

uh

#

the point is

#

they don't change

#

df.index[0] and list(df.iterrows())[0][0] are constants.

#

but anyway

#

like I said

#

you have if followed by an if-else

#

and your conditions are such that

#

the else will trigger if the if does, from what I see

#

if i == 0:

and then:

if branch != df.iloc[i - 1, 0] and i != 0:
    ...
else:
    ...

logically, therefore, the else will trigger if at least one of those conditions is not true

#

i.e. it will trigger in all cases where i == 0

#

since you use the same variable in all branches, the value in it will be overwritten

real wigeon Mar 17, 2020, 3:32 AM

#

https://paste.pythondiscord.com/uyusuxuloy.py

#

i mean i had if pd.isnull(branch):

#

if i == df.index[1]:

#

elif branch != df.iloc[i - 1, 0] and i != 0:

#

else:

real wigeon Mar 17, 2020, 4:04 AM

#

i feel like you are mistaken, these aren't nested

serene scaffold Mar 17, 2020, 4:49 AM

#

Are there any O'Reilly books that go into word embeddedings? I decided to write a paper on them for my linear algebra class and I want to write some pseudocode (which will probably just be Python) in the paper for how they're created and used mathematically.

velvet thorn Mar 17, 2020, 5:18 AM

#

yes, precisely, they aren't nested

#

in the first one

#

and in the second one you're comparing to df.index[1]

#

which is the second row

oblique belfry Mar 17, 2020, 2:50 PM

#

Has anyone successfully integrated Metaflow and DVC together?

flint hamlet Mar 17, 2020, 4:06 PM

#

Hey anyone here that visualized WhatsApp Chat Data with Python? A few weeks ago I tried it but had some problems. I wanna try it again later and I was curious if anyones willing to help me out with it when the problems occur again.

#

Not sure if I should ask it here or in any of the help channels. Just tell me when I'm doing something wrong

brazen canyon Mar 17, 2020, 4:06 PM

#

I'm interested. I'm not so much of a data science guru tho🧐

flint hamlet Mar 17, 2020, 4:07 PM

#

Cool. Mind if I add or dm you so I remember you later this evening? I still have to finish some stuff so I might do it in a few hours or tomorrow. @brazen canyon

brazen canyon Mar 17, 2020, 4:08 PM

#

I don't mind. Feel free to dm or add me

thorny pasture Mar 17, 2020, 5:30 PM

#

What editors do you guys use? Someone recommended me Anaconda, but I'm pretty sure that's super bloated no?

oblique belfry Mar 17, 2020, 5:37 PM

#

VSCode.

ripe forge Mar 17, 2020, 5:40 PM

#

i like my anaconda. anaconda isnt an editor though

#

anaconda is a "batteries included" approach to python/data science. comes with a lot of goodies. So, it depends on whether you want that or not

#

(the editors anaconda would ship with would be jupyter notebook and spyder)

thorny pasture Mar 17, 2020, 5:51 PM

#

@ripe forge @oblique belfry The guy is saying he personally uses a Text Editor and a separate IPython REPL. Do you need a REPL, is there a reason for it?

ripe forge Mar 17, 2020, 5:52 PM

#

that's where python runs

#

without a place to run python, you're basically writing in the equivalent of a notepad or word document

#

(with some fancy features. ) 😛

#

anyways when i was learning this stuff, i personally liked using things that didn't make me worry about these kinds of details

oblique belfry Mar 17, 2020, 5:53 PM

#

Ah. I use VSCode for most things. If I need to interact with stuff, I'll use IPython.

I like Jupyter/IPython, but I tend to just run everything as scipts.

thorny pasture Mar 17, 2020, 5:53 PM

#

Im so confused

ripe forge Mar 17, 2020, 5:54 PM

#

if you're confused, download ONE thing

#

draw a chit, whatever.

thorny pasture Mar 17, 2020, 5:54 PM

#

Why do you need multiple really though

ripe forge Mar 17, 2020, 5:54 PM

#

they all work.

#

don't ask that question right now 😛

thorny pasture Mar 17, 2020, 5:54 PM

#

no like repl and editor

oblique belfry Mar 17, 2020, 5:54 PM

#

If I need to make plots or visualize images, then I'll use Jupyter.

#

Ah.

thorny pasture Mar 17, 2020, 5:54 PM

#

For instance I run the code on Sublime and get

📎 unknown.png

#

Nothing wrong with viewing it here

ripe forge Mar 17, 2020, 5:55 PM

#

then you're "running python" behind the scenes

thorny pasture Mar 17, 2020, 5:55 PM

#

behind the scenes?

#

from bs4 import BeautifulSoup as Soup
import requests
from pandas import DataFrame

ffc_response = requests.get(
    "https://fantasyfootballcalculator.com/adp/ppr/12-team/all/2017"
)


adp_soup = Soup(ffc_response.text, "html.parser")

# adp_soup is a nested tag, so call find_all on it

tables = adp_soup.find_all("table")

# find_all always returns a list, even if there's only one element, which is
# the case here
len(tables)

# get the adp table out of it
adp_table = tables[0]

# adp_table another nested tag, so call find_all again
rows = adp_table.find_all("tr")

# this is a header row
rows[0]

# data rows
first_data_row = rows[1]
first_data_row

# get columns from first_data_row
first_data_row.find_all("td")

# comprehension to get raw data out -- each x is simple tag
[str(x.string) for x in first_data_row.find_all("td")]

# put it in a function
def parse_row(row):
    """
    Take in a tr tag and get the data out of it in the form of a list of
    strings.
    """
    return [str(x.string) for x in row.find_all("td")]


# call function
list_of_parsed_rows = [parse_row(row) for row in rows[1:]]

# put it in a dataframe
df = DataFrame(list_of_parsed_rows)
df.head()

# clean up formatting
df.columns = [
    "ovr",
    "pick",
    "name",
    "pos",
    "team",
    "adp",
    "std_dev",
    "high",
    "low",
    "drafted",
    "graph",
]

float_cols = ["adp", "std_dev"]
int_cols = ["ovr", "drafted"]

df[float_cols] = df[float_cols].astype(float)
df[int_cols] = df[int_cols].astype(int)

df.drop("graph", axis=1, inplace=True)

# done
print(df.head())

oblique belfry Mar 17, 2020, 5:55 PM

#

I read this question wrong. So sorry to give a confusing answer.

ripe forge Mar 17, 2020, 5:55 PM

#

when you invoke python, it does and goes it's running and stuff, and then just throws you the output back

#

aka, you already actually have a REPL

#

so yeah...you dont need anything else 😛

#

ipython is nice though

thorny pasture Mar 17, 2020, 5:56 PM

#

what for exactly?

ripe forge Mar 17, 2020, 5:56 PM

#

so, the big power of python comes when you dont just run it as a script

#

but rather run it interactively

#

ipython does wonders when you're trying to run stuff back and forth

thorny pasture Mar 17, 2020, 5:57 PM

#

back and forth

#

?

ripe forge Mar 17, 2020, 5:57 PM

#

(as in, imagine running just first 3 lines of your program, getting the output, then continuing to work, writing couple more lines, but selecting and choosing whatever you really want to run)

#

it's one of the things i loved most about python when paired with a good IDE

thorny pasture Mar 17, 2020, 5:57 PM

#

Sounds very weird!

ripe forge Mar 17, 2020, 5:57 PM

#

it IS

#

and you'll get the outputs wrong so many times initially

thorny pasture Mar 17, 2020, 5:58 PM

#

DS sounds so complex haha

ripe forge Mar 17, 2020, 5:58 PM

#

but there's just a charm of just, you know..instantly selecting a variable, and running it in REPL, and it spits out it's value

#

without having to rerun the whole script

#

you can even run code out of order. not recommended initially, at all!

#

it lets you essentially "Experiment" with writing the logic of the code, and quickly running just that line

#

leads to some insane boost in productivity once you get used to it

#

if it all sounds like hand wavey and fancy, don't worry, it's probably meant to be hand wavey and fancy. just use python any way you prefer.

thorny pasture Mar 17, 2020, 6:00 PM

#

📎 unknown.png

#

Thats what im told from someone else

#

lol

ripe forge Mar 17, 2020, 6:00 PM

#

mhm

#

opinions everywhere

thorny pasture Mar 17, 2020, 6:00 PM

#

yeah.

ripe forge Mar 17, 2020, 6:00 PM

#

fwiw, i give vs code full points too. it's not bad at all

#

just, my personal first choice is spyder still. somehow vs code makes me feel "cramped"

thorny pasture Mar 17, 2020, 6:01 PM

#

conflicted as hell

ripe forge Mar 17, 2020, 6:01 PM

#

(and in terms of simply market share on IDE, actually pycharm is on top. but again, pick whatever. they all do the same thing)

#

literally, pick one at random.

#

not like your choice is locked for life 😛

thorny pasture Mar 17, 2020, 6:02 PM

#

but its like totally different anaconda has that spyder thing and code one side

ripe forge Mar 17, 2020, 6:02 PM

#

anaconda is a pretty painless introduction to python on windows imo

thorny pasture Mar 17, 2020, 6:03 PM

#

Well I know how to install pandas and whatnot

#

and I have a Mac as well

ripe forge Mar 17, 2020, 6:03 PM

#

cool. in that case, pick whatever!

thorny pasture Mar 17, 2020, 6:04 PM

#

is the only reason to use anaconda cause it installs pandas and numpy, etc

ripe forge Mar 17, 2020, 6:04 PM

#

hmm

#

well, there's the conda environment/package manager as well

#

also the fact that it gives everything you need out of the box i suppose

#

those really are the big things. you can achieve the same kind of setup if you like without anaconda too.

#

tohugh, the dependency resolution of conda packages is pretty amazing

#

makes installing some stuff a breeze, that would have been a pain to manage manually

#

geopandas and tensorflow come to mind. though i believe tensorflow fixed their issues and now pip install works just fine too

#

(also, not to mention, you can have anaconda, and then use vscode or some other editor too)

oblique belfry Mar 17, 2020, 6:20 PM

#

I personally am not a fan of Conda package management.

#

I start to use repls when I start adding a bunch of print statements everywhere.

thorny pasture Mar 17, 2020, 7:00 PM

#

What do you use as a REPL

#

@oblique belfry

#

I see VS Code has a REPL like Jupyter inside it

oblique belfry Mar 17, 2020, 7:06 PM

#

A mix between IPython and Jupyter.

thorny pasture Mar 17, 2020, 7:06 PM

#

Why a mix?

oblique belfry Mar 17, 2020, 7:08 PM

#

If I need to check the functionality of something quickly, I will use IPython3. But if I am doing some sort of exploratory data analysis, then I will spin up Jupyter.

#

Most of the time, I am ssh-ing into servers, and I don't feel like setting up Jupyter.

thorny pasture Mar 17, 2020, 7:09 PM

#

So you do all the actual coding in vscode or another and then sometimes you open up a IPython Notebook like the web based ones?

oblique belfry Mar 17, 2020, 7:11 PM

#

If I am doing any type of data analysis, I will use Jupyter. Then I will move to vscode as I get more familiar with the data.

If there are some functions or classes I want to quickly test, I will open up ipython.

thorny pasture Mar 17, 2020, 7:11 PM

#

So you dont start in text editor you start in jupyter

#

What makes you not use Anaconda Tony?

#

And instead do it how you mentioned

oblique belfry Mar 17, 2020, 7:14 PM

#

It depends on what I am doing.

I'd say I spend 85% of my time in vscode, 10% in ipython, and 5% in Jupyter.

#

It is probably a lot simpler now, but when I first tired all the anaconda stack 2-3 years ago, it was a pain. I could create a virtualenv and just use pip just as easily.

thorny pasture Mar 17, 2020, 7:16 PM

#

create a virtualenv?

#

I have like no DS experience whatsoever, what is that needed for?

#

I agree with pip though

oblique belfry Mar 17, 2020, 7:19 PM

#

Do you have much experience with Python? Virtual environments are a way to make sure you have the correct dependencies per project. Instead of installing everything in the global python environment, you can install the dependencies you need per project in this "virtual environment".

thorny pasture Mar 17, 2020, 7:21 PM

#

I'm new to Python as a whole really

oblique belfry Mar 17, 2020, 7:22 PM

#

Have you used any programming languages?

thorny pasture Mar 17, 2020, 7:22 PM

#

C#, JS

oblique belfry Mar 17, 2020, 7:23 PM

#

Some people will disagree with this, but virtual environments are akin to local node_modules folders. Instead of installing a node package globally, you can install it per project.

thorny pasture Mar 17, 2020, 7:23 PM

#

Basic JS*

#

lol

#

Why not install globally though?

oblique belfry Mar 17, 2020, 7:27 PM

#

Good question. Some packages require certain versions of a dependency. Package A may require version 1.13 of numpy. Package B may require version 1.14 or greater. Clearly, an issue will arise.

#

Package A's dependencies are incompatible with Package B's.

#

Well, if each package had a local dev environment where they can run any version of the dependency, then you wouldn't have this issue.

thorny pasture Mar 17, 2020, 7:29 PM

#

So Anaconda you don't need to do virtual env?

oblique belfry Mar 17, 2020, 7:29 PM

#

If you were using Docker or something to deploy a project, then you mighty not want to. Since the container is single purpose, then you won't have these issues. But my desktop is multi-purpose, thus I will run into issues with this.

#

Anaconda does something similar to virtual environments. They accomplish the same thing. conda can make sure whatever project you are in has the correct dependencies.

#

Now, conda can do MORE than that. But, that is a quick breakdown of my take on it.

thorny pasture Mar 17, 2020, 7:33 PM

#

Nothing's easy to get into haha

oblique belfry Mar 17, 2020, 7:33 PM

#

It's easier than you think. I promise. lol

thorny pasture Mar 17, 2020, 7:34 PM

#

IPython says it's Jupyter

#

But you said you have both IPython and Jupyter lol

oblique belfry Mar 17, 2020, 7:35 PM

#

Jupyter use ipython internally.

Ipython is a repl that runs in the shell.
Jupyter uses Ipython, but runs in the browser.

#

*actually, Jupyter is a web UI, and sends the data/commands/whatever to the ipython kernel.

thorny pasture Mar 17, 2020, 7:36 PM

#

so you downloaded a seperate application IPython or do you use vscode interactive?

#

when you say Jupyter you mean Jupyter lab as well right?

#

https://jupyter.org/try I see Classic and Lab which says it's newer

Project Jupyter

The Jupyter Notebook is a web-based interactive computing platform. The notebook combines live code, equations, narrative text, visualizations, interactive dashboards and other media.

oblique belfry Mar 17, 2020, 7:38 PM

#

For the purpose of this discussion, yes. There is a difference, but not for this discussion.

thorny pasture Mar 17, 2020, 7:38 PM

#

<ipython-input-1-4cbb279a3e44> in <module>
----> 1 from bs4 import BeautifulSoup as Soup
2 import requests
3 from pandas import DataFrame
4
5 ffc_response = requests.get(

ModuleNotFoundError: No module named 'bs4'

#

When I try to run my code in Jupyter I get this

oblique belfry Mar 17, 2020, 7:41 PM

#

Currently, I have been researching Metaflow to use at work. I am looking at the metaflow repo I downloaded from git. As I am following the online instructions, I have VSCode open with the source code of the tutorials. On the right, I have ipython open. I am exploring previous metaflow runs. (Don't worry about what is there. Just though that I am interactively stepping through the code and executing things one at a time.)

📎 Screen_Shot_2020-03-17_at_14.38.22.png

#

Yeah. You do not have bs4 downloaded.

thorny pasture Mar 17, 2020, 7:42 PM

#

I have bs4 downloaded on my pc

oblique belfry Mar 17, 2020, 7:42 PM

#

Is it globally installed?

thorny pasture Mar 17, 2020, 7:43 PM

#

I did pip install bs4

#

it works in vs code, sublime, etc

#

but that error is jupyter website thing

oblique belfry Mar 17, 2020, 7:43 PM

#

How did you install jupyter?

thorny pasture Mar 17, 2020, 7:43 PM

#

I'm doing a web test thing it's not installed

#

How did you get your Terminal to look like that in VSCode?

#

📎 unknown.png

oblique belfry Mar 17, 2020, 7:45 PM

#

That is called ipython. It is the REPL Jupyter uses

#

It is just another package.

#

Are you running jupyter in a conda environment?

thorny pasture Mar 17, 2020, 7:45 PM

#

Disregard conda

#

I have VS Code open

#

I'm asking like 5 questions at once, so let's focus it down to one thing at a time cause Im an idiot

#

Do you have the Extension Open in IPython by Ilya Vouk?

oblique belfry Mar 17, 2020, 7:47 PM

#

For my terminal, I ran ipython instead of python. It is a package.

#

Nope.

thorny pasture Mar 17, 2020, 7:48 PM

#

so I need to pip install ipython?

oblique belfry Mar 17, 2020, 7:48 PM

#

No. Unless you want to.

#

What are you trying to do.

thorny pasture Mar 17, 2020, 7:48 PM

#

Currently make my terminal window on vscode look like yours

#

yours looks like spyder [1] etc whereas mine is >>>

oblique belfry Mar 17, 2020, 7:51 PM

#

If you are using pip, pip install ipython.

I think you need to better research how Python works first before you delve into data science. Understanding how package management works is VERY important. I can say from personal experience that certain versions of keras , tensorflow, and numpy do not play well together.

#

It is important to know how to correct that stuff.

I had to pin the keras version and not just download the newest stuff.

thorny pasture Mar 17, 2020, 7:53 PM

#

I've yet to have to, but imsure it'll happen

#

I was going to get into stuff trying this out https://fantasycoding.com/

oblique belfry Mar 17, 2020, 7:55 PM

#

Yeah. But, you have some fundamental gaps that are going to only get larger as you keep going. Not all data scientists/data engineers/ml engineers need to be the best software developers, but it is important.

#

Looks fun. But review the basics of Python first.

thorny pasture Mar 17, 2020, 7:56 PM

#

Well I've done some projects and stuff, nothing crazy

#

I've been learning from Corey Schafer

#

Didn't watch Ep22 Pipenv

thorny pasture Mar 17, 2020, 10:26 PM

#

@oblique belfry venv isn't hard at all, you weren't wrong

trail kite Mar 18, 2020, 2:38 PM

#

guys I have question about parsing really extra nested json. am I in the right place?

ripe forge Mar 18, 2020, 3:39 PM

#

you can just use the help channel

trail kite Mar 18, 2020, 4:27 PM

#

didn't get enough help 😦

tacit jewel Mar 18, 2020, 4:31 PM

#

Hey could someone offer some insight on datasets? I've only used free datasets that are available online, but what would the process of gathering your own dataset look like?
I suspect connecting to different sites' API would be the way to go

#

Total beginner question but would love if someone could point me in the right direction.

vital cipher Mar 18, 2020, 5:13 PM

#

@tacit jewel you can use a scrapper to scrape which ever information you need from any given website....

tacit jewel Mar 18, 2020, 5:15 PM

#

Thank you @vital cipher . I think I will try saving in just a python dictionary rather than sql or sqlite

#

for now since I'm just starting out and it's not a huge or complex dataset

vital cipher Mar 18, 2020, 5:16 PM

#

cool

undone shard Mar 18, 2020, 6:21 PM

#

going to be working on some AMD Vega optimized witchery with pyopencl, wheeeee

uncut shadow Mar 18, 2020, 6:41 PM

#

witchery? GWcorbinMonkaGIGA

orchid lintel Mar 19, 2020, 12:28 AM

#

So, what's the deal with compress in itertools? Seems to basically do the same thing as filter?

velvet thorn Mar 19, 2020, 12:43 AM

#

not...really?

#

filter processes an iterable, removing elements that evaluate to False

#

compress processes two iterables, removing elements that come from the first iterable paired with elements from the second iterable evaluating to False

orchid lintel Mar 19, 2020, 12:52 AM

#

@velvet thorn Could you give an example of when you'd use compress?

#

like I'm looking at the example given here: https://florian-dahlitz.de/blog/introduction-to-itertools

Python: Introduction To Itertools

Introducing the itertools functions using real world examples

#



def name_selection(names):
    name_selectors = []

    for name in names:
        if name.startswith("A"):
            name_selectors.append(1)
        else:
            name_selectors.append(0)

    return name_selectors


names = ["Albert", "Alexandra", "Miriam", "Sascha"]
filtered_names = list(compress(names, name_selection(names)))```

#

that just seems like a clunkier way of doing filter(lambda x: x.startswith("A"), names)

#

but maybe that's just a bad example that doesn't really show what compress is good for

velvet thorn Mar 19, 2020, 1:28 AM

#

yes, it actually is a bad example

#

in general, you would use compress only in two cases (where data refers to the first iterable and mask to the second):

the mask is not reproducible from the data alone
the mask has already been calculated

#

so, for example, say you have a list of names and a list of ethnicities

#

you could use compress to get only, say, the Gaelic names (along with a generator expression on the second list

#

compress is like a slightly neutered form of filtering by something else

#

if you have worked with languages that have something like filterBy...that's basically it.

orchid lintel Mar 19, 2020, 2:46 AM

#

@velvet thorn Aha! Awesome, thanks!

lapis sequoia Mar 20, 2020, 3:09 AM

#

I am trying to learn how to make neural networks but don't know where to start

velvet thorn Mar 20, 2020, 3:10 AM

#

@lapis sequoia how much do you know?

#

like what's your level of ability in linear algebra, probability, and programming

lapis sequoia Mar 20, 2020, 3:11 AM

#

high in programming but really low in math

#

where do i learn the math ?

#

@velvet thorn

velvet thorn Mar 20, 2020, 3:14 AM

#

https://www.deeplearningbook.org/

#

I suggest this

lapis sequoia Mar 20, 2020, 3:15 AM

#

do i have to learn more numpy and graphing stuff

#

@velvet thorn

velvet thorn Mar 20, 2020, 3:16 AM

#

no need

#

to keep tagging me

#

knowing numpy concepts is good

#

visualisation is not mandatory, but it is very helpful

lapis sequoia Mar 20, 2020, 3:17 AM

#

ok so is there a course u can show me for the programming

velvet thorn Mar 20, 2020, 3:20 AM

#

huh

#

if you're good at programming it should be really simple

#

just pick up Keras/Tensorflow/PyTorch and start hacking

#

don't think a course is necessary

lapis sequoia Mar 20, 2020, 3:24 AM

#

KK

south dagger Mar 20, 2020, 6:45 AM

#

A bit confused not sure what I did wrong, the Annuel Revenue doesnt seem right. For index 522 it seems right but like for index 1476 shouldnt it be 33.333*

*please @ me *

📎 unknown.png

polar acorn Mar 20, 2020, 7:58 AM

#

@south dagger you are diving years by Total in Millions which is wrong. You should divide Total in Millions by year. Also for calculating years you don't need the apply and lambda you can just do sales['Year published'] - 2019, same for calculating Annual Revenue just divide one column by the other as if you would any other variable.

lapis sequoia Mar 20, 2020, 11:59 AM

#

I have this MT Dataset, what can be done with such dataset, as a Data scientist, what questions can be asked regarding this dataset, what can be the yields, what to analyze, etc.

So far I thought to make a classification on age, gender.

To find the reasons why patients undergo, surgery, allergy tests, etc. what else can be done with it, any suggestions, please?

Any suggestion is appreciated.

📎 MT_DATASET.PNG

thorny pasture Mar 20, 2020, 12:33 PM

#

What ecosystem do you guys program on? I'm curious.

obsidian copper Mar 20, 2020, 12:59 PM

#

Can I train YOLO (darkflow) on hand gestures and recognise hand gestures in real time?

#

Ping me in replies

ripe forge Mar 20, 2020, 1:25 PM

#

@obsidian copper as long as you have a dataset for hand gestures, and as long as the actual prediction of a gesture is done as if it was on a "Frame", then absolutely.

obsidian copper Mar 20, 2020, 1:28 PM

#

@ripe forge ok thank you

ripe forge Mar 20, 2020, 1:31 PM

#

@lapis sequoia some kind of nlp around the column that gives description. could be classification to one of the categories under 2nd column, or just a simple patient clustering algo to try to fit new cases in

#

having said that, that pic doesn't give a lot to go on

#

ecosystem..not really sure what that means tbh. just python i guess 😛

sand timber Mar 20, 2020, 1:41 PM

#

having some trouble with an implementation of sparse pegasos svm... it looks like the more i train, the worse it is :(

ripe forge Mar 20, 2020, 1:41 PM

#

are you measuring performance on both train and test? is it improving on train?

sand timber Mar 20, 2020, 1:43 PM

#

i'm just measuring performance on train and it's getting worse the more i train

ripe forge Mar 20, 2020, 1:46 PM

#

interesting

#

sadly, i know nothing specific about how sparse pegasos svm works

sand timber Mar 20, 2020, 1:46 PM

#

it's supposed to be linearly separable

#

but i'm not really sure how to prove it is

#

sigh never mind i think i figured it out

#

silly mistakes

south dagger Mar 20, 2020, 1:50 PM

#

@polar acorn thank you !

ripe forge Mar 20, 2020, 2:08 PM

#

curious, if you don't mind sharing, what was it? @sand timber

sand timber Mar 20, 2020, 2:10 PM

#

regularization lambda was too strong

#

classic self-implementation move

ripe forge Mar 20, 2020, 2:10 PM

#

ah

sand timber Mar 20, 2020, 2:10 PM

#

you never know if you implemented it right or if the hyperparameters are bogus

ripe forge Mar 20, 2020, 2:10 PM

#

heh. yep, that'd do it

oblique belfry Mar 20, 2020, 2:50 PM

#

@obsidian copper What type of gestures are you doing?

obsidian copper Mar 20, 2020, 3:20 PM

#

@oblique belfry actually I want to detect hand gestures in real time. Gestures like thumbs up or fist etc. I'll later use these classification to perform various mouse functions.

oblique belfry Mar 20, 2020, 3:23 PM

#

Yolo might work for you.

If you find that training on still images is not good enough, you can use SlowFast or Conv3Ds and train on multiple videos.

obsidian copper Mar 20, 2020, 3:25 PM

#

I thought of Conv3Ds but idk much about it. And I've to develop this project in like 2 weeks max.

#

I'm still learning cnns

oblique belfry Mar 20, 2020, 3:26 PM

#

You might have to leave the "image classification" approach to things and go to "action recognition." I did a similar problem. Spent hours trying to do a Conv2D + LSTM approach and got nowhere.

#

Gotcha. Just was scrolling through and saw your comment.

obsidian copper Mar 20, 2020, 3:26 PM

#

Can u help me with this thing? I may ask for more help

#

😬

oblique belfry Mar 20, 2020, 3:32 PM

#

I'll do my best

south dagger Mar 20, 2020, 4:34 PM

#

how would i center or word wrap column names?

📎 unknown.png

lapis sequoia Mar 20, 2020, 7:47 PM

#

well hello there
i plotted this with matplotlib

#

📎 foo.png

#

i think we all know what this is.
However as you may also seen, you can't read s**t from it.
i did the dpi at 150 and figsize at (20,10)
but its like unreadable
also lags of course a lot, hence i am converting them in pictures instead of plotting them
so i need it readable on an image
any methods?

lapis sequoia Mar 20, 2020, 10:02 PM

#

Sad, can nobody give more insights?

velvet thorn Mar 21, 2020, 12:19 AM

#

@south dagger do you want to center or wrap?

south dagger Mar 21, 2020, 12:19 AM

#

would center auto wrap it ?

#

both ?

velvet thorn Mar 21, 2020, 12:20 AM

#

did you Google

south dagger Mar 21, 2020, 12:24 AM

#

yes but it kept giving me col selection stuff which wasnt working

chilly fog Mar 21, 2020, 12:27 AM

#

Do you guys know anything about normalization by chance

#

i need someone rn

velvet thorn Mar 21, 2020, 12:27 AM

#

try df.style.set_table_styles([{'selector': "th", 'props': [('max-width', '50px')]}]) @south dagger

#

@chilly fog

#

!ask

arctic wedgeBOT Mar 21, 2020, 12:28 AM

#

Asking good questions will yield a much higher chance of a quick response:

• Don't ask to ask your question, just go ahead and tell us your problem.
• Don't ask if anyone is knowledgeable in some area, filtering serves no purpose.
• Try to solve the problem on your own first, we're not going to write code for you.
• Show us the code you've tried and any errors or unexpected results it's giving.
• Be patient while we're helping you.

You can find a much more detailed explanation on our website.

lapis sequoia Mar 21, 2020, 5:23 AM

#

hi

undone shard Mar 21, 2020, 7:01 AM

#

Hi bon, having fun with this N-body particle thing I made in PyOpenCL, glad I fixed a bug

#

Vector fields!

#

I'll show an image.

#

📎 Figure_1.png

hexed aurora Mar 21, 2020, 10:18 AM

#

Hi all, im very to coding in python!! great to be here!

arctic wedgeBOT Mar 21, 2020, 10:22 AM

#

Hey @burnt wharf!

It looks like you tried to attach file type(s) that we do not allow (.txt). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .m4v, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .md.

Feel free to ask in #community-meta if you think this is a mistake.

hexed aurora Mar 21, 2020, 10:31 AM

#

Can anyone point me to a good reading material. I want to compare snapshots of the same table(s) at different points in time and identify the differences and make some meaningful inference

wheat frost Mar 21, 2020, 2:30 PM

#

Well numpy sounds good for that sort of thing

#

Depends what sort of meaningful inference you want to make

dusky cairn Mar 21, 2020, 6:13 PM

#

Hey, i am kind of new to Python, and i need some help. I am trying to do a polynomial regression

lapis sequoia Mar 21, 2020, 9:51 PM

#

https://www.geeksforgeeks.org/python-implementation-of-polynomial-regression/

GeeksforGeeks

Python | Implementation of Polynomial Regression - GeeksforGeeks

A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

hexed aurora Mar 22, 2020, 10:09 AM

#

@wheat frost I look at the snapshot of the table every 10 minutes

#

news rows can be added to a table, or existing rows could have changed

#

but the volume of data is very high, so RDBMS like comparisons take a bunch of time

fresh cedar Mar 22, 2020, 1:31 PM

#

Hi All, Pandas question: how would I go about adding new rows to a DataFrame obtained using groupby() and count(). I have cumulative sums of items grouped by date. My resulting dataframe looks like in the screenshot. I'd like to add additional rows to it e.g. to predict future growth.

📎 unknown.png

hollow orbit Mar 22, 2020, 4:34 PM

#

wow

jolly briar Mar 22, 2020, 4:41 PM

#

@fresh cedar if you put .reset_index() after it should return a dataframe.

fresh cedar Mar 22, 2020, 4:50 PM

#

Thanks! I'll try it out

crimson umbra Mar 23, 2020, 12:33 AM

#

“Top 20 YouTube Channels for Data Science in 2020” by Benedict Neo https://towardsdatascience.com/top-20-youtube-channels-for-data-science-in-2020-2ef4fb0d3d5

Medium

Top 20 YouTube Channels for Data Science in 2020

Here are the best YouTubers you should follow to learn about programming, Machine learning and AI, mathematics and Data Science.

real wigeon Mar 23, 2020, 1:35 AM

#

how can i tell my script to do something, if it is on the first row of a pandas df

next cairn Mar 23, 2020, 7:31 AM

#

How to deal with spatial data ?

jolly briar Mar 23, 2020, 9:51 AM

#

@real wigeon what do you mean, can you give an example?

silk forge Mar 23, 2020, 10:16 AM

#

so i was watching andrew ngs course

#

and i decided to code a linear regression model myself

#

and so i have a doubt

#

import sklearn.linear_model as lin
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib
from math import sqrt

data = pd.read_csv(r"C:\Users\home\Desktop\Artifcial intelligence\ML\data\Regr\FuelConsumptionCo2.csv")

x = data[['ENGINESIZE']]
y = data[['CO2EMISSIONS']]

trainx,testx,trainy,testy = train_test_split(x,y,test_size=0.2,train_size=0.8,random_state=7)

regr = lin.LinearRegression()
regr.fit(trainx,trainy)

# h(x) = O0 + O1

o0 = regr.intercept_
o1 = regr.coef_

print(f"o0 shape = {o0.shape}")
print(f"o1 shape = {o1.shape}")

#

o0 shape = (1,)
o1 shape = (1, 1)

#

why is my theta 1 a 1d vector

#

while my theta0 is not

#

nvm makes sense now

#

npar = np.array([o0,
                 o1])
print(npar)

#

when i put them in an array

#

they become a 2d vector

#data-science-and-ml

distance (from apertures) to screen

size of each aperture

wavelength of light

separation of the slits

wavenumber

define distance along the screen, y, from -y_max to y_max

size of the screen - extends (-y.max, y.max)

sets the number of pixel on the screen