#data-science-and-ml

1 messages · Page 265 of 1

velvet thorn
#

@velvet thorn I need to compare across groups to count intersections. Any suggestions on how to go about it?
@heady hatch can you elaborate

heady hatch
#

@velvet thorn

Given two columns, a and b.

Grouping by a, we're going to have a list of values from b for each group in a.

then I want to compare across the different groups in a to see how many elements from b intersect for each group in a.

ie.

a b
0 4
0 5
0 6
1 4
1 5
2 5
2 6

Then
0: [4, 5, 6]
1: [4, 5]
2: [5, 6]

0 and 1 share two elements, 4 and 5
0 and 2 share two elements, 5 and 6
1 and 2 share one element, 5

paper niche
#

!e

from io import StringIO
import itertools as it
import pandas as pd

df = pd.read_csv(StringIO("""
a b
0 4
0 5
0 6
1 4
1 5
2 5
2 6
"""), sep=' ')

for a1, a2 in it.combinations(df['a'].unique(), 2):
    intersection = set(df.loc[df['a'] == a1, 'b']) & set(df.loc[df['a'] == a2, 'b'])
    print(f"{a1} and {a2} share {len(intersection)} elements: {intersection}")
arctic wedgeBOT
#

@paper niche :white_check_mark: Your eval job has completed with return code 0.

001 | 0 and 1 share 2 elements: {4, 5}
002 | 0 and 2 share 2 elements: {5, 6}
003 | 1 and 2 share 1 elements: {5}
paper niche
#

@heady hatch do you mean something like this?

heady hatch
#

@paper niche right right. Is this going to run through at O(N^2)?

#

I'm trying to find a solution quicker than it.

velvet thorn
#

@heady hatch hm I think that's probably not the way to do it

#

let me think

#

like grouping by a is not the most efficient solution

#

do you need to count all possible intersections?

slate scroll
#

Seems like a classic map reduce problem

heady hatch
#

I don’t know if I need to count all possible intersections, but just the ones with at least one intersection.

#

Oh? How would you go about it in terms of map reduce?

slate scroll
#

Well you would just map all nodes based on column a, the reduce would be counting the overlaps in b

heady hatch
#

I think I’m not familiar enough with map reduce.

I’m thinking of map function and reduce function from python.

#

On the other hand, isn’t that also O(n^2)?

#

Since when counting all the overlaps with b values, you still need to go through each a value n times.

velvet thorn
#

I don’t know if I need to count all possible intersections, but just the ones with at least one intersection.
@heady hatch hm.

#

that makes the problem different

#

so basically you want to find the values of b that correspond to more than one unique value of a?

heady hatch
#

Yes(I think.)

To give you guys some context.

Each product has a category. And I have two stores and each store has their own set of categories.

The idea is to find categories from each stores that share the most amount of product with the other category.

velvet thorn
#

or do you mean the unique values of a which correspond to at least one value of b that is shared with another unique value of a

heady hatch
#

Ahh I think it sounds like the latter.

velvet thorn
#

okay maybe with an example

#

it would be easier

#

can you provide some sample data and your expected result

heady hatch
#

Yea definitely.
If you guys don’t mind, I have to anonymizes couple things.

But data is something like this.

Two datasets, each one with a product and the categories they’re in. They can be in more than one category.

So dataset is something like ...

Dataset 1
item -> category
apple -> [a, b, c]
orange -> [a, d, e]

Dataset 2
Item -> category
Apple -> [1, 7, 9]
Watermelon -> [1, 4]
Banana -> [1, 5,6]
Orange -> [1, 2]

#

And the result that we want to get is something like a utility matrix of sort.

Category from dataset 1 vs category from dataset 2

a b c d e
1[apple, orange][apple]...
4[][][]...
5
6
7
9

#

I don’t know if this helps.

#

The way I was thinking of was compute all the items in the categories and go through each category in the other dataset to see how many items they would share.

#

But it would be O(cate1 * cate2).

#

However thinking about it, I can filter down a bit of the categories.

Not sure how I would filter, now thinking about it twice. Hahaha

velvet thorn
#

wait

#

so there are lists in your DF?

winged lark
#

Hello, I'm having some trouble with my dataframes. I have tried playing around with indexes, transposing and the like. For now I just want to plot either of the points in the first row.

burnt prawn
river hazel
#

can anyone take a look at the train function and does anyone know how to fill in self.params['W']? this is for my class on linear regression 😭```py

TODO: Use the gradients in the grads dictionary to update the

        # parameters of the model (stored in the dictionary self.params)        #
        # using stochastic gradient descent. You'll need to use the gradients   #
        # stored in the grads dictionary defined above.                         #
        
        self.params['W'] = ???
        
        #   END OF YOUR CODE                         
this is my attempt at it, but it doesnt yield the supposed values from the notebook  (loss plot is increasing instead of decreasing XD) https://github.com/poisonivysaur/ml-class/blob/main/Linear%20Regression/linear_regression.py
heady hatch
#

@velvet thorn kind of messy but yea. Hahaha

For each item I was going to explode the list of categories and regroup the categories. So it would be grouped by categories instead by items.

wheat seal
#

how do i optimize my yolov3 model

#

its very slow on my raspberry pi

hollow sentinel
#

so I just read that the Andrew Ng class is on Octave

#

what the hell is Octave guys

river hazel
#

matlab but free

hollow sentinel
#

can't I just use Jupyter notebook like any sane human being

#

jupyter notebook is my baby

#

ugh

river hazel
#

do u think u can take a look at my linear regression code? above ^

wheat seal
#

i tried that course

hollow sentinel
#

did you find it good

wheat seal
#

second week exercises are rigged

hollow sentinel
#

oh no

#

i heard answers are on github repositories

wheat seal
#

mostly because oactave is soooo had to use

hollow sentinel
#

dude

#

i just wanna use jupyter notebook

wheat seal
#

i heard answers are on github repositories
@hollow sentinel if they find out u use that they remove u from the course

#

lamo ok sorry

hollow sentinel
#

i know

#

I don't get it

#

what is the point of making you handwrite your own linear regression without sci kit learn

wheat seal
#

exactly

hollow sentinel
#

sci kit learn is there for a reason lol

wheat seal
#

lol

hollow sentinel
#

yeah um I may try the columbia course first

wheat seal
#

anyway i recommend google colab instead of jupyter

#

its so much better

hollow sentinel
#

oh i've heard of that

#

the google crash course uses it

wheat seal
#

its basically jupyter on steroids

#

ye

hollow sentinel
#

i don't want to use octave rn lol i'm still a beginner

#

to machine learning

wheat seal
#

dont use it

#

i had never heard of it before Ng's course

hollow sentinel
#

all this time i thought it was just python

river hazel
#

^

wheat seal
#

and if a programming language is paid dont even bat an eye (matlab)

hollow sentinel
#

yep

#

i will check out google colab'

wheat seal
#

ye

#

its python so pog

hollow sentinel
#

thanks

wheat seal
#

wlcm

undone flare
#

hey guys

hollow sentinel
#

hello

wheat seal
#

if you're training an ML model then u can use their public GPUs too

#

in colab

undone flare
#

I am trying to learn data analysis can I ask questions related to it here?

hollow sentinel
#

yes this is a data science chat

undone flare
#

So

#

I downloaded jupyter lab using pip install jupyterlab

hollow sentinel
#

if you're training an ML model then u can use their public GPUs too
@wheat seal that's cool

undone flare
#

how can I do that?

hollow sentinel
#

lmao I've stuck w Jupyter notebook so far idek how to help

#

are you talking about this

undone flare
#

no sign up option

wheat seal
#

hmm

#

so you're looking to run ipynb notebooks in the cloud?

undone flare
#

yes

wheat seal
#

well

#

ig its time to recommend google colab again

hollow sentinel
#

does google colab do that

undone flare
#

wait I can't even sign in xD

hollow sentinel
#

lmaooooooo

wheat seal
#

yes

#

try google colab

hollow sentinel
#

google colab seems like the repl.it of data science

wheat seal
#

ye

undone flare
#

How to do that?

wheat seal
#

go to this url and sign in with ur google account

undone flare
#

do I need to install anything other than jupyterlab?

wheat seal
#

u dont need to install ANYTHING to use googe colab

undone flare
#

ok

river hazel
#

just browser

#

google docs for python

wheat seal
#

lol ye

#

couldnt get simpler

undone flare
#

so I choose new notebook to start right

hollow sentinel
#

yes

wheat seal
#

yes

#

u can even upload nbs

hollow sentinel
#

if you're using google colab i would recommend google's machine learning crash course

wheat seal
#

ye

#

mostly all ML tutorials online also use google colab

hollow sentinel
#

oh Portilla uses jupyter notebook lol

wheat seal
#

🤮

hollow sentinel
#

heyyyy

undone flare
#

bruh this was easier than I thought xD idk why someone recommended me notebooks.ai

hollow sentinel
#

he's good

wheat seal
#

lmao

hollow sentinel
#

Portilla is a G

#

we stan for Portilla

wheat seal
#

kekw

#

:kekw:

#

aww man i need nitro

undone flare
#

thx guys

wheat seal
#

np

hollow sentinel
#

no problem

undone flare
#

now I can finally start coding xD

hollow sentinel
#

is this your first time doing machine learning? @undone flare

undone flare
#

yea

hollow sentinel
#

oh that's fun

#

haha I started a couple weeks ago

wheat seal
#

same

hollow sentinel
#

when I first came here I couldn't make a matplotlib pie chart properly

#

so I got an internship interview w CUNA Mutual group and they asked me if i knew any algorithms and i said uhhhhhhhh i make graphs

wheat seal
#

can relate

hollow sentinel
#

yeah safe to say I didn't get the job

undone flare
#

does executing things take time firstly or it's just my laptop

wheat seal
#

not the job part im just a kid

hollow sentinel
#

depends on what you're executing

wheat seal
#

ye

undone flare
#

I executed 2+3 xD

hollow sentinel
#

uhhhhhhhhhhh

wheat seal
#

u need ur lapy checked

undone flare
#

idk it still works after 7 years

wheat seal
#

even more reason to get it checked

hollow sentinel
#

yeah i would be sus if my machine lasts that long

wheat seal
#

only macs last that logn

#

what computer u have

undone flare
#

mine is windows

wheat seal
#

omg

hollow sentinel
#

F

wheat seal
#

we have met the messaiah

undone flare
#

that also Win7 Ultimate lol

hollow sentinel
#

lol i got bullied in college for having a mac

wheat seal
#

bruh

hollow sentinel
#

everyone was like who uses a mac to code

wheat seal
#

i still get bullied by my friends while playing mc even tho i get higher fps than them

undone flare
#

with 4gb ram oof

hollow sentinel
#

you game on your mac????

#

HOW

wheat seal
#

yes

hollow sentinel
#

mine would melt

wheat seal
#

uhh

#

settings change

#

i lower my settings lol

hollow sentinel
#

mine does not like gaming

#

even flash games

#

i can hear the fans go off

#

anyways back to ML

wheat seal
#

lmao

#

ye

#

well if the fans go off its not really a bad thing

#

back to ml

hollow sentinel
#

me knowing I won't understand TF without calculus and lin alg

undone flare
#

bruh do I have to learn jupyter notebook before other stuff?

hollow sentinel
#

jupyter notebook and google colab is like the same thing

#

except google colab is the cloud

undone flare
#

I mean that only

hollow sentinel
#

well normally you would learn how to visualize data

#

then you would learn how to clean data

#

and finally machine learning

#

and then all those niche topics like NN, NLP

undone flare
#

👍

hollow sentinel
#

yessir

#

are you using a course to learn your stuff?

undone flare
#

yes

hollow sentinel
#

nice what course

undone flare
#

yt freecodecamp

hollow sentinel
#

oh

#

idk never used that before

undone flare
#

Data Analysis with Python - Full Course for Beginners (Numpy, Pandas, Matplotlib, Seaborn)

hollow sentinel
#

nope i just went straight to udemy

undone flare
#

Does udemy have free courses?

hollow sentinel
#

some but i wouldn't call them amazing

wheat seal
#

i learned python from freecodecamp course

hollow sentinel
#

lol i learned python from college big oof

#

biggest mistake

undone flare
#

oof

wheat seal
#

oof

undone flare
#

pls tell some shortcuts for google colab

#

like making new cell shortcut

wheat seal
#

all the keyboard shortcuts are listed in the menu bar

#

right below ur notebook name

hollow sentinel
#

i would recommend downloading a Kaggle dataset and trying to do visualizations off that

#

while you're following the video

undone flare
#

Ctrl+M B what does that mean

#

pressing M B keys together?

#

ok I got it

#

is datacamp good?

hollow sentinel
#

lol idk

undone flare
#

nah it's boring I just checked lol

hollow sentinel
#

datacamp is kind of fill in the blank

#

that's what i just read

hollow sentinel
#
df["term"] = df["term"].apply(lambda term: int(term[:3]))
#

TypeError: 'int' object is not subscriptable

#

anyone see what's wrong here

#

I don't get it this was exactly what Portilla typed

#

visible confusion

#

see that is the same line

#

I need to learn lambda expressions

undone flare
#

@hollow sentinel can you tell me what you using?

#

just jupyter nb?

hollow sentinel
#

yes

undone flare
#

ok

hollow sentinel
#

it might be bc i ran the cell more than once

#

but it still doesn't work lmao

#

idk how to fix it

#

should i go to a help channel

molten hamlet
#

does numpy or scipy have any functions to add values with some masks but masks like 0 to 1 ?

hollow sentinel
#

masks???

molten hamlet
#

mask = array > 10

hollow sentinel
#

oh cool didn't know that term

#

no idk

#

here's the doc for masks

molten hamlet
#

nah, I want to add 2 arrays, center is full new, and outerring is some average ;d

#

it is just.... numpy ;D

hollow sentinel
#

idk lmao maybe someone else knows

molten hamlet
#

i just pasted solution

#

xD

hollow sentinel
#

oh

#

cool

undone flare
#

hey anyone here?

#

😦

hollow sentinel
#

yep

hollow sentinel
#
X = df.drop('loan_repaid',axis=1).values
y = df['loan_repaid'].values 
#

KeyError: "['loan_repaid'] not found in axis"

#

I ran the cell more than once and now I can't get it to work properly

#

train_test_split needs an X and a y

#

haha nvm i fixed it

#

just restarted the cell and ran everything again

bitter harbor
heady hatch
#

Learning new things.

#

Interestingly that's consistent with tf too.

junior horizon
#

Anyone know how to remove a column containing a substring in python

#

pandas

hollow sentinel
#

@bitter harbor was that directed at me

bitter harbor
#

sort of
i've never read the pandas docs but that's the first thing that shows up

hollow sentinel
#

I hate reading doc it’s so boring

bitter harbor
#

depends on the docs tbh

#

sometimes it's easier to read through the source code imo

hollow sentinel
#

I like to read the doc and then just use methods from it on data from Kaggle

#

I’m getting better at reading doc tho

grave thunder
#

I just use stack overflow

heady hatch
#

Do they still teach rtfm in cs schools?

solar bluff
#

the pandas documentation is often confusing

cerulean spindle
#

Yes, I can definitely agree with that and I think the scikit-learn documentation is rather informative and easy to read.

grave thunder
#

Hello, anyone know in jupyter notebook how can I get the cleaner looking histogram shown? Mine (up) is hard to see

austere swift
#

have you tried the edgecolor parameter?

grave thunder
#

have you tried the edgecolor parameter?
@austere swift That did the trick. Thanks ^^

austere swift
#

Np

remote pecan
#

scrapeing part of data science?

hollow sentinel
#

yeah

#

what are you using to scrape? bs4/selenium?

remote pecan
#

err no im trying the basics of scrapeing so im trying html scrapeing. but i keep getting 0 data no matter what i try

hollow sentinel
#

just send your code

remote pecan
#

but ye i use bs4

#

im in help room neon

hollow sentinel
#

oh then why are you asking for help here lol

#

someone there will help you

remote pecan
#

🙂 just wondered if it was right category

#

read the beutiful soup documentations but my experiance differs from the documents.

hollow sentinel
#

this might just give you a general guide on how to do bs4 scraping

remote pecan
#

will take a look.

#

thanks

hollow sentinel
#

ofc

agile wing
#

oh man

#

im sleepy from studying an dstreaming

#

finally learning NN

#

learned that each layers is a logistic regression function.

#

essentially

hollow sentinel
#

idk man

#

I don't know if i want to spend time learning octave

#

people don't use it

#

it's either jupyter notebook or google co lab

hollow sentinel
#

hey guys if you want to brush up on your python basics automate the boring stuff with Python is free on Udemy with this code: NOV2020FREE

#

be careful the code only works for a limited amount of time

grave thunder
#

it's either jupyter notebook or google co lab
I've been sceptical about notebook but once I tried it I'm not going back

#

Makes data manipulation and presentation waaaaaaaaay better than any IDE out there

hollow sentinel
#

has anyone tried the IBM data science course

#

AAAAAAAAAAAAH THEY'RE ASKING FOR CREDIT CARD INFO ON COURSERA

twilit brook
#

Has anyone come up on this problem?

#

I download a .csv file directly from a local server, but it doesn't import properly into pandas

#

The column headers are shifted two over. Only workaround i've figured out was opening the file in numbers/excel and resaving. Then it imports fine

#

but this will run on a scheduler... anyway to fix the headings?

agile wing
#

wondering if its a delimiter problem in the csv file

twilit brook
#

that could be it

#

I just looked up what a delimiter is

agile wing
#

in other words, there may not be a comma between those columns?

twilit brook
#

lemme check

#

thank you

hollow sentinel
#

columbia machine learning course is boring

#

compared to Portilla's

agile wing
#

i have the andrew ng machine learning python homework assignments

lapis sequoia
#

what are you trying to learn

agile wing
#

someone created the python homework set, and basically it's approved when submitting it, that's why I'm using that version anyways

#

columbia ml course?

lapis sequoia
#

ok

hollow sentinel
#

there's a columbia machine learning course on edx

#

this professor puts me to sleep tho

agile wing
#

i like coursera the best

hollow sentinel
#

yeah but i have to pay for that

#

i think i should do the google ml crash course

#

i can't stand these boring machine learning theory lessons

agile wing
#

i've seen that and that ...is too little

#

the google ml crash ones

hollow sentinel
#

that's unfortunate

#

well I'm not gonna do Ng anytime soon

#

and learn Octave

#

lol the columbia course is lame not doing it

#

If I wanted to learn the math I would’ve done 3b1b

lapis sequoia
#

Where would we talk about AI?

agile wing
#

you dont need to learn octave

#

just get the p ython homework version for andrew ng's class

#

there's actually a github of someone who created all of it in python for homeowrk exercises

jolly folio
#

im kinda new to pyton, what is the best IDE?

#

...

hollow sentinel
#

if you wanna do data science jupyter notebook and google colab is good @jolly folio

#

but other than that I would recommend VSC

#

@lapis sequoia here lol

#

@agile wing thanks man I found one that does it entirely in Python

#

I just wish there were more courses like Portilla's

hollow sentinel
#

i don't understand everywhere I read it says the Ng course is free

#

but it requires credit card info??

undone flare
#

I have an numpy array a = [0, 0.5, 1, 1.5, 2] and when I print it a[0] it gives 0.0 why?

hollow sentinel
#

you're printing an index of the array?

#

sike it works now

undone flare
#

I edited it

hollow sentinel
#

yes because 0 is the zeroth index of the list

undone flare
#

no I mean why did it get converted to float?

#

In array it was 0 but when I print that element it becomes 0.0

hollow sentinel
#

oh idk lmao

undone flare
#

and if I do a[-1] it will give 2.0 and not 2

#

is it because the data type of a is float?

#

cuz arrays can have only same data type values

#

can that be the reason?

heady hatch
#

because NumPy arrays can only hold one dtype, and I think it cast everything to float if there's a float in there.

you can check array's dtype with array.dtype.

undone flare
#

yea thx got it

spare karma
#

Anyone know of any Video upscaling tutorials? I want to take a 1080p movie and upscale it to 4k.

#

Thought I could make a fun project out of it.

real wigeon
#

how do you guys deal with loading xls files into mysql?

#

I'm working on a flask app

#

trying to come up with a way to load new user data in bulk, into my db

vital cipher
#

guys was just wondering tensorflow2 supports only till python 3.8 but new distros like fedorra 33 comes with py3.9 so was wondering is there any news on the new updation on the tensorflow or you can share any new updates that exist

lapis sequoia
#

@vital cipher "when all of our dependencies support py3.9" is when it will be available

vital cipher
#

yup i agree with you @lapis sequoia but it was posted like 26 days ago and wanted to know like whats new thats all... 🙂

lapis sequoia
#

You can always check what dependencies support python 3.9 @vital cipher

#

Maybe one is holding them back

undone flare
#

heya anyone there??

remote pecan
#

yes ?

undone flare
#

When we may require to create an array initialised to zeros or ones?

remote pecan
#

im sorry but i did not understand that 🙂

undone flare
#

like

#

you create an numpy array

#

np.zeros((3,4))

#

or

#

np.ones((3,2))

#

Why we need an array with only zeros and ones

remote pecan
#

im afraid i do not know the answer to that.

undone flare
#

me neither xD

bitter harbor
#

@undone flare if you need to init an array with values, you'd use one of those. Zeros is used pretty commonly in networks and stats related stuff but the usages depend on the range you need. np.full works as well and can fill an array with values such as inf and NaN: values that aren't really values. np.empty can be used as well + is faster considering there aren't any values but you have to set all the values - including invalids.

undone flare
#

idk any of those xD I am still learning numpy

#

I only know np.full

bitter harbor
#

they're just different ways to init an array

#

np.(zeros/ones/full/empty)_like is similar to all that too, but it's used to copy the shape + data-type + order

undone flare
#

yea I just learnt about that

heady hatch
#

Hey @velvet thorn I think I finally have a better way of describing what I was looking for.

So I'm grabbing the cartesian product of two lists of categories, and under each category, it's a list of products.

For each product of categories, I wanted to find the length of shared products under the categories.

I was able to do something along the lines of for each product of categories, find the set intersection of the two.

but I'm curious if there's a way to do it more efficiently.

I would love to hear what others' advice as well.

undone flare
#

Can anyone give me some patter to print using arrays (simple one as I am still learning)

chrome barn
#
import numpy as np
array = np.array([1, 2, 3, 4, 5])
print(array)
for x in array:
    print(x)
#

something like this

bitter harbor
undone flare
#

@chrome barn wdym

bitter harbor
#

like printing out elements of an array

undone flare
#

I want something like this

#

I will do checkerboard

chrome barn
#

should help you out

undone flare
#

I am already learning it

wanton bison
#

hey guys if you want to brush up on your python basics automate the boring stuff with Python is free on Udemy with this code: NOV2020FREE
@hollow sentinel Thanks

lapis sequoia
#

!e <head> Hello world </head>

arctic wedgeBOT
#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

remote valley
#

matplotlib: i want to grab the array of pixels for a plot, manipulate that array, and then write it again with ax.imshow(arr) but I don't see any way to get a plot (bar in this case) as an array of pixels

undone flare
#

hi anyone here?

spiral peak
#

@remote valley that seems like an odd way of editing a graph. What's the specific use case of this vs using mpl's normal functions for changing a graph?

undone flare
#

where can I practice Matrix Multiplication?

#

nvm got it

remote valley
#

@spiral peak oh yeah it's odd. I'm working on some procedural art stuff. not an intended use case for sure.

#

made some bar charts with polar coordinates and wanted to use them as patterns to fill voronoi cells 🙂

spiral peak
#

Aaaah, okay. I'm not sure, let me do some research

hollow sentinel
#

lol time to start doing Ng

#

in other words dyiNG

#

hahaha i'm dead inside

undone flare
#

@hollow sentinel do you know if the determinant of the identity matrix is always 1 or not?

undone flare
#

Matrices is easy

#

do you want a question to solve?

hollow sentinel
#

nah

molten hamlet
#

2d array of 0 to 255

hollow sentinel
#

uhhhhhh

heady hatch
#

What do you mean by pattern? As in you want it to be a 2d array?

#

If so convert img to array.

molten hamlet
#

@heady hatch It cant be done that way, it has to be dynamic... you want size 8, or 10 etc. 😄

lapis sequoia
#

How much better is automating with openpyxl than with Macros/VBA?

blazing sundial
#

Hey fam, anyone where a wiz at matplotlib?

#

or know anything about plotting antenna radiation patterns?

fallow prism
#

what is the criteria for choosing a neural network library like scikit lern or TF for NLP?

#

learn*

rustic obsidian
#

im having an issue during numpy import, referenced here https://github.com/xianyi/OpenBLAS/issues/2709

anyone familiar with that? im not sure where to go from here, running windows 10.0.19041 Build 19041

"RuntimeError: The current Numpy installation ('venv\lib\site-packages\numpy\init.py') fails to
pass a sanity check due to a bug in the windows runtime. See this issue for more information: https://tinyurl.com/y3dm3h86"

heady hatch
#

Never mind, got it.

fallow prism
#

dataset['b'].head()

heady hatch
#

Oh but that doesn't include group by.

#

I had to do df.groupby('a')['c'].nlargest.

#

In regards to your question, @fallow prism . The kind of algorithms you want to use for NLP depends on your problem and constraint and how you want to go about it.

fallow prism
#

do you can resend the problem please?

heady hatch
#

Scikit learn isn't a neural network library.

#

So if you need NN, you would look into NN framework libraries such as PyTorch or TensorFlow.

#

But if you need to use classical machine learning, Scikit-Learn is there.

#

But there's also more nlp focused ones like NLTK or SpaCy.

#

Scikit-Learn is a general library for ML, but they don't include NN.

fallow prism
#

thank you, I going to study that better

#

my problem in NLP is interpretation and classification of description made by people about car accidents

#

NLP problem*

#

and i need train a NN to do that

#

or i think that

heady hatch
#

So I'm looking to group by a, and sort by c.

#

But retaining the values of b.

#

Regarding your issue of classification of description made by people about car accidents.

NN could work.

#

But there's nothing wrong with trying out classical algorithms as well.

#

Or I guess I'm curious, why do you think you need to jump to NN right away?

tall seal
#

noob question here, but this is my data set and I am trying to Display movie name, number of genres for the movie in dataframe and also print(total number of movies which have more than one genres)...any idea where to start here? I looked up documention of .sum() function but can't see to get it to work...

heady hatch
#

@tall seal to clarify, are those three different requests?

#

Would something like df.sum(axis=1) be what you're looking for?

fallow prism
#

imagine you crash your car and you describe me the accident, i have to be able to classify the accident and know what part of your car was damaged, know how occurs the accident

#

and who is responsible

#

in a few words

heady hatch
#

So from my understanding you're trying to extract information from text?

fallow prism
#

basically

ripe forge
#

Nine, did you try a sort_values on the group by object yet?

fallow prism
#

and my set of texts doesn't have structure

heady hatch
#

Hey @ripe forge , thanks for responding.

I did and this was the result I got.

#

But the issue I'm encountering now is the b column is in index form instead of its actual value.

#

And so now I'm not too sure how to go about it.

ripe forge
#

Oh just use reset index after that

heady hatch
ripe forge
#

This is after reset index yeah?

heady hatch
#

Mhm.

ripe forge
#

Then I think the only part left is top 5,yeah?

heady hatch
#

I think c is already in top 5.

#

After sorting and doing nlargest.

ripe forge
#

Oh then yep, you're done

heady hatch
#

Is there an elegant solution to do it without remapping?

ripe forge
#

Not sure off the top of my head. I'm a bit surprised why it came as level1

#

Can you change nlargest to head and see if it still comes?

heady hatch
#

Oh good point.

#

Oh wait I remember trying that and needed to sort beforehand.

#

So this was something else I've also tried.

#

But the issue here is a isn't in groups and c is just based off of the absolute sort instead of within groups.

ripe forge
#

This should still logically contain all the rows that you're interested in. But yeah, this one aside, I was thinking group by, sort values, and head. What's the output of the operations in that order?

heady hatch
#

I remember hitting error on using sort_values after groupby.

#

Or did you mean

df.groupby('col')['col2'].sort_values...
#

On the other hand I just tried a new one.

This seems to get me there.

hollow sentinel
#

hey guys if I asked questions in octave would you be able to help me

#

I haven’t started the Ng course bc I’ve been busy w school

tall seal
#

Would something like df.sum(axis=1) be what you're looking for?
@heady hatch I tried this and it didn't see to work

heady hatch
#

What results were you trying to get to? and could you clarify what you were trying to achieve?

tall seal
#

@tall seal to clarify, are those three different requests?
@heady hatch 2 requests, to display movie name and number of genres for the movie and then print total number of movies with more than one genre.

heady hatch
#

Right right, if you don't mind let me try to lead you through what you're seeing here.

#

Oh ops.

#

I realized why it's adding random things, it's adding the id.

#

so you might need to do something like

#

df.iloc[:, 3:].sum(axis=1) or df.loc[:, 'Action':].sum(axis=1)

#

What this does is adds up all the values in your genres. Since your data is a boolean encoding of the genres. By adding up the values, you get to see the total amount of genres per movie.

#

From there, then if you want to filter to movies with more than 1 genre, you would then need to do

col > 1
#

If it's too much, I guess let me know what questions you have.

lapis sequoia
#

Anyone here use statsmodels?

hollow sentinel
#

is that like scikit learn

lapis sequoia
#

Yeah similar concept.

hollow sentinel
#

yeah I don't use it but I remember you asking a question about it before

lapis sequoia
#

I've only used statsmodels tbh, I should probably try SciKit Learn too.

#

Yeah was just curious.

hollow sentinel
#

have you used Octave lol

lapis sequoia
#

Haven't had time to tweak my linear model I did from last time.

#

Nah what is that?

#

I'm pretty new to coding lol.

hollow sentinel
#

it's like matlab

#

Andrew Ng uses it for his machine learning course

#

on coursera

lapis sequoia
#

Do you have to pay for it?

hollow sentinel
#

no it's free

#

only if you want the certificate

lapis sequoia
#

@hollow sentinel Do you work in ML?

hollow sentinel
#

@lapis sequoia lol no i'm just a college business student who thinks ML is cool

lapis sequoia
#

Ah gotcha lol. Same here, ML seems dope. I work in corporate finance and some of our financial models and tools we use are starting to become ML.

hollow sentinel
#

that's very cool

lapis sequoia
#

We now have ML dashboards to predict and model out future costs.

hollow sentinel
#

that sounds very cool

lapis sequoia
#

I have been trying to create some sort of regression model for our labor hours/direct labor costs to find the drivers and create a thing where we can choose each component of the product and see how many hours get added but been struggling since the initial regression.

#

That was the one I asked about a few weeks ago.

hollow sentinel
#

just be careful about who you give the data to on the internet

#

that kind of stuff in the wrong hands is really bad

lapis sequoia
#

Yeah I always hide company info.

hollow sentinel
#

good

lapis sequoia
#

hi @weak kiln bro can i get access to the voice chat to interact with you guyz

weak kiln
#

weird place to ping me, dude. see #voice-verification for information on why you don't have speaking permissions.

hollow sentinel
#

lol

lapis sequoia
#

lol
@hollow sentinel i'm new at this place

hollow sentinel
#

haha welcome

#

I've been here for a couple weeks. This thread is for DS/ML questions so if you have any just ask

lapis sequoia
#

I've been here for a couple weeks. This thread is for DS/ML questions so if you have any just ask
@hollow sentinel Yes i have a lot of...

hollow sentinel
#

uhhhh then ask them?

charred blaze
#

@fading wigeon hey, remember that presentations about how notebooks are sucky?

#

https://www.youtube.com/watch?v=9Q6sLbz37gk here's one from one of the authors of fast.ai on how notebooks might actually be interesting and attempts to refute some of the arguments of that other presentation

I like using Jupyter Notebooks (https://jupyter.org/). Particularly when combined with nbdev (https://nbdev.fast.ai/). In this video, I explain why, and explain why I have a different opinion to Joel Grus, who discussed in another talk why he doesn't like using Jupyter Noteboo...

▶ Play video
#

TBH, I wasn't that convinced on the refutals but nbdev caught my attention

lapis sequoia
#

uhhhh then ask them?
@hollow sentinel bro can we integrate osint with ai??

hollow sentinel
#

uhhhhhh

fading wigeon
#

@charred blaze Cool, thanks, will check it out

charred blaze
#

also, this was the presentation where Jeremy Howard was somewhat... "canceled"

#

and a schism is starting to brew that involves Jupyter (really)

hollow sentinel
#

jupyter was the first thing i used for DS/ML

chilly pasture
#

hello i already have python installed in my windows system

#

is it good to register anaconda3's python as default?

obsidian yacht
#

anacondas python has many inbuilt libraries so you can make it default to use them in ease

wheat seal
#

octave is a pain jus saying

obsidian yacht
#

But if you are more familiar with general IDE then ignore

wheat seal
#

ye but it takes up a lot of space

#

anaconda is just venv or virtualenv module but with a fancy unnecessary gui and all libraries install by default

#

even installing the libraries on your own takes up less space

cerulean ingot
#

i have a api question.

#

with get request api is giving data like more than 10000 in count and with pagiantion. now to use those data I use for loop and request data for every page or is there any other efficient way.

#

i googled a lot but i m not getting the answer

undone flare
#

filedata = np.genfromtxt('data.txt', delimiter=",") this gives an error data.txt not found but I have saved it

#

I am using google colab

#

ping me pls if you know what's wrong

#

np.loadtxt() gives the same error too

mild topaz
#

hello, currently my code is giving me output likepython albania_passport: 100.00% confidence_level: 100.00
this way . how i can make more changes to get my output like
predictions [[0.03083993 0.9471298 0.02203036]] python albania_driving_licence : 0.03083993 albania_passport : 0.9471298 invalid : 0.02203036
this way
i want to get prediction for all labels
my code herepython print("label:", label) predictions = np.argmax(predictions) print(predictions) if (label == prediction): print(f"{label}: {(predictions)*100:.2f}%") logger.debug ("{}: {:.2f}%".format(label, predictions * 100)) confidence_level = predictions * 100 confidence_level1 = "{:.2f}".format(confidence_level) print("confidence_level: ", confidence_level1) logger.debug(f"confidence_level: {confidence_level1}")
my code here https://paste.pythondiscord.com/rogokizezo.py

lapis sequoia
#

predictions are from an sklearn model?

#

predictions [[0.03083993 0.9471298 0.02203036]]
@mild topaz if so you can use model.predict_proba() it gives the confidence probabilities directly

mild topaz
#

@lapis sequoia hello

lapis sequoia
#

yeah

#

?

mild topaz
#

predictions are from an sklearn model?
@lapis sequoia no i am using predictions = model.predict(img) see line 242

lapis sequoia
#

line 245 is what you need

mild topaz
#

prediction_prob = model.predict_proba(img) this ? @lapis sequoia

lapis sequoia
#

predict gives label and predict_proba gives probability. which im assuming this is "predictions [[0.03083993 0.9471298 0.02203036]]"

#

yes

mild topaz
#

prediction_prob: [[0.03083993 0.9471298 0.02203036]] this one u were talking about

#

i want to get my output this waypython albania_driving_licence : 0.03083993 albania_passport : 0.9471298 invalid : 0.02203036

lapis sequoia
#

...its the same isnt it

mild topaz
#

ya but not get the required output @lapis sequoia

lapis sequoia
#

prediction_prob = model.predict_proba(img) did you print this?

mild topaz
#

see prediction_prob: [[0.03083993 0.9471298 0.02203036]] this is what i get @lapis sequoia

lapis sequoia
#

yes but i dont understand your problem

#

you have your probabilties you just need to print it in label : confidence format

mild topaz
#

see first i explain what i want to achieve

#

i want that each label has its corresponding prediction_probability prediction_prob: [[0.03083993 0.9471298 0.02203036]]

#

this way i want my outputpython albania_driving_licence : 0.03083993 albania_passport : 0.9471298 invalid : 0.02203036

#

i want to show probability for each label using prediction_prob: [[0.03083993 0.9471298 0.02203036]] this

#

@lapis sequoia

#

each label has its own prediction_prob value

lapis sequoia
#

its possible you are treating a multiclass problem as a multilabel classification.

#

Cant help you more than this without seeing model training

mild topaz
#

see bro, currently i dont think so u need model training

#

do u have my code?

mild topaz
#

can u atleast give some suggestion how i can fix this issue ? @lapis sequoia

dawn whale
#

Hello, so I have a questionm that's rather about stastics, but maybe someone can help me out here:
Ok, so I have a list of normalized values, asking which year was the worst for their allergies

0 means nearer to present
1 means nearer to the start of their allergies

If they answered, that they didn't notice any change, can I just use 0.5?
I calculated the values by: (2020 - worstyear) / (2020 - first year)

undone flare
#

hey anyone here?

molten hamlet
#

👻

#

Lets say I got matrix and some kernel

#print(mat)
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]

# kernel
kernel = np.array([[1,1,1],
                   [1,1,1],
                   [1,1,1]])

out = np.correlate(mat, kernel)

and I want to use correlate to count sum of each square in matrix 3x3

#

ValueError: object too deep for desired array

#

its only 1d 😐

undone flare
#

So I have a text file named data_ and when I try to do np.genfromtxt('data_.txt', delimiter = ',') or np.loadtxt('data.txt', delimiter = ',') it gives an OSError : data_.txt not found

molten hamlet
#

OSError : data_.txt not found

#

if you could read to end :}

undone flare
#

ik but I have a file named data_.txt

molten hamlet
#

not in this workpath*

undone flare
#

I am using google colab

molten hamlet
#

!d numpy.genfromtxt

undone flare
#

so do I add that file in google colab?

molten hamlet
#

@undone flare you can try is_file = os.path.is_file(file_path) and then print(is_file) you will get True or False depending if it can see file

undone flare
#

ok

undone flare
#

bruh now I am getting new error
ValueError: Some errors were detected !
Line #2 (got 1 columns instead of 5)
Line #4 (got 1 columns instead of 5)

lapis sequoia
#

If all my independent variables are dummy variables should I use something other than linear regression?

undone flare
#

omg it is finally working

#

thank god

#

this took too long to figure

hollow juniper
#

which cloud should I learn if i want to get into machine learning and ds?

zinc cobalt
#

word-clouds, at most py_guido

tawny cradle
#

Hi everyone

#

I made a project on AI for high school

#

Can you help me by giving suggestions?

arctic wedgeBOT
#

Hey @tawny cradle!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .flac, .afdesign, .m4a, .csv.

Feel free to ask in #community-meta if you think this is a mistake.

tawny cradle
#

How can I send pdf here?

#

@lucid hornet

hollow sentinel
#

you click the + button if you're on a machine

undone flare
#

you can't send pdf's

#

@hollow sentinel did you learn numpy?

hollow sentinel
#

@undone flare yes

undone flare
#

@hollow sentinel can you link me the course?

hollow sentinel
undone flare
#

paid oof

hollow sentinel
#

yeah my prof i was doing research w paid for it

#

but this does help

#

I liked it more than I liked the Ng course

undone flare
#

ok I will see if I can buy that lol

#

or I will stick to yt courses

hollow sentinel
#

the Ng course is free

#

but he teaches ML in octave

#

but there's githubs with everything in python

undone flare
#

ok thx for suggestion

hollow sentinel
#

no prob

undone flare
lapis sequoia
#

@lapis sequoia have you considered ANOVA? I'm assuming your dependent variable is continuous

lapis sequoia
#

@lapis sequoia That's what someone else suggested, but I have no idea how to do/implement that. And yes my dependent variable is continuous.

hollow sentinel
#

might be helpful idk

#

this github does ANOVA it might be helpful to look at

lapis sequoia
#

Let me check those out, thank you.

hollow sentinel
#

no problem

lapis sequoia
#

Is it supposed to be that long?

hollow sentinel
#

not sure I haven't done an ANOVA before

#

did that help @lapis sequoia

barren meadow
#

does anyone have a guide/paper/link to some websites or packages that checks on data anomaly or data fidelity?

lapis sequoia
#

@hollow sentinel Yeah trying to make it work now.

tidal bronze
#

with beautiful soup how could I find all links anchored within h3 headers?
This is what I've tried so far:

self.all_links = soup.find_all("h3",
                                       {"class": "entity-title"},
                                       limit=41).a.get("href")

seems that the .a is not working

hollow sentinel
summer holly
#

Hi everyone. I'm trying to deploy a flask app with my custom BERT keras model which takes tweets as input. The model runs perfectly by itself but whenever I try to make the model.predict() function call within the flask app, it always results in the flask app being terminated. Any help/suggestions would be appreciated. Thanks!

#

I specifically need help understanding why pycache is reloading and how can I prevent that

lapis sequoia
#

@summer holly we would need to see more source code, particularly where the model.predict() is called within your views.py (assuming it's called views.py) file

lapis sequoia
#

@summer holly one thing you could try is to execute flask run --no-reload and see if it solves the problem or specify use_reloader=False in the app.run() argument

tawdry sentinel
#

Hi, I'm starting in Data Science and First I'm studying CSV

#

I'm trying to edit a CSV and it always gets messy when I use to_csv

lapis sequoia
#

what gets messy?

tawdry sentinel
lapis sequoia
#

use pathlib to get the path of the object btw

tawdry sentinel
#

use pathlib to get the path of the object btw
@lapis sequoia Ok, I'm using Visual Studio, can this influence?

lapis sequoia
#

the IDE doesn't influence the libraries you import

#

This problem is not related to what you're asking btw.

#

I'm still not sure what the problem is

#

because I don't know how the .csv looks like, how you delimit the rows and columns etc.

#

maybe use the same encoding utf-8

tawdry sentinel
#

I have multiple columns before importing
I import the file, convert it to a dataframe and after saving it it messes everything up in just two columns

lapis sequoia
#

it's likely something to do with the delimiter setting, change it to delimiter=" " or delimiter="\t" and see if that helps

#

and use pandas.read_csv()

tawdry sentinel
#

Like this:
arquivo = open('c:\Users\Pichau\Downloads\caso_full.csv', encoding="utf-8", delimiter="\t")?

lapis sequoia
#

arquivo = pd.read_csv('c:\Users\Pichau\Downloads\caso_full.csv')

#

try this first

#

then save it using pd.to_csv() and see if the columns are preserved

tawdry sentinel
#

ok

lapis sequoia
#

if not then try pd.read_csv('...', delimiter=' ') and pd.to_csv('...', sep=' ')

summer holly
#

@summer holly one thing you could try is to execute flask run --no-reload and see if it solves the problem or specify use_reloader=False in the app.run() argument
@lapis sequoia
--no-reload worked. Thanks alot!

tawdry sentinel
#

worked

#

tks so much @lapis sequoia

#

^^

hollow sentinel
#

hey guys how do you invite people to this server

bitter harbor
#

send them that

hollow sentinel
#

thank you @bitter harbor

steel talon
#

Hey guys I'm using matpotlib for an image processing assignment and ideally we are suppose to make a function called chromeKey() which you can guess is used to remove a green screened background. Here's my code.

#

I have no idea what to do when I'm calling my function inorder to implement the two images together

#

I don't really expect an answer I just need help

merry ridge
#

I’m not sure what your question is

merry ridge
#

The usual way to do this is to take a linear combination of the two images at each pixel as a function of the pixel data at a point and possibly it’s neighbors to smooth the edges

#

It looks like right now you are simply choosing one or the other based on the intensity of the color channel

midnight skiff
#

Hey, does anyone know how to load a bunch of random seaborn subplots into one plot?

lapis sequoia
#

I have no idea what to do when I'm calling my function inorder to implement the two images together
@steel talon hey you have arguments in your chromekey function which are not optional so if you just call the function you eventually get error

slender eagle
#

hi

undone flare
#

hello

final trellis
#

What exactly is data science?

undone flare
#

Data science is a field that uses scientific methods, algorithms, systems, etc. to extract knowledge from structured and unstructured data. (Big data)

jade lava
#

Something that's pretty strange to me is that the recommended way to preprocess text in Deep Learning with Python for multi-class classification is to do one-hot on the encoded text. Isn't that what the Embedding layer is supposed to do?

grave path
#

I just loaded a dataset using read_csv in pandas however I realised there is '?' instead of null how do i get rid of all the rows that contain '?'

lapis sequoia
#

df = df[(df.T != '?').any()] should work

grave path
#

what is T?

lapis sequoia
#

transpose

grave path
#

its alright now I tried some from stackoverflow but finally found one that works

#

data = data.replace('?',pd.np.nan)

lapis sequoia
#

actually don't need transpose

#

cool

undone flare
lapis sequoia
#

@undone flare what is the question?

molten hamlet
#

probably yes

#

jupyter has some tricky user friendly features

undone flare
#

but this looks cool

molten hamlet
#

if you plot something and do not show() jupyter will display image anyway

undone flare
#

if I print it is messy

molten hamlet
#

print head() maybe

#

😄

undone flare
#

but this table form looks sick

molten hamlet
#

nvm, that is way better anyway :d

#

whats is total?

undone flare
#

total?

#

rows you mean?

molten hamlet
#

total ammount?

#

that column named total

undone flare
#

idk

molten hamlet
#

xD

#

ok 🙂

undone flare
#

I just downloaded this dataset

#

idk anything about pokemon lol

#

just learning pandas

molten hamlet
#

👍

lapis sequoia
#

Anyone here have experience with openpyxl?

gritty wedge
#

hey guys

#

can you tell me what math skills I need for learning machine learning?

undone flare
#

Linear Algebra, Stats, Prob, Calculus

gritty wedge
#

in which grade are these taught?

undone flare
#

11th and 12th

#

Stats and Prob is almost in all higher grades

gritty wedge
#

not in my grade

undone flare
#

which grade

gritty wedge
#

i am in 5th grade

undone flare
#

oh then yea

gritty wedge
#

yea

undone flare
#

It is too soon for you to learn ML

gritty wedge
#

i know

undone flare
#

bcuz you might not understand high level math

gritty wedge
#

o dont worry...i already know most high level math

undone flare
#

huh..

gritty wedge
#

my elder sister tutors me after school

undone flare
#

do you know matrix multiplication

gritty wedge
#

no ;-;

undone flare
#

bcuz that's the first thing in Linear Algebra

gritty wedge
#

i havent learnt that yet

#

i am learning about calcuclus 1 now

undone flare
#

ok

gritty wedge
#

do u have any tips for ML?

jade lava
#

(Keras) Input, TextVectorization, and Embedding layers are a bit difficult to wrap my head around. Every time I feel like I get it, something throws a wrench in my mental model and I have to start over.

hollow sentinel
#

@gritty wedge Python for DS/ML Bootcamp by Jose Portilla on Udemy

gritty wedge
#

ok

#

thnx

bitter harbor
#

Linear Algebra, Stats, Prob, Calculus
This first

#

Plz

gritty wedge
#

okay 🙂

#

is the course good for beginners?

bitter harbor
#

Yea that's what it's meant for
The only reason I'd highly suggest you learn what's going on behind the scenes os, a lot of concepts (universal across ml) can't be fully understood without knowing what's actually going on

gritty wedge
#

o.....

#

thnx for ur suggestions 🙂

hollow sentinel
#

oh yeah what @bitter harbor said too

gritty wedge
bitter harbor
#

If you want to have a look at the 'basics' of ml, id highly recommend watching 3b1b's series' on the topics

gritty wedge
#

is this the course?

#

o thnx man!

bitter harbor
#

Thatd be a good course too

hollow sentinel
#
import pandas as pd

# Path of the file to read
iowa_file_path = '../input/home-data-for-ml-course/train.csv'

# Fill in the line below to read the file into a variable home_data
home_data = pd.read_csv("iowa_file_path")

# Call line below with no argument to check that you've loaded the data correctly
step_1.check()
#

does anyone see what's wrong with that

bitter harbor
#

Not to discourage you but ml's got quite a few layers that you should learn before jumping in, that course will help with python implementations of the basic mechanics

gritty wedge
#

any course recommendations then?

bitter harbor
#

@hollow sentinel don't call the variable as a string

#

University? 🙃

gritty wedge
#

hmmmmm.......thats too far

#

and no...coz i am just 12

bitter harbor
#

Na there're quite a few ways to learn it, I learnt what I know about it through 3b1b

gritty wedge
#

who is 3b1b?

bitter harbor
#

It's pretty heavy tho regardless

hollow sentinel
#

he's a youtuber

gritty wedge
#

oh

bitter harbor
#

3 blue 1 brown

gritty wedge
#

o lol....among us memes pydis_dye

bitter harbor
#

Na lol he's been around for a bit longer than the game :)

gritty wedge
#

lollemon_ping

hollow sentinel
#

i would also recommend the kaggle mini courses

#

they're great for an intro to ML

gritty wedge
#

thnx 🙂 .....i cant take u seriously with that profile pic....no offence

hollow sentinel
#

it's just tony stark lmao

gritty wedge
#

but it looks funny too

hollow sentinel
#

that's the point

gritty wedge
#

lmao

jade lava
#

What I'm trying to figure out these days is how exactly these model constants are determined. I suppose I just need to find more tutorials about multiclassification models for things like the Reuters dataset.

hollow sentinel
#
# What is the average lot size (rounded to nearest integer)?
avg_lot_size = home_data["LotArea"].mean()
#print(avg_lot_size)

# As of today, how old is the newest home (current year - the date in which it was built)
newest_home_age = home_data["YearBuilt"].mean()

# Checks your answers
step_2.check()
#

Incorrect: Incorrect value for avg_lot_size: 10516.828082191782

#

how is lot area not the lot size

#

Kaggle is so stupid

bitter harbor
#

Would you not want the standard deviation of the lot sizes?

#

Also I'm assuming you'd want some form of Sig digs

hollow sentinel
#

AttributeError: 'Series' object has no attribute 'stdev'

bitter harbor
#

Do you have to call it in 1 line?

hollow sentinel
#

what

#

oh do you mean find the mean first

bitter harbor
#

statistics.stdev(dataset)

hollow sentinel
#

i have to import statistics too right

bitter harbor
#

Yea

hollow sentinel
#
TypeError: can't convert type 'str' to numerator/denominator
#

there's strings in the dataset

#

dataset["lot_size"]

bitter harbor
#

How did you call the mean then?

hollow sentinel
#

avg_lot_size = home_data["LotArea"].mean()

bitter harbor
#

I got that part but how did you call it if there's a string

hollow sentinel
#

oh i meant when you do stdev the whole dataset there's strings in the dataset

bitter harbor
#

Oh ya you're only doing it to the lot size

#

statistics.stdev(dataset, xbar)

#

That's with both args

#

Xbar being the median

hollow sentinel
#

i don't think they're taking a standard dev

bitter harbor
#

:(

hollow sentinel
#

i looked up the answers

#

and they just want me to round up the 10516.828082191782

#

idek how to do that

#

i tried calling ,round()

bitter harbor
#

To how many decimal points
Also this is a prime example of where you should use stdev why is that wrong

#

Or is it "too advanced"

hollow sentinel
#

idk all they want is 10517

#

so how do you cast in python again

bitter harbor
#

#bot-commands message

#

casting to an int floors the decimal so be careful with that

hollow sentinel
#

thanks

#
newest_home_age = 2020 -(home_data["YearBuilt"].mean())
#

the correct answer is 8

heady hatch
#

8

bitter harbor
#

what're you getting?

hollow sentinel
#

48.73219178082195

bitter harbor
#

niccce

hollow sentinel
#

lmaoooo

#

built for data science you already know 😆

heady hatch
#

Are you supposed to take the mean of the year built or the subtraction?

hollow sentinel
#

the directions say current year - the date in which it was built

#

ohhhh

#

nope still wrong

heady hatch
#

On the other hand nice, only +/- 41.

bitter harbor
#

what does home_data["YearBuilt"].mean() return?

hollow sentinel
#

1971.267808219178

bitter harbor
#

humor me and try with stdev

hollow sentinel
#

AttributeError: 'Series' object has no attribute 'stdev'

#

you mean home_data["YearBuilt"].stdev() right

bitter harbor
#

it's not an attribute

#

statistics.stdev(home_data["YearBuilt"])

hollow sentinel
#

Incorrect value for newest_home_age: 30.202904042525258

#

it should be 8

bitter harbor
#

I mean we're getting closer ¯_(ツ)_/¯

hollow sentinel
#

lmao i'm sorry

#

I thought this would be easy and I would get it done in like 2-3 days

#

it's a mini course lmao

bitter harbor
#

it's all good lol this takes time, are there any outliers you're expected to clean?

hollow sentinel
#

no they didn't ask me to

#

there are some columns i'd drop

#

but they didn't ask so

bitter harbor
#

weird idk sorry

hollow sentinel
#

it's ok

#

i have the answers so

#

lmao the correct answer was 10

#

how tf they be getting 10

#

i feel like i'm in math class rn

#

kaggle is lame

heady hatch
#

@hollow sentinel What's the min? like home_data['YearBuilt'].min

hollow sentinel
#

1872 @heady hatch

heady hatch
#

Oh what about the max?

hollow sentinel
#

2010

heady hatch
#

Yea lol

#

I think that's how they got the 10.

bitter harbor
#

that'd do it

hollow sentinel
#

OH HOW OLD IS THE NEWEST HOME

#

man i'm stupid

heady tide
#

More or less, if I would want to compute the Tf-IDF vectorizer for 12 GBs of pdfs, how much time will that take? should I consider cloud computing?

heady hatch
#

If you have the resource, I would go cloud.

But tfidf can also be done on regular machine itself. Probably need to use a generator instead if you don't have the memory.

hollow sentinel
#

I don't think I like the kaggle mini courses

#

find it kind of boring

hollow sierra
heady hatch
#

@hollow sierra what do you need help with exactly?

hollow sierra
#

I have build ML model and exported to pickle file now i want to use it in a web app to make predictions .I want to use Node.js in web app , So is it possible to use this pickle model in javascript enviorment.

#

@heady hatch

heady hatch
#

Right right.

#

What kind of ML model is it? and from what library? Or was it written with native Python?

hollow sierra
#

Yes, sklearn

#

of python

#

It is a regression model.

heady hatch
#

There are also couple pickle converters.

#

But pickle is a finicky object.

#

You can try to convert it first and see how it goes.

#

If not then I would try one of the options above.

hollow sierra
#

Ok thanks @heady hatch , I have seen articles and tutorials on deployment of pickle ML model ,All the time flask was used .Is it easier or recommended to use python based web-library when u have python ml models?

heady hatch
#

It's recommended to use Python because of the consistency, pickling is weird because it takes environment into account.

#

I don't know how Python environment will work transitioning to non-Python environments.

#

Plus you don't have a language switch.

#

You don't necessary need Python to do full backend.

you can set ML up as a microservice

#

and have your backend call the ML API.

hollow sierra
#

Have deployed ml models?If yes, how ?

heady hatch
#

Are you asking if I have deployed?

hollow sierra
#

yes

heady hatch
#

Those articles and tutorials should serve as a good entry point.

#

You can create a simple backend with Flask, add gunicorn or uvicorn on top of the framework.

#

load ml model and set prediction as an api endpoint.

hollow sierra
#

I got your point and it cleared some doubts of mine. Thanks @heady hatch for your time and help.

jade lava
#

Sanity check: if different approaches to how we structure our layers yield the same accuracy results, then doesn't that imply that our dataset isn't good enough in quality to be able to predict what you're trying to do?

umbral pollen
#

hi

plush zenith
#

Hi sorry

#

this is the place where i can ask more python math related things?

opal ferry
#

looking for some feedback of how dumb it would be to use a function like this to shrink a df's memory footprint

def auto_cats(df):
  for col in df.columns:
    curr_usage = df[col].memory_usage(deep=True)
    if curr_usage > df[col].astype('category').memory_usage(deep=True):
      df[col] = df[col].astype('category')
  return df.info(memory_usage='deep')```
whole vortex
#

Hey guys, I was wondering if anyone here has some half decent experience with using seaborn / pandas

#

I'm trying to complete some tasks but I'm unsure if the way I'm representing the data is the best way? Even I'm not really understanding the graphs that this spits out - would've thought they'd have to be somewhat interpretable

#

I'm quite new to both of these libraries so just figuring things out - any advice, ideas, anything really would be greatly appreciated

lapis sequoia
#

@jade lava Hey, just saw your reaction now for my openpyxl question. Does it take long to create a code to automate reports/tasks? I have weekly reports I have to send out and it's a headache to go through the process of having to clean them up, same repetitive task for the most part.

jade lava
#

Not enough info to answer

lapis sequoia
#

Not enough info to answer
@jade lava What info do you need?

jade lava
#

You have to use VBA for macros, but you can read and write Excel documents, sure...

lapis sequoia
lapis sequoia
#

Not asking about VBA

hollow sentinel
#
# from _ import _
from sklearn.tree import DecisionTreeRegressor
#specify the model. 
#For model reproducibility, set a numeric value for random_state when specifying the model
iowa_model = DecisionTreeRegressor()

# Fit the model
iowa_model.fit(X,y)

# Check your answer
step_3.check()
#

so I have this

#

Incorrect: You forgot to set the random_state.

#

sike i figured it out sorry

heady hatch
#

I think you forgot to set the random_state.

#

:^)

humble flame
#

Hello

#

I am new to data science but interested

#

Anyone have any advice on how to start or where to go?

heady hatch
#

Do you like books or courses?

humble flame
#

Well, I would prefer courses but books are fine

#

I have good experience with python itself already, I am just new to the topic of data science

heady hatch
#

hmm in terms of courses, I'm not too sure about MOOC. But I've heard people liking Data Camp.

#

But I'm sure there are tons of MOOC.

#

pinging @hollow sentinel , they've had lots of experience there.

humble flame
#

Hi, thanks a lot, I will try them out. Any good books too?

undone flare
#

I don't like Data Camp, it's like more of fill in the blank type

heady hatch
#

I think this is my go to for intro to data science with Python.

#

You're implementing algorithms from scratch.

humble flame
#

Brilliant, thanks a lot

hollow sentinel
#

I’d recommend python for data science & machine learning bootcamp

#

and Kaggle mini courses

narrow flume
#

hello does anyone know well about Matplotlib in python?

hollow sentinel
#

thanks @heady hatch for the book rec bc this looks really good

#

i might do this over the stanford ML course

somber bane
#

can someone help me out with pandas

#

The following code does save and add a new row into the file

#

but when I re run the program, it will not create a new row, it will just replace the row of data that was created previosul

agile wing
#

someone image recognize ice cream for me, cuz I want ice cream

#

ughh

narrow flume
#

Can anyone help me with Matplotlib

austere swift
#

whats your question

heady hatch
#

Can we search for index by values in pandas DataFrame?

#

ie given a dataframe

a b
1 0
3 2
5 4

get the location of value 0.

I'm also curious how would duplicate values work.

hasty grail
#

From my brief search on Google it seems that you're supposed to convert it into np.ndarray and then use np.ndarray.nonzero to get the indices you're interested in

heady hatch
#

I'm a bit confused. How would I use nonzero to search?

hasty grail
#

in your example you would first get the column 'b' from your DataFrame, then convert it to NumPy

#

afterwards you can create a boolean mask and get nonzero of that mask

heady hatch
#

Oh but how would I know what column it is in?

hasty grail
#

if you look at the docs for nonzero you'd understand

heady hatch
#

Maybe I'm misunderstanding something.

you would use nonzero after getting the b column, right?