velvet thorn Nov 1, 2020, 2:54 AM

#

@velvet thorn I need to compare across groups to count intersections. Any suggestions on how to go about it?
@heady hatch can you elaborate

heady hatch Nov 1, 2020, 3:21 AM

#

@velvet thorn

Given two columns, a and b.

Grouping by a, we're going to have a list of values from b for each group in a.

then I want to compare across the different groups in a to see how many elements from b intersect for each group in a.

ie.

a b
0 4
0 5
0 6
1 4
1 5
2 5
2 6

Then
0: [4, 5, 6]
1: [4, 5]
2: [5, 6]

0 and 1 share two elements, 4 and 5
0 and 2 share two elements, 5 and 6
1 and 2 share one element, 5

paper niche Nov 1, 2020, 4:02 AM

#

!e

from io import StringIO
import itertools as it
import pandas as pd

df = pd.read_csv(StringIO("""
a b
0 4
0 5
0 6
1 4
1 5
2 5
2 6
"""), sep=' ')

for a1, a2 in it.combinations(df['a'].unique(), 2):
    intersection = set(df.loc[df['a'] == a1, 'b']) & set(df.loc[df['a'] == a2, 'b'])
    print(f"{a1} and {a2} share {len(intersection)} elements: {intersection}")

arctic wedgeBOT Nov 1, 2020, 4:02 AM

#

@paper niche :white_check_mark: Your eval job has completed with return code 0.

001 | 0 and 1 share 2 elements: {4, 5}
002 | 0 and 2 share 2 elements: {5, 6}
003 | 1 and 2 share 1 elements: {5}

paper niche Nov 1, 2020, 4:02 AM

#

@heady hatch do you mean something like this?

heady hatch Nov 1, 2020, 5:34 AM

#

@paper niche right right. Is this going to run through at O(N^2)?

#

I'm trying to find a solution quicker than it.

velvet thorn Nov 1, 2020, 6:17 AM

#

@heady hatch hm I think that's probably not the way to do it

#

let me think

#

like grouping by a is not the most efficient solution

#

do you need to count all possible intersections?

slate scroll Nov 1, 2020, 6:18 AM

#

Seems like a classic map reduce problem

heady hatch Nov 1, 2020, 6:35 AM

#

I don’t know if I need to count all possible intersections, but just the ones with at least one intersection.

#

Oh? How would you go about it in terms of map reduce?

slate scroll Nov 1, 2020, 6:47 AM

#

Well you would just map all nodes based on column a, the reduce would be counting the overlaps in b

heady hatch Nov 1, 2020, 7:05 AM

#

I think I’m not familiar enough with map reduce.

I’m thinking of map function and reduce function from python.

#

On the other hand, isn’t that also O(n^2)?

#

Since when counting all the overlaps with b values, you still need to go through each a value n times.

velvet thorn Nov 1, 2020, 7:09 AM

#

I don’t know if I need to count all possible intersections, but just the ones with at least one intersection.
@heady hatch hm.

#

that makes the problem different

#

so basically you want to find the values of b that correspond to more than one unique value of a?

heady hatch Nov 1, 2020, 7:11 AM

#

Yes(I think.)

To give you guys some context.

Each product has a category. And I have two stores and each store has their own set of categories.

The idea is to find categories from each stores that share the most amount of product with the other category.

velvet thorn Nov 1, 2020, 7:11 AM

#

or do you mean the unique values of a which correspond to at least one value of b that is shared with another unique value of a

heady hatch Nov 1, 2020, 7:11 AM

#

Ahh I think it sounds like the latter.

velvet thorn Nov 1, 2020, 7:12 AM

#

okay maybe with an example

#

it would be easier

#

can you provide some sample data and your expected result

heady hatch Nov 1, 2020, 7:16 AM

#

Yea definitely.
If you guys don’t mind, I have to anonymizes couple things.

But data is something like this.

Two datasets, each one with a product and the categories they’re in. They can be in more than one category.

So dataset is something like ...

Dataset 1
item -> category
apple -> [a, b, c]
orange -> [a, d, e]

Dataset 2
Item -> category
Apple -> [1, 7, 9]
Watermelon -> [1, 4]
Banana -> [1, 5,6]
Orange -> [1, 2]

#

And the result that we want to get is something like a utility matrix of sort.

Category from dataset 1 vs category from dataset 2

a b c d e
1[apple, orange][apple]...
4[][][]...
5
6
7
9

#

I don’t know if this helps.

#

The way I was thinking of was compute all the items in the categories and go through each category in the other dataset to see how many items they would share.

#

But it would be O(cate1 * cate2).

#

~~However thinking about it, I can filter down a bit of the categories.~~

Not sure how I would filter, now thinking about it twice. Hahaha

velvet thorn Nov 1, 2020, 7:53 AM

#

wait

#

so there are lists in your DF?

winged lark Nov 1, 2020, 9:25 AM

#

Hello, I'm having some trouble with my dataframes. I have tried playing around with indexes, transposing and the like. For now I just want to plot either of the points in the first row.

📎 unknown.png

burnt prawn Nov 1, 2020, 9:53 AM

#

Release 0.0.2 of NLP Profiler is now available, see https://pos.li/2h39ue

PyPi: https://pos.li/2h39uf
Github: https://pos.li/2h39ug
Gitter: https://pos.li/2h3b1o

river hazel Nov 1, 2020, 10:51 AM

#

can anyone take a look at the train function and does anyone know how to fill in self.params['W']? this is for my class on linear regression 😭```py

TODO: Use the gradients in the grads dictionary to update the

        # parameters of the model (stored in the dictionary self.params)        #
        # using stochastic gradient descent. You'll need to use the gradients   #
        # stored in the grads dictionary defined above.                         #
        
        self.params['W'] = ???
        
        #   END OF YOUR CODE

this is my attempt at it, but it doesnt yield the supposed values from the notebook  (loss plot is increasing instead of decreasing XD) https://github.com/poisonivysaur/ml-class/blob/main/Linear%20Regression/linear_regression.py

GitHub

poisonivysaur/ml-class

Contribute to poisonivysaur/ml-class development by creating an account on GitHub.

heady hatch Nov 1, 2020, 2:44 PM

#

@velvet thorn kind of messy but yea. Hahaha

For each item I was going to explode the list of categories and regroup the categories. So it would be grouped by categories instead by items.

wheat seal Nov 1, 2020, 2:51 PM

#

how do i optimize my yolov3 model

#

its very slow on my raspberry pi

hollow sentinel Nov 1, 2020, 3:08 PM

#

so I just read that the Andrew Ng class is on Octave

#

what the hell is Octave guys

river hazel Nov 1, 2020, 3:08 PM

#

matlab but free

hollow sentinel Nov 1, 2020, 3:08 PM

#

can't I just use Jupyter notebook like any sane human being

#

jupyter notebook is my baby

#

ugh

river hazel Nov 1, 2020, 3:09 PM

#

do u think u can take a look at my linear regression code? above ^

wheat seal Nov 1, 2020, 3:09 PM

#

i tried that course

hollow sentinel Nov 1, 2020, 3:09 PM

#

did you find it good

wheat seal Nov 1, 2020, 3:09 PM

#

second week exercises are rigged

hollow sentinel Nov 1, 2020, 3:09 PM

#

oh no

#

i heard answers are on github repositories

wheat seal Nov 1, 2020, 3:09 PM

#

mostly because oactave is soooo had to use

hollow sentinel Nov 1, 2020, 3:09 PM

#

dude

#

i just wanna use jupyter notebook

wheat seal Nov 1, 2020, 3:09 PM

#

i heard answers are on github repositories
@hollow sentinel if they find out u use that they remove u from the course

#

lamo ok sorry

hollow sentinel Nov 1, 2020, 3:09 PM

#

i know

#

I don't get it

#

what is the point of making you handwrite your own linear regression without sci kit learn

wheat seal Nov 1, 2020, 3:10 PM

#

exactly

hollow sentinel Nov 1, 2020, 3:10 PM

#

sci kit learn is there for a reason lol

wheat seal Nov 1, 2020, 3:10 PM

#

lol

hollow sentinel Nov 1, 2020, 3:10 PM

#

yeah um I may try the columbia course first

wheat seal Nov 1, 2020, 3:11 PM

#

anyway i recommend google colab instead of jupyter

#

its so much better

hollow sentinel Nov 1, 2020, 3:11 PM

#

oh i've heard of that

#

the google crash course uses it

wheat seal Nov 1, 2020, 3:11 PM

#

its basically jupyter on steroids

#

ye

hollow sentinel Nov 1, 2020, 3:11 PM

#

i don't want to use octave rn lol i'm still a beginner

#

to machine learning

wheat seal Nov 1, 2020, 3:11 PM

#

dont use it

#

i had never heard of it before Ng's course

hollow sentinel Nov 1, 2020, 3:12 PM

#

all this time i thought it was just python

river hazel Nov 1, 2020, 3:12 PM

#

^

wheat seal Nov 1, 2020, 3:12 PM

#

and if a programming language is paid dont even bat an eye (matlab)

hollow sentinel Nov 1, 2020, 3:12 PM

#

yep

#

i will check out google colab'

wheat seal Nov 1, 2020, 3:12 PM

#

ye

#

its python so pog

hollow sentinel Nov 1, 2020, 3:12 PM

#

thanks

wheat seal Nov 1, 2020, 3:12 PM

#

wlcm

undone flare Nov 1, 2020, 3:12 PM

#

hey guys

hollow sentinel Nov 1, 2020, 3:12 PM

#

hello

wheat seal Nov 1, 2020, 3:13 PM

#

if you're training an ML model then u can use their public GPUs too

#

in colab

undone flare Nov 1, 2020, 3:13 PM

#

I am trying to learn data analysis can I ask questions related to it here?

hollow sentinel Nov 1, 2020, 3:13 PM

#

yes this is a data science chat

undone flare Nov 1, 2020, 3:13 PM

#

So

#

I downloaded jupyter lab using pip install jupyterlab

hollow sentinel Nov 1, 2020, 3:14 PM

#

if you're training an ML model then u can use their public GPUs too
@wheat seal that's cool

undone flare Nov 1, 2020, 3:14 PM

#

and now I want to use notebooks.ai

#

how can I do that?

hollow sentinel Nov 1, 2020, 3:14 PM

#

lmao I've stuck w Jupyter notebook so far idek how to help

#

https://notebooks.ai/

Notebooks AI | Jupyter Notebooks as a Service

Free Data Science environment in the cloud.

#

are you talking about this

undone flare Nov 1, 2020, 3:15 PM

#

no sign up option

wheat seal Nov 1, 2020, 3:15 PM

#

hmm

#

so you're looking to run ipynb notebooks in the cloud?

undone flare Nov 1, 2020, 3:15 PM

#

yes

wheat seal Nov 1, 2020, 3:15 PM

#

well

#

ig its time to recommend google colab again

hollow sentinel Nov 1, 2020, 3:16 PM

#

does google colab do that

undone flare Nov 1, 2020, 3:16 PM

#

wait I can't even sign in xD

hollow sentinel Nov 1, 2020, 3:16 PM

#

lmaooooooo

wheat seal Nov 1, 2020, 3:16 PM

#

yes

#

try google colab

hollow sentinel Nov 1, 2020, 3:16 PM

#

google colab seems like the repl.it of data science

wheat seal Nov 1, 2020, 3:16 PM

#

ye

undone flare Nov 1, 2020, 3:16 PM

#

How to do that?

wheat seal Nov 1, 2020, 3:16 PM

#

https://colab.research.google.com

Google Colaboratory

#

go to this url and sign in with ur google account

undone flare Nov 1, 2020, 3:16 PM

#

do I need to install anything other than jupyterlab?

wheat seal Nov 1, 2020, 3:17 PM

#

u dont need to install ANYTHING to use googe colab

undone flare Nov 1, 2020, 3:17 PM

#

ok

river hazel Nov 1, 2020, 3:17 PM

#

just browser

#

google docs for python

wheat seal Nov 1, 2020, 3:17 PM

#

lol ye

#

couldnt get simpler

undone flare Nov 1, 2020, 3:18 PM

#

so I choose new notebook to start right

hollow sentinel Nov 1, 2020, 3:18 PM

#

yes

wheat seal Nov 1, 2020, 3:18 PM

#

yes

#

u can even upload nbs

hollow sentinel Nov 1, 2020, 3:18 PM

#

if you're using google colab i would recommend google's machine learning crash course

wheat seal Nov 1, 2020, 3:18 PM

#

ye

#

mostly all ML tutorials online also use google colab

hollow sentinel Nov 1, 2020, 3:18 PM

#

oh Portilla uses jupyter notebook lol

wheat seal Nov 1, 2020, 3:18 PM

#

🤮

hollow sentinel Nov 1, 2020, 3:18 PM

#

heyyyy

undone flare Nov 1, 2020, 3:19 PM

#

bruh this was easier than I thought xD idk why someone recommended me notebooks.ai

hollow sentinel Nov 1, 2020, 3:19 PM

#

he's good

wheat seal Nov 1, 2020, 3:19 PM

#

lmao

hollow sentinel Nov 1, 2020, 3:19 PM

#

Portilla is a G

#

we stan for Portilla

wheat seal Nov 1, 2020, 3:19 PM

#

kekw

#

:kekw:

#

aww man i need nitro

undone flare Nov 1, 2020, 3:19 PM

#

thx guys

wheat seal Nov 1, 2020, 3:19 PM

#

np

hollow sentinel Nov 1, 2020, 3:19 PM

#

no problem

undone flare Nov 1, 2020, 3:19 PM

#

now I can finally start coding xD

hollow sentinel Nov 1, 2020, 3:20 PM

#

is this your first time doing machine learning? @undone flare

undone flare Nov 1, 2020, 3:20 PM

#

yea

hollow sentinel Nov 1, 2020, 3:20 PM

#

oh that's fun

#

haha I started a couple weeks ago

wheat seal Nov 1, 2020, 3:20 PM

#

same

hollow sentinel Nov 1, 2020, 3:20 PM

#

when I first came here I couldn't make a matplotlib pie chart properly

#

so I got an internship interview w CUNA Mutual group and they asked me if i knew any algorithms and i said uhhhhhhhh i make graphs

wheat seal Nov 1, 2020, 3:21 PM

#

can relate

hollow sentinel Nov 1, 2020, 3:21 PM

#

yeah safe to say I didn't get the job

undone flare Nov 1, 2020, 3:21 PM

#

does executing things take time firstly or it's just my laptop

wheat seal Nov 1, 2020, 3:21 PM

#

not the job part im just a kid

hollow sentinel Nov 1, 2020, 3:21 PM

#

depends on what you're executing

wheat seal Nov 1, 2020, 3:22 PM

#

ye

undone flare Nov 1, 2020, 3:22 PM

#

I executed 2+3 xD

hollow sentinel Nov 1, 2020, 3:22 PM

#

uhhhhhhhhhhh

wheat seal Nov 1, 2020, 3:22 PM

#

u need ur lapy checked

undone flare Nov 1, 2020, 3:22 PM

#

idk it still works after 7 years

wheat seal Nov 1, 2020, 3:22 PM

#

even more reason to get it checked

hollow sentinel Nov 1, 2020, 3:23 PM

#

yeah i would be sus if my machine lasts that long

wheat seal Nov 1, 2020, 3:23 PM

#

only macs last that logn

#

what computer u have

undone flare Nov 1, 2020, 3:23 PM

#

mine is windows

wheat seal Nov 1, 2020, 3:23 PM

#

omg

hollow sentinel Nov 1, 2020, 3:23 PM

#

F

wheat seal Nov 1, 2020, 3:23 PM

#

we have met the messaiah

undone flare Nov 1, 2020, 3:23 PM

#

that also Win7 Ultimate lol

hollow sentinel Nov 1, 2020, 3:23 PM

#

lol i got bullied in college for having a mac

wheat seal Nov 1, 2020, 3:23 PM

#

bruh

hollow sentinel Nov 1, 2020, 3:24 PM

#

everyone was like who uses a mac to code

wheat seal Nov 1, 2020, 3:24 PM

#

i still get bullied by my friends while playing mc even tho i get higher fps than them

undone flare Nov 1, 2020, 3:24 PM

#

with 4gb ram oof

hollow sentinel Nov 1, 2020, 3:24 PM

#

you game on your mac????

#

HOW

wheat seal Nov 1, 2020, 3:24 PM

#

yes

hollow sentinel Nov 1, 2020, 3:24 PM

#

mine would melt

wheat seal Nov 1, 2020, 3:24 PM

#

uhh

#

settings change

#

i lower my settings lol

hollow sentinel Nov 1, 2020, 3:24 PM

#

mine does not like gaming

#

even flash games

#

i can hear the fans go off

#

anyways back to ML

wheat seal Nov 1, 2020, 3:26 PM

#

lmao

#

ye

#

well if the fans go off its not really a bad thing

#

back to ml

hollow sentinel Nov 1, 2020, 3:28 PM

#

me knowing I won't understand TF without calculus and lin alg

undone flare Nov 1, 2020, 3:28 PM

#

bruh do I have to learn jupyter notebook before other stuff?

hollow sentinel Nov 1, 2020, 3:28 PM

#

jupyter notebook and google colab is like the same thing

#

except google colab is the cloud

undone flare Nov 1, 2020, 3:28 PM

#

I mean that only

hollow sentinel Nov 1, 2020, 3:28 PM

#

well normally you would learn how to visualize data

#

then you would learn how to clean data

#

and finally machine learning

#

and then all those niche topics like NN, NLP

undone flare Nov 1, 2020, 3:29 PM

#

👍

hollow sentinel Nov 1, 2020, 3:29 PM

#

yessir

#

are you using a course to learn your stuff?

undone flare Nov 1, 2020, 3:30 PM

#

yes

hollow sentinel Nov 1, 2020, 3:30 PM

#

nice what course

undone flare Nov 1, 2020, 3:30 PM

#

yt freecodecamp

hollow sentinel Nov 1, 2020, 3:30 PM

#

oh

#

idk never used that before

undone flare Nov 1, 2020, 3:31 PM

#

Data Analysis with Python - Full Course for Beginners (Numpy, Pandas, Matplotlib, Seaborn)

hollow sentinel Nov 1, 2020, 3:32 PM

#

nope i just went straight to udemy

undone flare Nov 1, 2020, 3:32 PM

#

Does udemy have free courses?

hollow sentinel Nov 1, 2020, 3:32 PM

#

some but i wouldn't call them amazing

wheat seal Nov 1, 2020, 3:32 PM

#

i learned python from freecodecamp course

hollow sentinel Nov 1, 2020, 3:33 PM

#

lol i learned python from college big oof

#

biggest mistake

undone flare Nov 1, 2020, 3:33 PM

#

oof

wheat seal Nov 1, 2020, 3:34 PM

#

oof

#

#help-cake if anybody can help me out

undone flare Nov 1, 2020, 3:37 PM

#

pls tell some shortcuts for google colab

#

like making new cell shortcut

wheat seal Nov 1, 2020, 3:39 PM

#

all the keyboard shortcuts are listed in the menu bar

#

right below ur notebook name

hollow sentinel Nov 1, 2020, 3:41 PM

#

i would recommend downloading a Kaggle dataset and trying to do visualizations off that

#

while you're following the video

undone flare Nov 1, 2020, 3:42 PM

#

Ctrl+M B what does that mean

#

pressing M B keys together?

#

ok I got it

#

is datacamp good?

hollow sentinel Nov 1, 2020, 3:47 PM

#

lol idk

undone flare Nov 1, 2020, 3:49 PM

#

nah it's boring I just checked lol

hollow sentinel Nov 1, 2020, 3:50 PM

#

datacamp is kind of fill in the blank

#

that's what i just read

hollow sentinel Nov 1, 2020, 4:05 PM

#

df["term"] = df["term"].apply(lambda term: int(term[:3]))

#

TypeError: 'int' object is not subscriptable

#

anyone see what's wrong here

#

I don't get it this was exactly what Portilla typed

#

visible confusion

#

📎 unknown.png

#

see that is the same line

#

I need to learn lambda expressions

undone flare Nov 1, 2020, 4:11 PM

#

@hollow sentinel can you tell me what you using?

#

just jupyter nb?

hollow sentinel Nov 1, 2020, 4:12 PM

#

yes

undone flare Nov 1, 2020, 4:12 PM

#

ok

hollow sentinel Nov 1, 2020, 4:13 PM

#

it might be bc i ran the cell more than once

#

but it still doesn't work lmao

#

idk how to fix it

#

should i go to a help channel

molten hamlet Nov 1, 2020, 4:28 PM

#

does numpy or scipy have any functions to add values with some masks but masks like 0 to 1 ?

hollow sentinel Nov 1, 2020, 4:29 PM

#

pydis_nope_py

#

masks???

molten hamlet Nov 1, 2020, 4:29 PM

#

mask = array > 10

hollow sentinel Nov 1, 2020, 4:30 PM

#

oh cool didn't know that term

#

no idk

#

https://numpy.org/doc/stable/reference/maskedarray.generic.html

#

here's the doc for masks

molten hamlet Nov 1, 2020, 4:31 PM

#

nah, I want to add 2 arrays, center is full new, and outerring is some average ;d

#

lemon_smug this

📎 Screenshot_from_2020-11-01_17-33-28.png

#

it is just.... numpy ;D

hollow sentinel Nov 1, 2020, 4:37 PM

#

idk lmao maybe someone else knows

molten hamlet Nov 1, 2020, 4:37 PM

#

i just pasted solution

#

xD

hollow sentinel Nov 1, 2020, 4:48 PM

#

oh

#

cool

undone flare Nov 1, 2020, 5:14 PM

#

hey anyone here?

#

😦

molten hamlet Nov 1, 2020, 5:24 PM

#

https://media.discordapp.net/attachments/751636489208856725/772511128969281556/avatar_fun_debug.png

#

Yes @undone flare

hollow sentinel Nov 1, 2020, 5:33 PM

#

yep

hollow sentinel Nov 1, 2020, 6:02 PM

#

X = df.drop('loan_repaid',axis=1).values
y = df['loan_repaid'].values

#

KeyError: "['loan_repaid'] not found in axis"

#

I ran the cell more than once and now I can't get it to work properly

#

train_test_split needs an X and a y

#

haha nvm i fixed it

#

just restarted the cell and ran everything again

bitter harbor Nov 1, 2020, 6:13 PM

#

fun fact

📎 unknown.png

heady hatch Nov 1, 2020, 6:21 PM

#

Learning new things.

#

Interestingly that's consistent with tf too.

junior horizon Nov 1, 2020, 7:05 PM

#

Anyone know how to remove a column containing a substring in python

#

pandas

hollow sentinel Nov 1, 2020, 7:10 PM

#

@bitter harbor was that directed at me

bitter harbor Nov 1, 2020, 7:11 PM

#

sort of
i've never read the pandas docs but that's the first thing that shows up

hollow sentinel Nov 1, 2020, 7:11 PM

#

I hate reading doc it’s so boring

bitter harbor Nov 1, 2020, 7:11 PM

#

depends on the docs tbh

#

sometimes it's easier to read through the source code imo

hollow sentinel Nov 1, 2020, 7:12 PM

#

I like to read the doc and then just use methods from it on data from Kaggle

#

I’m getting better at reading doc tho

grave thunder Nov 1, 2020, 7:46 PM

#

I just use stack overflow

heady hatch Nov 1, 2020, 7:49 PM

#

Do they still teach rtfm in cs schools?

solar bluff Nov 1, 2020, 8:45 PM

#

the pandas documentation is often confusing

cerulean spindle Nov 1, 2020, 8:58 PM

#

Yes, I can definitely agree with that and I think the scikit-learn documentation is rather informative and easy to read.

grave thunder Nov 1, 2020, 9:10 PM

#

Hello, anyone know in jupyter notebook how can I get the cleaner looking histogram shown? Mine (up) is hard to see

📎 unknown.png

austere swift Nov 1, 2020, 9:45 PM

#

have you tried the edgecolor parameter?

grave thunder Nov 1, 2020, 10:04 PM

#

have you tried the edgecolor parameter?
@austere swift That did the trick. Thanks ^^

austere swift Nov 1, 2020, 10:04 PM

#

Np

remote pecan Nov 1, 2020, 10:19 PM

#

scrapeing part of data science?

hollow sentinel Nov 1, 2020, 10:21 PM

#

yeah

#

what are you using to scrape? bs4/selenium?

remote pecan Nov 1, 2020, 10:22 PM

#

err no im trying the basics of scrapeing so im trying html scrapeing. but i keep getting 0 data no matter what i try

hollow sentinel Nov 1, 2020, 10:22 PM

#

just send your code

remote pecan Nov 1, 2020, 10:23 PM

#

but ye i use bs4

#

im in help room neon

hollow sentinel Nov 1, 2020, 10:23 PM

#

oh then why are you asking for help here lol

#

someone there will help you

remote pecan Nov 1, 2020, 10:23 PM

#

🙂 just wondered if it was right category

#

read the beutiful soup documentations but my experiance differs from the documents.

hollow sentinel Nov 1, 2020, 10:40 PM

#

https://www.coursera.org/projects/web-scraping

Coursera

Web Scraping with Python + BeautifulSoup

Offered by Coursera Project Network. By the end of this project, you will have a grasp of the essentials for extracting data from most of the websites on the internet. This includes the usage of BeautifulSoup for getting elements through patterns, Browser DevTools for pattern ...

#

this might just give you a general guide on how to do bs4 scraping

remote pecan Nov 1, 2020, 10:41 PM

#

will take a look.

#

thanks

hollow sentinel Nov 1, 2020, 10:41 PM

#

ofc

agile wing Nov 1, 2020, 10:51 PM

#

oh man

#

im sleepy from studying an dstreaming

#

finally learning NN

#

learned that each layers is a logistic regression function.

#

essentially

hollow sentinel Nov 1, 2020, 11:20 PM

#

idk man

#

I don't know if i want to spend time learning octave

#

people don't use it

#

it's either jupyter notebook or google co lab

hollow sentinel Nov 1, 2020, 11:37 PM

#

hey guys if you want to brush up on your python basics automate the boring stuff with Python is free on Udemy with this code: NOV2020FREE

#

be careful the code only works for a limited amount of time

grave thunder Nov 2, 2020, 12:48 AM

#

it's either jupyter notebook or google co lab
I've been sceptical about notebook but once I tried it I'm not going back

#

Makes data manipulation and presentation waaaaaaaaay better than any IDE out there

hollow sentinel Nov 2, 2020, 12:51 AM

#

has anyone tried the IBM data science course

#

AAAAAAAAAAAAH THEY'RE ASKING FOR CREDIT CARD INFO ON COURSERA

twilit brook Nov 2, 2020, 1:09 AM

#

Has anyone come up on this problem?

#

I download a .csv file directly from a local server, but it doesn't import properly into pandas

#

📎 Screen_Shot_2020-11-01_at_8.10.05_PM.png

#

The column headers are shifted two over. Only workaround i've figured out was opening the file in numbers/excel and resaving. Then it imports fine

#

but this will run on a scheduler... anyway to fix the headings?

agile wing Nov 2, 2020, 1:11 AM

#

wondering if its a delimiter problem in the csv file

twilit brook Nov 2, 2020, 1:12 AM

#

that could be it

#

I just looked up what a delimiter is

agile wing Nov 2, 2020, 1:13 AM

#

in other words, there may not be a comma between those columns?

twilit brook Nov 2, 2020, 1:13 AM

#

lemme check

#

thank you

hollow sentinel Nov 2, 2020, 1:28 AM

#

columbia machine learning course is boring

#

compared to Portilla's

agile wing Nov 2, 2020, 1:36 AM

#

i have the andrew ng machine learning python homework assignments

lapis sequoia Nov 2, 2020, 1:37 AM

#

what are you trying to learn

agile wing Nov 2, 2020, 1:37 AM

#

someone created the python homework set, and basically it's approved when submitting it, that's why I'm using that version anyways

#

columbia ml course?

lapis sequoia Nov 2, 2020, 1:37 AM

#

ok

hollow sentinel Nov 2, 2020, 1:38 AM

#

there's a columbia machine learning course on edx

#

this professor puts me to sleep tho

agile wing Nov 2, 2020, 1:39 AM

#

i like coursera the best

hollow sentinel Nov 2, 2020, 1:40 AM

#

yeah but i have to pay for that

#

i think i should do the google ml crash course

#

i can't stand these boring machine learning theory lessons

agile wing Nov 2, 2020, 1:41 AM

#

i've seen that and that ...is too little

#

the google ml crash ones

hollow sentinel Nov 2, 2020, 1:41 AM

#

that's unfortunate

#

well I'm not gonna do Ng anytime soon

#

and learn Octave

#

lol the columbia course is lame not doing it

#

If I wanted to learn the math I would’ve done 3b1b

lapis sequoia Nov 2, 2020, 1:52 AM

#

Where would we talk about AI?

agile wing Nov 2, 2020, 1:52 AM

#

you dont need to learn octave

#

just get the p ython homework version for andrew ng's class

#

there's actually a github of someone who created all of it in python for homeowrk exercises

jolly folio Nov 2, 2020, 1:59 AM

#

im kinda new to pyton, what is the best IDE?

#

...

hollow sentinel Nov 2, 2020, 2:01 AM

#

if you wanna do data science jupyter notebook and google colab is good @jolly folio

#

but other than that I would recommend VSC

#

@lapis sequoia here lol

#

@agile wing thanks man I found one that does it entirely in Python

#

I just wish there were more courses like Portilla's

hollow sentinel Nov 2, 2020, 2:35 AM

#

i don't understand everywhere I read it says the Ng course is free

#

but it requires credit card info??

undone flare Nov 2, 2020, 2:37 AM

#

I have an numpy array a = [0, 0.5, 1, 1.5, 2] and when I print it a[0] it gives 0.0 why?

hollow sentinel Nov 2, 2020, 2:38 AM

#

you're printing an index of the array?

#

sike it works now

undone flare Nov 2, 2020, 2:39 AM

#

I edited it

hollow sentinel Nov 2, 2020, 2:40 AM

#

yes because 0 is the zeroth index of the list

undone flare Nov 2, 2020, 2:40 AM

#

no I mean why did it get converted to float?

#

In array it was 0 but when I print that element it becomes 0.0

hollow sentinel Nov 2, 2020, 2:41 AM

#

oh idk lmao

undone flare Nov 2, 2020, 2:41 AM

#

and if I do a[-1] it will give 2.0 and not 2

#

is it because the data type of a is float?

#

cuz arrays can have only same data type values

#

can that be the reason?

heady hatch Nov 2, 2020, 2:48 AM

#

because NumPy arrays can only hold one dtype, and I think it cast everything to float if there's a float in there.

you can check array's dtype with array.dtype.

undone flare Nov 2, 2020, 2:48 AM

#

yea thx got it

spare karma Nov 2, 2020, 4:36 AM

#

Anyone know of any Video upscaling tutorials? I want to take a 1080p movie and upscale it to 4k.

#

Thought I could make a fun project out of it.

real wigeon Nov 2, 2020, 4:41 AM

#

how do you guys deal with loading xls files into mysql?

#

I'm working on a flask app

#

trying to come up with a way to load new user data in bulk, into my db

vital cipher Nov 2, 2020, 6:04 AM

#

guys was just wondering tensorflow2 supports only till python 3.8 but new distros like fedorra 33 comes with py3.9 so was wondering is there any news on the new updation on the tensorflow or you can share any new updates that exist

lapis sequoia Nov 2, 2020, 6:09 AM

#

https://github.com/tensorflow/tensorflow/issues/40840

GitHub

Python3.9 support · Issue #40840 · tensorflow/tensorflow

Has anyone tried to build tensorflow for Python3.9? For the assembly, everything seems to go fine, except for troubles with installing some pypi packages (Maybe there is a problem in Windows, insta...

#

@vital cipher "when all of our dependencies support py3.9" is when it will be available

vital cipher Nov 2, 2020, 6:17 AM

#

yup i agree with you @lapis sequoia but it was posted like 26 days ago and wanted to know like whats new thats all... 🙂

lapis sequoia Nov 2, 2020, 6:25 AM

#

You can always check what dependencies support python 3.9 @vital cipher

#

Maybe one is holding them back

undone flare Nov 2, 2020, 6:42 AM

#

heya anyone there??

remote pecan Nov 2, 2020, 6:43 AM

#

yes ?

undone flare Nov 2, 2020, 6:44 AM

#

When we may require to create an array initialised to zeros or ones?

remote pecan Nov 2, 2020, 6:44 AM

#

im sorry but i did not understand that 🙂

undone flare Nov 2, 2020, 6:45 AM

#

like

#

you create an numpy array

#

np.zeros((3,4))

#

or

#

np.ones((3,2))

#

Why we need an array with only zeros and ones

remote pecan Nov 2, 2020, 6:47 AM

#

im afraid i do not know the answer to that.

undone flare Nov 2, 2020, 6:48 AM

#

me neither xD

bitter harbor Nov 2, 2020, 8:36 AM

#

@undone flare if you need to init an array with values, you'd use one of those. Zeros is used pretty commonly in networks and stats related stuff but the usages depend on the range you need. np.full works as well and can fill an array with values such as inf and NaN: values that aren't really values. np.empty can be used as well + is faster considering there aren't any values but you have to set all the values - including invalids.

undone flare Nov 2, 2020, 8:37 AM

#

idk any of those xD I am still learning numpy

#

I only know np.full

bitter harbor Nov 2, 2020, 8:38 AM

#

they're just different ways to init an array

#

np.(zeros/ones/full/empty)_like is similar to all that too, but it's used to copy the shape + data-type + order

undone flare Nov 2, 2020, 8:57 AM

#

yea I just learnt about that

heady hatch Nov 2, 2020, 9:02 AM

#

Hey @velvet thorn I think I finally have a better way of describing what I was looking for.

So I'm grabbing the cartesian product of two lists of categories, and under each category, it's a list of products.

For each product of categories, I wanted to find the length of shared products under the categories.

I was able to do something along the lines of for each product of categories, find the set intersection of the two.

but I'm curious if there's a way to do it more efficiently.

I would love to hear what others' advice as well.

undone flare Nov 2, 2020, 9:08 AM

#

Can anyone give me some patter to print using arrays (simple one as I am still learning)

chrome barn Nov 2, 2020, 9:13 AM

#

import numpy as np
array = np.array([1, 2, 3, 4, 5])
print(array)
for x in array:
    print(x)

#

something like this

bitter harbor Nov 2, 2020, 9:13 AM

#

https://www.machinelearningplus.com/python/101-numpy-exercises-python/ that looks half decent

undone flare Nov 2, 2020, 9:15 AM

#

@chrome barn wdym

bitter harbor Nov 2, 2020, 9:16 AM

#

like printing out elements of an array

undone flare Nov 2, 2020, 9:16 AM

#

I did this one

📎 array_q1.PNG

#

I want something like this

#

I will do checkerboard

chrome barn Nov 2, 2020, 9:25 AM

#

https://numpy.org/devdocs/user/absolute_beginners.html

#

should help you out

undone flare Nov 2, 2020, 9:27 AM

#

I am already learning it

wanton bison Nov 2, 2020, 10:30 AM

#

hey guys if you want to brush up on your python basics automate the boring stuff with Python is free on Udemy with this code: NOV2020FREE
@hollow sentinel Thanks

lapis sequoia Nov 2, 2020, 10:43 AM

#

!e <head> Hello world </head>

arctic wedgeBOT Nov 2, 2020, 10:43 AM

#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

remote valley Nov 2, 2020, 12:12 PM

#

matplotlib: i want to grab the array of pixels for a plot, manipulate that array, and then write it again with ax.imshow(arr) but I don't see any way to get a plot (bar in this case) as an array of pixels

undone flare Nov 2, 2020, 1:14 PM

#

hi anyone here?

spiral peak Nov 2, 2020, 1:21 PM

#

@remote valley that seems like an odd way of editing a graph. What's the specific use case of this vs using mpl's normal functions for changing a graph?

undone flare Nov 2, 2020, 1:23 PM

#

where can I practice Matrix Multiplication?

#

nvm got it

remote valley Nov 2, 2020, 1:37 PM

#

@spiral peak oh yeah it's odd. I'm working on some procedural art stuff. not an intended use case for sure.

#

made some bar charts with polar coordinates and wanted to use them as patterns to fill voronoi cells 🙂

spiral peak Nov 2, 2020, 1:41 PM

#

Aaaah, okay. I'm not sure, let me do some research

hollow sentinel Nov 2, 2020, 3:37 PM

#

lol time to start doing Ng

#

in other words dyiNG

#

hahaha i'm dead inside

undone flare Nov 2, 2020, 3:39 PM

#

@hollow sentinel do you know if the determinant of the identity matrix is always 1 or not?

hollow sentinel Nov 2, 2020, 3:55 PM

#

@undone flare http://people.math.harvard.edu/~elkies/M21b.08/det.html#:~:text=Special case%3A the determinant of,In always equals 1.&text=[Application%3A the determinant of the,ndet(A).]&text=The determinant of the inverse,det(A) [6.2.

#

i need to learn lin alg too lol

undone flare Nov 2, 2020, 3:57 PM

#

Matrices is easy

#

do you want a question to solve?

hollow sentinel Nov 2, 2020, 4:17 PM

#

nah

molten hamlet Nov 2, 2020, 4:25 PM

#

can someone help me with pattern for that image? 🙂

📎 Screenshot_from_2020-11-02_17-24-42.png

#

2d array of 0 to 255

hollow sentinel Nov 2, 2020, 4:40 PM

#

uhhhhhh

heady hatch Nov 2, 2020, 5:23 PM

#

What do you mean by pattern? As in you want it to be a 2d array?

#

If so convert img to array.

molten hamlet Nov 2, 2020, 5:27 PM

#

I solved it with #algos-and-data-structs but thanks

#

@heady hatch It cant be done that way, it has to be dynamic... you want size 8, or 10 etc. 😄

lapis sequoia Nov 2, 2020, 6:23 PM

#

How much better is automating with openpyxl than with Macros/VBA?

blazing sundial Nov 2, 2020, 7:01 PM

#

Hey fam, anyone where a wiz at matplotlib?

#

or know anything about plotting antenna radiation patterns?

fallow prism Nov 2, 2020, 9:25 PM

#

what is the criteria for choosing a neural network library like scikit lern or TF for NLP?

#

learn*

rustic obsidian Nov 2, 2020, 9:36 PM

#

im having an issue during numpy import, referenced here https://github.com/xianyi/OpenBLAS/issues/2709

anyone familiar with that? im not sure where to go from here, running windows 10.0.19041 Build 19041

"RuntimeError: The current Numpy installation ('venv\lib\site-packages\numpy\init.py') fails to
pass a sanity check due to a bug in the windows runtime. See this issue for more information: https://tinyurl.com/y3dm3h86"

GitHub

Errors when using OpenBLAS through NumPy on Windows 10 2004 · Issue...

This issue only appears on Windows 10 2004 (19041). It does not appear on Windows 10 1909 (18363). On a fresh install of NumPy from pip on a 2004 machine, e.g, pip install numpy ipython open ipytho...

heady hatch Nov 2, 2020, 9:38 PM

#

Never mind, got it.

fallow prism Nov 2, 2020, 9:38 PM

#

dataset['b'].head()

heady hatch Nov 2, 2020, 9:38 PM

#

Oh but that doesn't include group by.

#

I had to do df.groupby('a')['c'].nlargest.

#

In regards to your question, @fallow prism . The kind of algorithms you want to use for NLP depends on your problem and constraint and how you want to go about it.

fallow prism Nov 2, 2020, 9:39 PM

#

do you can resend the problem please?

heady hatch Nov 2, 2020, 9:40 PM

#

Scikit learn isn't a neural network library.

#

So if you need NN, you would look into NN framework libraries such as PyTorch or TensorFlow.

#

But if you need to use classical machine learning, Scikit-Learn is there.

#

But there's also more nlp focused ones like NLTK or SpaCy.

#

Scikit-Learn is a general library for ML, but they don't include NN.

fallow prism Nov 2, 2020, 9:41 PM

#

thank you, I going to study that better

#

my problem in NLP is interpretation and classification of description made by people about car accidents

#

NLP problem*

#

and i need train a NN to do that

#

or i think that

heady hatch Nov 2, 2020, 9:45 PM

#

also whoops, apparently I didn't solve my issue.

The data is

📎 unknown.png

#

So I'm looking to group by a, and sort by c.

#

But retaining the values of b.

#

Regarding your issue of classification of description made by people about car accidents.

NN could work.

#

But there's nothing wrong with trying out classical algorithms as well.

#

Or I guess I'm curious, why do you think you need to jump to NN right away?

tall seal Nov 2, 2020, 9:50 PM

#

noob question here, but this is my data set and I am trying to Display movie name, number of genres for the movie in dataframe and also print(total number of movies which have more than one genres)...any idea where to start here? I looked up documention of .sum() function but can't see to get it to work...

📎 Screen_Shot_2020-11-02_at_13.48.38.png

heady hatch Nov 2, 2020, 9:51 PM

#

@tall seal to clarify, are those three different requests?

#

Would something like df.sum(axis=1) be what you're looking for?

fallow prism Nov 2, 2020, 9:53 PM

#

imagine you crash your car and you describe me the accident, i have to be able to classify the accident and know what part of your car was damaged, know how occurs the accident

#

and who is responsible

#

in a few words

heady hatch Nov 2, 2020, 9:55 PM

#

So from my understanding you're trying to extract information from text?

fallow prism Nov 2, 2020, 9:56 PM

#

basically

ripe forge Nov 2, 2020, 9:57 PM

#

Nine, did you try a sort_values on the group by object yet?

fallow prism Nov 2, 2020, 9:57 PM

#

and my set of texts doesn't have structure

heady hatch Nov 2, 2020, 9:58 PM

#

Hey @ripe forge , thanks for responding.

I did and this was the result I got.

📎 unknown.png

#

But the issue I'm encountering now is the b column is in index form instead of its actual value.

#

And so now I'm not too sure how to go about it.

ripe forge Nov 2, 2020, 9:58 PM

#

Oh just use reset index after that

heady hatch Nov 2, 2020, 9:59 PM

#

Would I just index into column by with level_1?

📎 unknown.png

ripe forge Nov 2, 2020, 10:01 PM

#

This is after reset index yeah?

heady hatch Nov 2, 2020, 10:01 PM

#

Mhm.

ripe forge Nov 2, 2020, 10:01 PM

#

Then I think the only part left is top 5,yeah?

heady hatch Nov 2, 2020, 10:02 PM

#

I think c is already in top 5.

#

After sorting and doing nlargest.

ripe forge Nov 2, 2020, 10:02 PM

#

Oh then yep, you're done

heady hatch Nov 2, 2020, 10:02 PM

#

Is there an elegant solution to do it without remapping?

ripe forge Nov 2, 2020, 10:02 PM

#

Not sure off the top of my head. I'm a bit surprised why it came as level1

#

Can you change nlargest to head and see if it still comes?

heady hatch Nov 2, 2020, 10:03 PM

#

Oh good point.

#

Oh wait I remember trying that and needed to sort beforehand.

#

So this was something else I've also tried.

#

📎 unknown.png

#

But the issue here is a isn't in groups and c is just based off of the absolute sort instead of within groups.

ripe forge Nov 2, 2020, 10:10 PM

#

This should still logically contain all the rows that you're interested in. But yeah, this one aside, I was thinking group by, sort values, and head. What's the output of the operations in that order?

heady hatch Nov 2, 2020, 10:12 PM

#

I remember hitting error on using sort_values after groupby.

#

Or did you mean

df.groupby('col')['col2'].sort_values...

#

On the other hand I just tried a new one.

This seems to get me there.

📎 unknown.png

hollow sentinel Nov 2, 2020, 10:48 PM

#

hey guys if I asked questions in octave would you be able to help me

#

I haven’t started the Ng course bc I’ve been busy w school

tall seal Nov 2, 2020, 11:04 PM

#

Would something like df.sum(axis=1) be what you're looking for?
@heady hatch I tried this and it didn't see to work

heady hatch Nov 2, 2020, 11:04 PM

#

What results were you trying to get to? and could you clarify what you were trying to achieve?

tall seal Nov 2, 2020, 11:05 PM

#

@tall seal to clarify, are those three different requests?
@heady hatch 2 requests, to display movie name and number of genres for the movie and then print total number of movies with more than one genre.

#

this is the result I got with item.sum(axis = 1)

📎 Screen_Shot_2020-11-02_at_13.55.46.png

heady hatch Nov 2, 2020, 11:09 PM

#

Right right, if you don't mind let me try to lead you through what you're seeing here.

#

Oh ops.

#

I realized why it's adding random things, it's adding the id.

#

so you might need to do something like

#

df.iloc[:, 3:].sum(axis=1) or df.loc[:, 'Action':].sum(axis=1)

#

What this does is adds up all the values in your genres. Since your data is a boolean encoding of the genres. By adding up the values, you get to see the total amount of genres per movie.

#

From there, then if you want to filter to movies with more than 1 genre, you would then need to do

col > 1

#

If it's too much, I guess let me know what questions you have.

lapis sequoia Nov 3, 2020, 12:26 AM

#

Anyone here use statsmodels?

hollow sentinel Nov 3, 2020, 12:27 AM

#

is that like scikit learn

lapis sequoia Nov 3, 2020, 12:27 AM

#

Yeah similar concept.

hollow sentinel Nov 3, 2020, 12:28 AM

#

yeah I don't use it but I remember you asking a question about it before

lapis sequoia Nov 3, 2020, 12:28 AM

#

I've only used statsmodels tbh, I should probably try SciKit Learn too.

#

Yeah was just curious.

hollow sentinel Nov 3, 2020, 12:28 AM

#

have you used Octave lol

lapis sequoia Nov 3, 2020, 12:28 AM

#

Haven't had time to tweak my linear model I did from last time.

#

Nah what is that?

#

I'm pretty new to coding lol.

hollow sentinel Nov 3, 2020, 12:28 AM

#

it's like matlab

#

Andrew Ng uses it for his machine learning course

#

on coursera

lapis sequoia Nov 3, 2020, 12:29 AM

#

Do you have to pay for it?

hollow sentinel Nov 3, 2020, 12:29 AM

#

no it's free

#

only if you want the certificate

lapis sequoia Nov 3, 2020, 12:30 AM

#

@hollow sentinel Do you work in ML?

hollow sentinel Nov 3, 2020, 12:32 AM

#

@lapis sequoia lol no i'm just a college business student who thinks ML is cool

lapis sequoia Nov 3, 2020, 12:33 AM

#

Ah gotcha lol. Same here, ML seems dope. I work in corporate finance and some of our financial models and tools we use are starting to become ML.

hollow sentinel Nov 3, 2020, 12:33 AM

#

that's very cool

lapis sequoia Nov 3, 2020, 12:33 AM

#

We now have ML dashboards to predict and model out future costs.

hollow sentinel Nov 3, 2020, 12:34 AM

#

that sounds very cool

lapis sequoia Nov 3, 2020, 12:35 AM

#

I have been trying to create some sort of regression model for our labor hours/direct labor costs to find the drivers and create a thing where we can choose each component of the product and see how many hours get added but been struggling since the initial regression.

#

That was the one I asked about a few weeks ago.

hollow sentinel Nov 3, 2020, 12:35 AM

#

just be careful about who you give the data to on the internet

#

that kind of stuff in the wrong hands is really bad

lapis sequoia Nov 3, 2020, 12:36 AM

#

Yeah I always hide company info.

hollow sentinel Nov 3, 2020, 12:36 AM

#

good

lapis sequoia Nov 3, 2020, 12:41 AM

#

hi @weak kiln bro can i get access to the voice chat to interact with you guyz

weak kiln Nov 3, 2020, 12:43 AM

#

weird place to ping me, dude. see #voice-verification for information on why you don't have speaking permissions.

hollow sentinel Nov 3, 2020, 12:44 AM

#

lol

lapis sequoia Nov 3, 2020, 12:46 AM

#

lol
@hollow sentinel i'm new at this place

hollow sentinel Nov 3, 2020, 12:46 AM

#

haha welcome

#

I've been here for a couple weeks. This thread is for DS/ML questions so if you have any just ask

lapis sequoia Nov 3, 2020, 12:50 AM

#

I've been here for a couple weeks. This thread is for DS/ML questions so if you have any just ask
@hollow sentinel Yes i have a lot of...

hollow sentinel Nov 3, 2020, 1:01 AM

#

uhhhh then ask them?

charred blaze Nov 3, 2020, 1:05 AM

#

@fading wigeon hey, remember that presentations about how notebooks are sucky?

#

https://www.youtube.com/watch?v=9Q6sLbz37gk here's one from one of the authors of fast.ai on how notebooks might actually be interesting and attempts to refute some of the arguments of that other presentation

YouTube

Jeremy Howard

I Like Notebooks

I like using Jupyter Notebooks (https://jupyter.org/). Particularly when combined with nbdev (https://nbdev.fast.ai/). In this video, I explain why, and explain why I have a different opinion to Joel Grus, who discussed in another talk why he doesn't like using Jupyter Noteboo...

▶ Play video

#

TBH, I wasn't that convinced on the refutals but nbdev caught my attention

lapis sequoia Nov 3, 2020, 1:12 AM

#

uhhhh then ask them?
@hollow sentinel bro can we integrate osint with ai??

hollow sentinel Nov 3, 2020, 1:14 AM

#

uhhhhhh

fading wigeon Nov 3, 2020, 1:16 AM

#

@charred blaze Cool, thanks, will check it out

charred blaze Nov 3, 2020, 1:16 AM

#

also, this was the presentation where Jeremy Howard was somewhat... "canceled"

#

and a schism is starting to brew that involves Jupyter (really)

hollow sentinel Nov 3, 2020, 1:19 AM

#

jupyter was the first thing i used for DS/ML

chilly pasture Nov 3, 2020, 5:19 AM

#

📎 unknown.png

#

📎 unknown.png

#

hello i already have python installed in my windows system

#

is it good to register anaconda3's python as default?

obsidian yacht Nov 3, 2020, 5:46 AM

#

anacondas python has many inbuilt libraries so you can make it default to use them in ease

wheat seal Nov 3, 2020, 5:46 AM

#

octave is a pain jus saying

obsidian yacht Nov 3, 2020, 5:47 AM

#

But if you are more familiar with general IDE then ignore

wheat seal Nov 3, 2020, 5:48 AM

#

ye but it takes up a lot of space

#

anaconda is just venv or virtualenv module but with a fancy unnecessary gui and all libraries install by default

#

even installing the libraries on your own takes up less space

cerulean ingot Nov 3, 2020, 6:15 AM

#

i have a api question.

#

with get request api is giving data like more than 10000 in count and with pagiantion. now to use those data I use for loop and request data for every page or is there any other efficient way.

#

i googled a lot but i m not getting the answer

undone flare Nov 3, 2020, 7:55 AM

#

filedata = np.genfromtxt('data.txt', delimiter=",") this gives an error data.txt not found but I have saved it

#

I am using google colab

#

ping me pls if you know what's wrong

#

np.loadtxt() gives the same error too

mild topaz Nov 3, 2020, 9:55 AM

#

hello, currently my code is giving me output likepython albania_passport: 100.00% confidence_level: 100.00
this way . how i can make more changes to get my output like
predictions [[0.03083993 0.9471298 0.02203036]] python albania_driving_licence : 0.03083993 albania_passport : 0.9471298 invalid : 0.02203036
this way
i want to get prediction for all labels
my code herepython print("label:", label) predictions = np.argmax(predictions) print(predictions) if (label == prediction): print(f"{label}: {(predictions)*100:.2f}%") logger.debug ("{}: {:.2f}%".format(label, predictions * 100)) confidence_level = predictions * 100 confidence_level1 = "{:.2f}".format(confidence_level) print("confidence_level: ", confidence_level1) logger.debug(f"confidence_level: {confidence_level1}")
my code here https://paste.pythondiscord.com/rogokizezo.py

lapis sequoia Nov 3, 2020, 10:08 AM

#

predictions are from an sklearn model?

#

predictions [[0.03083993 0.9471298 0.02203036]]
@mild topaz if so you can use model.predict_proba() it gives the confidence probabilities directly

mild topaz Nov 3, 2020, 10:12 AM

#

@lapis sequoia hello

lapis sequoia Nov 3, 2020, 10:12 AM

#

yeah

#

?

mild topaz Nov 3, 2020, 10:13 AM

#

predictions are from an sklearn model?
@lapis sequoia no i am using predictions = model.predict(img) see line 242

lapis sequoia Nov 3, 2020, 10:15 AM

#

line 245 is what you need

mild topaz Nov 3, 2020, 10:16 AM

#

prediction_prob = model.predict_proba(img) this ? @lapis sequoia

lapis sequoia Nov 3, 2020, 10:17 AM

#

predict gives label and predict_proba gives probability. which im assuming this is "predictions [[0.03083993 0.9471298 0.02203036]]"

#

yes

mild topaz Nov 3, 2020, 10:17 AM

#

prediction_prob: [[0.03083993 0.9471298 0.02203036]] this one u were talking about

#

i want to get my output this waypython albania_driving_licence : 0.03083993 albania_passport : 0.9471298 invalid : 0.02203036

lapis sequoia Nov 3, 2020, 10:54 AM

#

...its the same isnt it

mild topaz Nov 3, 2020, 10:55 AM

#

ya but not get the required output @lapis sequoia

lapis sequoia Nov 3, 2020, 10:55 AM

#

prediction_prob = model.predict_proba(img) did you print this?

mild topaz Nov 3, 2020, 10:56 AM

#

see prediction_prob: [[0.03083993 0.9471298 0.02203036]] this is what i get @lapis sequoia

lapis sequoia Nov 3, 2020, 10:58 AM

#

yes but i dont understand your problem

#

you have your probabilties you just need to print it in label : confidence format

mild topaz Nov 3, 2020, 11:00 AM

#

see first i explain what i want to achieve

#

i want that each label has its corresponding prediction_probability prediction_prob: [[0.03083993 0.9471298 0.02203036]]

#

this way i want my outputpython albania_driving_licence : 0.03083993 albania_passport : 0.9471298 invalid : 0.02203036

#

i want to show probability for each label using prediction_prob: [[0.03083993 0.9471298 0.02203036]] this

#

@lapis sequoia

#

each label has its own prediction_prob value

lapis sequoia Nov 3, 2020, 11:07 AM

#

its possible you are treating a multiclass problem as a multilabel classification.

#

Cant help you more than this without seeing model training

mild topaz Nov 3, 2020, 11:08 AM

#

see bro, currently i dont think so u need model training

#

do u have my code?

mild topaz Nov 3, 2020, 11:27 AM

#

can u atleast give some suggestion how i can fix this issue ? @lapis sequoia

dawn whale Nov 3, 2020, 11:39 AM

#

Hello, so I have a questionm that's rather about stastics, but maybe someone can help me out here:
Ok, so I have a list of normalized values, asking which year was the worst for their allergies

0 means nearer to present
1 means nearer to the start of their allergies

If they answered, that they didn't notice any change, can I just use 0.5?
I calculated the values by: (2020 - worstyear) / (2020 - first year)

undone flare Nov 3, 2020, 12:40 PM

#

hey anyone here?

molten hamlet Nov 3, 2020, 12:46 PM

#

👻

#

Lets say I got matrix and some kernel

#print(mat)
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]
 [15 16 17 18 19]
 [20 21 22 23 24]]

# kernel
kernel = np.array([[1,1,1],
                   [1,1,1],
                   [1,1,1]])

out = np.correlate(mat, kernel)

and I want to use correlate to count sum of each square in matrix 3x3

#

ValueError: object too deep for desired array

#

its only 1d 😐

undone flare Nov 3, 2020, 12:54 PM

#

So I have a text file named data_ and when I try to do np.genfromtxt('data_.txt', delimiter = ',') or np.loadtxt('data.txt', delimiter = ',') it gives an OSError : data_.txt not found

molten hamlet Nov 3, 2020, 12:54 PM

#

OSError : data_.txt not found

#

if you could read to end :}

undone flare Nov 3, 2020, 12:54 PM

#

ik but I have a file named data_.txt

molten hamlet Nov 3, 2020, 12:55 PM

#

not in this workpath*

undone flare Nov 3, 2020, 12:55 PM

#

I am using google colab

molten hamlet Nov 3, 2020, 12:55 PM

#

!d numpy.genfromtxt

undone flare Nov 3, 2020, 12:56 PM

#

so do I add that file in google colab?

molten hamlet Nov 3, 2020, 1:06 PM

#

@undone flare you can try is_file = os.path.is_file(file_path) and then print(is_file) you will get True or False depending if it can see file

undone flare Nov 3, 2020, 1:10 PM

#

ok

undone flare Nov 3, 2020, 1:39 PM

#

bruh now I am getting new error
ValueError: Some errors were detected !
Line #2 (got 1 columns instead of 5)
Line #4 (got 1 columns instead of 5)

lapis sequoia Nov 3, 2020, 1:58 PM

#

If all my independent variables are dummy variables should I use something other than linear regression?

undone flare Nov 3, 2020, 2:01 PM

#

omg it is finally working

#

thank god

#

this took too long to figure

hollow juniper Nov 3, 2020, 2:08 PM

#

which cloud should I learn if i want to get into machine learning and ds?

zinc cobalt Nov 3, 2020, 2:19 PM

#

word-clouds, at most py_guido

tawny cradle Nov 3, 2020, 2:21 PM

#

Hi everyone

#

I made a project on AI for high school

#

Can you help me by giving suggestions?

arctic wedgeBOT Nov 3, 2020, 2:28 PM

#

Hey @tawny cradle!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .3gp, .3g2, .avi, .bmp, .gif, .h264, .jpg, .jpeg, .mkv, .mov, .mp4, .mpeg, .mpg, .png, .tiff, .wmv, .svg, .psd, .ai, .aep, .xcf, .mp3, .wav, .ogg, .webm, .webp, .flac, .afdesign, .m4a, .csv.

Feel free to ask in #community-meta if you think this is a mistake.

tawny cradle Nov 3, 2020, 2:30 PM

#

How can I send pdf here?

#

@lucid hornet

hollow sentinel Nov 3, 2020, 2:45 PM

#

you click the + button if you're on a machine

undone flare Nov 3, 2020, 2:58 PM

#

you can't send pdf's

#

@hollow sentinel did you learn numpy?

hollow sentinel Nov 3, 2020, 3:07 PM

#

@undone flare yes

undone flare Nov 3, 2020, 3:07 PM

#

@hollow sentinel can you link me the course?

hollow sentinel Nov 3, 2020, 3:08 PM

#

https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/

Udemy

Learn Python for Data Science, Structures, Algorithms, Interviews

Learn how to use NumPy, Pandas, Seaborn , Matplotlib , Plotly , Scikit-Learn , Machine Learning, Tensorflow , and more!

undone flare Nov 3, 2020, 3:08 PM

#

paid oof

hollow sentinel Nov 3, 2020, 3:08 PM

#

yeah my prof i was doing research w paid for it

#

but this does help

#

I liked it more than I liked the Ng course

undone flare Nov 3, 2020, 3:09 PM

#

ok I will see if I can buy that lol

#

or I will stick to yt courses

hollow sentinel Nov 3, 2020, 3:09 PM

#

the Ng course is free

#

but he teaches ML in octave

#

but there's githubs with everything in python

undone flare Nov 3, 2020, 3:10 PM

#

ok thx for suggestion

hollow sentinel Nov 3, 2020, 3:10 PM

#

no prob

undone flare Nov 3, 2020, 3:11 PM

#

@fading burrow https://numpy.org/doc/stable/reference/generated/numpy.diag.html

lapis sequoia Nov 3, 2020, 3:42 PM

#

@lapis sequoia have you considered ANOVA? I'm assuming your dependent variable is continuous

lapis sequoia Nov 3, 2020, 4:00 PM

#

@lapis sequoia That's what someone else suggested, but I have no idea how to do/implement that. And yes my dependent variable is continuous.

hollow sentinel Nov 3, 2020, 4:05 PM

#

@lapis sequoia https://machinelearningmastery.com/feature-selection-with-numerical-input-data/

Machine Learning Mastery

Jason Brownlee

How to Perform Feature Selection With Numerical Input Data

Feature selection is the process of identifying and selecting a subset of input features that are most relevant to the target variable. Feature selection is often straightforward when working with real-valued input and output data, such as using the Pearson’s correlation coeff...

#

might be helpful idk

#

https://github.com/mohan-mj/ANOVA---Sales-Volume @lapis sequoia

GitHub

mohan-mj/ANOVA---Sales-Volume

Analysis of Variance. Contribute to mohan-mj/ANOVA---Sales-Volume development by creating an account on GitHub.

#

this github does ANOVA it might be helpful to look at

lapis sequoia Nov 3, 2020, 4:09 PM

#

Let me check those out, thank you.

hollow sentinel Nov 3, 2020, 4:16 PM

#

no problem

lapis sequoia Nov 3, 2020, 4:23 PM

#

Is it supposed to be that long?

hollow sentinel Nov 3, 2020, 4:43 PM

#

not sure I haven't done an ANOVA before

#

did that help @lapis sequoia

barren meadow Nov 3, 2020, 4:51 PM

#

does anyone have a guide/paper/link to some websites or packages that checks on data anomaly or data fidelity?

lapis sequoia Nov 3, 2020, 4:57 PM

#

@hollow sentinel Yeah trying to make it work now.

tidal bronze Nov 3, 2020, 5:01 PM

#

with beautiful soup how could I find all links anchored within h3 headers?
This is what I've tried so far:

self.all_links = soup.find_all("h3",
                                       {"class": "entity-title"},
                                       limit=41).a.get("href")

seems that the .a is not working

hollow sentinel Nov 3, 2020, 5:30 PM

#

https://www.w3resource.com/python-exercises/BeautifulSoup/python-beautifulsoup-exercise-11.php @tidal bronze

w3resource

Python BeautifulSoup: List of all the h1, h2, h3 tags from the webp...

Python BeautifulSoup Exercises, Practice and Solution: Write a Python program to a list of all the h1, h2, h3 tags from the webpage python.org.

summer holly Nov 3, 2020, 5:34 PM

#

Hi everyone. I'm trying to deploy a flask app with my custom BERT keras model which takes tweets as input. The model runs perfectly by itself but whenever I try to make the model.predict() function call within the flask app, it always results in the flask app being terminated. Any help/suggestions would be appreciated. Thanks!

#

📎 IMG-20201103-WA0010.jpg

#

I specifically need help understanding why pycache is reloading and how can I prevent that

lapis sequoia Nov 3, 2020, 7:28 PM

#

@summer holly we would need to see more source code, particularly where the model.predict() is called within your views.py (assuming it's called views.py) file

lapis sequoia Nov 3, 2020, 7:51 PM

#

@summer holly one thing you could try is to execute flask run --no-reload and see if it solves the problem or specify use_reloader=False in the app.run() argument

tawdry sentinel Nov 3, 2020, 8:36 PM

#

Hi, I'm starting in Data Science and First I'm studying CSV

#

📎 unknown.png

#

I'm trying to edit a CSV and it always gets messy when I use to_csv

lapis sequoia Nov 3, 2020, 8:37 PM

#

what gets messy?

tawdry sentinel Nov 3, 2020, 8:38 PM

#

Before

📎 unknown.png

lapis sequoia Nov 3, 2020, 8:39 PM

#

use pathlib to get the path of the object btw

tawdry sentinel Nov 3, 2020, 8:39 PM

#

After

📎 unknown.png

#

use pathlib to get the path of the object btw
@lapis sequoia Ok, I'm using Visual Studio, can this influence?

lapis sequoia Nov 3, 2020, 8:41 PM

#

the IDE doesn't influence the libraries you import

#

https://stackoverflow.com/questions/3430372/how-do-i-get-the-full-path-of-the-current-files-directory

Stack Overflow

How do I get the full path of the current file's directory?

I want to get the current file's directory path.
I tried:

os.path.abspath(file)
'C:\python27\test.py'
But how can I retrieve the directory's path?

For example:

'C:\python27\'

#

This problem is not related to what you're asking btw.

#

I'm still not sure what the problem is

#

because I don't know how the .csv looks like, how you delimit the rows and columns etc.

#

maybe use the same encoding utf-8

tawdry sentinel Nov 3, 2020, 8:44 PM

#

I have multiple columns before importing
I import the file, convert it to a dataframe and after saving it it messes everything up in just two columns

lapis sequoia Nov 3, 2020, 8:53 PM

#

it's likely something to do with the delimiter setting, change it to delimiter=" " or delimiter="\t" and see if that helps

#

and use pandas.read_csv()

tawdry sentinel Nov 3, 2020, 8:58 PM

#

Like this:
arquivo = open('c:\Users\Pichau\Downloads\caso_full.csv', encoding="utf-8", delimiter="\t")?

lapis sequoia Nov 3, 2020, 9:00 PM

#

arquivo = pd.read_csv('c:\Users\Pichau\Downloads\caso_full.csv')

#

try this first

#

then save it using pd.to_csv() and see if the columns are preserved

tawdry sentinel Nov 3, 2020, 9:02 PM

#

ok

lapis sequoia Nov 3, 2020, 9:02 PM

#

if not then try pd.read_csv('...', delimiter=' ') and pd.to_csv('...', sep=' ')

summer holly Nov 3, 2020, 9:04 PM

#

@summer holly one thing you could try is to execute flask run --no-reload and see if it solves the problem or specify use_reloader=False in the app.run() argument
@lapis sequoia
--no-reload worked. Thanks alot!

tawdry sentinel Nov 3, 2020, 9:05 PM

#

worked

#

tks so much @lapis sequoia

#

^^

hollow sentinel Nov 3, 2020, 10:20 PM

#

hey guys how do you invite people to this server

bitter harbor Nov 3, 2020, 10:31 PM

#

https://discord.gg/python

#

send them that

hollow sentinel Nov 3, 2020, 11:34 PM

#

thank you @bitter harbor

steel talon Nov 4, 2020, 2:25 AM

#

Hey guys I'm using matpotlib for an image processing assignment and ideally we are suppose to make a function called chromeKey() which you can guess is used to remove a green screened background. Here's my code.

📎 chromekey.PNG

#

I have no idea what to do when I'm calling my function inorder to implement the two images together

#

I don't really expect an answer I just need help

merry ridge Nov 4, 2020, 4:50 AM

#

I’m not sure what your question is

merry ridge Nov 4, 2020, 5:13 AM

#

The usual way to do this is to take a linear combination of the two images at each pixel as a function of the pixel data at a point and possibly it’s neighbors to smooth the edges

#

It looks like right now you are simply choosing one or the other based on the intensity of the color channel

midnight skiff Nov 4, 2020, 5:15 AM

#

Hey, does anyone know how to load a bunch of random seaborn subplots into one plot?

lapis sequoia Nov 4, 2020, 5:32 AM

#

I have no idea what to do when I'm calling my function inorder to implement the two images together
@steel talon hey you have arguments in your chromekey function which are not optional so if you just call the function you eventually get error

slender eagle Nov 4, 2020, 11:00 AM

#

hi

undone flare Nov 4, 2020, 11:36 AM

#

hello

final trellis Nov 4, 2020, 12:07 PM

#

What exactly is data science?

undone flare Nov 4, 2020, 12:16 PM

#

Data science is a field that uses scientific methods, algorithms, systems, etc. to extract knowledge from structured and unstructured data. (Big data)

jade lava Nov 4, 2020, 1:25 PM

#

Something that's pretty strange to me is that the recommended way to preprocess text in Deep Learning with Python for multi-class classification is to do one-hot on the encoded text. Isn't that what the Embedding layer is supposed to do?

grave path Nov 4, 2020, 1:38 PM

#

I just loaded a dataset using read_csv in pandas however I realised there is '?' instead of null how do i get rid of all the rows that contain '?'

lapis sequoia Nov 4, 2020, 1:42 PM

#

df = df[(df.T != '?').any()] should work

grave path Nov 4, 2020, 1:43 PM

#

what is T?

lapis sequoia Nov 4, 2020, 1:43 PM

#

transpose

grave path Nov 4, 2020, 1:43 PM

#

its alright now I tried some from stackoverflow but finally found one that works

#

data = data.replace('?',pd.np.nan)

lapis sequoia Nov 4, 2020, 1:43 PM

#

actually don't need transpose

#

cool

undone flare Nov 4, 2020, 1:44 PM

#

am I getting this because I am not printing it?

📎 unknown.png

lapis sequoia Nov 4, 2020, 1:45 PM

#

@undone flare what is the question?

molten hamlet Nov 4, 2020, 1:45 PM

#

probably yes

#

jupyter has some tricky user friendly features

undone flare Nov 4, 2020, 1:45 PM

#

but this looks cool

molten hamlet Nov 4, 2020, 1:46 PM

#

if you plot something and do not show() jupyter will display image anyway

undone flare Nov 4, 2020, 1:46 PM

#

if I print it is messy

molten hamlet Nov 4, 2020, 1:46 PM

#

print head() maybe

#

😄

undone flare Nov 4, 2020, 1:46 PM

#

but this table form looks sick

molten hamlet Nov 4, 2020, 1:46 PM

#

nvm, that is way better anyway :d

#

whats is total?

undone flare Nov 4, 2020, 1:47 PM

#

total?

#

rows you mean?

molten hamlet Nov 4, 2020, 1:47 PM

#

total ammount?

#

that column named total

undone flare Nov 4, 2020, 1:48 PM

#

idk

molten hamlet Nov 4, 2020, 1:48 PM

#

xD

#

ok 🙂

undone flare Nov 4, 2020, 1:48 PM

#

I just downloaded this dataset

#

idk anything about pokemon lol

#

just learning pandas

molten hamlet Nov 4, 2020, 1:48 PM

#

👍

lapis sequoia Nov 4, 2020, 2:39 PM

#

Anyone here have experience with openpyxl?

gritty wedge Nov 4, 2020, 3:44 PM

#

hey guys

#

can you tell me what math skills I need for learning machine learning?

undone flare Nov 4, 2020, 3:47 PM

#

Linear Algebra, Stats, Prob, Calculus

gritty wedge Nov 4, 2020, 3:47 PM

#

in which grade are these taught?

undone flare Nov 4, 2020, 3:47 PM

#

11th and 12th

#

Stats and Prob is almost in all higher grades

gritty wedge Nov 4, 2020, 3:48 PM

#

not in my grade

undone flare Nov 4, 2020, 3:48 PM

#

which grade

gritty wedge Nov 4, 2020, 3:48 PM

#

i am in 5th grade

undone flare Nov 4, 2020, 3:48 PM

#

oh then yea

gritty wedge Nov 4, 2020, 3:48 PM

#

yea

undone flare Nov 4, 2020, 3:48 PM

#

It is too soon for you to learn ML

gritty wedge Nov 4, 2020, 3:48 PM

#

i know

undone flare Nov 4, 2020, 3:48 PM

#

bcuz you might not understand high level math

gritty wedge Nov 4, 2020, 3:49 PM

#

o dont worry...i already know most high level math

undone flare Nov 4, 2020, 3:49 PM

#

huh..

gritty wedge Nov 4, 2020, 3:49 PM

#

my elder sister tutors me after school

undone flare Nov 4, 2020, 3:49 PM

#

do you know matrix multiplication

gritty wedge Nov 4, 2020, 3:50 PM

#

no ;-;

undone flare Nov 4, 2020, 3:50 PM

#

bcuz that's the first thing in Linear Algebra

gritty wedge Nov 4, 2020, 3:50 PM

#

i havent learnt that yet

#

i am learning about calcuclus 1 now

undone flare Nov 4, 2020, 3:50 PM

#

ok

gritty wedge Nov 4, 2020, 3:50 PM

#

do u have any tips for ML?

jade lava Nov 4, 2020, 4:14 PM

#

(Keras) Input, TextVectorization, and Embedding layers are a bit difficult to wrap my head around. Every time I feel like I get it, something throws a wrench in my mental model and I have to start over.

hollow sentinel Nov 4, 2020, 4:16 PM

#

@gritty wedge Python for DS/ML Bootcamp by Jose Portilla on Udemy

gritty wedge Nov 4, 2020, 4:16 PM

#

ok

#

thnx

bitter harbor Nov 4, 2020, 4:16 PM

#

Linear Algebra, Stats, Prob, Calculus
This first

#

Plz

gritty wedge Nov 4, 2020, 4:17 PM

#

okay 🙂

#

is the course good for beginners?

bitter harbor Nov 4, 2020, 4:20 PM

#

Yea that's what it's meant for
The only reason I'd highly suggest you learn what's going on behind the scenes os, a lot of concepts (universal across ml) can't be fully understood without knowing what's actually going on

gritty wedge Nov 4, 2020, 4:20 PM

#

o.....

#

thnx for ur suggestions 🙂

hollow sentinel Nov 4, 2020, 4:22 PM

#

oh yeah what @bitter harbor said too

gritty wedge Nov 4, 2020, 4:22 PM

#

https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/

Udemy

Learn Python for Data Science, Structures, Algorithms, Interviews

Learn how to use NumPy, Pandas, Seaborn , Matplotlib , Plotly , Scikit-Learn , Machine Learning, Tensorflow , and more!

bitter harbor Nov 4, 2020, 4:22 PM

#

If you want to have a look at the 'basics' of ml, id highly recommend watching 3b1b's series' on the topics

gritty wedge Nov 4, 2020, 4:22 PM

#

is this the course?

#

o thnx man!

bitter harbor Nov 4, 2020, 4:23 PM

#

Thatd be a good course too

hollow sentinel Nov 4, 2020, 4:23 PM

#

import pandas as pd

# Path of the file to read
iowa_file_path = '../input/home-data-for-ml-course/train.csv'

# Fill in the line below to read the file into a variable home_data
home_data = pd.read_csv("iowa_file_path")

# Call line below with no argument to check that you've loaded the data correctly
step_1.check()

#

does anyone see what's wrong with that

bitter harbor Nov 4, 2020, 4:24 PM

#

Not to discourage you but ml's got quite a few layers that you should learn before jumping in, that course will help with python implementations of the basic mechanics

gritty wedge Nov 4, 2020, 4:25 PM

#

any course recommendations then?

bitter harbor Nov 4, 2020, 4:25 PM

#

@hollow sentinel don't call the variable as a string

#

University? 🙃

gritty wedge Nov 4, 2020, 4:26 PM

#

hmmmmm.......thats too far

#

and no...coz i am just 12

bitter harbor Nov 4, 2020, 4:26 PM

#

Na there're quite a few ways to learn it, I learnt what I know about it through 3b1b

gritty wedge Nov 4, 2020, 4:26 PM

#

who is 3b1b?

bitter harbor Nov 4, 2020, 4:26 PM

#

It's pretty heavy tho regardless

hollow sentinel Nov 4, 2020, 4:26 PM

#

he's a youtuber

gritty wedge Nov 4, 2020, 4:26 PM

#

oh

bitter harbor Nov 4, 2020, 4:26 PM

#

3 blue 1 brown

gritty wedge Nov 4, 2020, 4:27 PM

#

o lol....among us memes pydis_dye

bitter harbor Nov 4, 2020, 4:27 PM

#

Na lol he's been around for a bit longer than the game :)

gritty wedge Nov 4, 2020, 4:28 PM

#

lol lemon_ping

hollow sentinel Nov 4, 2020, 4:29 PM

#

i would also recommend the kaggle mini courses

#

they're great for an intro to ML

gritty wedge Nov 4, 2020, 4:30 PM

#

thnx 🙂 .....i cant take u seriously with that profile pic....no offence

hollow sentinel Nov 4, 2020, 4:31 PM

#

it's just tony stark lmao

gritty wedge Nov 4, 2020, 4:31 PM

#

but it looks funny too

hollow sentinel Nov 4, 2020, 4:31 PM

#

that's the point

gritty wedge Nov 4, 2020, 4:31 PM

#

lmao

jade lava Nov 4, 2020, 4:34 PM

#

What I'm trying to figure out these days is how exactly these model constants are determined. I suppose I just need to find more tutorials about multiclassification models for things like the Reuters dataset.

hollow sentinel Nov 4, 2020, 4:36 PM

#

# What is the average lot size (rounded to nearest integer)?
avg_lot_size = home_data["LotArea"].mean()
#print(avg_lot_size)

# As of today, how old is the newest home (current year - the date in which it was built)
newest_home_age = home_data["YearBuilt"].mean()

# Checks your answers
step_2.check()

#

Incorrect: Incorrect value for avg_lot_size: 10516.828082191782

#

how is lot area not the lot size

#

Kaggle is so stupid

bitter harbor Nov 4, 2020, 4:39 PM

#

Would you not want the standard deviation of the lot sizes?

#

Also I'm assuming you'd want some form of Sig digs

hollow sentinel Nov 4, 2020, 4:41 PM

#

AttributeError: 'Series' object has no attribute 'stdev'

bitter harbor Nov 4, 2020, 4:42 PM

#

Do you have to call it in 1 line?

hollow sentinel Nov 4, 2020, 4:43 PM

#

what

#

oh do you mean find the mean first

bitter harbor Nov 4, 2020, 4:45 PM

#

statistics.stdev(dataset)

hollow sentinel Nov 4, 2020, 4:46 PM

#

i have to import statistics too right

bitter harbor Nov 4, 2020, 4:46 PM

#

Yea

hollow sentinel Nov 4, 2020, 4:46 PM

#

TypeError: can't convert type 'str' to numerator/denominator

#

there's strings in the dataset

#

dataset["lot_size"]

bitter harbor Nov 4, 2020, 4:47 PM

#

How did you call the mean then?

hollow sentinel Nov 4, 2020, 4:47 PM

#

avg_lot_size = home_data["LotArea"].mean()

bitter harbor Nov 4, 2020, 4:48 PM

#

I got that part but how did you call it if there's a string

hollow sentinel Nov 4, 2020, 4:48 PM

#

oh i meant when you do stdev the whole dataset there's strings in the dataset

bitter harbor Nov 4, 2020, 4:48 PM

#

Oh ya you're only doing it to the lot size

#

statistics.stdev(dataset, xbar)

#

That's with both args

#

Xbar being the median

hollow sentinel Nov 4, 2020, 4:49 PM

#

i don't think they're taking a standard dev

bitter harbor Nov 4, 2020, 4:50 PM

#

:(

hollow sentinel Nov 4, 2020, 4:50 PM

#

i looked up the answers

#

and they just want me to round up the 10516.828082191782

#

idek how to do that

#

i tried calling ,round()

bitter harbor Nov 4, 2020, 4:52 PM

#

To how many decimal points
Also this is a prime example of where you should use stdev why is that wrong

#

Or is it "too advanced"

hollow sentinel Nov 4, 2020, 4:53 PM

#

idk all they want is 10517

#

so how do you cast in python again

bitter harbor Nov 4, 2020, 4:55 PM

#

#bot-commands message

#

casting to an int floors the decimal so be careful with that

hollow sentinel Nov 4, 2020, 4:57 PM

#

thanks

#

newest_home_age = 2020 -(home_data["YearBuilt"].mean())

#

the correct answer is 8

heady hatch Nov 4, 2020, 4:58 PM

#

8

bitter harbor Nov 4, 2020, 4:58 PM

#

what're you getting?

hollow sentinel Nov 4, 2020, 4:58 PM

#

48.73219178082195

bitter harbor Nov 4, 2020, 4:58 PM

#

niccce

hollow sentinel Nov 4, 2020, 4:58 PM

#

lmaoooo

#

built for data science you already know 😆

heady hatch Nov 4, 2020, 4:59 PM

#

Are you supposed to take the mean of the year built or the subtraction?

hollow sentinel Nov 4, 2020, 4:59 PM

#

the directions say current year - the date in which it was built

#

ohhhh

#

nope still wrong

heady hatch Nov 4, 2020, 5:00 PM

#

On the other hand nice, only +/- 41.

bitter harbor Nov 4, 2020, 5:01 PM

#

what does home_data["YearBuilt"].mean() return?

hollow sentinel Nov 4, 2020, 5:02 PM

#

1971.267808219178

bitter harbor Nov 4, 2020, 5:02 PM

#

humor me and try with stdev

hollow sentinel Nov 4, 2020, 5:02 PM

#

AttributeError: 'Series' object has no attribute 'stdev'

#

you mean home_data["YearBuilt"].stdev() right

bitter harbor Nov 4, 2020, 5:03 PM

#

it's not an attribute

#

statistics.stdev(home_data["YearBuilt"])

hollow sentinel Nov 4, 2020, 5:05 PM

#

Incorrect value for newest_home_age: 30.202904042525258

#

it should be 8

bitter harbor Nov 4, 2020, 5:05 PM

#

I mean we're getting closer ¯_(ツ)_/¯

hollow sentinel Nov 4, 2020, 5:05 PM

#

lmao i'm sorry

#

I thought this would be easy and I would get it done in like 2-3 days

#

it's a mini course lmao

bitter harbor Nov 4, 2020, 5:06 PM

#

it's all good lol this takes time, are there any outliers you're expected to clean?

hollow sentinel Nov 4, 2020, 5:07 PM

#

no they didn't ask me to

#

there are some columns i'd drop

#

but they didn't ask so

bitter harbor Nov 4, 2020, 5:08 PM

#

weird idk sorry

hollow sentinel Nov 4, 2020, 5:08 PM

#

it's ok

#

i have the answers so

#

lmao the correct answer was 10

#

how tf they be getting 10

#

i feel like i'm in math class rn

#

kaggle is lame

heady hatch Nov 4, 2020, 5:17 PM

#

@hollow sentinel What's the min? like home_data['YearBuilt'].min

hollow sentinel Nov 4, 2020, 5:44 PM

#

1872 @heady hatch

heady hatch Nov 4, 2020, 5:45 PM

#

Oh what about the max?

hollow sentinel Nov 4, 2020, 5:45 PM

#

2010

heady hatch Nov 4, 2020, 5:45 PM

#

Yea lol

#

I think that's how they got the 10.

bitter harbor Nov 4, 2020, 5:45 PM

#

that'd do it

hollow sentinel Nov 4, 2020, 5:45 PM

#

OH HOW OLD IS THE NEWEST HOME

#

man i'm stupid

heady tide Nov 4, 2020, 5:46 PM

#

More or less, if I would want to compute the Tf-IDF vectorizer for 12 GBs of pdfs, how much time will that take? should I consider cloud computing?

heady hatch Nov 4, 2020, 5:50 PM

#

If you have the resource, I would go cloud.

But tfidf can also be done on regular machine itself. Probably need to use a generator instead if you don't have the memory.

hollow sentinel Nov 4, 2020, 6:29 PM

#

I don't think I like the kaggle mini courses

#

find it kind of boring

hollow sierra Nov 4, 2020, 7:13 PM

#

https://stackoverflow.com/questions/64686302/using-pickle-object-of-model-to-predict-output
Can anyone help me with this?

Stack Overflow

Using pickle object of model to predict output

I have a build a ML model and exported it as pickle object file,final goal is to use this object file to make prediction in web app.
What I want to know
How to use this pickle file to predict outp...

heady hatch Nov 4, 2020, 7:19 PM

#

@hollow sierra what do you need help with exactly?

hollow sierra Nov 4, 2020, 7:21 PM

#

I have build ML model and exported to pickle file now i want to use it in a web app to make predictions .I want to use Node.js in web app , So is it possible to use this pickle model in javascript enviorment.

#

@heady hatch

heady hatch Nov 4, 2020, 7:21 PM

#

Right right.

#

What kind of ML model is it? and from what library? Or was it written with native Python?

hollow sierra Nov 4, 2020, 7:22 PM

#

Yes, sklearn

#

of python

#

It is a regression model.

heady hatch Nov 4, 2020, 7:23 PM

#

So you have couple options here.

build Python microservice instead of going straight into NodeJS
https://github.com/nok/sklearn-porter
https://www.npmjs.com/package/scikit-learn

GitHub

nok/sklearn-porter

Transpile trained scikit-learn estimators to C, Java, JavaScript and others. - nok/sklearn-porter

npm

scikit-learn

Node.js wrapper of scikit-learn

#

There are also couple pickle converters.

#

https://github.com/sciyoshi/pickle-js
https://github.com/jlaine/node-jpickle

GitHub

sciyoshi/pickle-js

Automatically exported from code.google.com/p/pickle-js - sciyoshi/pickle-js

GitHub

jlaine/node-jpickle

Full-javascript parser for Python's pickle format. Contribute to jlaine/node-jpickle development by creating an account on GitHub.

#

But pickle is a finicky object.

#

You can try to convert it first and see how it goes.

#

If not then I would try one of the options above.

hollow sierra Nov 4, 2020, 7:27 PM

#

Ok thanks @heady hatch , I have seen articles and tutorials on deployment of pickle ML model ,All the time flask was used .Is it easier or recommended to use python based web-library when u have python ml models?

heady hatch Nov 4, 2020, 7:28 PM

#

It's recommended to use Python because of the consistency, pickling is weird because it takes environment into account.

#

I don't know how Python environment will work transitioning to non-Python environments.

#

Plus you don't have a language switch.

#

You don't necessary need Python to do full backend.

you can set ML up as a microservice

#

and have your backend call the ML API.

hollow sierra Nov 4, 2020, 7:30 PM

#

Have deployed ml models?If yes, how ?

heady hatch Nov 4, 2020, 7:30 PM

#

Are you asking if I have deployed?

hollow sierra Nov 4, 2020, 7:30 PM

#

yes

heady hatch Nov 4, 2020, 7:31 PM

#

Those articles and tutorials should serve as a good entry point.

#

You can create a simple backend with Flask, add gunicorn or uvicorn on top of the framework.

#

load ml model and set prediction as an api endpoint.

hollow sierra Nov 4, 2020, 7:34 PM

#

I got your point and it cleared some doubts of mine. Thanks @heady hatch for your time and help.

jade lava Nov 4, 2020, 8:22 PM

#

Sanity check: if different approaches to how we structure our layers yield the same accuracy results, then doesn't that imply that our dataset isn't good enough in quality to be able to predict what you're trying to do?

umbral pollen Nov 4, 2020, 8:30 PM

#

hi

plush zenith Nov 4, 2020, 8:43 PM

#

Hi sorry

#

this is the place where i can ask more python math related things?

opal ferry Nov 4, 2020, 10:03 PM

#

looking for some feedback of how dumb it would be to use a function like this to shrink a df's memory footprint

def auto_cats(df):
  for col in df.columns:
    curr_usage = df[col].memory_usage(deep=True)
    if curr_usage > df[col].astype('category').memory_usage(deep=True):
      df[col] = df[col].astype('category')
  return df.info(memory_usage='deep')```

whole vortex Nov 4, 2020, 10:04 PM

#

Hey guys, I was wondering if anyone here has some half decent experience with using seaborn / pandas

#

I'm trying to complete some tasks but I'm unsure if the way I'm representing the data is the best way? Even I'm not really understanding the graphs that this spits out - would've thought they'd have to be somewhat interpretable

#

For example

📎 Screenshot_2020-11-04_at_22.05.41.png

#

I'm quite new to both of these libraries so just figuring things out - any advice, ideas, anything really would be greatly appreciated

lapis sequoia Nov 4, 2020, 10:18 PM

#

@jade lava Hey, just saw your reaction now for my openpyxl question. Does it take long to create a code to automate reports/tasks? I have weekly reports I have to send out and it's a headache to go through the process of having to clean them up, same repetitive task for the most part.

jade lava Nov 4, 2020, 10:55 PM

#

Not enough info to answer

lapis sequoia Nov 4, 2020, 11:42 PM

#

Not enough info to answer
@jade lava What info do you need?

jade lava Nov 4, 2020, 11:43 PM

#

You have to use VBA for macros, but you can read and write Excel documents, sure...

lapis sequoia Nov 4, 2020, 11:51 PM

#

#discord-bots

lapis sequoia Nov 5, 2020, 12:07 AM

#

Not asking about VBA

hollow sentinel Nov 5, 2020, 2:58 AM

#

# from _ import _
from sklearn.tree import DecisionTreeRegressor
#specify the model. 
#For model reproducibility, set a numeric value for random_state when specifying the model
iowa_model = DecisionTreeRegressor()

# Fit the model
iowa_model.fit(X,y)

# Check your answer
step_3.check()

#

so I have this

#

Incorrect: You forgot to set the random_state.

#

sike i figured it out sorry

heady hatch Nov 5, 2020, 3:18 AM

#

I think you forgot to set the random_state.

#

:^)

humble flame Nov 5, 2020, 3:27 AM

#

Hello

#

I am new to data science but interested

#

Anyone have any advice on how to start or where to go?

heady hatch Nov 5, 2020, 3:29 AM

#

Do you like books or courses?

humble flame Nov 5, 2020, 3:30 AM

#

Well, I would prefer courses but books are fine

#

I have good experience with python itself already, I am just new to the topic of data science

heady hatch Nov 5, 2020, 3:40 AM

#

hmm in terms of courses, I'm not too sure about MOOC. But I've heard people liking Data Camp.

#

But I'm sure there are tons of MOOC.

#

pinging @hollow sentinel , they've had lots of experience there.

humble flame Nov 5, 2020, 3:43 AM

#

Hi, thanks a lot, I will try them out. Any good books too?

undone flare Nov 5, 2020, 3:44 AM

#

I don't like Data Camp, it's like more of fill in the blank type

heady hatch Nov 5, 2020, 3:44 AM

#

https://www.google.com/books/edition/Data_Science_from_Scratch/JYodCAAAQBAJ?hl=en&gbpv=0

Google Books

Data Science from Scratch

#

I think this is my go to for intro to data science with Python.

#

You're implementing algorithms from scratch.

humble flame Nov 5, 2020, 3:45 AM

#

Brilliant, thanks a lot

hollow sentinel Nov 5, 2020, 4:02 AM

#

I’d recommend python for data science & machine learning bootcamp

#

and Kaggle mini courses

narrow flume Nov 5, 2020, 4:28 AM

#

hello does anyone know well about Matplotlib in python?

hollow sentinel Nov 5, 2020, 4:29 AM

#

thanks @heady hatch for the book rec bc this looks really good

#

i might do this over the stanford ML course

somber bane Nov 5, 2020, 4:44 AM

#

can someone help me out with pandas

#

📎 unknown.png

#

The following code does save and add a new row into the file

#

but when I re run the program, it will not create a new row, it will just replace the row of data that was created previosul

agile wing Nov 5, 2020, 4:48 AM

#

someone image recognize ice cream for me, cuz I want ice cream

#

ughh

narrow flume Nov 5, 2020, 5:05 AM

#

Can anyone help me with Matplotlib

austere swift Nov 5, 2020, 5:27 AM

#

whats your question

heady hatch Nov 5, 2020, 5:41 AM

#

Can we search for index by values in pandas DataFrame?

#

ie given a dataframe

a b
1 0
3 2
5 4

get the location of value 0.

I'm also curious how would duplicate values work.

hasty grail Nov 5, 2020, 5:44 AM

#

From my brief search on Google it seems that you're supposed to convert it into np.ndarray and then use np.ndarray.nonzero to get the indices you're interested in

heady hatch Nov 5, 2020, 5:45 AM

#

I'm a bit confused. How would I use nonzero to search?

hasty grail Nov 5, 2020, 5:46 AM

#

in your example you would first get the column 'b' from your DataFrame, then convert it to NumPy

#

afterwards you can create a boolean mask and get nonzero of that mask

heady hatch Nov 5, 2020, 5:47 AM

#

Oh but how would I know what column it is in?

hasty grail Nov 5, 2020, 5:48 AM

#

if you look at the docs for nonzero you'd understand

heady hatch Nov 5, 2020, 5:49 AM

#

Maybe I'm misunderstanding something.

you would use nonzero after getting the b column, right?

#data-science-and-ml

TODO: Use the gradients in the grads dictionary to update the