#data-science-and-ml | Python | Page 373

lapis sequoia Feb 3, 2022, 1:37 AM

#

Given more time i;d feel so much more confident but i've got 48 hours to do something i dont have under my grasp

#

just to stay in the program

desert oar Feb 3, 2022, 1:37 AM

#

i'm sorry you are under such pressure. the market is hot, but not hot enough to warrant making yourself insane for it. tell your family that a professional data scientist said that

#

the market isn't going anywhere. there might be a slump if companies start folding and/or laying people off after the current hype wave

#

but that will be short-lived

serene scaffold Feb 3, 2022, 1:37 AM

#

desert oar i'm sorry you are under such pressure. the market _is_ hot, but not hot enough t...

Seconded by another professional data scientist.

lapis sequoia Feb 3, 2022, 1:37 AM

#

I just hate that im in this position

#

I just want to turn in something acceptable

desert oar Feb 3, 2022, 1:38 AM

#

data strategy is becoming a necessity for pretty much all medium+-sized companies, and data science is not going to be automated away for at least a decade

lapis sequoia Feb 3, 2022, 1:38 AM

#

im not worried about killing the project

desert oar Feb 3, 2022, 1:38 AM

#

probably more lke 5 decades lol

lapis sequoia Feb 3, 2022, 1:38 AM

#

i want to learn

#

i just want to finish this fucking project and have SOMETHING that says hey i tried

desert oar Feb 3, 2022, 1:38 AM

#

great

lapis sequoia Feb 3, 2022, 1:38 AM

#

i can share my last project

desert oar Feb 3, 2022, 1:38 AM

#

so, what do you currently know?

lapis sequoia Feb 3, 2022, 1:38 AM

#

if that could show insights to what im capable of better than me explaining it

desert oar Feb 3, 2022, 1:38 AM

#

sure, that might give us a starting point

lapis sequoia Feb 3, 2022, 1:38 AM

#

which i got an ASS fucking score on

desert oar Feb 3, 2022, 1:38 AM

#

but you should also explain anything you've learned between now and then

lapis sequoia Feb 3, 2022, 1:39 AM

#

I've learned EDA functions with pandas and seaborn for visualizing concepts

#

ill link you my notebook one second...

arctic wedgeBOT Feb 3, 2022, 1:40 AM

#

Hey @lapis sequoia!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

lapis sequoia Feb 3, 2022, 1:41 AM

#

idk how to get this to yall through ere

#

its an ipynb

#

Screen_Shot_2022-02-02_at_8.41.51_PM.png

Screen_Shot_2022-02-02_at_8.42.24_PM.png

#

shit like this basically

#

nothing special. just what I have time to put together

#

that was my first project

desert oar Feb 3, 2022, 1:43 AM

#

ok, fair enough

#

did you get any feedback on why the score was bad?

lapis sequoia Feb 3, 2022, 1:43 AM

#

Yes ill link SS of it

#

and it sucks too because

#

ill be in "class"

#

and my professor will be trying to correct his own errors for 30+ minutes

desert oar Feb 3, 2022, 1:45 AM

#

yikes

#

that's really bad

#

i would definitely attempt to ask for your money back somehow

#

maybe go through your credit card company or bank if you have to

#

that's... not worth your time to suffer though, and certainly not your money

#

i'm sorry this was a bad experience

#

so do you know what "linear regression" is?

lapis sequoia Feb 3, 2022, 1:46 AM

#

Screen_Shot_2022-02-02_at_8.42.29_PM.png

Screen_Shot_2022-02-02_at_8.46.27_PM.png

desert oar Feb 3, 2022, 1:47 AM

#

if it makes you feel better, i had no idea what "differential pricing" meant, and i had to look it up (as per the problem statement)

lapis sequoia Feb 3, 2022, 1:47 AM

#

my bad. wrong one on top

desert oar Feb 3, 2022, 1:47 AM

#

ok, and do you understand that feedback?

#

they gave very specific suggestions

lapis sequoia Feb 3, 2022, 1:47 AM

#

Screen_Shot_2022-02-02_at_8.47.30_PM.png

#

Yes i just have trouble executing the approaches they outline

#

I sincerely wish i had more time under my belt

#

its a quantitative representation of two variables relationship

#

Such as brand name and price

#

or mileage and price

#

etc

#

idk how to generate something to predict that though..

desert oar Feb 3, 2022, 1:55 AM

#

well i think you need to start by understanding the feedback

#

unfortunately a lot of topics in data science heavily depend on prior topics

#

i know it's difficult because you are behind in the class

#

the "better conclusions" comment is interesting - this suggests that you might also be struggling to interpret the stuff you do know how to produce

#

so let's start at the very very beginning. you know what a mean (aka average) is, right? median? standard deviation?

lapis sequoia Feb 3, 2022, 1:56 AM

#

of course

#

spread from the average

#

high Standard of deviation means more outliers if im not mistaken

desert oar Feb 3, 2022, 1:57 AM

#

high standard deviation means higher average spread

lapis sequoia Feb 3, 2022, 1:57 AM

#

low standard is more consolidated around the mean no

desert oar Feb 3, 2022, 1:57 AM

#

maybe that means outliers, or maybe it means that the data is just spread out

#

but ok, you know that much. good

#

and do you know what a "mode" is?

#

do you know what a "frequency table" is?

lapis sequoia Feb 3, 2022, 1:58 AM

#

not entirely sure what a freq table is

#

i know what the mode is

#

most occurring value

#

I see that it can be one or the other

desert oar Feb 3, 2022, 1:59 AM

#

yep

#

a frequency table is just a table of how often each value appears in the data

#

so if you have the data H H H T T H T H H T of coin flips, then the frequencies are H: 6 and T: 4

#

in pandas you would invoke the .value_counts() method to compute this on a column in a dataframe

#

you might have seen that one

lapis sequoia Feb 3, 2022, 2:00 AM

#

havent.. im most familiar w

desert oar Feb 3, 2022, 2:00 AM

#

note that this only makes sense for categorical variables

lapis sequoia Feb 3, 2022, 2:00 AM

#

oh i see what you mean there

desert oar Feb 3, 2022, 2:01 AM

#

it makes no sense to compute a frequency table for a continuous variable

#

you'd get all counts of 1, for the most part

#

but you can extend this to two categorical variables, and this is known as a "cross-tabulation", or "crosstab" for short, and it is implemented with pandas.crosstab

lapis sequoia Feb 3, 2022, 2:02 AM

#

okay

#

So

#

i just implemented this in my last project

#

data.value_counts()

#

just taking a look at it right quick

desert oar Feb 3, 2022, 2:03 AM

#

good, so you are familiar with that much

lapis sequoia Feb 3, 2022, 2:03 AM

#

and I see

desert oar Feb 3, 2022, 2:03 AM

#

i believe this is what they were asking for when they asked for the "proportion of customers" across different categories

lapis sequoia Feb 3, 2022, 2:03 AM

#

with my repeated values such as

#

for example i have states that sales are made from

#

and those values have the 1

desert oar Feb 3, 2022, 2:03 AM

#

what do you mean by "those values have the 1"?

lapis sequoia Feb 3, 2022, 2:03 AM

#

Screen_Shot_2022-02-02_at_9.03.53_PM.png

#

Is there a way to display this cleaner

desert oar Feb 3, 2022, 2:04 AM

#

ah, you called it on the entire dataframe

#

that applies .value_counts() to every column at once

#

as you can see it's quite messy, and the ... means that it's cutting off rows in the middle

#

also, does it make senes to compute .value_counts() on order id?

#

it's not a continuous variable

#

it is clearly categorical

#

but does it make sense to measure the counts of order ids?

#

probably not, unless you are expecting lots of repeated order ids!

#

do you see why that is the case?

#

also in general it's much more useful to look at counts of individual variables: each variable is worth examining on its own before trying to examine their relationships

lapis sequoia Feb 3, 2022, 2:05 AM

#

Im wrapping my head around your comments

#

and let me say; I appreciate immensely your help

desert oar Feb 3, 2022, 2:06 AM

#

this is what actual professional data scientists do btw, i am not dumbing anything down for you. this is literally how i start every project

lapis sequoia Feb 3, 2022, 2:06 AM

#

i see the ...

#

it'll cut my rows if i do some

#

data.head() type shit

desert oar Feb 3, 2022, 2:06 AM

#

it's cutting off rows to avoid dumping a huge amount of data

lapis sequoia Feb 3, 2022, 2:06 AM

#

i see what you mean about order id

desert oar Feb 3, 2022, 2:06 AM

#

good!

#

now there are cases when you are interested in the counts of order ids

#

for example, you are given a dataset by some finance person and they claim that every row has a unique order id

#

the first thing you should do with that dataset, is verify that assertion

#

because people make mistakes

#

in that case, you are checking for the counts of order ids to assert that they all are indeed 1

#

otherwise there is a problem w/ the dataset and you need to either 1) make a decision about how to deduplicate rows, or 2) send the data back and tell them to fix it

#

de-duplication is a big serious topic in applied data science and i will assume for the most part that you won't have to do it (and you don't have time to worry about it)

#

but i hope it's clear why order id isn't usually useful for data analysis, in this intro-level scenario

#

so let's pause. given what i said above, how could you improve that one output you generated?

#

give me a couple of ideas

#

and yes i am happy to help, i happen to have a few minutes of free time and i get really frustrated when i see that people got sucked into stuff like this and are struggling

lapis sequoia Feb 3, 2022, 2:09 AM

#

Im following

#

digesting;

desert oar Feb 3, 2022, 2:10 AM

#

sure, take your time

lapis sequoia Feb 3, 2022, 2:10 AM

#

I dont feel incapable of doing or learning these things

#

Its just

#

ends gotta be met and I'm stressed as fuck. but Im following - give me a sec to

#

confused by what you are saying here

desert oar Feb 3, 2022, 2:11 AM

#

sure, you definitely seem capable under less-bad circumstances. but even smart people have limits of what they can do with limited time. so don't feel bad that you're struggling

lapis sequoia Feb 3, 2022, 2:11 AM

#

need to understand why 1 is what it is in this situation

desert oar Feb 3, 2022, 2:12 AM

#

i generally think that most people are capable of learning most things, at least conceptually

lapis sequoia Feb 3, 2022, 2:12 AM

#

its not clicking entirely

#

im guessing that

#

youre saying that every ID is unique

desert oar Feb 3, 2022, 2:12 AM

#

lapis sequoia youre saying that every ID is unique

precisely!

lapis sequoia Feb 3, 2022, 2:12 AM

#

but if its not - im combing it to ensure that they are indeed unique

#

if theyre repeated then we have an error

desert oar Feb 3, 2022, 2:12 AM

#

right

lapis sequoia Feb 3, 2022, 2:12 AM

#

bc each order should be its own

desert oar Feb 3, 2022, 2:12 AM

#

if any order id is repeated, then it will have a count of >1

lapis sequoia Feb 3, 2022, 2:12 AM

#

yes!

#

okay so I see the value of the count

#

Understood

desert oar Feb 3, 2022, 2:13 AM

#

right, but i hope you also see why this is different from "data analysis" as such. this is more like checking that the data is "clean" before trying to work with it

lapis sequoia Feb 3, 2022, 2:13 AM

#

desert oar otherwise there is a problem w/ the dataset and you need to either 1) make a dec...

exactly

desert oar Feb 3, 2022, 2:13 AM

#

and therefore why you should treat it separately from variables that you intend to use in the model

lapis sequoia Feb 3, 2022, 2:13 AM

#

heard and understood

#

its the opposite of scanning missing values almost

#

just looking for repeats where they shoudnt be

desert oar Feb 3, 2022, 2:14 AM

#

yep! and great, i'm glad you know enough to look for missing values

lapis sequoia Feb 3, 2022, 2:14 AM

#

Im not a complete idiot I just feel like one because

#

i paid out my ass for this

#

i feel like im disappointing myself and others when im just trying to make shit work out

desert oar Feb 3, 2022, 2:15 AM

#

i basically spent 2021-2022 so far paying out the ass for things that ended up being kind of a bust, or overpaying for things that i should have paid less for

#

it sucks

lapis sequoia Feb 3, 2022, 2:15 AM

#

if i didnt care i wouldnt be embarrasing myself on discord

desert oar Feb 3, 2022, 2:15 AM

#

you are not embarrassing yourself and you shouldn't be embarrassed

#

taking a course and working a full time job is hard enough

#

it's worse when the course is clearly bad and you are unsupported as a student

lapis sequoia Feb 3, 2022, 2:15 AM

#

I work for a voting machine company and

desert oar Feb 3, 2022, 2:15 AM

#

and the fact that it's disgustingly expensive is adding insult to injury. i feel you, and no you shouldn't feel bad about asking for help

lapis sequoia Feb 3, 2022, 2:15 AM

#

basically its crunch time 100% of the time

desert oar Feb 3, 2022, 2:16 AM

#

that sucks too. nobody should have to work like that

lapis sequoia Feb 3, 2022, 2:16 AM

#

im really hoping tomorrow we have the day off bc

#

im in pittsburgh rn but we have a huge winter storm coming in

#

praying the county closes the warehouse so i can actually work on this!

#

otherwise ima be doing other shit from 6am to 5pm

#

and lemme tell you trying to do this shit when ur dead exhausted is

desert oar Feb 3, 2022, 2:16 AM

#

can you take a sick day? don't even get me started on how awful us labor laws are...

lapis sequoia Feb 3, 2022, 2:17 AM

#

I could but itd comprimise my future work

desert oar Feb 3, 2022, 2:17 AM

#

well that's fucked up in and of itself

lapis sequoia Feb 3, 2022, 2:17 AM

#

they flew me out here for two weeks

#

so im staying in a hotel and riding airfares on their tab

desert oar Feb 3, 2022, 2:17 AM

#

ah i see, you're onsite somewhere

#

that's rough for sure

#

and it seems like you work enough hours that even applying for other jobs is a big chore

#

ok, so let me try to help you a bit more and at least you can think about this stuff tomorrow a little, even if you can't get hands on

#

the assignment asked for proportions, i.e. the fraction of data points, which you can always convert to a percentage. you can easily get a % from a count by dividing by the total number of data points. so in the heads/tails example above, you have 10 data points, and therefore you have proportions 60% H and 40% T

#

6/10 and 4/10 are 0.6 and 0.4, i.e. 60% and 40%

#

counts and proportions are basically equivalent, but humans are generally bad at numbers so i like to present both when it isn't cumbersome to present both

#

e.g. "our experiment ran 10 times, and we found 6 heads (60%) and 4 tails (40%)"

lapis sequoia Feb 3, 2022, 2:22 AM

#

I see

desert oar Feb 3, 2022, 2:24 AM

#

and this is what i was going to say before, about extending a frequency table to two categorical variables:

let's say you are looking at people's clothing, and you are writing down 2 binary variables for each person. (binary means just "yes" and "no", which are represented in python as True and False or 1 and 0, depending on the situation). the two variables in this case are "is the person wearing boots?" and "is the person carrying an umbrella?" so the data might look like this:

boots?   umbrella?
True     False
True     True
False    False
False    True
True     True
False    False
True     False
True     True
False    False

so the crosstab of boots? and umbrella? would look like this:

         umbrella
boots    True  False
True     3     2
False    1     3

that's 9 data points, so you should confirm that the sum of all the numbers in the crosstab is indeed 9

#

of course, if you have more than 2 categories in each variable, the crosstab will have more rows or columns

#

and if you have a lot of categories, cross tabs and frequency tables start to get a bit messy and hard to read, in which case you would fall back to other techniques that you probably don't need to worry about right now

#

and of course you can compute proportions for a crosstab too, e.g.:

         umbrella
boots    True  False
True     33%   22%
False    11%   33%

which should add up (approximately) to 100 (in this case it adds up to 99 because of rounding)

#

the crosstab is a "bivariate" analysis, meaning "two variables". whereas the frequency table for a single variable is called "univariate", meaning "one variable".

#

another name for a crosstab that you might see in statistics is a "contingency table"

#

and what's really interesting is that you can recover the frequency table for each variable individually from the crosstab!

         umbrella
boots    True  False   Total
True     3     2     | 5
False    1     3     | 4
         ------------+---
Total    4     5     | 9

#

maybe just ruminate on that for a while

lapis sequoia Feb 3, 2022, 2:33 AM

#

ruminating

desert oar Feb 3, 2022, 2:33 AM

#

feel free to @ me with questions, i need to work on something for a bit

lapis sequoia Feb 3, 2022, 2:34 AM

#

I appreciate everything man

#

Im going to add you

serene scaffold Feb 3, 2022, 3:14 AM

#

salt rock lamp is so fucking good

sleek tapir Feb 3, 2022, 4:48 AM

#

is real analysis important for ml

hollow sentinel Feb 3, 2022, 5:15 AM

#

i googled it and it says it's not that important, but idk

#

also did not know what real analysis was before you asked

#

sounds like proof-based stuff

iron basalt Feb 3, 2022, 5:38 AM

#

sleek tapir is real analysis important for ml

Yes and no. It might come up, but its benefits are more indirect (but not insignificant). It will teach you how to think and make you more comfortable with mathematics in general. An important skill to have when reading / understanding other's work. It also depends on what you consider to be part of "real analysis".

#

(If you want to really understand how probability works (which can come up if you are doing very experimental ML), then you need it)

desert oar Feb 3, 2022, 5:40 AM

#

serene scaffold salt rock lamp is so fucking good

i cant believe you saved this 😆

steel mantle Feb 3, 2022, 5:40 AM

#

Where is best to get started with data science?
Hope someone mentors me from here

iron basalt Feb 3, 2022, 5:41 AM

#

So while it's not directly needed, I would still recommend it, just for getting into the right head space.

#

(Also it's fun, if you like math)

sleek tapir Feb 3, 2022, 5:50 AM

#

wait wat degree did u guys have

#

before going data science

#

or mle

#

im thinking of doing either

#

im from australia

worthy nest Feb 3, 2022, 7:24 AM

#

steel mantle Where is best to get started with data science? Hope someone mentors me from her...

It depends on what your background is. Give people here more information about your training and degree/major, and someone will respond.

flint grotto Feb 3, 2022, 9:12 AM

#

hello.

#

can you recommend data science books?

#

for O relly books.

earnest fog Feb 3, 2022, 9:59 AM

#

flint grotto can you recommend data science books?

z-lib.org

#

search for O’REILLY

#

You should find many books

#

look for the 2019-2021 ones

odd meteor Feb 3, 2022, 10:27 AM

#

sleek tapir wait wat degree did u guys have

Data Science is an interdisciplinary field so alotta people in this field started off from different backgrounds. I have a friend who have a major in Fishery but he's working as a Data Scientist now 😀

In essence, you might as well study Human Kinetics and Sports Education and still end up working as a Data Scientist if you put in the much needed work to learn it.

Notwithstanding, going for a major in Mathematics, Computer Science or Statistics will definitely offer you more options and give you an edge over others.

digital stirrup Feb 3, 2022, 10:29 AM

#

Hey guys,
I'm trying to use the gpt-3 question answering function.
Anyone have clue how to use it?
For example if I want to create a bot that acts like a real human with same personality like if someone ask him what is age he will answer the same age but in other way

sleek tapir Feb 3, 2022, 10:29 AM

#

odd meteor Data Science is an interdisciplinary field so alotta people in this field starte...

Im stats cs

#

how much do most data scientists in aud

#

stats is hard

#

im doing theiss

sweet sequoia Feb 3, 2022, 10:59 AM

#

I have tried importing the chess module using pip. It said dependency satisfied but when I use any of it's function, it does not work?

#

please help smeone

nova smelt Feb 3, 2022, 11:02 AM

#

Hey, i am having a weird problem with training a NN. When it its through about 3/4 of training the loss suddenly gets the value nan

#

at around the point of the red square. before that it has normal numerical value
any idea why this happens?

eager imp Feb 3, 2022, 11:06 AM

#

Has any of you any experience in combining genetic programming with ML?

sweet sequoia Feb 3, 2022, 11:08 AM

#

odd meteor Feb 3, 2022, 11:33 AM

#

sleek tapir Im stats cs

I'm not sure I understand what 'lm' means though. I have a major in Statistics. And I believe it's not really that hard.

I enjoyed Stats more than Math. I even picked more CS electives than Math electives when I was in school.

Goodluck on your thesis ✌️

sleek tapir Feb 3, 2022, 11:35 AM

#

i am stats is hard

sleek tapir Feb 3, 2022, 11:36 AM

#

odd meteor I'm not sure I understand what 'lm' means though. I have a major in Statistics. ...

r u australian

#

how is stats not hard theres so much proof writing

#

i struggle in bayesian the most

#

how bout stochastic calculus the list goes on

odd meteor Feb 3, 2022, 11:39 AM

#

sweet sequoia I have tried importing the chess module using pip. It said dependency satisfied ...

Go to your windows command prompt and ensure the library you installed is pointing to the right PATH.

Also, check if you're working in the same environment where the library was installed

odd meteor Feb 3, 2022, 11:40 AM

#

sleek tapir r u australian

No. I'm Nigerian

sleek tapir Feb 3, 2022, 11:41 AM

#

o lol

#

i'm chinese then

sweet sequoia Feb 3, 2022, 11:42 AM

#

odd meteor Go to your windows command prompt and ensure the library you installed is pointi...

how do I check where the library is installed

sleek tapir Feb 3, 2022, 11:42 AM

#

just celebrated new year

odd meteor Feb 3, 2022, 11:42 AM

#

nova smelt Hey, i am having a weird problem with training a NN. When it its through about 3...

It's probably a vanishing gradient problem. Investigate further to ensure it's not the problem of exploding gradient or vanishing gradient

sweet sequoia Feb 3, 2022, 11:43 AM

#

sweet sequoia how do I check where the library is installed

?

nova smelt Feb 3, 2022, 11:43 AM

#

odd meteor It's probably a vanishing gradient problem. Investigate further to ensure it's n...

hmm okay. Do you know any resources that talk about this or do you know how i can find that out?

sleek tapir Feb 3, 2022, 11:43 AM

#

im struggling in tittanic

odd meteor Feb 3, 2022, 11:54 AM

#

sleek tapir how is stats not hard theres so much proof writing

To be honest I ain't gon lie, the proving part is what I love most (especially when it's going well) 😀

You could legit use up 3 sheets of paper to prove a stats equation. Of all the proofing I did I enjoyed Experimental Design, Confounding, and Gambler's Ruin class the most.

Not to say, there are no topic in Stats I really don't enjoy. I particularly don't enjoy sample survey classes lol.

One of the ways to get past what you struggle with is to:

Make friends with the brilliant guys in your class, ask them to help you understand the concepts you struggle with.
Attend after-lecture tutorials (if such exist in your class)

odd meteor Feb 3, 2022, 12:04 PM

#

sweet sequoia how do I check where the library is installed

In your command prompt, search for that chess library in maybe the scripts folder (could be different on your pc) and ensure it's loaded in the right directory where your python is installed.

There are different way to solve this problem tho. Check stackoverflow.com

Are you working on Jupyter Notebook? When you try importing this chess library in JNB do you get any error message?

odd meteor Feb 3, 2022, 12:08 PM

#

nova smelt hmm okay. Do you know any resources that talk about this or do you know how i ca...

I'm lazy to type now but do check this https://www.analyticsvidhya.com/blog/2021/06/the-challenge-of-vanishing-exploding-gradients-in-deep-neural-networks/

Analytics Vidhya

Yash Bohra

Vanishing and Exploding Gradients in Deep Neural Networks

In this article, Vanishing and exploding gradients in a deep neural network is explained and the techniques to solve it

nova smelt Feb 3, 2022, 12:09 PM

#

Thanks. Tho i was dumb and just realised i had some nan vlaues in my data🤦‍♂️

#

thanks for your help tho

sleek tapir Feb 3, 2022, 12:15 PM

#

odd meteor To be honest I ain't gon lie, the proving part is what I love most (especially w...

tbh i find stats better than a cs degree

#

for ml

#

is ai/ml better for maths degrees

#

than cs degrees

sour spindle Feb 3, 2022, 12:44 PM

#

Yeah i split it into train test and validation. The apple one is more reliable since it has more data and i trained it using the same parameters.

#

It isnt a forcast. Its a generated trading strategy

#

I used around 13 different indicators to get the signals

#

Here is the apple ticker used only on test data which wasnt used in the model

#

The accuracy when matching up with the 1s and 0s of the position signal method i used is around 72% on the unseen test data which is shown above

sour spindle Feb 3, 2022, 1:40 PM

#

Tesla went public around 2010 where the graph started and ended in 2022 and the graph includes the training testing and validation data

mild dirge Feb 3, 2022, 1:42 PM

#

sour spindle Tesla went public around 2010 where the graph started and ended in 2022 and the ...

Yeah but the reason I point that out is that the success of the strategy should not be measured by the return on your investment. as simply buying at the beginning and selling somewhere in the last few years would have a 100x+ return

mild dirge Feb 3, 2022, 1:42 PM

#

sour spindle Here is the apple ticker used only on test data which wasnt used in the model

And how does the image show the training and testing data?

sour spindle Feb 3, 2022, 1:43 PM

#

mild dirge Yeah but the reason I point that out is that the success of the strategy should ...

Did you see the year roi and the strat year roi

sour spindle Feb 3, 2022, 1:44 PM

#

mild dirge And how does the image show the training and testing data?

That image is only on data that was unseen by the model

#

It wasnt used in validation either

mild dirge Feb 3, 2022, 1:44 PM

#

how does the model perform on test data that does not increase heavily over a long time?

sour spindle Feb 3, 2022, 1:45 PM

#

mild dirge how does the model perform on test data that does not increase heavily over a lo...

It out performs around 2 to 10%

#

I am trying to update the y data generation to get better results

mild dirge Feb 3, 2022, 1:47 PM

#

It's just pretty hard to get reliable tests on these types of prediction models. It looks like a fun project though, but I wouldn't be so sure about the performance by testing it on a few test sets.

#

It's still impressive btw not trying to belittle you or anything

sour spindle Feb 3, 2022, 1:49 PM

#

mild dirge _It's still impressive btw not trying to belittle you or anything_

Thanks

mild dirge Feb 3, 2022, 2:45 PM

#

I wouldn't know yet, ask the question first 😛

#

Please don't dm me. And think about why one would normalize data and if the amount of figures would be relevant.

warm raven Feb 3, 2022, 4:04 PM

#

Hello I am trying to combine the results of a couple masks to make another column in my dataframes indicating whether revenue is recurring or non recurring

#


        pcn_mask = x['prod_code_name'].isin(gfs['prod_code_name']).any()
        #print(pcn_mask)
        pidmap_mask = x['PRODUCT_ID_MAP'].iloc[i].isin(gfs['PRODUCT_ID_MAP']).any()
        #print(pidmap_mask)
        sector_mask = x['Sector'].isin(gfs['Sector']).any()
        #print(sector_mask)
        
        relevant_product_codes = ["Usf33", "Usf34", "Us756", "Usf37", "Usf40", "Usf29"]
        product_code_mask = gfs['Product_Code'].isin(relevant_product_codes).any()
        #print(product_code_mask)

        relevant_company_codes = ["Us05", "Us1b", "Usm6"]
        company_code_mask = gfs['Company_Code'].isin(relevant_company_codes).any()
        #print(company_code_mask)

        all_masks = (
            (pcn_mask and pidmap_mask and sector_mask) and (product_code_mask or company_code_mask)
        ).all()

        



        return all_masks ```

#

I keep getting the following error when calling the function “ ‘str’ object has no attribute ‘isin’ “

#

Could I use just ‘in’ and still get the same comparison?

dusk tide Feb 3, 2022, 4:36 PM

#

Hello guys I have a doubt that
In cost function of linear regression we are dividing the SSE by 1/2m .
So what's the use of doing 1/m ie. the average??
If I will not do this then what will happen??

prime hearth Feb 3, 2022, 4:46 PM

#

thats a good question, so actually the 1/m is needed for averaging without it our cost would be quite big and dependent on our data size, taking 1/m removes that dependency and make math easier to work with

#

1/2 i believe is there to make math easier to work with when we take partial derivative- it a constant but doesnt affect loss it just simplifies math again.

soft viper Feb 3, 2022, 4:53 PM

#

in apriori, which one do i value more. Confidence or lift?

prime hearth Feb 3, 2022, 4:53 PM

#

Also, the full derivation of these formulas will also show how those maths come to be namely gaussian formula , can check out this or google linear regression map derivation which is an alternate to traiditional linear regression approach;
https://math.stackexchange.com/questions/884887/why-divide-by-2m

warm raven Feb 3, 2022, 5:44 PM

#

warm raven I keep getting the following error when calling the function “ ‘str’ object has...

can I please get some help with this when someone gets a chance

sour spindle Feb 3, 2022, 6:37 PM

#

mild dirge I wouldn't know yet, ask the question first 😛

what?

mild dirge Feb 3, 2022, 6:37 PM

#

sour spindle what?

Someone asked if I could help with a question but didn't ask their question, wasn't in response to you

#

they removed their message

sour spindle Feb 3, 2022, 6:38 PM

#

mild dirge Someone asked if I could help with a question but didn't ask their question, was...

oh sorry for pining then

mild dirge Feb 3, 2022, 6:38 PM

#

oh nw lol

soft peak Feb 3, 2022, 6:40 PM

#

Hi, im having a problem. My code is supposed to highlight the nuclei of several blood cells of specific animals and give me their diameter and radius, which will be used later. The code works fine until it stops and ends at this error

#

#

#

heres the code itself

#

ive ran my code through cmd on admin but it still doesnt work, and it does run for a couple seconds before the code just stops working

#

any form of help is appreciated

mellow vapor Feb 3, 2022, 7:07 PM

#

So basically I m trying to work my way through projects on ml and ai
I did a few projects on them bt the hands on projects usually don't bother with the mathematics and understanding level
Like for neural networks in tensorflow and keras they build models based on layers like dense and models like sequential
With optimisers like adam
So are there any courses which actually explain these things in some detail so that atleast i can judge things by myself
And know when to use what and how to use them actually

mild dirge Feb 3, 2022, 7:13 PM

#

You'd probably want to start with a basic course on linear algebra

#

and statistics

#

and after that try to use this information for understanding a machine learning/ neural networks course

serene scaffold Feb 3, 2022, 7:14 PM

#

@mellow vapor https://www.pythondiscord.com/resources/?topics=data-science&difficulty=beginner

Python Discord | Resources

We're a large, friendly community focused around the Python programming language. Our community is open to those who wish to learn the language, as well as those looking to help others.

#

"Data Science from Scratch" is a book that I recommend in that it goes over some of the fundamentals that PcCamel just mentioned

#

if you are a student, I would see if you can get the ebook through your library.

mellow vapor Feb 3, 2022, 7:15 PM

#

@mild dirge I think I do understand the basics of linear algebra and statistics plus a bit calculus but i need some good machine learning courses i guess

#

I did take andrew ngs coursera course on machine learning and it was quite great

#

He explained the mathematical ideas along with the implementation sections

#

@serene scaffold oh that's great I will try to get that book and go through it. Thanks!

serene scaffold Feb 3, 2022, 7:20 PM

#

it might actually not be advanced enough if you feel comfortable with the material in the andrew ng course

#

I'm not really sure what to suggest tangerine_think

mellow vapor Feb 3, 2022, 7:30 PM

#

@serene scaffold it was difficult for me to implement those things in octave bt the videos were good

#

Like any material which can explain things like neural nets in a similar fashion

warm raven Feb 3, 2022, 7:37 PM

#

warm raven can I please get some help with this when someone gets a chance

please?

serene scaffold Feb 3, 2022, 8:21 PM

#

warm raven I keep getting the following error when calling the function “ ‘str’ object has...

isin is a method of a pandas.Series, but you're using it on a string. Which line, exactly, is the one that causes the error? (Be sure to always share the whole error message, starting from Traceback, as that would answer the question.)

warm raven Feb 3, 2022, 9:47 PM

#

serene scaffold `isin` is a method of a `pandas.Series`, but you're using it on a string. Which ...

AttributeError                            Traceback (most recent call last)
C:\Users\PBWEWU~1\AppData\Local\Temp/ipykernel_9948/2090180030.py in <module>
      1 x = []
----> 2 x = get_rec_value(pipe_short)

C:\Users\PBWEWU~1\AppData\Local\Temp/ipykernel_9948/3247770990.py in get_rec_value(x)
      4 
      5     for i in range(x.shape[0]):
----> 6         pcn_mask = x['prod_code_name'].iloc[i].isin(gfs['prod_code_name']).any()
      7         #gfs['prod_code_name'] == x['prod_code_name']
      8         print(pcn_mask)

AttributeError: 'str' object has no attribute 'isin'```

mint palm Feb 3, 2022, 9:54 PM

#

In Resnets, do we really skip calculation of a[l+1] layer??

serene scaffold Feb 3, 2022, 10:02 PM

#

warm raven ```--------------------------------------------------------------------------- A...

x['prod_code_name'].iloc[i] must be an individual string, whereas isin is a Series method, like I said.

You can probably accomplish what you're trying to do without any loops. What are x and gfs?

hollow sentinel Feb 3, 2022, 10:05 PM

#

hey i'm having a very strange problem

#

i cannot graph simple data anymore on my jupyter notebook

#

import pandas as pd
from matplotlib import pyplot as plt

plt.style.use("seaborn")


x = [5, 7, 8, 5, 6, 7, 9, 2, 3, 4, 4, 4, 2, 6, 3, 6, 8, 6, 4, 1]
y = [7, 4, 3, 9, 1, 3, 2, 5, 2, 4, 8, 7, 1, 6, 4, 9, 7, 7, 5, 1]




colors = [7, 5, 9, 7, 5, 7, 2, 5, 3, 7, 1, 2, 8, 1, 9, 2, 5, 6, 7, 5]

plt.scatter(x, y, s=100, c="green", edgecolor = "black", linewidth = 1, alpha=0.75)

plt.show()

#

i keep getting a "dead kernel" error

#

what can i do to fix this?

serene scaffold Feb 3, 2022, 10:06 PM

#

hollow sentinel i keep getting a "dead kernel" error

you have to restart the kernel.

#

the problem is unrelated to your code; the jupyter environment has stopped.

hollow sentinel Feb 3, 2022, 10:07 PM

#

can i do that with restart and run all code?

#

bc i tried that

#

and it still wouldn't work

serene scaffold Feb 3, 2022, 10:08 PM

#

did you get a more substantial error message than "dead kernel"? or is that the only text that displayed?

hollow sentinel Feb 3, 2022, 10:08 PM

#

"The kernel appears to have died. It will restart automatically."

#

this error just constantly pops up

#

shit, is it related to the millions of rows of data i was dealing w before for that internship?

#

ugh

#

did i melt my computer?

serene scaffold Feb 3, 2022, 10:09 PM

#

I googled that error message, and it appears that there's a few possible causes. try looking at them to see if any relate to something you're doing.

hollow sentinel Feb 3, 2022, 10:09 PM

#

should i try hitting ctrl + C on the terminal, closing out of the conda environemnt, and then opening it uup agian?

serene scaffold Feb 3, 2022, 10:10 PM

#

hollow sentinel shit, is it related to the millions of rows of data i was dealing w before for t...

if it has to do with having too much data, you would have gotten a memory error.

hollow sentinel Feb 3, 2022, 10:10 PM

#

i see

serene scaffold Feb 3, 2022, 10:10 PM

#

hollow sentinel should i try hitting ctrl + C on the terminal, closing out of the conda environe...

that might fix it shrug2 I don't use conda.

hollow sentinel Feb 3, 2022, 10:10 PM

#

i am beginning to not like conda too

#

i don't like how my code is in sep cells and i have to run the entire thing over and over

#

ik there is a restart and run all code thing

serene scaffold Feb 3, 2022, 10:11 PM

#

hollow sentinel i don't like how my code is in sep cells and i have to run the entire thing over...

that's a jupyter thing, not a conda thing

hollow sentinel Feb 3, 2022, 10:11 PM

#

oh

serene scaffold Feb 3, 2022, 10:11 PM

#

but yes, jupyter notebooks are overused among data scientists as well.

hollow sentinel Feb 3, 2022, 10:11 PM

#

yeah i see why you dislike it now

serene scaffold Feb 3, 2022, 10:11 PM

#

I'm so proud lemon_hyperpleased

hollow sentinel Feb 3, 2022, 10:12 PM

#

i'm gonna do some more googling and figure out what's going on

#

i think i wanna switch to sublime text

serene scaffold Feb 3, 2022, 10:14 PM

#

by the way, if you want a nice environment for quickly testing stuff, but don't want the false sense of reproducibility that you get from jupyter notebooks, try python -m IPython

#

it's basically the regular python console, but with lots of quality-of-life features

hollow sentinel Feb 3, 2022, 10:15 PM

#

python -m IPython in the terminal?

serene scaffold Feb 3, 2022, 10:15 PM

#

yes

hollow sentinel Feb 3, 2022, 10:15 PM

#

i will check it out

serene scaffold Feb 3, 2022, 10:15 PM

#

if you have jupyter, you should already have it

#

jupyter is basically IPython but with a gui. and cells. ||and sadness||

hollow sentinel Feb 3, 2022, 10:15 PM

#

also don't get angry but i have been using excel lately

serene scaffold Feb 3, 2022, 10:16 PM

#

use pandas.

hollow sentinel Feb 3, 2022, 10:16 PM

#

ik ik ik

#

but for some reason it's like putting honey out for bees

#

for recruiters

serene scaffold Feb 3, 2022, 10:16 PM

#

yeah, I had excel on my resume

hollow sentinel Feb 3, 2022, 10:16 PM

#

ok, so this is strange printing hello world in a notebook prints hello world

serene scaffold Feb 3, 2022, 10:16 PM

#

but I've never been asked to use it, so I'll probably delete it if I ever job hunt again

hollow sentinel Feb 3, 2022, 10:17 PM

#

don't think you gotta job hunt for a while w that mitre job you got now

#

congrats on that btw

serene scaffold Feb 3, 2022, 10:17 PM

#

thx

hollow sentinel Feb 3, 2022, 10:17 PM

#

ok, so strangely this particular notebook will not produce the intended behavior even tho a separate notebook with me printing hello world will

#

lemme see if i can just copy paste it into another notebook for funsies

serene scaffold Feb 3, 2022, 10:17 PM

#

f u n s i e s

hollow sentinel Feb 3, 2022, 10:18 PM

#

ok, now i'm stumped

#

is there something wrong with the import statements?

#

import pandas as pd
from matplotlib import pyplot as plt

plt.style.use("seaborn")


x = [5, 7, 8, 5, 6, 7, 9, 2, 3, 4, 4, 4, 2, 6, 3, 6, 8, 6, 4, 1]
y = [7, 4, 3, 9, 1, 3, 2, 5, 2, 4, 8, 7, 1, 6, 4, 9, 7, 7, 5, 1]




colors = [7, 5, 9, 7, 5, 7, 2, 5, 3, 7, 1, 2, 8, 1, 9, 2, 5, 6, 7, 5]

plt.scatter(x, y, s=100, c="green", edgecolor = "black", linewidth = 1, alpha=0.75)

plt.show()

serene scaffold Feb 3, 2022, 10:18 PM

#

looks fine to me

#

oh, you can also do python -m IPython --matplotlib

#

and it will show figures in a separate window when you .show them

#

try that

hollow sentinel Feb 3, 2022, 10:19 PM

#

interesting, command not found

#

typo?

#

oh yes it is a typo lol

#

"no module named IPython"

serene scaffold Feb 3, 2022, 10:20 PM

#

pip install IPython, I guess

hollow sentinel Feb 3, 2022, 10:20 PM

#

sudo pip install ipython

#

yeah

#

oh quick question

#

what's the diff b/w sudo and sudo pip

#

i never asked before

serene scaffold Feb 3, 2022, 10:21 PM

#

sudo and pip are unrelated. sudo means "super user do"

hollow sentinel Feb 3, 2022, 10:21 PM

#

so does it do anything special?

serene scaffold Feb 3, 2022, 10:22 PM

#

you put it before commands that are restricted to administrators. on your own computer, you presumably have that. whereas on a production system, it's usually limited to only a few people.

hollow sentinel Feb 3, 2022, 10:22 PM

#

oh, so you don't necessarily need it for your own personal computer?

#

!pastebin

arctic wedgeBOT Feb 3, 2022, 10:22 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

serene scaffold Feb 3, 2022, 10:22 PM

#

you probably don't need it for pip.

hollow sentinel Feb 3, 2022, 10:23 PM

#

https://paste.pythondiscord.com/ezefihatah.php

warm raven Feb 3, 2022, 10:23 PM

#

serene scaffold `x['prod_code_name'].iloc[i]` must be an individual string, whereas `isin` is a ...

x is the pseudo name for whatever dataframe i place in the function

hollow sentinel Feb 3, 2022, 10:23 PM

#

yeah, i have no idea what's going on here

warm raven Feb 3, 2022, 10:23 PM

#

gfs is another dataframe that holds those product and company codes

serene scaffold Feb 3, 2022, 10:24 PM

#

warm raven x is the pseudo name for whatever dataframe i place in the function

a "pseudo name for whatever you put in a function" is called a parameter. what dataframe did you pass, in this case?

hollow sentinel Feb 3, 2022, 10:24 PM

#

am i so dead inside that i spelled IPython wrong?

#

no, i didn't

warm raven Feb 3, 2022, 10:24 PM

#

serene scaffold a "pseudo name for whatever you put in a function" is called a parameter. what d...

another dataframe i have called pipeline

serene scaffold Feb 3, 2022, 10:24 PM

#

it's case sensitive

warm raven Feb 3, 2022, 10:24 PM

#

the iloc i is 100% a string

serene scaffold Feb 3, 2022, 10:24 PM

#

you cid python -m Ipython --matplotlib

#

has to be IPython

serene scaffold Feb 3, 2022, 10:25 PM

#

warm raven the iloc i is 100% a string

right, so strings don't have an isin method. you have to call it from a whole Series

hollow sentinel Feb 3, 2022, 10:25 PM

#

wait what was the original command? pip install IPython?

warm raven Feb 3, 2022, 10:25 PM

#

I’m looking to rewrite this function in some type of way so that I could use a ‘.apply’ to create a new column that has whether the result of these masks are true or false

hollow sentinel Feb 3, 2022, 10:25 PM

#

python -m IPython --matplotlib

#

hey, it worked

#

i'm not used to seeing code in my own terminal lol

#

this is cool

serene scaffold Feb 3, 2022, 10:26 PM

#

warm raven I’m looking to rewrite this function in some type of way so that I could use a ‘...

apply is bad because it doesn't benefit from any of pandas' optimizations

#

can you show all the dataframes involved here with print(df.head().to_dict('list'))? that way I can copy and paste them directly.

hollow sentinel Feb 3, 2022, 10:28 PM

#

oh yeah do not mess w apply

warm raven Feb 3, 2022, 10:28 PM

#

I’m not following

serene scaffold Feb 3, 2022, 10:28 PM

#

hollow sentinel oh yeah do not mess w apply

it's necessary sometimes.

hollow sentinel Feb 3, 2022, 10:28 PM

#

sometimes

#

yeah

warm raven Feb 3, 2022, 10:28 PM

#

Why do you need to know the dataframes involved?

#

I tried that command and got an error because I don’t have a dataframe named “df”

serene scaffold Feb 3, 2022, 10:29 PM

#

warm raven I’m not following

I can't help you with dataframe operations unless I know what is actually in the dataframes that you're working with, because every dataframe is different. the columns, their names, what types of data they have. if I don't know that, there's nothing I can do.

serene scaffold Feb 3, 2022, 10:29 PM

#

warm raven I tried that command and got an error because I don’t have a dataframe named “df...

you have to replace df with the name of a dataframe.

#

in "normal python", this isn't usually necessary, since "nums is a list of ints" pretty much tells you anything you'd need to know. not as simple with dataframes.

hollow sentinel Feb 3, 2022, 10:31 PM

#

i'm confused on when you have to do reassignments with dataframes

#

and when you don't

serene scaffold Feb 3, 2022, 10:31 PM

#

hollow sentinel i'm confused on when you have to do reassignments with dataframes

you'll just have to refer to the docs to see if the operation returns a new dataframe or modifies it in place. most return a new one.

hollow sentinel Feb 3, 2022, 10:32 PM

#

modifies in place would mean that i would have to reassign it, right?

#

or no

#

i actually don't know the answer to that

#

🥲

serene scaffold Feb 3, 2022, 10:32 PM

#

stuff = [1, 2, 3]
stuff.append(5)

list.append modifies a list in-place and returns none

hollow sentinel Feb 3, 2022, 10:33 PM

#

i see

#

oh, so i was right

#

i had a feeling i was right i saw a bunch of leetcode problems w solving things in place

warm raven Feb 3, 2022, 10:33 PM

#

output is too long

serene scaffold Feb 3, 2022, 10:33 PM

#

warm raven output is too long

!paste

arctic wedgeBOT Feb 3, 2022, 10:33 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hollow sentinel Feb 3, 2022, 10:33 PM

#

...:  
   ...: plt.style.use('seaborn') 
   ...:  
   ...: x = [5, 7, 8, 5, 6, 7, 9, 2, 3, 4, 4, 4, 2, 6, 3, 6, 8, 6, 4, 1] 
   ...: y = [7, 4, 3, 9, 1, 3, 2, 5, 2, 4, 8, 7, 1, 6, 4, 9, 7, 7, 5, 1] 
   ...:  
   ...:  
   ...: # colors = [7, 5, 9, 7, 5, 7, 2, 5, 3, 7, 1, 2, 8, 1, 9, 2, 5, 6, 7, 5] 
   ...:  
   ...: # sizes = [209, 486, 381, 255, 191, 315, 185, 228, 174, 
   ...: #          538, 239, 394, 399, 153, 273, 293, 436, 501, 397, 539] 
   ...:  
   ...: # data = pd.read_csv('2019-05-31-data.csv') 
   ...: # view_count = data['view_count'] 
   ...: # likes = data['likes'] 
   ...: # ratio = data['ratio'] 
   ...:  
   ...: # plt.title('Trending YouTube Videos') 
   ...: # plt.xlabel('View Count') 
   ...: # plt.ylabel('Total Likes') 
   ...:  
   ...: plt.tight_layout() 
   ...:  
   ...: plt.show()                                                              

In [2]: Segmentation fault: 11

#

the file isn't there

serene scaffold Feb 3, 2022, 10:34 PM

#

what file

warm raven Feb 3, 2022, 10:34 PM

#

warm raven ``` def get_rec_value(x): pcn_mask = x['prod_code_name'].isin(gfs['prod...

this is my code snippet i’m not sure if you saw from above

hollow sentinel Feb 3, 2022, 10:34 PM

#

i ean

#

mean

#

there is no file

#

it's just plotting from two lists alone

#

one as the x and one as the y

serene scaffold Feb 3, 2022, 10:35 PM

#

warm raven this is my code snippet i’m not sure if you saw from above

yes but I need to see the dataframes themselves. I showed you the statement that prints them in a useable way.

hollow sentinel Feb 3, 2022, 10:35 PM

#

actually i'm dumb

serene scaffold Feb 3, 2022, 10:35 PM

#

just seeing code that has dataframes in them doesn't give me enough information to know what to do.

hollow sentinel Feb 3, 2022, 10:35 PM

#

hang on

#

ugh i just wanna get sublime

#

gonna check out schafer

#

brb

warm raven Feb 3, 2022, 10:38 PM

#

hastebin - oyecufetiq

https://paste.pythondiscord.com/oyecufetiq.lua

serene scaffold Feb 3, 2022, 10:39 PM

#

one moment

#

@warm raven

In [16]: pipeline['PRODUCT_ID_MAP'].isin(gfs['PRODUCT_ID_MAP'])
Out[16]:
0    False
1    False
2    False
3    False
4    False
Name: PRODUCT_ID_MAP, dtype: bool

In [17]: pipeline['PRODUCT_ID_MAP'].isin(gfs['PRODUCT_ID_MAP']).any()
Out[17]: False

#

see how isin returns a boolean Series?

#

you don't want to do it for individual values, as that won't work.

warm raven Feb 3, 2022, 10:43 PM

#

right so should I use IN?

serene scaffold Feb 3, 2022, 10:43 PM

#

no, you should restructure the solution to use isin

warm raven Feb 3, 2022, 10:44 PM

#

how would I do that thought to achieve getting the result value for every row, or essentially creating a new column in the dataframe for the result of “all_masks”

serene scaffold Feb 3, 2022, 10:44 PM

#

all_masks = (
            (pcn_mask and pidmap_mask and sector_mask) and (product_code_mask or company_code_mask)
        ).all()

this won't work because you can't use and and or for pandas objects. you have to use the & and | operators.

warm raven Feb 3, 2022, 10:44 PM

#

I disagree

#

I was stuck on this about a week or so ago

#

I was using bitwise operators

serene scaffold Feb 3, 2022, 10:45 PM

#

you disagree. I said that you can't use and and or for pandas objects, and that is a fact.

warm raven Feb 3, 2022, 10:45 PM

#

was getting errors until i switched over to and and r

#

or*

#

listen i’m not trying to be rude i’m telling you what I’ve tried

serene scaffold Feb 3, 2022, 10:46 PM

#

well, you can't chain bitwise operators with pandas objects, so that might be why you were having an issue

#

you'd have to concatenate the Series into a DataFrame and use any or all.

warm raven Feb 3, 2022, 10:47 PM

#

serene scaffold <@!661619415577788436> ```py In [16]: pipeline['PRODUCT_ID_MAP'].isin(gfs['PRODU...

did you remove the comment to print the product ID mask every time?

serene scaffold Feb 3, 2022, 10:48 PM

#

no

warm raven Feb 3, 2022, 10:50 PM

#

okay sorry I re-read a bit and I see our discrepancy

#

I accidentally sent an old snippet, although my error is the same

hollow sentinel Feb 3, 2022, 10:50 PM

#

[Finished in 2.2s with exit code -11]
[shell_cmd: python -u "/Users/rahuldas/Desktop/Project Folder Sublime Text/matplotlibstuff.py"]
[dir: /Users/rahuldas/Desktop/Project Folder Sublime Text]
[path: /usr/bin:/bin:/usr/sbin:/sbin]

#

import pandas as pd 
from matplotlib import pyplot as plt 

plt.style.use("seaborn")


x = [5, 7, 8, 5, 6, 7, 9, 2, 3, 4, 4, 4, 2, 6, 3, 6, 8, 6, 4, 1]
y = [7, 4, 3, 9, 1, 3, 2, 5, 2, 4, 8, 7, 1, 6, 4, 9, 7, 7, 5, 1]


plt.tight_layout() 

plt.show()

#

um i googled it, idk what exit code -11 is

warm raven Feb 3, 2022, 10:51 PM

#

serene scaffold no

This function works when not using a .apply, although it returns one result.

hollow sentinel Feb 3, 2022, 10:51 PM

#

should i go grab a help channel

#

imma go do that so you can help this guy

warm raven Feb 3, 2022, 10:51 PM

#

my fault bro

warm raven Feb 3, 2022, 10:52 PM

#

warm raven This function works when not using a .apply, although it returns one result.

i’m trying to get it to return a result for every row of the dataframe

serene scaffold Feb 3, 2022, 10:53 PM

#

pcn_mask = x['prod_code_name'].isin(gfs['prod_code_name']).any()
pidmap_mask = x['PRODUCT_ID_MAP'].isin(gfs['PRODUCT_ID_MAP']).any()
sector_mask = x['Sector'].isin(gfs['Sector']).any()

first = pd.concat(
    (pcn_mask, pidmap_mask, sector_mask),
    axis=1
).all(axis=1)

product_code_mask = gfs['Product_Code'].isin(["Usf33", "Usf34", "Us756", "Usf37", "Usf40", "Usf29"]).any()
company_code_mask = gfs['Company_Code'].isin(["Us05", "Us1b", "Usm6"]).any()

second = product_code_mask | company_code_mask
return first & second

#

I think this is the solution but I didn't test it.

lapis sequoia Feb 3, 2022, 10:54 PM

#

Hey again

serene scaffold Feb 3, 2022, 10:55 PM

#

added axis=1 to .all in one of them

lapis sequoia Feb 3, 2022, 10:55 PM

#

if I’m working on this linear regression model. Do I need to remove null values from my set ?

serene scaffold Feb 3, 2022, 10:55 PM

#

lapis sequoia if I’m working on this linear regression model. Do I need to remove null values ...

I think so.

lapis sequoia Feb 3, 2022, 10:55 PM

#

okay

#

I’m going to try and get this shit running here within the next day or so

#

Well I have to essentially finish it today smfh

warm raven Feb 3, 2022, 10:57 PM

#

serene scaffold I think this is the solution but I didn't test it.

Gave an error on the PD.concatenation of First

#

“Cannot concatenation object type ‘<class ‘numpy.bool_’>’; only Series and Dataframe objs are valid”

serene scaffold Feb 3, 2022, 11:03 PM

#

warm raven “Cannot concatenation object type ‘<class ‘numpy.bool_’>’; only Series and Dataf...

did you remove all the calls to iloc?

#

oh, I see the problem

warm raven Feb 3, 2022, 11:04 PM

#

I fixed it

#

so your function works

#

but it still does not do exactly what I’ve been asking for

serene scaffold Feb 3, 2022, 11:04 PM

#

tangerine_think

warm raven Feb 3, 2022, 11:04 PM

#

i’ve made a short dataframe to test it with, it’s still returning one result

#

The short dataframe has 5 rows

serene scaffold Feb 3, 2022, 11:05 PM

#

it occurs to me that all the calls to .any() are probably wrong.

#

since if you call any or all on a series, that reduces it to a stand-alone bool

warm raven Feb 3, 2022, 11:05 PM

#

yeah makes sense

#

that’s why I had the iloc in there initially thinking I’d compare the row values

#

or rather that one row of the input dataframe to compare against every row of gfs

lapis sequoia Feb 3, 2022, 11:06 PM

#

whats the best way to go about

#

removing null values

#

after running my df.isna().sum()

#

S.No. 0
Name 0
Location 0
Year 0
Kilometers_Driven 0
Fuel_Type 0
Transmission 0
Owner_Type 0
Mileage 2
Engine 46
Power 175
Seats 53
New_Price 0
Price 1234
dtype: int64

hollow sentinel Feb 3, 2022, 11:08 PM

#

"Sublime Text is not a Python Package installer, just a text editor. With it, you can edit a python script. When you are done editing, you just launch your script using python script.py"

#

shit bro

#

i might just use spyder

#

i'll mess around w spyder when i have the time

#

rn i got bigger fish to fry

desert oar Feb 3, 2022, 11:45 PM

#

lapis sequoia removing null values

ask yourself why the values are null in the first place

lapis sequoia Feb 3, 2022, 11:49 PM

#

desert oar ask yourself _why_ the values are null in the first place

lemme think,

desert oar Feb 3, 2022, 11:50 PM

#

lapis sequoia lemme think,

the reason for them being null can significantly change how you handle it

#

sometimes you just want to drop those rows entirely

#

other times it makes sense to "impute" a value - basically replace the null with an educated guess

#

missing data imputation is a huge field too, and something that you don't want to spend a lot of time on right now probably

#

i assume this was at least mentioned in your course?

lapis sequoia Feb 3, 2022, 11:52 PM

#

Imputing was but

#

Right now we've covered preprocessing but I havent had that much time to dive in

#

What im HOPING for is that we get this freeze here in pittsburgh so i can work tomorrow

desert oar Feb 3, 2022, 11:54 PM

#

well what did they talk about with respect to imputing data? the most basic choices include filling the missing values with the mean, median, or mode

#

usually missing data imputation is a matter of understanding what the data means and where it comes from, and letting that guide you to a sensible approach

lapis sequoia Feb 3, 2022, 11:55 PM

#

Yeah... theres a few examples ive found where

#

its missing 3 or more values

desert oar Feb 3, 2022, 11:57 PM

#

you might want to think a little about your actual task

thin palm Feb 3, 2022, 11:57 PM

#

What's up Data Science gang, I have a question about concat in Pandas

#

when I concat two data frames why does the final column in the data frame I connect have NaN?

desert oar Feb 3, 2022, 11:57 PM

#

@lapis sequoia ultimately you need to come up with some kind of "willingness to pay" estimate based on different attributes of the car, and use that to segment customers into 2 different price tiers. that's what "differential pricing" is

thin palm Feb 3, 2022, 11:57 PM

#

for example:

#

Screen_Shot_2022-02-03_at_4.57.58_PM.png

Screen_Shot_2022-02-03_at_4.58.02_PM.png

#

trying to one hot encode "City" in our original DataFrame known as "Feature" but when I concat the one hot encoded dataframe it produces the NaN?

#

any thoughts?

lapis sequoia Feb 3, 2022, 11:58 PM

#

predicting the price of a used cars rn

desert oar Feb 3, 2022, 11:58 PM

#

thin palm when I concat two data frames why does the final column in the data frame I conn...

this doesn't have to do with the location of the column in the dataframe. if you can post a minimal example that someone can copy and paste and reproduce the problem, then we can figure out what the actual problem is

#

in general, if you get unexpected missing values after a concat operation, it's because either the column names or row index labels don't match up

#

but it's pretty hard to debug someone else's screenshots

thin palm Feb 3, 2022, 11:59 PM

#

desert oar but it's pretty hard to debug someone else's screenshots

gotcha sorry for the complication

desert oar Feb 4, 2022, 12:00 AM

#

example data + runnable code are ideal

main fox Feb 4, 2022, 12:00 AM

#

@desert oar Could you help me see if my understanding that a decision tree model performed better than a logistic regression model is correct?
I already trained both models and measured for precision, plus made a confusion matrix

desert oar Feb 4, 2022, 12:00 AM

#

thin palm gotcha sorry for the complication

for what it's worth, sometimes the process of constructing such an example is enough to guide you to the right solution without even having to ask

desert oar Feb 4, 2022, 12:01 AM

#

main fox <@389497659087650836> Could you help me see if my understanding that a decision ...

i was about to log off, but you shouldn't "ask to ask". just post the question, etc. you know the deal!

#

i can answer if it's quick

main fox Feb 4, 2022, 12:01 AM

#

https://github.com/jose-cano/Machine-Learning/blob/main/Predict Loan Acceptance/PredictingBankLoan.ipynb

GitHub

Machine-Learning/PredictingBankLoan.ipynb at main · jose-cano/Machi...

Showcase of Machine Learning projects. Contribute to jose-cano/Machine-Learning development by creating an account on GitHub.

#

It's okay if you're logging off though

lapis sequoia Feb 4, 2022, 12:04 AM

#

Fuck man

#

just trying to figure out

#

how to remove any rows

#

with missing values

main fox Feb 4, 2022, 12:06 AM

#

lapis sequoia with missing values

df.dropna()

lapis sequoia Feb 4, 2022, 12:06 AM

#

how can i take that result

#

and call that as my dataframe going forward

#

would It be x = df.dropna()

#

then call that going forward?

desert oar Feb 4, 2022, 12:07 AM

#

yes, or you can do df = df.dropna()

#

in addition to df.dropna(), consider the general pattern for filtering rows:

row_is_ok = # do some operation that returns a boolean Series, one value per row
df = df.loc[row_is_ok].copy()

lapis sequoia Feb 4, 2022, 12:07 AM

#

yessss!!!!

desert oar Feb 4, 2022, 12:07 AM

#

worth reading when you have more time https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

lapis sequoia Feb 4, 2022, 12:07 AM

#

i did data = df.dropna()

#

So i removed all null values and can proceed. BLESS

desert oar Feb 4, 2022, 12:07 AM

#

main fox https://github.com/jose-cano/Machine-Learning/blob/main/Predict%20Loan%20Accepta...

which is the relevant part here?

desert oar Feb 4, 2022, 12:08 AM

#

lapis sequoia So i removed all null values and can proceed. BLESS

they will probably ding you for this, given their feedback on your other project. maybe it's fine for the sake of just getting it done

#

personally i'd rather write explicitly "for the sake of simplicity, i avoided dealing with missing data imputation and i just removed all rows with missing values. i know this isn't really the right thing to do, but i did it in the interest of getting something done."

#

depends on how kind the grader is

main fox Feb 4, 2022, 12:09 AM

#

desert oar which is the relevant part here?

Close to the end
When I trained both models, I tuned them for precision and made confusion matrix
Since the trees precision was higher, but the confusion matrix looked worse, I wanted to know if I'm correct thinking it is better suited for the task

desert oar Feb 4, 2022, 12:09 AM

#

main fox Close to the end When I trained both models, I tuned them for precision and made...

what do you mean by "looked worse"? how did you tune them? are you aware of the "bias-variance tradeoff"?

main fox Feb 4, 2022, 12:13 AM

#

The tree was more pessimistic in predicting true positives (which is what I wanted), but the logistic regression model looked like it performed better "overall" (albeit risking false positives).

I tuned them both using GridSearchCV.
I am not aware of bias-variance tradeoff.

desert oar Feb 4, 2022, 12:15 AM

#

main fox The tree was more pessimistic in predicting true positives (which is what I want...

https://mecha-mind.medium.com/explaining-bias-variance-tradeoff-to-a-ml-engineer-d747bdbb1f1d this article has a good explanation, although i find it both funny and depressing that they went all the way up to gradient boosting without so much giving logistic regression a nod

Medium

Explaining Bias-Variance Tradeoff to a ML Engineer

Generally data scientists and statisticians are well versed with the term “Bias Variance Tradeoff” as they can very well understand them…

lapis sequoia Feb 4, 2022, 12:15 AM

#

So now another problem im having..

desert oar Feb 4, 2022, 12:16 AM

#

this article is probably better @main fox https://medium.com/@venkatavinay222/understanding-bias-variance-trade-off-in-machine-learning-952d2d7a86ba

Medium

Understanding Bias-Variance Trade-off in Machine-Learning.

Bird’s-eye view of the Blog:

lapis sequoia Feb 4, 2022, 12:16 AM

#

mileage is listed as kmpl

desert oar Feb 4, 2022, 12:16 AM

#

lapis sequoia mileage is listed as kmpl

gas mileage? that looks like "kilometers per liter"

#

so if you want "miles per gallon" you need to do a bit of math

lapis sequoia Feb 4, 2022, 12:16 AM

#

I mean

mild dirge Feb 4, 2022, 12:16 AM

#

you need to make sure they're both the same unit

lapis sequoia Feb 4, 2022, 12:17 AM

#

I suppose it doesnt matter bc the problem statement is referring to an indian market so

#

distance per liter is all good yeah

#

Screen_Shot_2022-02-03_at_7.18.14_PM.png

#

so step 1 done. nix all null values - imputing would be nice but in the name of time i think its best to proceed from here

#

Im trying to figure out how to exclude outliers now...

desert oar Feb 4, 2022, 12:29 AM

#

lapis sequoia Im trying to figure out how to exclude outliers now...

quick and dirty approach: remove anything more than 2 standard deviations from the mean. however i very strongly suggest that you plot a univariate distribution (boxplot and kernel density plot) to see the shape of each variable

#

mostly outlier removal isn't needed in real data except for a few really egregious data points

#

just because something is "far from the average" doesn't make it an "outlier" in the sense of "this is a measurement error or something else weird that i need to exclude from my model"

#

extreme values do happen in real life, and you don't want to remove them just because they seem unusual

lapis sequoia Feb 4, 2022, 12:30 AM

#

okay heard

#

Its just part of the rubric "Outlier treatment"

desert oar Feb 4, 2022, 12:30 AM

#

in that case then yes, do look for it

#

e.g. if a car price is 5 or 0, that's clearly not right and probably should be re-coded as "null"

lapis sequoia Feb 4, 2022, 12:31 AM

#

yeah...

desert oar Feb 4, 2022, 12:31 AM

#

likewise if a car has 10000 km/L

#

or -99999

#

etc

lapis sequoia Feb 4, 2022, 12:31 AM

#

is there a command that will

desert oar Feb 4, 2022, 12:31 AM

#

no

lapis sequoia Feb 4, 2022, 12:31 AM

#

display the range of values?

desert oar Feb 4, 2022, 12:31 AM

#

yes

#

i thought you were going to ask if there was a command to remove outliers 😆

lapis sequoia Feb 4, 2022, 12:31 AM

#

i guess the variance?

desert oar Feb 4, 2022, 12:31 AM

#

guess? more like actually compute

lapis sequoia Feb 4, 2022, 12:31 AM

#

nahhhh I just want to see what the lowest / highest values are

#

and then find the average

desert oar Feb 4, 2022, 12:32 AM

#

well there's .min() and .max()

#

the range is the difference thereof

#

variance is .var() and standard deviation is .std()

lapis sequoia Feb 4, 2022, 12:33 AM

#

so i tried doing

desert oar Feb 4, 2022, 12:33 AM

#

you might want to skim the list of functions available to use on Series objects: https://pandas.pydata.org/docs/reference/series.html

lapis sequoia Feb 4, 2022, 12:33 AM

#

i get series is not defined

desert oar Feb 4, 2022, 12:34 AM

#

hm? those are methods on the Series class

#

if you get a single column from a dataframe, that object is of type Series

main fox Feb 4, 2022, 12:34 AM

#

desert oar this article is probably better <@!663854301990355004> https://medium.com/@venka...

Thanks, I'll give it a read. Is there any other resource or book you'd recommend?

desert oar Feb 4, 2022, 12:35 AM

#

main fox Thanks, I'll give it a read. Is there any other resource or book you'd recommend...

other than a full machine learning course? not that i can think of offhand

lapis sequoia Feb 4, 2022, 12:35 AM

#

or if i run "Price"

Screen_Shot_2022-02-03_at_7.34.59_PM.png

desert oar Feb 4, 2022, 12:35 AM

#

and if they don't discuss bias-variance tradeoff in your machine learning course, then you were robbed. it's one of the most important concepts to understand

desert oar Feb 4, 2022, 12:35 AM

#

lapis sequoia or if i run "Price"

well yeah, what did you expect? there's no variable Price

#

presumably your data is called df based on your examples

#

so you'd do df['Price'] to get the Price column

#

df['Price'] gives you a Series instance, representing the Price column in df

#

df is an instance of DataFrame

lapis sequoia Feb 4, 2022, 12:36 AM

#

So that last column doesn't act that way

desert oar Feb 4, 2022, 12:36 AM

#

act what way?

lapis sequoia Feb 4, 2022, 12:36 AM

#

Like i cant call it where it's defined?

desert oar Feb 4, 2022, 12:37 AM

#

what do you mean by that?

lapis sequoia Feb 4, 2022, 12:37 AM

#

df['Price'].min()

#

I see how this works

desert oar Feb 4, 2022, 12:37 AM

#

what happens when you try that?

#

is it different from what you expect?

#

Price isn't a stand-alone variable

#

it's a column in the dataframe

lapis sequoia Feb 4, 2022, 12:37 AM

#

I see I see.

desert oar Feb 4, 2022, 12:37 AM

#

you can assign it to a separate variable if you want

lapis sequoia Feb 4, 2022, 12:38 AM

#

how might i do that just out of curiousity

#

x = df['Price']

main fox Feb 4, 2022, 12:38 AM

#

desert oar and if they don't discuss bias-variance tradeoff in your machine learning course...

Thank you for the advice. I didn't formally have a ML course, just a deep interest that I'm trying to pursue. The math and theory can go as deep as one is willing to dive so it can be a timesink to cover topics.
Did you take an ML course you'd recommend?

desert oar Feb 4, 2022, 12:39 AM

#

lapis sequoia x = df['Price']

yes

lapis sequoia Feb 4, 2022, 12:39 AM

#

Word!

desert oar Feb 4, 2022, 12:39 AM

#

main fox Thank you for the advice. I didn't formally have a ML course, just a deep intere...

no, i went to school for quantitative social science and learned the rest along the way

lapis sequoia Feb 4, 2022, 12:39 AM

#

it's small concepts like these that help me clear the much larger picture

desert oar Feb 4, 2022, 12:39 AM

#

lapis sequoia it's small concepts like these that help me clear the much larger picture

good! building a foundation of concepts is very important

#

i do need to run now though

lapis sequoia Feb 4, 2022, 12:40 AM

#

bless man thank you for the help.

#

mannnnn this shit is kicking my ass

iron basalt Feb 4, 2022, 12:57 AM

#

lapis sequoia x = df['Price']

You can think of [] as a function, that takes some arguments and returns something. It's just a special function that uses [] syntax (in Python you can change what operators like [] do on certain objects, which is what Pandas is doing).

nova tapir Feb 4, 2022, 1:10 AM

#

can someone explain why this question's answer is this? and how can i find the x1 and x2 features?

dusk tide Feb 4, 2022, 2:11 AM

#

The order of +ve and -ve examples can be up or down but you will get the features like this from the plot.

dusk tide Feb 4, 2022, 2:13 AM

#

nova tapir can someone explain why this question's answer is this? and how can i find the x...

As we know that the svm tries to find the large margin between the +ve and - ve examples so that it can classify the two.
The decision boundary will go from 3 on X1 axis straight perpendicular and therefore the margin will be max in this case only

dusk tide Feb 4, 2022, 2:24 AM

#

nova tapir can someone explain why this question's answer is this? and how can i find the x...

And as far as I know the optimization function tries to find the smallest values of theta possibally it can so that then
To classify the x(an example) as (a). +ve --->norm theta and product with p(i)(no of -ve examples ) <-1 then it classify y as 0(-ve) il
(b) Whereas on the other hand norm(theta) and product of with p(i) i I is no of +ve examples ) should be greater than or equal to 1 then it will classify the example (that are plotted) as +ve

Here is have assumed theta0=0 so the decision boundary passes through origin.

as the cost function selects low values of theta so p(i) (distance between the decision boundary should be as large as possible ) then only it is following the constraints that are there in (a),(b) will be satisfied. no other decision boundary other than the one is allowed.

dusk tide Feb 4, 2022, 2:25 AM

#

nova tapir can someone explain why this question's answer is this? and how can i find the x...

Also I am also having the confusion of selecting the theta because the ||theta|| for every value of theta is coming same .

dusk tide Feb 4, 2022, 2:26 AM

#

nova tapir can someone explain why this question's answer is this? and how can i find the x...

Is you get the ans. Of selecting the theta then do tell

visual spear Feb 4, 2022, 5:48 AM

#

Does anyone here know how (and can either tell me how, or show me an example of how) to create a model for an image generation AI?

high grove Feb 4, 2022, 6:26 AM

#

how to increase space between stacked bar in matplotlib

#

mortal adder Feb 4, 2022, 6:46 AM

#

How can i learn gpt 3 as a complete novice?

potent sky Feb 4, 2022, 7:03 AM

#

Learn to make a gpt 3 clone?
Learn to use gpt-3?

flint grotto Feb 4, 2022, 7:28 AM

#

hello.

#

i have a question.

#

cnn in the convolution block, dense block. why make a block?

#

i mean just can do write code, why make a block?

velvet rampart Feb 4, 2022, 7:58 AM

#

Please where can I get data science and machine learning projects and exercises with source code

kind rock Feb 4, 2022, 7:59 AM

#

would y'all explain keras as a better interface to control the tensorflow framework?

long zephyr Feb 4, 2022, 8:04 AM

#

https://link.medium.com/sTjms6039mb

Medium

Introduction to Data Science Workbook

The principal purpose of Data Science is to find patterns within data. It uses various statistical techniques to analyse and draw insights…

spare junco Feb 4, 2022, 8:14 AM

#

velvet rampart Please where can I get data science and machine learning projects and exercises ...

CodeBasics, he has whole playlists on Machine learning and Deep learning with many projects and exercises

eager imp Feb 4, 2022, 8:35 AM

#

Any good material on directed attention?

unreal swan Feb 4, 2022, 8:44 AM

#

Idk

last echo Feb 4, 2022, 9:16 AM

#

is this model over-fitting or under-fitting?

brazen spire Feb 4, 2022, 10:18 AM

#

how to make a function of activation functions?

#

i get an error when trying to do

#

    # return nn.Sin()
    # return nn.Tanh()
    # return nn.Sigmoid()
    # return nn.Tanhshrink()
    return nn.HardTanh(-1,1)
    # return nn.Hardswish()
    # return nn.functionnal.silu()````

fickle frigate Feb 4, 2022, 10:35 AM

#

brazen spire how to make a function of activation functions?

does the activation function contains learnable parameters?

if yes you have to inherit from nn.Module if no then you can apply it directly to the output

stone vector Feb 4, 2022, 10:37 AM

#

#help-carrot message

#

hello, I need help to validate if there are duplicates value in csv column and items which failed the validation should be logged (e.g. stderr) and ignored for the next processes

brazen spire Feb 4, 2022, 10:39 AM

#

fickle frigate does the activation function contains learnable parameters? if yes you have t...

#

fickle frigate Feb 4, 2022, 10:53 AM

#

brazen spire

the function f returns an object and it does not take any args the correct way is to do the following

import torch.nn as nn 
import torch
def f():
    return nn.Tanh() 

x = torch.randn(5)
result = f()(x)

fickle frigate Feb 4, 2022, 10:55 AM

#

stone vector hello, I need help to validate if there are duplicates value in csv column and i...

if you are using pandas you could grouby that column and all the duplicates will be grouped together

stone vector Feb 4, 2022, 11:15 AM

#

how can i log to stderr each duplicate item

hollow sentinel Feb 4, 2022, 11:51 AM

#

spyder is fantastic lol

potent sky Feb 4, 2022, 12:22 PM

#

last echo is this model over-fitting or under-fitting?

Depends on application, but i would say neither. The training accuracy is pretty high and doesn't have a huge gap with the val_acc

naive river Feb 4, 2022, 12:25 PM

#

nova tapir can someone explain why this question's answer is this? and how can i find the x...

There should be little to no dependence on x2, since a vertical line would separate things nicely, so θ₂ should be small. Inputting (0, 0) should be a negative case, i.e. θ₀ < 0. This leaves only one option

brazen spire Feb 4, 2022, 1:03 PM

#

I don't understand how we get 12 parameters here

#

#

#

isn't it 9 in the middle?

prime hearth Feb 4, 2022, 1:23 PM

#

12 params is all the nodes

brazen spire Feb 4, 2022, 1:29 PM

#

i don't understand

#

is it 9 (weights) + 3 (biais) = 12?

brazen spire Feb 4, 2022, 1:45 PM

#

ah i understand now.

last echo Feb 4, 2022, 2:13 PM

#

how many dense layer and number of neurons? any tips where to learn how to describe this cnn model?

#Augmented Layer
model.add(augmented)

#Input shape Layer
model.add(Input(shape=(WIDTH,HEIGHT,3)))

#Conv2D and MaxPool2D Layers
model.add(Conv2D(16, kernel_size=(3,3), activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))

model.add(Conv2D(32, kernel_size=(3,3), activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))

model.add(Conv2D(64, kernel_size=(3,3), activation='relu'))
model.add(Conv2D(64, kernel_size=(3,3), activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))

model.add(Conv2D(128, kernel_size=(3,3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3,3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3,3), activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))

#Flatten Layer
model.add(Flatten())

#Fully connected layer, OUTPUT Layer
model.add(Dense(units=128, activation='relu'))
model.add(Dense(units=class_size, activation='softmax'))```

tender hearth Feb 4, 2022, 2:27 PM

#

A conv2d has kernel size * filters number of learnable parameters

#

filters is essentially how many kernels you have

#

a dense layer has units number of parameters

#

depending on what your teacher meant by neuron specifically it could be just the learnable params in the dense layers or including the kernels as well

last echo Feb 4, 2022, 2:33 PM

#

tender hearth depending on what your teacher meant by neuron specifically it could be just the...

thank you pontifex, might as well throw question to the instructor on which neurons is asked

tender hearth Feb 4, 2022, 2:34 PM

#

Sounds good

serene scaffold Feb 4, 2022, 3:30 PM

#

if you need help with data science or AI, please ask a question directed to the whole channel.

lapis sequoia Feb 4, 2022, 3:52 PM

#

clueless tbh

exotic thicket Feb 4, 2022, 4:06 PM

#

I'm parikshith. Stream ECE branch section B. I was wondering if you were available to help me with Lec 3: Image formation: Radiometry which is a little bit vast as per my knowledge in math.
And I would like to share the resources I'd gone through to figure out a solution

Below link video of Lec 3: Image formation: Radiometry (NPTEL course video)
https://www.youtube.com/watch?v=ch1xdUFABA8

Another same concept video explanation in YouTube link I'd gone through
https://www.youtube.com/watch?v=kPIqO929pIc

Questions:

The light has a radiant flux of 100 watts, what is the irradiance on an object which is placed at 2 meters from the light (assuming object is perpendicular to the night light)? Wm−2Wm−2
2.99
1.25
1.99
0.55
A light source has a radiant flux of 100 watts, what is the flux on a rectangular object of size 20 cm by 30 cm placed 2 meters away (perpendicular to the light)?
0.1194 mW
0.1163 mW
0.1189 mW
0.1123 MW
Given the 10-watt source coming in from 2π32π3 solid angle (in sr) of a radius 3 meter, the corresponding source of energy carried by the ray is

52π252π2

12π212π2

π2π2
10

Light source has a radiant intensity of 60 W sr−1. Determine the irradiance on a sign board 2 meters away.
10
15
20
30
Suppose a source with an area of 4 m−2m−2 is viewed at an angle of 30 degree and has a radiance of 0.3 Wm−2sr−1Wm−2sr−1. Calculate the radiant intensity of the source?

1.65 Wsr−1Wsr−1

1.04 Wsr−1Wsr−1

2.78 Wsr−1Wsr−1

2.11 Wsr−1Wsr−1

Suppose the source in question 9 is viewed from a perfectly reflecting Lambertian surface. Then find the value of radiosity.

0.3145Wm−2Wm−2

0.1645 Wm−2Wm−2

0.2598Wm−2Wm−2

0.4768Wm−2

Thank you for your time

serene scaffold Feb 4, 2022, 4:09 PM

#

lapis sequoia clueless tbh

did you put import tensorflow as tf at the top of your file? Also, please copy and paste actual text instead of screenshots as this is a lot more useful for answerers.

serene scaffold Feb 4, 2022, 4:10 PM

#

exotic thicket I'm parikshith. Stream ECE branch section B. I was wondering if you were availab...

sorry, is this a data science question? it sounds like it might be physics.

exotic thicket Feb 4, 2022, 4:11 PM

#

serene scaffold sorry, is this a data science question? it sounds like it might be physics.

Ya it's a radiometry concept and underpinning of math and physics of CV

lapis sequoia Feb 4, 2022, 4:11 PM

#

serene scaffold did you put `import tensorflow as tf` at the top of your file? Also, please copy...

import tensorflow as ts

string = tf.Variable("this is a string", tf.string)
print(string)```

serene scaffold Feb 4, 2022, 4:11 PM

#

exotic thicket Ya it's a radiometry concept and underpinning of math and physics of CV

I would ask in a different discord server. Sorry

#

@lapis sequoia you did import tensorflow as ts, with a ts instead of tf.

lapis sequoia Feb 4, 2022, 4:15 PM

#

I'm stuck in an issue can i dm if possible?

#

oh fair

#

If anyone can help

serene scaffold Feb 4, 2022, 4:15 PM

#

lapis sequoia I'm stuck in an issue can i dm if possible?

No; please ask your questions in the channel.

lapis sequoia Feb 4, 2022, 4:15 PM

#

serene scaffold No; please ask your questions in the channel.

Oh ok

#

but how can i get rid of this error

#

Skipping registering GPU devices...
2022-02-04 19:15:31.196989: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
<tf.Variable 'Variable:0' shape=() dtype=string, numpy=b'this is a string'>```

serene scaffold Feb 4, 2022, 4:16 PM

#

lapis sequoia ```2022-02-04 19:15:31.126151: W tensorflow/core/common_runtime/gpu/gpu_device.c...

Do you understand what this error message is telling you?

exotic thicket Feb 4, 2022, 4:16 PM

#

serene scaffold I would ask in a different discord server. Sorry

I'm so glad u said that I'm literally looking for a particular domain based server on CV and IP bacas I have had taken a course on computer vision and image processing fundamentals and application I'm really excited to learn that course but my exams are going so I need to manage for few weeks with assignments if I don't get good marks in assignments my score gets low however I completed my two assignments and this week questions I had left with those above sent

Thank you for your time

#

So plz let me know sir

strange scarab Feb 4, 2022, 4:27 PM

#

hey! i have a pandas dataframe of yearly population projections, and the format is a bit iffy to process. i'd like to merge the rows like this:

#

i have never worked with pandas before so i don't really know where and how to look for guidance lol

serene scaffold Feb 4, 2022, 4:34 PM

#

strange scarab hey! i have a pandas dataframe of yearly population projections, and the format ...

I think you need to pivot it. Can you do print(df.head().to_dict('list')) so that I can copy and paste it and experiment?

#

alternatively, you can look at the docs and try to figure it out.

#

!docs pandas.DataFrame.pivot_table

arctic wedgeBOT Feb 4, 2022, 4:34 PM

#

pandas.DataFrame.pivot\_table


DataFrame.pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)```
Create a spreadsheet-style pivot table as a DataFrame.

The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

strange scarab Feb 4, 2022, 4:35 PM

#

let me take a look, the above was a quick excel mock of the larger messier thing i have so i'd rather spare your time and effort and see if this does it

serene scaffold Feb 4, 2022, 4:36 PM

#

I don't mind as long as you provide the data in a format that I can use immediately, like a CSV, or something.

soft silo Feb 4, 2022, 4:38 PM

#

Hi guys I;m currently facing a task from MLSS2020 regarding RL environment and agents and im kinda stuck on one issue, I have to adjust the environment or the agent so that his actions reflect the given probability like in the AIMA example. I have the environment defined but the actions of the agent are where im stuck. anyone willing to take a look?

strange scarab Feb 4, 2022, 4:46 PM

#

serene scaffold I don't mind as long as you provide the data in a format that I can use immediat...

yeah i'm a bit lost still, here's the first three years or so, {"column": ["row", ...]} ```py
{'City': ['Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total'], 'Year': ['2021', '2021', '2021', '2021', '2021', '2021', '2021', '2022', '2022', '2022', '2022', '2022', '2022', '2022', '2023', '2023', '2023', '2023', '2023', '2023', '2023', '2024', '2024', '2024', '2024', '2024', '2024'], 'Age': ['Total', '0 - 14', '15 - 24', '25 - 44', '45 - 64', '65 - 74', '75 -', 'Total', '0 - 14', '15 - 24', '25 - 44', '45 - 64', '65 - 74', '75 -', 'Total', '0 - 14', '15 - 24', '25 - 44', '45 - 64', '65 - 74', '75 -', 'Total', '0 - 14', '15 - 24', '25 - 44', '45 - 64', '65 - 74'], 'value': [5547045, 852577, 608053, 1423098, 1383040, 703342, 576935, 5555002, 843285, 609690, 1422508, 1377411, 695173, 606935, 5562569, 833001, 614092, 1420517, 1374381, 684221, 636357, 5569645, 821592, 618233, 1419647, 1369556, 677181]}

lapis sequoia Feb 4, 2022, 4:53 PM

#

serene scaffold Do you understand what this error message is telling you?

i dont have a gpu?

serene scaffold Feb 4, 2022, 4:54 PM

#

strange scarab yeah i'm a bit lost still, here's the first three years or so, `{"column": ["row...

you can do this

In [25]: df.pivot_table(index=['City', 'Year'], columns='Age')
Out[25]:
                       value
Age                   0 - 14   15 - 24    25 - 44    45 - 64   65 - 74      75 -      Total
City          Year
Country total 2021  852577.0  608053.0  1423098.0  1383040.0  703342.0  576935.0  5547045.0
              2022  843285.0  609690.0  1422508.0  1377411.0  695173.0  606935.0  5555002.0
              2023  833001.0  614092.0  1420517.0  1374381.0  684221.0  636357.0  5562569.0
              2024  821592.0  618233.0  1419647.0  1369556.0  677181.0       NaN  5569645.0

strange scarab Feb 4, 2022, 4:55 PM

#

jesus

#

alright let me see

#

wait holup! this might be exactly how i imagined it should be in my head, thanks!! now i just have to figure out how to work the MultiIndexes(?)

serene scaffold Feb 4, 2022, 4:59 PM

#

strange scarab wait holup! this might be exactly how i imagined it should be in my head, thanks...

the multiindexes. did you want to "flatten" them?

#

In [27]: df.pivot_table(index=['City', 'Year'], columns='Age', values='value').reset_index()
Out[27]:
Age           City  Year    0 - 14   15 - 24    25 - 44    45 - 64   65 - 74      75 -      Total
0    Country total  2021  852577.0  608053.0  1423098.0  1383040.0  703342.0  576935.0  5547045.0
1    Country total  2022  843285.0  609690.0  1422508.0  1377411.0  695173.0  606935.0  5555002.0
2    Country total  2023  833001.0  614092.0  1420517.0  1374381.0  684221.0  636357.0  5562569.0
3    Country total  2024  821592.0  618233.0  1419647.0  1369556.0  677181.0       NaN  5569645.0

#

also why is every city named "country total"?

strange scarab Feb 4, 2022, 4:59 PM

#

because the first one is the whole country

#

up until 2040

#

so the first city is like, 200 rows down?

serene scaffold Feb 4, 2022, 5:00 PM

#

ah

strange scarab Feb 4, 2022, 5:02 PM

#

i'll surely work it out from here on

#

thanks a bunch!

serene scaffold Feb 4, 2022, 5:02 PM

#

💚

strange scarab Feb 4, 2022, 5:03 PM

#

got a task at work and the library i use returns organized data as a DataFrame

#

turned out to be quite the can of worms lol

mint quail Feb 4, 2022, 5:09 PM

#

We are designing an underwater robot. While the underwater robot is in autonomous driving, it will search for a certain object. But while searching for the object, it should not hit the walls around it. Can I detect walls with OpenCV? Or how do I make sure it doesn't crash into walls?

molten ridge Feb 4, 2022, 5:23 PM

#

Hi, I am trying to make a database model and i am running into memory error

#

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 868. MiB for an array with shape (10669, 10669) and data type float64

#

this think works when i try it in in Ipython (jupyter lab launched from anaconda) without any errors

#

but when I try it in plain python, it gives error

#

data = Product.objects.all()  # gets data
df = pd.DataFrame(data.values())
tfidf = TfidfVectorizer(stop_words='english')
df['product_name'] = df['product_name'].fillna('')
overview_matrix = tfidf.fit_transform(df['product_name'])
similarity_matrix = linear_kernel(overview_matrix, overview_matrix)

#

i get the error on similarity_matrix line

#

i am on a 64bit machine, and it gives error for 868 MB

serene scaffold Feb 4, 2022, 5:28 PM

#

is there a way to make the result of tfidf.fit_transform a sparse array?

molten ridge Feb 4, 2022, 5:29 PM

#

nope

#

i think it is some Kind of numpy problems in which some kind of limit is set for memory allocation

serene scaffold Feb 4, 2022, 5:30 PM

#

it actually does return a sparse array; can you show the whole error message starting from Traceback?

molten ridge Feb 4, 2022, 5:30 PM

#

just a min

#

i am running it in thread btw

#

Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Users\LAKSHYA\AppData\Local\Programs\Python\Python37-32\lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "C:\Users\LAKSHYA\AppData\Local\Programs\Python\Python37-32\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\LAKSHYA\PycharmProjects\mega-env\PC website\website\backend\main\views.py", line 39, in product_recommendations_variables
    similarity_matrix = linear_kernel(overview_matrix, overview_matrix)
  File "C:\Users\LAKSHYA\PycharmProjects\mega-env\venv\lib\site-packages\sklearn\metrics\pairwise.py", line 1073, in linear_kernel
    return safe_sparse_dot(X, Y.T, dense_output=dense_output)
  File "C:\Users\LAKSHYA\PycharmProjects\mega-env\venv\lib\site-packages\sklearn\utils\extmath.py", line 161, in safe_sparse_dot
    return ret.toarray()
  File "C:\Users\LAKSHYA\PycharmProjects\mega-env\venv\lib\site-packages\scipy\sparse\compressed.py", line 1039, in toarray
    out = self._process_toarray_args(order, out)
  File "C:\Users\LAKSHYA\PycharmProjects\mega-env\venv\lib\site-packages\scipy\sparse\base.py", line 1202, in _process_toarray_args
    return np.zeros(self.shape, dtype=self.dtype, order=order)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 868. MiB for an array with shape (10669, 10669) and data type float64```

serene scaffold Feb 4, 2022, 5:30 PM

#

so this is the part that causes the error: similarity_matrix = linear_kernel(overview_matrix, overview_matrix)

#

(while that may be obvious to you, I had no way of knowing that before you provided the whole error message)

molten ridge Feb 4, 2022, 5:31 PM

#

sorry 😅 , my bad

serene scaffold Feb 4, 2022, 5:31 PM

#

what is linear_kernel?

#

oh, I guess it's this

#

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.linear_kernel.html

scikit-learn

sklearn.metrics.pairwise.linear_kernel

molten ridge Feb 4, 2022, 5:32 PM

#

from sklearn.metrics.pairwise import linear_kernel

#

yeah

serene scaffold Feb 4, 2022, 5:32 PM

#

try linear_kernel(overview_matrix, overview_matrix, dense_output=False)

molten ridge Feb 4, 2022, 5:33 PM

#

just a min

serene scaffold Feb 4, 2022, 5:33 PM

#

I'm in a meeting now, so I may become unresponsive

molten ridge Feb 4, 2022, 5:33 PM

#

yeah thanks alot

#

it didnt give error now

desert oar Feb 4, 2022, 5:33 PM

#

strange scarab i have never worked with pandas before so i don't really know where and how to l...

the pandas documentation and user guides are much better than they used to be. i recommend reading through the user guide material if you are feeling stuck. stackoverflow also has a lot of pandas questions now

desert oar Feb 4, 2022, 5:34 PM

#

molten ridge i am on a 64bit machine, and it gives error for 868 MB

64bit isn't relevant here. if you don't have 868 MB of free RAM, you won't be able to allocate an array of that size

molten ridge Feb 4, 2022, 5:36 PM

#

many people ask 32bit / 64bit in memory type errors cause of the 4gb memory limit, so i just gave it :)

desert oar Feb 4, 2022, 5:36 PM

#

that might have to do with float32 vs float64, which has less to do with your operating system and more to do with how your data has been stored

#

btw if you are trying to compute cosine distance, you might also be interested in https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html

molten ridge Feb 4, 2022, 5:37 PM

#

thanks alot :)

strange scarab Feb 4, 2022, 5:38 PM

#

desert oar the pandas documentation and user guides are much better than they used to be. i...

yeah i figured this along the way, but i ended up asking here as i had no idea of what to even look for in the docs

molten ridge Feb 4, 2022, 5:45 PM

#

hey @serene scaffold, sorry to disturb you while you are in a meeting,
but after turning dense_output to false, the model stopped working altogether

serene scaffold Feb 4, 2022, 5:56 PM

#

molten ridge hey <@!253696366952316929>, sorry to disturb you while you are in a meeting, but...

I can't help; sorry

#

try restating what information might help someone debug this with you.

#

"the model stopped working altogether" is uninformative; what happened instead? how do you know it isn't working?

molten ridge Feb 4, 2022, 5:57 PM

#

it didn't gave any output

molten ridge Feb 4, 2022, 5:57 PM

#

serene scaffold I can't help; sorry

ok, thanks

serene scaffold Feb 4, 2022, 6:00 PM

#

molten ridge it didn't gave any output

Unless this means that linear_kernel(overview_matrix, overview_matrix, dense_output=False) returned None, you have not yet divulged enough information for anyone to assist.

brave latch Feb 4, 2022, 6:24 PM

#

result_dict = {}
for index, row in df.iterrows():
   if row["Column value"] in result_dict:
        result_dict[row["Columne value"]].append(row)
    else:
        result_dict[row["Column value"]] = [row]

anyone happen to know how I could do this properly, i.e declaratively, with pandas/python? this works but you aren't supposed to iterate imperatively like that with pandas.

basically im trying to get a key value dict, where the keys are the unique values of a column in the dataframe (table), and the dict's values are the rows of the data frame with that column value. just trying to figure out the approach I should take to do it declaratively but it's not coming to me for some reason

#

sorry if there is a better channel for this

serene scaffold Feb 4, 2022, 6:25 PM

#

brave latch ```py result_dict = {} for index, row in df.iterrows(): if row["Column value"...

can you show df.head().to_dict('list')? This is the right channel for this.

brave latch Feb 4, 2022, 6:26 PM

#

yeah give me a sec, it takes me a few minutes to run the function. also this is for worked so trying to keeping the actual data anonymized if that is ok

#

thanks

serene scaffold Feb 4, 2022, 6:27 PM

#

you'd have to make a copy of the dataframe with fake data that captures the schema of the real dataframe.

brave latch Feb 4, 2022, 6:28 PM

#

ok might just share the real output and delete after

serene scaffold Feb 4, 2022, 6:28 PM

#

you can DM it to me, if you must.

velvet abyss Feb 4, 2022, 6:34 PM

#

I'm interested on getting into Data Science, is there anything I should know before start messing with it?

serene scaffold Feb 4, 2022, 6:34 PM

#

velvet abyss I'm interested on getting into Data Science, is there anything I should know bef...

it's mostly math

velvet abyss Feb 4, 2022, 6:35 PM

#

Just that?

serene scaffold Feb 4, 2022, 6:35 PM

#

velvet abyss Just that?

that's the part that a lot of people end up being dissapointed about

velvet abyss Feb 4, 2022, 6:35 PM

#

Oh well

#

Math isn't my forte, but this isn't a deal breaker

serene scaffold Feb 4, 2022, 6:36 PM

#

@brave latch what if the same Deal appears more than once? you can't have the same key twice in a dict.

#

you want a nested list?

brave latch Feb 4, 2022, 6:37 PM

#

yes I want unique deals -> rows containing that deal

#

result_dict = {}
for index, row in df.iterrows():
   if row["Deal"] in result_dict:
        result_dict[row["Deal"]].append(row)
    else:
        result_dict[row["Deal"]] = [row]

my imperative code checks for that

#

I want to make this declarative

#

because pandas

tacit basin Feb 4, 2022, 6:39 PM

#

velvet abyss I'm interested on getting into Data Science, is there anything I should know bef...

Not really. If you're into tabular data, then there's very good course starting soon. It's on machine learning with python and Sci-kit learn taught by svikit learn core devs. It's free btw. https://www.fun-mooc.fr/en/courses/machine-learning-python-scikit-learn/

FUN MOOC

Machine learning in Python with scikit-learn

Build predictive models with scikit-learn and gain a practical understanding of the strengths and limitations of machine learning!

serene scaffold Feb 4, 2022, 6:39 PM

#

@brave latch

In [40]: df.groupby('Deal').apply(lambda d: [row for _, row in d.iterrows()])

try that.

#

well, I guess that's still a dataframe

brave latch Feb 4, 2022, 6:40 PM

#

thats the exact schema I need though

serene scaffold Feb 4, 2022, 6:40 PM

#

In [42]: df.groupby('Deal').apply(lambda d: [list(row) for _, row in d.iterrows()]).to_dict()

#

there you go.

brave latch Feb 4, 2022, 6:40 PM

#

amazing

#

you are awesome

#

thanks so much

serene scaffold Feb 4, 2022, 6:41 PM

#

the trick is that df.groupby is like a magical amalgamation of individual dataframes

brave latch Feb 4, 2022, 6:41 PM

#

I tried using groupby

#

but today was my first time

serene scaffold Feb 4, 2022, 6:41 PM

#

and then apply does a function to each of those

brave latch Feb 4, 2022, 6:41 PM

#

and i just wasn't grokking the api

tacit basin Feb 4, 2022, 6:41 PM

#

velvet abyss Oh well

I would argue that you don't need lots of math for applied DS/ML. Like Sci-kit learn library abstract s a lot of that so you can focus on applying the tools.

brave latch Feb 4, 2022, 6:41 PM

#

I knew I needed some function

#

but couldnt figure out what that function should be

brave latch Feb 4, 2022, 6:51 PM

#

serene scaffold ```py In [42]: df.groupby('Deal').apply(lambda d: [list(row) for _, row in d.ite...

this worked perfectly, appreciate it, and understand how it works now. mind if i ask you to delete the data?

lapis sequoia Feb 4, 2022, 7:19 PM

#

Hello! You won't probably remember but in early december I posted a message asking for help in order to decide a machine learning/optimization algorithm that would solve basketball matches referee assignment. You provided pretty solid answers without knowing the actual datasets to work with. Now that we know the datasets, it turns out that there are so many restrictions to implement ML or optimization algorithms. My coworkers decided to use a rules-based AI algorithm. I've been surfing the net trying to figure out some implementations of this approach but I'm constantly reading posts explaining the differences between rules-based and ML algorithms and so on. I wonder if you know an example of a rules-based AI algorithm so that I don't look like an absolute beginner whenever I have to code things that interact with it.

#

Or even coding it myself😅

serene scaffold Feb 4, 2022, 7:55 PM

#

lapis sequoia Hello! You won't probably remember but in early december I posted a message aski...

rule-based AI is the subset of AI that isn't machine learning. instead of having parameters that are learned from data, someone decides how the output should be determined based on the data.

#

If you hear someone say "AI is glorified if statements", that is the subset of AI that they're referring to.

lapis sequoia Feb 4, 2022, 8:04 PM

#

serene scaffold rule-based AI is the subset of AI that isn't machine learning. instead of having...

I see, I've been searching for some implementations of rule-based but I didn't come across with a realistic example

little crown Feb 4, 2022, 8:11 PM

#

Why is matrix multiplication significantly faster with numpy than with tensorflow? Should it not run faster on my gpu with tensorflow?

iron basalt Feb 4, 2022, 8:32 PM

#

little crown Why is matrix multiplication significantly faster with numpy than with tensorfl...

The GPU has data transfer and setup overhead. Try very large matrices.

#

(like 1024x1024)

little crown Feb 4, 2022, 8:33 PM

#

ok i will try it

iron basalt Feb 4, 2022, 8:34 PM

#

Also the times will be different when you actually do something with the results. Since right now you are just calculating it and throwing it away immediately.

#

(Which has different destruction times, and GPUs are faster if you keep the data there and use it for something else as well)

#

(Avoid going back and forth between the CPU and GPU)

#

Your CPU can calculate many small matrix multiplications before a single small matrix reaches the GPU.

#

(If the matrix data is already in the CPU's local memory)

#

However, with enough of them, it becomes worth it again, but you have to send them all in one batch to the GPU.

little crown Feb 4, 2022, 8:43 PM

#

Ok, thank you now I understand why it took sol long.

iron basalt Feb 4, 2022, 8:48 PM

#

lapis sequoia I see, I've been searching for some implementations of rule-based but I didn't c...

They probably want hand-crafted fuzzy logic (with hand-crafted fuzzification functions).

#

It's still "glorified if-statements", but the input can be vague and it's its own programming style (you can make a DSL for it, but don't need to).

lapis sequoia Feb 4, 2022, 8:51 PM

#

iron basalt They probably want hand-crafted fuzzy logic (with hand-crafted fuzzification fun...

Yeah I feel like they are pretending to create an ad hoc algorithm for this use case in particular

iron basalt Feb 4, 2022, 8:52 PM

#

Fuzzy logic can make use of ML later, since the fuzzification / input process can be whatever.

#

I have even seen it slapped on top of spiking neural networks.

safe elk Feb 4, 2022, 8:56 PM

#

lapis sequoia I see, I've been searching for some implementations of rule-based but I didn't c...

See https://www.reddit.com/r/artificial/comments/ziw60/what_was_gofai_and_why_did_it_fail/

r/artificial - What was GOFAI, and why did it fail?

25 votes and 11 comments so far on Reddit

#

Many early AI research projects involved constructing a representation of a domain using first-order logic predicates, or something similiar. For example you would have a description of a restaurant domain as follows:
at(restaurant,Alice)
at(restaurant,Bob)
at(restaurant,Carol)
works_at(restaurant,Carol)
has_job(restaurant,waitress,Carol)
orders(Bob,pizza)
orders(Alice,sushi)
along with rules for reasoning about the domain, such as:
forall X,Y,Z. orders(X,Y) and has_job(restaurant,waitress,Z) -> serves(Z,X,Y)
which attempts to encode the rule that if person X orders food Y and Z is a waitress at the restaurant then Z will serve food Y to person X.
From the above representation we can deduce:
serves(Carol,Bob,pizza) serves(Carol,Alice,sushi)

little crown Feb 4, 2022, 9:00 PM

#

iron basalt The GPU has data transfer and setup overhead. Try very large matrices.

Now GPU is faster with a 2000 by 2000 matrix

iron basalt Feb 4, 2022, 9:01 PM

#

Symbolic AI has always had a huge flaw which GPT shares, they both don't anchor symbols / words / etc to physical objects. They have no "world model". In this sense they are both naive algorithms. The real hard work is getting that world model, especially since it's a very complex world we live in, way more complex and messy than any simulation.

#

(GPT is way more efficient than the pure symbolic methods though, so it worked out better in being able to go through way more data and use induction)

#

(Training on text alone will never be enough for an AI to understand language, since language's meaning comes from our physical world)

#

(In addition, not only does it need to train on the real world (or at least a simulation of it), it also needs to be human aligned, in the sense that it needs to assign meaning in the same way we do, we care about certain things that matter to us and therefor label them, it might not care about the same things (where does the chair object begin and end? well for us it begins and ends where it's useful for us, but a computer does not need to sit, so where does it begin and end for it?))

lapis sequoia Feb 4, 2022, 9:06 PM

#

safe elk See https://www.reddit.com/r/artificial/comments/ziw60/what_was_gofai_and_why_di...

I read it, pretty interesting tbh

coarse gale Feb 4, 2022, 9:07 PM

#

I'm grateful, learned the term GOFAI as well as the interesting/educational reasoning about what it was and how it was thought about.

#

Also kind of doubtful about 'you can't learn language through just language' but that's just my intuition and I don't want to derail. 😅

safe elk Feb 4, 2022, 9:08 PM

#

lapis sequoia I read it, pretty interesting tbh

I have a book on that approach at home

#

They use LISP and variants

#

One time a Dutch prof lectured at our uni on some of those topics, fuzzy logic, expert systems and neural net...he kept pronouncing Variables Var eyeable

iron basalt Feb 4, 2022, 9:11 PM

#

coarse gale Also kind of doubtful about 'you can't learn language through just language' but...

It's on topic, but think about this, with text alone, how will an AI every truly know what a chair is? Can I ask it to simulate one falling over?

#

I can simulate one falling over in my head.

coarse gale Feb 4, 2022, 9:12 PM

#

On AI Dungeon (iirc running on OpenAI DaVinci) it literally described the process for building a chair to completion.
So it seemed to be able to verbalize what parts are and a finished project, inferring process from token inference.
To me that's understanding and I'm pretty sure that's both naive and subjective at the same time.

lapis sequoia Feb 4, 2022, 9:13 PM

#

safe elk One time a Dutch prof lectured at our uni on some of those topics, fuzzy logic, ...

That's interesting

iron basalt Feb 4, 2022, 9:14 PM

#

coarse gale On AI Dungeon (iirc running on OpenAI DaVinci) it literally described the proces...

Being able to chain together words does not mean that it understand what a chair is. It never experienced a chair and never will. It has its own understanding you could say, but when we mean that it understands a chair, it means our understanding, not its own made up universe of text only.

coarse gale Feb 4, 2022, 9:15 PM

#

I think a made-up universe of text (the kind of knowledge that the GOFAI critics pointed out, abstract from anything the computer might be able to understand beyond boiling it down to algebra) can still represent knowledge.
And if you're asking me to argue why I think a representation of knowledge is indistinguishable from knowledge I'm probably being the naive one. I don't really know enough to defend that position, just my instinct. 😅

lapis sequoia Feb 4, 2022, 9:15 PM

#

So just to make sure that I got it right, when we talk about rule-based AI it's nothing less than plain if statements aka glorified if statements

coarse gale Feb 4, 2022, 9:17 PM

#

I guess I'm still stuck on primitive thoughts like even though all maps are inaccurate, Google Maps sure seems useful.

iron basalt Feb 4, 2022, 9:18 PM

#

Yes it can represent knowledge, but it's not language like how humans have language, which is the goal. Humans have language linked to objects like being able to build a chair physically, not just give the instructions as words on how to do it. The issue is that it's only basing its knowledge on the language structure / predictability, which can get you decently far, but it's not enough for some cases, plus if I tell it the word "chair", it can't feel the chair with for example touch, there is no shared sensory organ / word relationship.

#

When it gives you the instructions to build a chair, it's not simulating the chair, it's just regurgitating the instructions that someone typed in at some point (with induction / mixed together responses).

coarse gale Feb 4, 2022, 9:20 PM

#

like how humans have language, which is the goal
Honestly I call myself an NLP enthusiast at best but I never thought this was the goal.
If the computer operates with underlying abstraction layers that are totally different than our meat brains but we can still interrelate complicated language processes to each other, doesn't seem far-fetched to say that the computer's completely insular universe of text was still able to provide usefulness to our real world based one.

iron basalt Feb 4, 2022, 9:21 PM

#

Well if you are into AI/AGI it's the goal, but GPT can be your end goal too, whatever floats your boat.

little crown Feb 4, 2022, 9:21 PM

#

lapis sequoia So just to make sure that I got it right, when we talk about rule-based AI it's ...

Decision tree would be an example for an rule based system https://en.wikipedia.org/wiki/Decision_tree

Decision tree

A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.
Decision trees are commonly used in operations research, specifically in decis...

coarse gale Feb 4, 2022, 9:22 PM

#

You're correct that it's not AGI but it's still AI 🤔
Like pretend I put a baby in front of GPT and ask it to tell the baby how to build a chair, you're saying if the baby learned how to build the chair that's not AI?

iron basalt Feb 4, 2022, 9:22 PM

#

I would certainly count it as AI.

coarse gale Feb 4, 2022, 9:22 PM

#

Seems to me like possibly we don't disagree then, except on what the goal of NLP is.

lapis sequoia Feb 4, 2022, 9:23 PM

#

little crown Decision tree would be an example for an rule based system https://en.wikipedia...

Yeah cause you can represent the logic behind a decision tree with if statements and so on

iron basalt Feb 4, 2022, 9:23 PM

#

NLP in general could be anything involving natural language, including taking natural language as input and outputing random numbers.

little crown Feb 4, 2022, 9:23 PM

#

Yes exactly 🙂

earnest wadi Feb 4, 2022, 9:24 PM

#

can anyone help me get back propagation working in my program, I understand the calculus, but not how to apply it

iron basalt Feb 4, 2022, 9:24 PM

#

(It's about what the input is)

lapis sequoia Feb 4, 2022, 9:27 PM

#

Thank you so much I feel like I'm fully aware of what rules-based AI is and I think that I can tackle the referee assignment problem myself👌

deft harbor Feb 4, 2022, 9:49 PM

#

earnest wadi can anyone help me get back propagation working in my program, I understand the ...

You are coding backprop in python?

earnest wadi Feb 4, 2022, 9:50 PM

#

deft harbor You are coding backprop in python?

trying to :)

lapis sequoia Feb 4, 2022, 9:50 PM

#

Alright

#

The work week is over

deft harbor Feb 4, 2022, 9:50 PM

#

Are you storing all your weights in numpy at least?

earnest wadi Feb 4, 2022, 9:50 PM

#

deft harbor Are you storing all your weights in numpy at least?

ofc

#

im just not using tf or keras

lapis sequoia Feb 4, 2022, 9:50 PM

#

And I’m ready to stay up like a crack head yo finish this project

earnest wadi Feb 4, 2022, 9:51 PM

#

deft harbor Are you storing all your weights in numpy at least?

I have the whole thing pretty much done, I can create a model with some Dense layers and it will feed forward fine, just the back prop i cant get my head around

deft harbor Feb 4, 2022, 9:52 PM

#

What do you have so far

earnest wadi Feb 4, 2022, 9:52 PM

#

wdym

#

towards back prop?

deft harbor Feb 4, 2022, 9:52 PM

#

in terms of code

earnest wadi Feb 4, 2022, 10:01 PM

#

deft harbor in terms of code

okay

#

import numpy as np

class Dense():
    def __init__(self, units, activation):
        self.units = units
        self.activation = activation
        self.type = "Dense"

    def initialise(self, num_inputs):
        self.weights = (np.random.rand(num_inputs, self.units) * 2) - 1
        self.bias = np.random.rand()

    def forward_propagate(self, inputs):
        self.z = np.dot(inputs, self.weights) + self.bias
        self.a = self.activation(self.z)
        return self.a

    def back_propagate(self):
        pass

#

This is my dense layer

#

I have written then deleted attempts at back prop many times

gleaming remnant Feb 4, 2022, 10:09 PM

#

Heyy. How can I use matplotlib in Vscode on mac ? I need it to do graphs for a physic project

pure blaze Feb 4, 2022, 10:12 PM

#

I'm trying to predict product sales (of different products in different times) in relation to stock.
I know I can use fbprophet - but I'm not sure how I'd set up a relation between the regressor (the stock) and the sales (the predicted timeseries) so that every time a sale is predicted, the stock is reduced and a new prediction is ran with the new input.

Does anyone know easier ways of doing this using other models? Is there an easier way to do it using fbprophet?

iron basalt Feb 4, 2022, 10:12 PM

#

gleaming remnant Heyy. How can I use matplotlib in Vscode on mac ? I need it to do graphs for a p...

https://code.visualstudio.com/docs/datascience/jupyter-notebooks

Working with Jupyter Notebooks in Visual Studio Code

Working with Jupyter Notebooks in Visual Studio Code.

little ginkgo Feb 4, 2022, 10:42 PM

#

Does anyone know a good way of plotting 2 y axis in pandas?

#

Or 2 x axis

earnest wadi Feb 4, 2022, 11:21 PM

#

can anyone help me get back propagation working in my program, I understand the calculus, but not how to apply it

serene scaffold Feb 4, 2022, 11:38 PM

#

little ginkgo Does anyone know a good way of plotting 2 y axis in pandas?

Data frames are always two dimensional. What do you mean?

little ginkgo Feb 4, 2022, 11:59 PM

#

serene scaffold Data frames are always two dimensional. What do you mean?

Like i have one x but two y data overlayed

serene scaffold Feb 5, 2022, 12:13 AM

#

little ginkgo Like i have one x but two y data overlayed

You can have one dataframe for each xy combination

lapis sequoia Feb 5, 2022, 12:36 AM

#

how do you impute missing values ><

serene scaffold Feb 5, 2022, 12:37 AM

#

@lapis sequoia what kind of imputation

#

Mean imputation? Mode imputation?

lapis sequoia Feb 5, 2022, 12:41 AM

#

serene scaffold <@456226577798135808> what kind of imputation

Im looking to impute Mean for a few and Mode for frequency on others

lapis sequoia Feb 5, 2022, 12:42 AM

#

serene scaffold <@456226577798135808> what kind of imputation

I already removed all null values but I want to attempt atleast imputing them before i move on with my model

#

i want to do mode for horsepower

thin palm Feb 5, 2022, 12:46 AM

#

What's up Python gang, when I concat two panda dataframes of same Rows why does it end up giving me NaN on the last column and add a row?

lapis sequoia Feb 5, 2022, 12:57 AM

#

not sure

#

can you show a SS of your code?

thin palm Feb 5, 2022, 12:57 AM

#

I know why.. it's because 128 rows and 127 rows are not same length

#

but so weird, how is a row missing from the same data?

lapis sequoia Feb 5, 2022, 12:58 AM

#

u can remove all nulls from ur data set

thin palm Feb 5, 2022, 12:58 AM

#

correct but i've cleaned it all up

#

can I show you what I'm talking about?

lapis sequoia Feb 5, 2022, 12:58 AM

#

sure

#

im honestly not too great but yeah Id love to see

thin palm Feb 5, 2022, 1:00 AM

#

I have this dataset known as df and is 127 rows by 13 columns. The first row known as Lender is going to be one hot encoded (OHE) and I'd like to make columns for each of the unique lender names.

#

Screen_Shot_2022-02-04_at_6.00.46_PM.png

#

ohc = OneHotEncoder()
ohe = ohc.fit_transform(df.Lender.values.reshape(-1,1)).toarray()

dfOneHot = pd.DataFrame(ohe, columns=['Lender_' +str(ohc.categories_[0][i]) for i in range(len(ohc.categories_[0]))])
dfh = pd.concat([df, dfOneHot], axis = 1)```

#

this is the result

Screen_Shot_2022-02-04_at_6.01.29_PM.png

#

I'd like to drop my Lenders column and put these in instead. Simple concat works but when I do concat on the last line where I set the value equal to dfh it adds another row?

#

Screen_Shot_2022-02-04_at_6.02.32_PM.png

serene scaffold Feb 5, 2022, 1:04 AM

#

@lapis sequoia you can use fillna with the mean. It's slightly more complicated for mode. I'm on mobile so I can't show you

serene scaffold Feb 5, 2022, 1:04 AM

#

lapis sequoia can you show a SS of your code?

Don't normalize sharing screenshots of code. Use the !code command.

thin palm Feb 5, 2022, 1:04 AM

#

serene scaffold Don't normalize sharing screenshots of code. Use the `!code` command.

What does that do

serene scaffold Feb 5, 2022, 1:05 AM

#

thin palm What does that do

Tells you how to post code

thin palm Feb 5, 2022, 1:05 AM

#

okay

lapis sequoia Feb 5, 2022, 1:06 AM

#

!code

#

OOP

serene scaffold Feb 5, 2022, 1:06 AM

#

Right

lapis sequoia Feb 5, 2022, 1:08 AM

#

df['Price'].min().fillna.mean()

serene scaffold Feb 5, 2022, 1:08 AM

#

Put it bacj

#

I was about to copy it

lapis sequoia Feb 5, 2022, 1:09 AM

#

had an issue there

serene scaffold Feb 5, 2022, 1:09 AM

#

df['Price'].fillna(df['Price'].mean())

lapis sequoia Feb 5, 2022, 1:09 AM

#

Yeah that spat out an error

serene scaffold Feb 5, 2022, 1:09 AM

#

I'm on mobile but I typed it anyway bc I appreciate you

#

Show whole error from traceback

#

Saying that something "caused an error" is opaque.

lapis sequoia Feb 5, 2022, 1:10 AM

#

what you wrote worked

serene scaffold Feb 5, 2022, 1:10 AM

#

Yay

lapis sequoia Feb 5, 2022, 1:10 AM

#

so now i have this small dataframe but

#

df = df['Price'].fillna(df['Price'].mean())```

#

OH

#

this should now have imputed average into the missing values. time to check

serene scaffold Feb 5, 2022, 1:12 AM

#

Also you're replacing the whole df variable with one column

prime hearth Feb 5, 2022, 1:12 AM

#

would be great is if can see the distribution of the data to make sure that the mean filling in price for data isnt bias which can be overfitting model

#

but again this depends on how many nan for price column; if just a few then it no biggy

lapis sequoia Feb 5, 2022, 1:13 AM

#

oh I see.

#

hmmm im still digging myself into a larger hole here.

#

It's way easier to just eliminate all null values

thin palm Feb 5, 2022, 1:15 AM

#

Why is it that I'm One Hot Encoding a DataFrame column with 127 rows but spits out a 126 rows dataframe??

prime hearth Feb 5, 2022, 1:15 AM

#

could be nan in one of the row maybe?

#

or is it all valid data

serene scaffold Feb 5, 2022, 1:16 AM

#

thin palm Why is it that I'm One Hot Encoding a DataFrame column with 127 rows but spits o...

Sounds like that's not going to work too well

thin palm Feb 5, 2022, 1:16 AM

#

serene scaffold Sounds like that's not going to work too well

Can you elaborate on that please?

lapis sequoia Feb 5, 2022, 1:16 AM

#

so youre trying to basically

#

take one column and turn that into x amount of columns instead?

thin palm Feb 5, 2022, 1:17 AM

#

Noooo, there's 34 unique values inside this specific column. So we'll make 34 new columns but why is it losing a row?

#

here's my OHE code:

ohc = OneHotEncoder()
ohe = ohc.fit_transform(df.Lender.values.reshape(-1,1)).toarray()

dfOneHot = pd.DataFrame(ohe, columns=['Lender_' +str(ohc.categories_[0][i]) for i in range(len(ohc.categories_[0]))])
dfh = pd.concat([df, dfOneHot], axis = 1)```

serene scaffold Feb 5, 2022, 1:19 AM

#

thin palm Can you elaborate on that please?

You have to one hot encode each feature separately. But it sounds like you have too many features

thin palm Feb 5, 2022, 1:19 AM

#

serene scaffold You have to one hot encode each feature separately. But it sounds like you have ...

No way. As a programmer we're too robust to be OHE each individual unique value?

serene scaffold Feb 5, 2022, 1:20 AM

#

@thin palm well, you only one hot encode nominal features. I don't know what your features are.

lapis sequoia Feb 5, 2022, 1:24 AM

#

can u not manually introduce another row or something

prime hearth Feb 5, 2022, 1:25 AM

#

instead of one hot encode you can apply feature engineering

#

such as pca or creating new feature column with grouping simlairities

serene scaffold Feb 5, 2022, 1:27 AM

#

Do you understand when you would or wouldn't use one hot encoding?

plush jungle Feb 5, 2022, 2:10 AM

#

can someone help me understand this code?

#

class MyRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MyRNN, self).__init__()
        self.hidden_size = hidden_size
        self.in2hidden = nn.Linear(input_size + hidden_size, hidden_size)
        self.in2output = nn.Linear(input_size + hidden_size, output_size)

    def forward(self, x, hidden_state):
        combined = torch.cat((x, hidden_state), 1)
        hidden = torch.sigmoid(self.in2hidden(combined))
        output = self.in2output(combined)
        return output, hidden
    
    def init_hidden(self):
        return nn.init.kaiming_uniform_(torch.empty(1, self.hidden_size))```

#

so RNNs take an input and a hidden state

#

and then they give an output and a hidden state

#

the neural net that produces the output is this:

self.in2output = nn.Linear(input_size + hidden_size, output_size)```

#

the neural net that produces the new hidden state is this:

#

self.in2hidden = nn.Linear(input_size + hidden_size, hidden_size)```

#

but this makes no sense to me

#

#

in this youtube tutorial explaining RNNs, the food represents the input

#

and the weather represents the hidden state

thin palm Feb 5, 2022, 2:13 AM

#

serene scaffold Do you understand when you would or wouldn't use one hot encoding?

Yes I understand the difference between different encoders and in the case of text we need to One Hot encode so our computer can understand text

serene scaffold Feb 5, 2022, 2:13 AM

#

@plush jungle are you sure? I would expect both to the the input

serene scaffold Feb 5, 2022, 2:14 AM

#

thin palm Yes I understand the difference between different encoders and in the case of te...

It's more complicated than just "understand text"

plush jungle Feb 5, 2022, 2:14 AM

#

serene scaffold <@433856634192789504> are you sure? I would expect both to the the input

oh

#

then where is the hidden state in this diagram

serene scaffold Feb 5, 2022, 2:16 AM

#

Let me check so I don't mislead you. But I'm pretty sure all the nodes in the middle are the hidden state and the food and the weather are features

plush jungle Feb 5, 2022, 2:16 AM

#

serene scaffold Let me check so I don't mislead you. But I'm pretty sure all the nodes in the mi...

that would make total sense

serene scaffold Feb 5, 2022, 2:16 AM

#

Also it might be a while before I can look into it.

plush jungle Feb 5, 2022, 2:16 AM

#

except that the code I posted has two neural net layers