#data-science-and-ml

1 messages · Page 373 of 1

lapis sequoia
#

Given more time i;d feel so much more confident but i've got 48 hours to do something i dont have under my grasp

#

just to stay in the program

desert oar
#

i'm sorry you are under such pressure. the market is hot, but not hot enough to warrant making yourself insane for it. tell your family that a professional data scientist said that

#

the market isn't going anywhere. there might be a slump if companies start folding and/or laying people off after the current hype wave

#

but that will be short-lived

serene scaffold
lapis sequoia
#

I just hate that im in this position

#

I just want to turn in something acceptable

desert oar
#

data strategy is becoming a necessity for pretty much all medium+-sized companies, and data science is not going to be automated away for at least a decade

lapis sequoia
#

im not worried about killing the project

desert oar
#

probably more lke 5 decades lol

lapis sequoia
#

i want to learn

#

i just want to finish this fucking project and have SOMETHING that says hey i tried

desert oar
#

great

lapis sequoia
#

i can share my last project

desert oar
#

so, what do you currently know?

lapis sequoia
#

if that could show insights to what im capable of better than me explaining it

desert oar
#

sure, that might give us a starting point

lapis sequoia
#

which i got an ASS fucking score on

desert oar
#

but you should also explain anything you've learned between now and then

lapis sequoia
#

I've learned EDA functions with pandas and seaborn for visualizing concepts

#

ill link you my notebook one second...

arctic wedgeBOT
#

Hey @lapis sequoia!

It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

lapis sequoia
#

idk how to get this to yall through ere

#

its an ipynb

#

shit like this basically

#

nothing special. just what I have time to put together

#

that was my first project

desert oar
#

ok, fair enough

#

did you get any feedback on why the score was bad?

lapis sequoia
#

Yes ill link SS of it

#

and it sucks too because

#

ill be in "class"

#

and my professor will be trying to correct his own errors for 30+ minutes

desert oar
#

yikes

#

that's really bad

#

i would definitely attempt to ask for your money back somehow

#

maybe go through your credit card company or bank if you have to

#

that's... not worth your time to suffer though, and certainly not your money

#

i'm sorry this was a bad experience

#

so do you know what "linear regression" is?

lapis sequoia
desert oar
#

if it makes you feel better, i had no idea what "differential pricing" meant, and i had to look it up (as per the problem statement)

lapis sequoia
#

my bad. wrong one on top

desert oar
#

ok, and do you understand that feedback?

#

they gave very specific suggestions

lapis sequoia
#

Yes i just have trouble executing the approaches they outline

#

I sincerely wish i had more time under my belt

#

its a quantitative representation of two variables relationship

#

Such as brand name and price

#

or mileage and price

#

etc

#

idk how to generate something to predict that though..

desert oar
#

well i think you need to start by understanding the feedback

#

unfortunately a lot of topics in data science heavily depend on prior topics

#

i know it's difficult because you are behind in the class

#

the "better conclusions" comment is interesting - this suggests that you might also be struggling to interpret the stuff you do know how to produce

#

so let's start at the very very beginning. you know what a mean (aka average) is, right? median? standard deviation?

lapis sequoia
#

of course

#

spread from the average

#

high Standard of deviation means more outliers if im not mistaken

desert oar
#

high standard deviation means higher average spread

lapis sequoia
#

low standard is more consolidated around the mean no

desert oar
#

maybe that means outliers, or maybe it means that the data is just spread out

#

but ok, you know that much. good

#

and do you know what a "mode" is?

#

do you know what a "frequency table" is?

lapis sequoia
#

not entirely sure what a freq table is

#

i know what the mode is

#

most occurring value

#

I see that it can be one or the other

desert oar
#

yep

#

a frequency table is just a table of how often each value appears in the data

#

so if you have the data H H H T T H T H H T of coin flips, then the frequencies are H: 6 and T: 4

#

in pandas you would invoke the .value_counts() method to compute this on a column in a dataframe

#

you might have seen that one

lapis sequoia
#

havent.. im most familiar w

desert oar
#

note that this only makes sense for categorical variables

lapis sequoia
#

oh i see what you mean there

desert oar
#

it makes no sense to compute a frequency table for a continuous variable

#

you'd get all counts of 1, for the most part

#

but you can extend this to two categorical variables, and this is known as a "cross-tabulation", or "crosstab" for short, and it is implemented with pandas.crosstab

lapis sequoia
#

okay

#

So

#

i just implemented this in my last project

#

data.value_counts()

#

just taking a look at it right quick

desert oar
#

good, so you are familiar with that much

lapis sequoia
#

and I see

desert oar
#

i believe this is what they were asking for when they asked for the "proportion of customers" across different categories

lapis sequoia
#

with my repeated values such as

#

for example i have states that sales are made from

#

and those values have the 1

desert oar
#

what do you mean by "those values have the 1"?

lapis sequoia
#

Is there a way to display this cleaner

desert oar
#

ah, you called it on the entire dataframe

#

that applies .value_counts() to every column at once

#

as you can see it's quite messy, and the ... means that it's cutting off rows in the middle

#

also, does it make senes to compute .value_counts() on order id?

#

it's not a continuous variable

#

it is clearly categorical

#

but does it make sense to measure the counts of order ids?

#

probably not, unless you are expecting lots of repeated order ids!

#

do you see why that is the case?

#

also in general it's much more useful to look at counts of individual variables: each variable is worth examining on its own before trying to examine their relationships

lapis sequoia
#

Im wrapping my head around your comments

#

and let me say; I appreciate immensely your help

desert oar
#

this is what actual professional data scientists do btw, i am not dumbing anything down for you. this is literally how i start every project

lapis sequoia
#

i see the ...

#

it'll cut my rows if i do some

#

data.head() type shit

desert oar
#

it's cutting off rows to avoid dumping a huge amount of data

lapis sequoia
#

i see what you mean about order id

desert oar
#

good!

#

now there are cases when you are interested in the counts of order ids

#

for example, you are given a dataset by some finance person and they claim that every row has a unique order id

#

the first thing you should do with that dataset, is verify that assertion

#

because people make mistakes

#

in that case, you are checking for the counts of order ids to assert that they all are indeed 1

#

otherwise there is a problem w/ the dataset and you need to either 1) make a decision about how to deduplicate rows, or 2) send the data back and tell them to fix it

#

de-duplication is a big serious topic in applied data science and i will assume for the most part that you won't have to do it (and you don't have time to worry about it)

#

but i hope it's clear why order id isn't usually useful for data analysis, in this intro-level scenario

#

so let's pause. given what i said above, how could you improve that one output you generated?

#

give me a couple of ideas

#

and yes i am happy to help, i happen to have a few minutes of free time and i get really frustrated when i see that people got sucked into stuff like this and are struggling

lapis sequoia
#

Im following

#

digesting;

desert oar
#

sure, take your time

lapis sequoia
#

I dont feel incapable of doing or learning these things

#

Its just

#

ends gotta be met and I'm stressed as fuck. but Im following - give me a sec to

#

confused by what you are saying here

desert oar
#

sure, you definitely seem capable under less-bad circumstances. but even smart people have limits of what they can do with limited time. so don't feel bad that you're struggling

lapis sequoia
#

need to understand why 1 is what it is in this situation

desert oar
#

i generally think that most people are capable of learning most things, at least conceptually

lapis sequoia
#

its not clicking entirely

#

im guessing that

#

youre saying that every ID is unique

desert oar
lapis sequoia
#

but if its not - im combing it to ensure that they are indeed unique

#

if theyre repeated then we have an error

desert oar
#

right

lapis sequoia
#

bc each order should be its own

desert oar
#

if any order id is repeated, then it will have a count of >1

lapis sequoia
#

yes!

#

okay so I see the value of the count

#

Understood

desert oar
#

right, but i hope you also see why this is different from "data analysis" as such. this is more like checking that the data is "clean" before trying to work with it

desert oar
#

and therefore why you should treat it separately from variables that you intend to use in the model

lapis sequoia
#

heard and understood

#

its the opposite of scanning missing values almost

#

just looking for repeats where they shoudnt be

desert oar
#

yep! and great, i'm glad you know enough to look for missing values

lapis sequoia
#

Im not a complete idiot I just feel like one because

#

i paid out my ass for this

#

i feel like im disappointing myself and others when im just trying to make shit work out

desert oar
#

i basically spent 2021-2022 so far paying out the ass for things that ended up being kind of a bust, or overpaying for things that i should have paid less for

#

it sucks

lapis sequoia
#

if i didnt care i wouldnt be embarrasing myself on discord

desert oar
#

you are not embarrassing yourself and you shouldn't be embarrassed

#

taking a course and working a full time job is hard enough

#

it's worse when the course is clearly bad and you are unsupported as a student

lapis sequoia
#

I work for a voting machine company and

desert oar
#

and the fact that it's disgustingly expensive is adding insult to injury. i feel you, and no you shouldn't feel bad about asking for help

lapis sequoia
#

basically its crunch time 100% of the time

desert oar
#

that sucks too. nobody should have to work like that

lapis sequoia
#

im really hoping tomorrow we have the day off bc

#

im in pittsburgh rn but we have a huge winter storm coming in

#

praying the county closes the warehouse so i can actually work on this!

#

otherwise ima be doing other shit from 6am to 5pm

#

and lemme tell you trying to do this shit when ur dead exhausted is

desert oar
#

can you take a sick day? don't even get me started on how awful us labor laws are...

lapis sequoia
#

I could but itd comprimise my future work

desert oar
#

well that's fucked up in and of itself

lapis sequoia
#

they flew me out here for two weeks

#

so im staying in a hotel and riding airfares on their tab

desert oar
#

ah i see, you're onsite somewhere

#

that's rough for sure

#

and it seems like you work enough hours that even applying for other jobs is a big chore

#

ok, so let me try to help you a bit more and at least you can think about this stuff tomorrow a little, even if you can't get hands on

#

the assignment asked for proportions, i.e. the fraction of data points, which you can always convert to a percentage. you can easily get a % from a count by dividing by the total number of data points. so in the heads/tails example above, you have 10 data points, and therefore you have proportions 60% H and 40% T

#

6/10 and 4/10 are 0.6 and 0.4, i.e. 60% and 40%

#

counts and proportions are basically equivalent, but humans are generally bad at numbers so i like to present both when it isn't cumbersome to present both

#

e.g. "our experiment ran 10 times, and we found 6 heads (60%) and 4 tails (40%)"

lapis sequoia
#

I see

desert oar
#

and this is what i was going to say before, about extending a frequency table to two categorical variables:

let's say you are looking at people's clothing, and you are writing down 2 binary variables for each person. (binary means just "yes" and "no", which are represented in python as True and False or 1 and 0, depending on the situation). the two variables in this case are "is the person wearing boots?" and "is the person carrying an umbrella?" so the data might look like this:

boots?   umbrella?
True     False
True     True
False    False
False    True
True     True
False    False
True     False
True     True
False    False

so the crosstab of boots? and umbrella? would look like this:

         umbrella
boots    True  False
True     3     2
False    1     3

that's 9 data points, so you should confirm that the sum of all the numbers in the crosstab is indeed 9

#

of course, if you have more than 2 categories in each variable, the crosstab will have more rows or columns

#

and if you have a lot of categories, cross tabs and frequency tables start to get a bit messy and hard to read, in which case you would fall back to other techniques that you probably don't need to worry about right now

#

and of course you can compute proportions for a crosstab too, e.g.:

         umbrella
boots    True  False
True     33%   22%
False    11%   33%

which should add up (approximately) to 100 (in this case it adds up to 99 because of rounding)

#

the crosstab is a "bivariate" analysis, meaning "two variables". whereas the frequency table for a single variable is called "univariate", meaning "one variable".

#

another name for a crosstab that you might see in statistics is a "contingency table"

#

and what's really interesting is that you can recover the frequency table for each variable individually from the crosstab!

         umbrella
boots    True  False   Total
True     3     2     | 5
False    1     3     | 4
         ------------+---
Total    4     5     | 9
#

maybe just ruminate on that for a while

lapis sequoia
#

ruminating

desert oar
#

feel free to @ me with questions, i need to work on something for a bit

lapis sequoia
#

I appreciate everything man

#

Im going to add you

serene scaffold
#

salt rock lamp is so fucking good

sleek tapir
#

is real analysis important for ml

hollow sentinel
#

i googled it and it says it's not that important, but idk

#

also did not know what real analysis was before you asked

#

sounds like proof-based stuff

iron basalt
# sleek tapir is real analysis important for ml

Yes and no. It might come up, but its benefits are more indirect (but not insignificant). It will teach you how to think and make you more comfortable with mathematics in general. An important skill to have when reading / understanding other's work. It also depends on what you consider to be part of "real analysis".

#

(If you want to really understand how probability works (which can come up if you are doing very experimental ML), then you need it)

desert oar
steel mantle
#

Where is best to get started with data science?
Hope someone mentors me from here

iron basalt
#

So while it's not directly needed, I would still recommend it, just for getting into the right head space.

#

(Also it's fun, if you like math)

sleek tapir
#

wait wat degree did u guys have

#

before going data science

#

or mle

#

im thinking of doing either

#

im from australia

worthy nest
flint grotto
#

hello.

#

can you recommend data science books?

#

for O relly books.

earnest fog
#

search for O’REILLY

#

You should find many books

#

look for the 2019-2021 ones

odd meteor
# sleek tapir wait wat degree did u guys have

Data Science is an interdisciplinary field so alotta people in this field started off from different backgrounds. I have a friend who have a major in Fishery but he's working as a Data Scientist now 😀

In essence, you might as well study Human Kinetics and Sports Education and still end up working as a Data Scientist if you put in the much needed work to learn it.

Notwithstanding, going for a major in Mathematics, Computer Science or Statistics will definitely offer you more options and give you an edge over others.

digital stirrup
#

Hey guys,
I'm trying to use the gpt-3 question answering function.
Anyone have clue how to use it?
For example if I want to create a bot that acts like a real human with same personality like if someone ask him what is age he will answer the same age but in other way

sleek tapir
#

how much do most data scientists in aud

#

stats is hard

#

im doing theiss

sweet sequoia
#

I have tried importing the chess module using pip. It said dependency satisfied but when I use any of it's function, it does not work?

#

please help smeone

nova smelt
#

Hey, i am having a weird problem with training a NN. When it its through about 3/4 of training the loss suddenly gets the value nan

#

at around the point of the red square. before that it has normal numerical value
any idea why this happens?

eager imp
#

Has any of you any experience in combining genetic programming with ML?

sweet sequoia
odd meteor
# sleek tapir Im stats cs

I'm not sure I understand what 'lm' means though. I have a major in Statistics. And I believe it's not really that hard.

I enjoyed Stats more than Math. I even picked more CS electives than Math electives when I was in school.

Goodluck on your thesis ✌️

sleek tapir
#

i am stats is hard

sleek tapir
#

how is stats not hard theres so much proof writing

#

i struggle in bayesian the most

#

how bout stochastic calculus the list goes on

odd meteor
odd meteor
sleek tapir
#

o lol

#

i'm chinese then

sweet sequoia
sleek tapir
#

just celebrated new year

odd meteor
nova smelt
sleek tapir
#

im struggling in tittanic

odd meteor
# sleek tapir how is stats not hard theres so much proof writing

To be honest I ain't gon lie, the proving part is what I love most (especially when it's going well) 😀

You could legit use up 3 sheets of paper to prove a stats equation. Of all the proofing I did I enjoyed Experimental Design, Confounding, and Gambler's Ruin class the most.

Not to say, there are no topic in Stats I really don't enjoy. I particularly don't enjoy sample survey classes lol.

One of the ways to get past what you struggle with is to:

  1. Make friends with the brilliant guys in your class, ask them to help you understand the concepts you struggle with.

  2. Attend after-lecture tutorials (if such exist in your class)

odd meteor
# sweet sequoia how do I check where the library is installed

In your command prompt, search for that chess library in maybe the scripts folder (could be different on your pc) and ensure it's loaded in the right directory where your python is installed.

There are different way to solve this problem tho. Check stackoverflow.com

Are you working on Jupyter Notebook? When you try importing this chess library in JNB do you get any error message?

nova smelt
#

Thanks. Tho i was dumb and just realised i had some nan vlaues in my data🤦‍♂️

#

thanks for your help tho

sleek tapir
#

for ml

#

is ai/ml better for maths degrees

#

than cs degrees

sour spindle
#

Yeah i split it into train test and validation. The apple one is more reliable since it has more data and i trained it using the same parameters.

#

It isnt a forcast. Its a generated trading strategy

#

I used around 13 different indicators to get the signals

#

Here is the apple ticker used only on test data which wasnt used in the model

#

The accuracy when matching up with the 1s and 0s of the position signal method i used is around 72% on the unseen test data which is shown above

sour spindle
#

Tesla went public around 2010 where the graph started and ended in 2022 and the graph includes the training testing and validation data

mild dirge
mild dirge
sour spindle
sour spindle
#

It wasnt used in validation either

mild dirge
#

how does the model perform on test data that does not increase heavily over a long time?

sour spindle
#

I am trying to update the y data generation to get better results

mild dirge
#

It's just pretty hard to get reliable tests on these types of prediction models. It looks like a fun project though, but I wouldn't be so sure about the performance by testing it on a few test sets.

#

It's still impressive btw not trying to belittle you or anything

mild dirge
#

I wouldn't know yet, ask the question first 😛

#

Please don't dm me. And think about why one would normalize data and if the amount of figures would be relevant.

warm raven
#

Hello I am trying to combine the results of a couple masks to make another column in my dataframes indicating whether revenue is recurring or non recurring

#

        pcn_mask = x['prod_code_name'].isin(gfs['prod_code_name']).any()
        #print(pcn_mask)
        pidmap_mask = x['PRODUCT_ID_MAP'].iloc[i].isin(gfs['PRODUCT_ID_MAP']).any()
        #print(pidmap_mask)
        sector_mask = x['Sector'].isin(gfs['Sector']).any()
        #print(sector_mask)
        
        relevant_product_codes = ["Usf33", "Usf34", "Us756", "Usf37", "Usf40", "Usf29"]
        product_code_mask = gfs['Product_Code'].isin(relevant_product_codes).any()
        #print(product_code_mask)

        relevant_company_codes = ["Us05", "Us1b", "Usm6"]
        company_code_mask = gfs['Company_Code'].isin(relevant_company_codes).any()
        #print(company_code_mask)

        all_masks = (
            (pcn_mask and pidmap_mask and sector_mask) and (product_code_mask or company_code_mask)
        ).all()

        



        return all_masks ```
#

I keep getting the following error when calling the function “ ‘str’ object has no attribute ‘isin’ “

#

Could I use just ‘in’ and still get the same comparison?

dusk tide
#

Hello guys I have a doubt that
In cost function of linear regression we are dividing the SSE by 1/2m .
So what's the use of doing 1/m ie. the average??
If I will not do this then what will happen??

prime hearth
#

thats a good question, so actually the 1/m is needed for averaging without it our cost would be quite big and dependent on our data size, taking 1/m removes that dependency and make math easier to work with

#

1/2 i believe is there to make math easier to work with when we take partial derivative- it a constant but doesnt affect loss it just simplifies math again.

soft viper
#

in apriori, which one do i value more. Confidence or lift?

prime hearth
#

Also, the full derivation of these formulas will also show how those maths come to be namely gaussian formula , can check out this or google linear regression map derivation which is an alternate to traiditional linear regression approach;
https://math.stackexchange.com/questions/884887/why-divide-by-2m

warm raven
mild dirge
# sour spindle what?

Someone asked if I could help with a question but didn't ask their question, wasn't in response to you

#

they removed their message

mild dirge
#

oh nw lol

soft peak
#

Hi, im having a problem. My code is supposed to highlight the nuclei of several blood cells of specific animals and give me their diameter and radius, which will be used later. The code works fine until it stops and ends at this error

#

heres the code itself

#

ive ran my code through cmd on admin but it still doesnt work, and it does run for a couple seconds before the code just stops working

#

any form of help is appreciated

mellow vapor
#

So basically I m trying to work my way through projects on ml and ai
I did a few projects on them bt the hands on projects usually don't bother with the mathematics and understanding level
Like for neural networks in tensorflow and keras they build models based on layers like dense and models like sequential
With optimisers like adam
So are there any courses which actually explain these things in some detail so that atleast i can judge things by myself
And know when to use what and how to use them actually

mild dirge
#

You'd probably want to start with a basic course on linear algebra

#

and statistics

#

and after that try to use this information for understanding a machine learning/ neural networks course

serene scaffold
#

"Data Science from Scratch" is a book that I recommend in that it goes over some of the fundamentals that PcCamel just mentioned

#

if you are a student, I would see if you can get the ebook through your library.

mellow vapor
#

@mild dirge I think I do understand the basics of linear algebra and statistics plus a bit calculus but i need some good machine learning courses i guess

#

I did take andrew ngs coursera course on machine learning and it was quite great

#

He explained the mathematical ideas along with the implementation sections

#

@serene scaffold oh that's great I will try to get that book and go through it. Thanks!

serene scaffold
#

it might actually not be advanced enough if you feel comfortable with the material in the andrew ng course

#

I'm not really sure what to suggest tangerine_think

mellow vapor
#

@serene scaffold it was difficult for me to implement those things in octave bt the videos were good

#

Like any material which can explain things like neural nets in a similar fashion

serene scaffold
warm raven
# serene scaffold `isin` is a method of a `pandas.Series`, but you're using it on a string. Which ...
AttributeError                            Traceback (most recent call last)
C:\Users\PBWEWU~1\AppData\Local\Temp/ipykernel_9948/2090180030.py in <module>
      1 x = []
----> 2 x = get_rec_value(pipe_short)

C:\Users\PBWEWU~1\AppData\Local\Temp/ipykernel_9948/3247770990.py in get_rec_value(x)
      4 
      5     for i in range(x.shape[0]):
----> 6         pcn_mask = x['prod_code_name'].iloc[i].isin(gfs['prod_code_name']).any()
      7         #gfs['prod_code_name'] == x['prod_code_name']
      8         print(pcn_mask)

AttributeError: 'str' object has no attribute 'isin'```
mint palm
#

In Resnets, do we really skip calculation of a[l+1] layer??

serene scaffold
hollow sentinel
#

hey i'm having a very strange problem

#

i cannot graph simple data anymore on my jupyter notebook

#
import pandas as pd
from matplotlib import pyplot as plt

plt.style.use("seaborn")


x = [5, 7, 8, 5, 6, 7, 9, 2, 3, 4, 4, 4, 2, 6, 3, 6, 8, 6, 4, 1]
y = [7, 4, 3, 9, 1, 3, 2, 5, 2, 4, 8, 7, 1, 6, 4, 9, 7, 7, 5, 1]




colors = [7, 5, 9, 7, 5, 7, 2, 5, 3, 7, 1, 2, 8, 1, 9, 2, 5, 6, 7, 5]

plt.scatter(x, y, s=100, c="green", edgecolor = "black", linewidth = 1, alpha=0.75)

plt.show()
#

i keep getting a "dead kernel" error

#

what can i do to fix this?

serene scaffold
#

the problem is unrelated to your code; the jupyter environment has stopped.

hollow sentinel
#

can i do that with restart and run all code?

#

bc i tried that

#

and it still wouldn't work

serene scaffold
#

did you get a more substantial error message than "dead kernel"? or is that the only text that displayed?

hollow sentinel
#

"The kernel appears to have died. It will restart automatically."

#

this error just constantly pops up

#

shit, is it related to the millions of rows of data i was dealing w before for that internship?

#

ugh

#

did i melt my computer?

serene scaffold
#

I googled that error message, and it appears that there's a few possible causes. try looking at them to see if any relate to something you're doing.

hollow sentinel
#

should i try hitting ctrl + C on the terminal, closing out of the conda environemnt, and then opening it uup agian?

serene scaffold
hollow sentinel
#

i see

serene scaffold
hollow sentinel
#

i am beginning to not like conda too

#

i don't like how my code is in sep cells and i have to run the entire thing over and over

#

ik there is a restart and run all code thing

serene scaffold
hollow sentinel
#

oh

serene scaffold
#

but yes, jupyter notebooks are overused among data scientists as well.

hollow sentinel
#

yeah i see why you dislike it now

serene scaffold
#

I'm so proud lemon_hyperpleased

hollow sentinel
#

i'm gonna do some more googling and figure out what's going on

#

i think i wanna switch to sublime text

serene scaffold
#

by the way, if you want a nice environment for quickly testing stuff, but don't want the false sense of reproducibility that you get from jupyter notebooks, try python -m IPython

#

it's basically the regular python console, but with lots of quality-of-life features

hollow sentinel
#

python -m IPython in the terminal?

serene scaffold
#

yes

hollow sentinel
#

i will check it out

serene scaffold
#

if you have jupyter, you should already have it

#

jupyter is basically IPython but with a gui. and cells. ||and sadness||

hollow sentinel
#

also don't get angry but i have been using excel lately

serene scaffold
#

use pandas.

hollow sentinel
#

ik ik ik

#

but for some reason it's like putting honey out for bees

#

for recruiters

serene scaffold
#

yeah, I had excel on my resume

hollow sentinel
#

ok, so this is strange printing hello world in a notebook prints hello world

serene scaffold
#

but I've never been asked to use it, so I'll probably delete it if I ever job hunt again

hollow sentinel
#

don't think you gotta job hunt for a while w that mitre job you got now

#

congrats on that btw

serene scaffold
#

thx

hollow sentinel
#

ok, so strangely this particular notebook will not produce the intended behavior even tho a separate notebook with me printing hello world will

#

lemme see if i can just copy paste it into another notebook for funsies

serene scaffold
#

f u n s i e s

hollow sentinel
#

ok, now i'm stumped

#

is there something wrong with the import statements?

#
import pandas as pd
from matplotlib import pyplot as plt

plt.style.use("seaborn")


x = [5, 7, 8, 5, 6, 7, 9, 2, 3, 4, 4, 4, 2, 6, 3, 6, 8, 6, 4, 1]
y = [7, 4, 3, 9, 1, 3, 2, 5, 2, 4, 8, 7, 1, 6, 4, 9, 7, 7, 5, 1]




colors = [7, 5, 9, 7, 5, 7, 2, 5, 3, 7, 1, 2, 8, 1, 9, 2, 5, 6, 7, 5]

plt.scatter(x, y, s=100, c="green", edgecolor = "black", linewidth = 1, alpha=0.75)

plt.show()
serene scaffold
#

looks fine to me

#

oh, you can also do python -m IPython --matplotlib

#

and it will show figures in a separate window when you .show them

#

try that

hollow sentinel
#

interesting, command not found

#

typo?

#

oh yes it is a typo lol

#

"no module named IPython"

serene scaffold
#

pip install IPython, I guess

hollow sentinel
#

sudo pip install ipython

#

yeah

#

oh quick question

#

what's the diff b/w sudo and sudo pip

#

i never asked before

serene scaffold
#

sudo and pip are unrelated. sudo means "super user do"

hollow sentinel
#

so does it do anything special?

serene scaffold
#

you put it before commands that are restricted to administrators. on your own computer, you presumably have that. whereas on a production system, it's usually limited to only a few people.

hollow sentinel
#

oh, so you don't necessarily need it for your own personal computer?

#

!pastebin

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

serene scaffold
#

you probably don't need it for pip.

hollow sentinel
warm raven
hollow sentinel
#

yeah, i have no idea what's going on here

warm raven
#

gfs is another dataframe that holds those product and company codes

serene scaffold
hollow sentinel
#

am i so dead inside that i spelled IPython wrong?

#

no, i didn't

warm raven
serene scaffold
#

it's case sensitive

warm raven
#

the iloc i is 100% a string

serene scaffold
#

you cid python -m Ipython --matplotlib

#

has to be IPython

serene scaffold
hollow sentinel
#

wait what was the original command? pip install IPython?

warm raven
#

I’m looking to rewrite this function in some type of way so that I could use a ‘.apply’ to create a new column that has whether the result of these masks are true or false

hollow sentinel
#

python -m IPython --matplotlib

#

hey, it worked

#

i'm not used to seeing code in my own terminal lol

#

this is cool

serene scaffold
#

can you show all the dataframes involved here with print(df.head().to_dict('list'))? that way I can copy and paste them directly.

hollow sentinel
#

oh yeah do not mess w apply

warm raven
#

I’m not following

serene scaffold
hollow sentinel
#

sometimes

#

yeah

warm raven
#

Why do you need to know the dataframes involved?

#

I tried that command and got an error because I don’t have a dataframe named “df”

serene scaffold
# warm raven I’m not following

I can't help you with dataframe operations unless I know what is actually in the dataframes that you're working with, because every dataframe is different. the columns, their names, what types of data they have. if I don't know that, there's nothing I can do.

serene scaffold
#

in "normal python", this isn't usually necessary, since "nums is a list of ints" pretty much tells you anything you'd need to know. not as simple with dataframes.

hollow sentinel
#

i'm confused on when you have to do reassignments with dataframes

#

and when you don't

serene scaffold
hollow sentinel
#

modifies in place would mean that i would have to reassign it, right?

#

or no

#

i actually don't know the answer to that

#

🥲

serene scaffold
#
stuff = [1, 2, 3]
stuff.append(5)

list.append modifies a list in-place and returns none

hollow sentinel
#

i see

#

oh, so i was right

#

i had a feeling i was right i saw a bunch of leetcode problems w solving things in place

warm raven
#

output is too long

serene scaffold
arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

hollow sentinel
#
...:  
   ...: plt.style.use('seaborn') 
   ...:  
   ...: x = [5, 7, 8, 5, 6, 7, 9, 2, 3, 4, 4, 4, 2, 6, 3, 6, 8, 6, 4, 1] 
   ...: y = [7, 4, 3, 9, 1, 3, 2, 5, 2, 4, 8, 7, 1, 6, 4, 9, 7, 7, 5, 1] 
   ...:  
   ...:  
   ...: # colors = [7, 5, 9, 7, 5, 7, 2, 5, 3, 7, 1, 2, 8, 1, 9, 2, 5, 6, 7, 5] 
   ...:  
   ...: # sizes = [209, 486, 381, 255, 191, 315, 185, 228, 174, 
   ...: #          538, 239, 394, 399, 153, 273, 293, 436, 501, 397, 539] 
   ...:  
   ...: # data = pd.read_csv('2019-05-31-data.csv') 
   ...: # view_count = data['view_count'] 
   ...: # likes = data['likes'] 
   ...: # ratio = data['ratio'] 
   ...:  
   ...: # plt.title('Trending YouTube Videos') 
   ...: # plt.xlabel('View Count') 
   ...: # plt.ylabel('Total Likes') 
   ...:  
   ...: plt.tight_layout() 
   ...:  
   ...: plt.show()                                                              

In [2]: Segmentation fault: 11 
#

the file isn't there

serene scaffold
#

what file

warm raven
hollow sentinel
#

i ean

#

mean

#

there is no file

#

it's just plotting from two lists alone

#

one as the x and one as the y

serene scaffold
hollow sentinel
#

actually i'm dumb

serene scaffold
#

just seeing code that has dataframes in them doesn't give me enough information to know what to do.

hollow sentinel
#

hang on

#

ugh i just wanna get sublime

#

gonna check out schafer

#

brb

warm raven
serene scaffold
#

one moment

#

@warm raven

In [16]: pipeline['PRODUCT_ID_MAP'].isin(gfs['PRODUCT_ID_MAP'])
Out[16]:
0    False
1    False
2    False
3    False
4    False
Name: PRODUCT_ID_MAP, dtype: bool

In [17]: pipeline['PRODUCT_ID_MAP'].isin(gfs['PRODUCT_ID_MAP']).any()
Out[17]: False
#

see how isin returns a boolean Series?

#

you don't want to do it for individual values, as that won't work.

warm raven
#

right so should I use IN?

serene scaffold
#

no, you should restructure the solution to use isin

warm raven
#

how would I do that thought to achieve getting the result value for every row, or essentially creating a new column in the dataframe for the result of “all_masks”

serene scaffold
#
all_masks = (
            (pcn_mask and pidmap_mask and sector_mask) and (product_code_mask or company_code_mask)
        ).all()

this won't work because you can't use and and or for pandas objects. you have to use the & and | operators.

warm raven
#

I disagree

#

I was stuck on this about a week or so ago

#

I was using bitwise operators

serene scaffold
#

you disagree. I said that you can't use and and or for pandas objects, and that is a fact.

warm raven
#

was getting errors until i switched over to and and r

#

or*

#

listen i’m not trying to be rude i’m telling you what I’ve tried

serene scaffold
#

well, you can't chain bitwise operators with pandas objects, so that might be why you were having an issue

#

you'd have to concatenate the Series into a DataFrame and use any or all.

warm raven
serene scaffold
#

no

warm raven
#

okay sorry I re-read a bit and I see our discrepancy

#

I accidentally sent an old snippet, although my error is the same

hollow sentinel
#
[Finished in 2.2s with exit code -11]
[shell_cmd: python -u "/Users/rahuldas/Desktop/Project Folder Sublime Text/matplotlibstuff.py"]
[dir: /Users/rahuldas/Desktop/Project Folder Sublime Text]
[path: /usr/bin:/bin:/usr/sbin:/sbin]
#
import pandas as pd 
from matplotlib import pyplot as plt 

plt.style.use("seaborn")


x = [5, 7, 8, 5, 6, 7, 9, 2, 3, 4, 4, 4, 2, 6, 3, 6, 8, 6, 4, 1]
y = [7, 4, 3, 9, 1, 3, 2, 5, 2, 4, 8, 7, 1, 6, 4, 9, 7, 7, 5, 1]


plt.tight_layout() 

plt.show()
#

um i googled it, idk what exit code -11 is

warm raven
hollow sentinel
#

should i go grab a help channel

#

imma go do that so you can help this guy

warm raven
#

my fault bro

warm raven
serene scaffold
#
pcn_mask = x['prod_code_name'].isin(gfs['prod_code_name']).any()
pidmap_mask = x['PRODUCT_ID_MAP'].isin(gfs['PRODUCT_ID_MAP']).any()
sector_mask = x['Sector'].isin(gfs['Sector']).any()

first = pd.concat(
    (pcn_mask, pidmap_mask, sector_mask),
    axis=1
).all(axis=1)

product_code_mask = gfs['Product_Code'].isin(["Usf33", "Usf34", "Us756", "Usf37", "Usf40", "Usf29"]).any()
company_code_mask = gfs['Company_Code'].isin(["Us05", "Us1b", "Usm6"]).any()

second = product_code_mask | company_code_mask
return first & second
#

I think this is the solution but I didn't test it.

lapis sequoia
#

Hey again

serene scaffold
#

added axis=1 to .all in one of them

lapis sequoia
#

if I’m working on this linear regression model. Do I need to remove null values from my set ?

lapis sequoia
#

okay

#

I’m going to try and get this shit running here within the next day or so

#

Well I have to essentially finish it today smfh

warm raven
#

“Cannot concatenation object type ‘<class ‘numpy.bool_’>’; only Series and Dataframe objs are valid”

serene scaffold
#

oh, I see the problem

warm raven
#

I fixed it

#

so your function works

#

but it still does not do exactly what I’ve been asking for

serene scaffold
warm raven
#

i’ve made a short dataframe to test it with, it’s still returning one result

#

The short dataframe has 5 rows

serene scaffold
#

it occurs to me that all the calls to .any() are probably wrong.

#

since if you call any or all on a series, that reduces it to a stand-alone bool

warm raven
#

yeah makes sense

#

that’s why I had the iloc in there initially thinking I’d compare the row values

#

or rather that one row of the input dataframe to compare against every row of gfs

lapis sequoia
#

whats the best way to go about

#

removing null values

#

after running my df.isna().sum()

#

S.No. 0
Name 0
Location 0
Year 0
Kilometers_Driven 0
Fuel_Type 0
Transmission 0
Owner_Type 0
Mileage 2
Engine 46
Power 175
Seats 53
New_Price 0
Price 1234
dtype: int64

hollow sentinel
#

"Sublime Text is not a Python Package installer, just a text editor. With it, you can edit a python script. When you are done editing, you just launch your script using python script.py"

#

shit bro

#

i might just use spyder

#

i'll mess around w spyder when i have the time

#

rn i got bigger fish to fry

desert oar
lapis sequoia
desert oar
#

sometimes you just want to drop those rows entirely

#

other times it makes sense to "impute" a value - basically replace the null with an educated guess

#

missing data imputation is a huge field too, and something that you don't want to spend a lot of time on right now probably

#

i assume this was at least mentioned in your course?

lapis sequoia
#

Imputing was but

#

Right now we've covered preprocessing but I havent had that much time to dive in

#

What im HOPING for is that we get this freeze here in pittsburgh so i can work tomorrow

desert oar
#

well what did they talk about with respect to imputing data? the most basic choices include filling the missing values with the mean, median, or mode

#

usually missing data imputation is a matter of understanding what the data means and where it comes from, and letting that guide you to a sensible approach

lapis sequoia
#

Yeah... theres a few examples ive found where

#

its missing 3 or more values

desert oar
#

you might want to think a little about your actual task

thin palm
#

What's up Data Science gang, I have a question about concat in Pandas

#

when I concat two data frames why does the final column in the data frame I connect have NaN?

desert oar
#

@lapis sequoia ultimately you need to come up with some kind of "willingness to pay" estimate based on different attributes of the car, and use that to segment customers into 2 different price tiers. that's what "differential pricing" is

thin palm
#

for example:

#

trying to one hot encode "City" in our original DataFrame known as "Feature" but when I concat the one hot encoded dataframe it produces the NaN?

#

any thoughts?

lapis sequoia
#

predicting the price of a used cars rn

desert oar
#

in general, if you get unexpected missing values after a concat operation, it's because either the column names or row index labels don't match up

#

but it's pretty hard to debug someone else's screenshots

thin palm
desert oar
#

example data + runnable code are ideal

main fox
#

@desert oar Could you help me see if my understanding that a decision tree model performed better than a logistic regression model is correct?
I already trained both models and measured for precision, plus made a confusion matrix

desert oar
desert oar
#

i can answer if it's quick

main fox
#

It's okay if you're logging off though

lapis sequoia
#

Fuck man

#

just trying to figure out

#

how to remove any rows

#

with missing values

main fox
lapis sequoia
#

how can i take that result

#

and call that as my dataframe going forward

#

would It be x = df.dropna()

#

then call that going forward?

desert oar
#

yes, or you can do df = df.dropna()

#

in addition to df.dropna(), consider the general pattern for filtering rows:

row_is_ok = # do some operation that returns a boolean Series, one value per row
df = df.loc[row_is_ok].copy()
lapis sequoia
#

yessss!!!!

desert oar
lapis sequoia
#

i did data = df.dropna()

#

So i removed all null values and can proceed. BLESS

desert oar
desert oar
#

personally i'd rather write explicitly "for the sake of simplicity, i avoided dealing with missing data imputation and i just removed all rows with missing values. i know this isn't really the right thing to do, but i did it in the interest of getting something done."

#

depends on how kind the grader is

main fox
# desert oar which is the relevant part here?

Close to the end
When I trained both models, I tuned them for precision and made confusion matrix
Since the trees precision was higher, but the confusion matrix looked worse, I wanted to know if I'm correct thinking it is better suited for the task

desert oar
main fox
#

The tree was more pessimistic in predicting true positives (which is what I wanted), but the logistic regression model looked like it performed better "overall" (albeit risking false positives).

I tuned them both using GridSearchCV.
I am not aware of bias-variance tradeoff.

desert oar
# main fox The tree was more pessimistic in predicting true positives (which is what I want...

https://mecha-mind.medium.com/explaining-bias-variance-tradeoff-to-a-ml-engineer-d747bdbb1f1d this article has a good explanation, although i find it both funny and depressing that they went all the way up to gradient boosting without so much giving logistic regression a nod

Medium

Generally data scientists and statisticians are well versed with the term “Bias Variance Tradeoff” as they can very well understand them…

lapis sequoia
#

So now another problem im having..

desert oar
lapis sequoia
#

mileage is listed as kmpl

desert oar
#

so if you want "miles per gallon" you need to do a bit of math

lapis sequoia
#

I mean

mild dirge
#

you need to make sure they're both the same unit

lapis sequoia
#

I suppose it doesnt matter bc the problem statement is referring to an indian market so

#

distance per liter is all good yeah

#

so step 1 done. nix all null values - imputing would be nice but in the name of time i think its best to proceed from here

#

Im trying to figure out how to exclude outliers now...

desert oar
#

mostly outlier removal isn't needed in real data except for a few really egregious data points

#

just because something is "far from the average" doesn't make it an "outlier" in the sense of "this is a measurement error or something else weird that i need to exclude from my model"

#

extreme values do happen in real life, and you don't want to remove them just because they seem unusual

lapis sequoia
#

okay heard

#

Its just part of the rubric "Outlier treatment"

desert oar
#

in that case then yes, do look for it

#

e.g. if a car price is 5 or 0, that's clearly not right and probably should be re-coded as "null"

lapis sequoia
#

yeah...

desert oar
#

likewise if a car has 10000 km/L

#

or -99999

#

etc

lapis sequoia
#

is there a command that will

desert oar
#

no

lapis sequoia
#

display the range of values?

desert oar
#

yes

#

i thought you were going to ask if there was a command to remove outliers 😆

lapis sequoia
#

i guess the variance?

desert oar
#

guess? more like actually compute

lapis sequoia
#

nahhhh I just want to see what the lowest / highest values are

#

and then find the average

desert oar
#

well there's .min() and .max()

#

the range is the difference thereof

#

variance is .var() and standard deviation is .std()

lapis sequoia
#

so i tried doing

desert oar
lapis sequoia
#

i get series is not defined

desert oar
#

hm? those are methods on the Series class

#

if you get a single column from a dataframe, that object is of type Series

main fox
desert oar
lapis sequoia
#

or if i run "Price"

desert oar
#

and if they don't discuss bias-variance tradeoff in your machine learning course, then you were robbed. it's one of the most important concepts to understand

desert oar
#

presumably your data is called df based on your examples

#

so you'd do df['Price'] to get the Price column

#

df['Price'] gives you a Series instance, representing the Price column in df

#

df is an instance of DataFrame

lapis sequoia
#

So that last column doesn't act that way

desert oar
#

act what way?

lapis sequoia
#

Like i cant call it where it's defined?

desert oar
#

what do you mean by that?

lapis sequoia
#

df['Price'].min()

#

I see how this works

desert oar
#

what happens when you try that?

#

is it different from what you expect?

#

Price isn't a stand-alone variable

#

it's a column in the dataframe

lapis sequoia
#

I see I see.

desert oar
#

you can assign it to a separate variable if you want

lapis sequoia
#

how might i do that just out of curiousity

#

x = df['Price']

main fox
desert oar
lapis sequoia
#

Word!

desert oar
lapis sequoia
#

it's small concepts like these that help me clear the much larger picture

desert oar
#

i do need to run now though

lapis sequoia
#

bless man thank you for the help.

#

mannnnn this shit is kicking my ass

iron basalt
# lapis sequoia x = df['Price']

You can think of [] as a function, that takes some arguments and returns something. It's just a special function that uses [] syntax (in Python you can change what operators like [] do on certain objects, which is what Pandas is doing).

nova tapir
#

can someone explain why this question's answer is this? and how can i find the x1 and x2 features?

dusk tide
#

The order of +ve and -ve examples can be up or down but you will get the features like this from the plot.

dusk tide
dusk tide
# nova tapir can someone explain why this question's answer is this? and how can i find the x...

And as far as I know the optimization function tries to find the smallest values of theta possibally it can so that then
To classify the x(an example) as (a). +ve --->norm theta and product with p(i)(no of -ve examples ) <-1 then it classify y as 0(-ve) il
(b) Whereas on the other hand norm(theta) and product of with p(i) i I is no of +ve examples ) should be greater than or equal to 1 then it will classify the example (that are plotted) as +ve

Here is have assumed theta0=0 so the decision boundary passes through origin.

as the cost function selects low values of theta so p(i) (distance between the decision boundary should be as large as possible ) then only it is following the constraints that are there in (a),(b) will be satisfied. no other decision boundary other than the one is allowed.

dusk tide
dusk tide
visual spear
#

Does anyone here know how (and can either tell me how, or show me an example of how) to create a model for an image generation AI?

high grove
#

how to increase space between stacked bar in matplotlib

mortal adder
#

How can i learn gpt 3 as a complete novice?

potent sky
#

Learn to make a gpt 3 clone?
Learn to use gpt-3?

flint grotto
#

hello.

#

i have a question.

#

cnn in the convolution block, dense block. why make a block?

#

i mean just can do write code, why make a block?

velvet rampart
#

Please where can I get data science and machine learning projects and exercises with source code

kind rock
#

would y'all explain keras as a better interface to control the tensorflow framework?

long zephyr
spare junco
eager imp
#

Any good material on directed attention?

unreal swan
#

Idk

last echo
#

is this model over-fitting or under-fitting?

brazen spire
#

how to make a function of activation functions?

#

i get an error when trying to do

#
    # return nn.Sin()
    # return nn.Tanh()
    # return nn.Sigmoid()
    # return nn.Tanhshrink()
    return nn.HardTanh(-1,1)
    # return nn.Hardswish()
    # return nn.functionnal.silu()````
fickle frigate
stone vector
#

hello, I need help to validate if there are duplicates value in csv column and items which failed the validation should be logged (e.g. stderr) and ignored for the next processes

fickle frigate
# brazen spire

the function f returns an object and it does not take any args the correct way is to do the following

import torch.nn as nn 
import torch
def f():
    return nn.Tanh() 

x = torch.randn(5)
result = f()(x)
fickle frigate
stone vector
#

how can i log to stderr each duplicate item

hollow sentinel
#

spyder is fantastic lol

potent sky
naive river
brazen spire
#

I don't understand how we get 12 parameters here

#

isn't it 9 in the middle?

prime hearth
#

12 params is all the nodes

brazen spire
#

i don't understand

#

is it 9 (weights) + 3 (biais) = 12?

brazen spire
#

ah i understand now.

last echo
#

how many dense layer and number of neurons? any tips where to learn how to describe this cnn model?

#Augmented Layer
model.add(augmented)

#Input shape Layer
model.add(Input(shape=(WIDTH,HEIGHT,3)))

#Conv2D and MaxPool2D Layers
model.add(Conv2D(16, kernel_size=(3,3), activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))

model.add(Conv2D(32, kernel_size=(3,3), activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))

model.add(Conv2D(64, kernel_size=(3,3), activation='relu'))
model.add(Conv2D(64, kernel_size=(3,3), activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))

model.add(Conv2D(128, kernel_size=(3,3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3,3), activation='relu'))
model.add(Conv2D(128, kernel_size=(3,3), activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))

#Flatten Layer
model.add(Flatten())

#Fully connected layer, OUTPUT Layer
model.add(Dense(units=128, activation='relu'))
model.add(Dense(units=class_size, activation='softmax'))```
tender hearth
#

A conv2d has kernel size * filters number of learnable parameters

#

filters is essentially how many kernels you have

#

a dense layer has units number of parameters

#

depending on what your teacher meant by neuron specifically it could be just the learnable params in the dense layers or including the kernels as well

last echo
tender hearth
#

Sounds good

serene scaffold
#

if you need help with data science or AI, please ask a question directed to the whole channel.

lapis sequoia
#

clueless tbh

exotic thicket
#

I'm parikshith. Stream ECE branch section B. I was wondering if you were available to help me with Lec 3: Image formation: Radiometry which is a little bit vast as per my knowledge in math.
And I would like to share the resources I'd gone through to figure out a solution

Below link video of Lec 3: Image formation: Radiometry (NPTEL course video)
https://www.youtube.com/watch?v=ch1xdUFABA8

Another same concept video explanation in YouTube link I'd gone through
https://www.youtube.com/watch?v=kPIqO929pIc

Questions:

  1. The light has a radiant flux of 100 watts, what is the irradiance on an object which is placed at 2 meters from the light (assuming object is perpendicular to the night light)? Wm−2Wm−2
    2.99
    1.25
    1.99
    0.55

  2. A light source has a radiant flux of 100 watts, what is the flux on a rectangular object of size 20 cm by 30 cm placed 2 meters away (perpendicular to the light)?
    0.1194 mW
    0.1163 mW
    0.1189 mW
    0.1123 MW

  3. Given the 10-watt source coming in from 2π32π3 solid angle (in sr) of a radius 3 meter, the corresponding source of energy carried by the ray is

52π252π2

12π212π2

π2π2
10

  1. Light source has a radiant intensity of 60 W sr−1. Determine the irradiance on a sign board 2 meters away.
    10
    15
    20
    30

  2. Suppose a source with an area of 4 m−2m−2 is viewed at an angle of 30 degree and has a radiance of 0.3 Wm−2sr−1Wm−2sr−1. Calculate the radiant intensity of the source?

1.65 Wsr−1Wsr−1

1.04 Wsr−1Wsr−1

2.78 Wsr−1Wsr−1

2.11 Wsr−1Wsr−1

  1. Suppose the source in question 9 is viewed from a perfectly reflecting Lambertian surface. Then find the value of radiosity.

0.3145Wm−2Wm−2

0.1645 Wm−2Wm−2

0.2598Wm−2Wm−2

0.4768Wm−2

Thank you for your time

serene scaffold
# lapis sequoia clueless tbh

did you put import tensorflow as tf at the top of your file? Also, please copy and paste actual text instead of screenshots as this is a lot more useful for answerers.

serene scaffold
exotic thicket
lapis sequoia
serene scaffold
#

@lapis sequoia you did import tensorflow as ts, with a ts instead of tf.

lapis sequoia
#

I'm stuck in an issue can i dm if possible?

#

oh fair

#

If anyone can help

serene scaffold
lapis sequoia
#

but how can i get rid of this error

#
Skipping registering GPU devices...
2022-02-04 19:15:31.196989: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
<tf.Variable 'Variable:0' shape=() dtype=string, numpy=b'this is a string'>```
serene scaffold
exotic thicket
# serene scaffold I would ask in a different discord server. Sorry

I'm so glad u said that I'm literally looking for a particular domain based server on CV and IP bacas I have had taken a course on computer vision and image processing fundamentals and application I'm really excited to learn that course but my exams are going so I need to manage for few weeks with assignments if I don't get good marks in assignments my score gets low however I completed my two assignments and this week questions I had left with those above sent

Thank you for your time

#

So plz let me know sir

strange scarab
#

hey! i have a pandas dataframe of yearly population projections, and the format is a bit iffy to process. i'd like to merge the rows like this:

#

i have never worked with pandas before so i don't really know where and how to look for guidance lol

serene scaffold
#

alternatively, you can look at the docs and try to figure it out.

#

!docs pandas.DataFrame.pivot_table

arctic wedgeBOT
#

DataFrame.pivot_table(values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All', observed=False, sort=True)```
Create a spreadsheet-style pivot table as a DataFrame.

The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.
strange scarab
#

let me take a look, the above was a quick excel mock of the larger messier thing i have so i'd rather spare your time and effort and see if this does it

serene scaffold
#

I don't mind as long as you provide the data in a format that I can use immediately, like a CSV, or something.

soft silo
#

Hi guys I;m currently facing a task from MLSS2020 regarding RL environment and agents and im kinda stuck on one issue, I have to adjust the environment or the agent so that his actions reflect the given probability like in the AIMA example. I have the environment defined but the actions of the agent are where im stuck. anyone willing to take a look?

strange scarab
# serene scaffold I don't mind as long as you provide the data in a format that I can use immediat...

yeah i'm a bit lost still, here's the first three years or so, {"column": ["row", ...]} ```py
{'City': ['Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total', 'Country total'], 'Year': ['2021', '2021', '2021', '2021', '2021', '2021', '2021', '2022', '2022', '2022', '2022', '2022', '2022', '2022', '2023', '2023', '2023', '2023', '2023', '2023', '2023', '2024', '2024', '2024', '2024', '2024', '2024'], 'Age': ['Total', '0 - 14', '15 - 24', '25 - 44', '45 - 64', '65 - 74', '75 -', 'Total', '0 - 14', '15 - 24', '25 - 44', '45 - 64', '65 - 74', '75 -', 'Total', '0 - 14', '15 - 24', '25 - 44', '45 - 64', '65 - 74', '75 -', 'Total', '0 - 14', '15 - 24', '25 - 44', '45 - 64', '65 - 74'], 'value': [5547045, 852577, 608053, 1423098, 1383040, 703342, 576935, 5555002, 843285, 609690, 1422508, 1377411, 695173, 606935, 5562569, 833001, 614092, 1420517, 1374381, 684221, 636357, 5569645, 821592, 618233, 1419647, 1369556, 677181]}

lapis sequoia
serene scaffold
# strange scarab yeah i'm a bit lost still, here's the first three years or so, `{"column": ["row...

you can do this

In [25]: df.pivot_table(index=['City', 'Year'], columns='Age')
Out[25]:
                       value
Age                   0 - 14   15 - 24    25 - 44    45 - 64   65 - 74      75 -      Total
City          Year
Country total 2021  852577.0  608053.0  1423098.0  1383040.0  703342.0  576935.0  5547045.0
              2022  843285.0  609690.0  1422508.0  1377411.0  695173.0  606935.0  5555002.0
              2023  833001.0  614092.0  1420517.0  1374381.0  684221.0  636357.0  5562569.0
              2024  821592.0  618233.0  1419647.0  1369556.0  677181.0       NaN  5569645.0
strange scarab
#

jesus

#

alright let me see

#

wait holup! this might be exactly how i imagined it should be in my head, thanks!! now i just have to figure out how to work the MultiIndexes(?)

serene scaffold
#
In [27]: df.pivot_table(index=['City', 'Year'], columns='Age', values='value').reset_index()
Out[27]:
Age           City  Year    0 - 14   15 - 24    25 - 44    45 - 64   65 - 74      75 -      Total
0    Country total  2021  852577.0  608053.0  1423098.0  1383040.0  703342.0  576935.0  5547045.0
1    Country total  2022  843285.0  609690.0  1422508.0  1377411.0  695173.0  606935.0  5555002.0
2    Country total  2023  833001.0  614092.0  1420517.0  1374381.0  684221.0  636357.0  5562569.0
3    Country total  2024  821592.0  618233.0  1419647.0  1369556.0  677181.0       NaN  5569645.0
#

also why is every city named "country total"?

strange scarab
#

because the first one is the whole country

#

up until 2040

#

so the first city is like, 200 rows down?

serene scaffold
#

ah

strange scarab
#

i'll surely work it out from here on

#

thanks a bunch!

serene scaffold
#

💚

strange scarab
#

got a task at work and the library i use returns organized data as a DataFrame

#

turned out to be quite the can of worms lol

mint quail
#

We are designing an underwater robot. While the underwater robot is in autonomous driving, it will search for a certain object. But while searching for the object, it should not hit the walls around it. Can I detect walls with OpenCV? Or how do I make sure it doesn't crash into walls?

molten ridge
#

Hi, I am trying to make a database model and i am running into memory error

#

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 868. MiB for an array with shape (10669, 10669) and data type float64

#

this think works when i try it in in Ipython (jupyter lab launched from anaconda) without any errors

#

but when I try it in plain python, it gives error

#
data = Product.objects.all()  # gets data
df = pd.DataFrame(data.values())
tfidf = TfidfVectorizer(stop_words='english')
df['product_name'] = df['product_name'].fillna('')
overview_matrix = tfidf.fit_transform(df['product_name'])
similarity_matrix = linear_kernel(overview_matrix, overview_matrix)
#

i get the error on similarity_matrix line

#

i am on a 64bit machine, and it gives error for 868 MB

serene scaffold
#

is there a way to make the result of tfidf.fit_transform a sparse array?

molten ridge
#

nope

#

i think it is some Kind of numpy problems in which some kind of limit is set for memory allocation

serene scaffold
#

it actually does return a sparse array; can you show the whole error message starting from Traceback?

molten ridge
#

just a min

#

i am running it in thread btw

#
Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Users\LAKSHYA\AppData\Local\Programs\Python\Python37-32\lib\threading.py", line 926, in _bootstrap_inner
    self.run()
  File "C:\Users\LAKSHYA\AppData\Local\Programs\Python\Python37-32\lib\threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\LAKSHYA\PycharmProjects\mega-env\PC website\website\backend\main\views.py", line 39, in product_recommendations_variables
    similarity_matrix = linear_kernel(overview_matrix, overview_matrix)
  File "C:\Users\LAKSHYA\PycharmProjects\mega-env\venv\lib\site-packages\sklearn\metrics\pairwise.py", line 1073, in linear_kernel
    return safe_sparse_dot(X, Y.T, dense_output=dense_output)
  File "C:\Users\LAKSHYA\PycharmProjects\mega-env\venv\lib\site-packages\sklearn\utils\extmath.py", line 161, in safe_sparse_dot
    return ret.toarray()
  File "C:\Users\LAKSHYA\PycharmProjects\mega-env\venv\lib\site-packages\scipy\sparse\compressed.py", line 1039, in toarray
    out = self._process_toarray_args(order, out)
  File "C:\Users\LAKSHYA\PycharmProjects\mega-env\venv\lib\site-packages\scipy\sparse\base.py", line 1202, in _process_toarray_args
    return np.zeros(self.shape, dtype=self.dtype, order=order)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 868. MiB for an array with shape (10669, 10669) and data type float64```
serene scaffold
#

so this is the part that causes the error: similarity_matrix = linear_kernel(overview_matrix, overview_matrix)

#

(while that may be obvious to you, I had no way of knowing that before you provided the whole error message)

molten ridge
#

sorry 😅 , my bad

serene scaffold
#

what is linear_kernel?

#

oh, I guess it's this

molten ridge
#

from sklearn.metrics.pairwise import linear_kernel

#

yeah

serene scaffold
#

try linear_kernel(overview_matrix, overview_matrix, dense_output=False)

molten ridge
#

just a min

serene scaffold
#

I'm in a meeting now, so I may become unresponsive

molten ridge
#

yeah thanks alot

#

it didnt give error now

desert oar
desert oar
molten ridge
#

many people ask 32bit / 64bit in memory type errors cause of the 4gb memory limit, so i just gave it :)

desert oar
#

that might have to do with float32 vs float64, which has less to do with your operating system and more to do with how your data has been stored

molten ridge
#

thanks alot :)

strange scarab
molten ridge
#

hey @serene scaffold, sorry to disturb you while you are in a meeting,
but after turning dense_output to false, the model stopped working altogether

serene scaffold
#

try restating what information might help someone debug this with you.

#

"the model stopped working altogether" is uninformative; what happened instead? how do you know it isn't working?

molten ridge
#

it didn't gave any output

molten ridge
serene scaffold
# molten ridge it didn't gave any output

Unless this means that linear_kernel(overview_matrix, overview_matrix, dense_output=False) returned None, you have not yet divulged enough information for anyone to assist.

brave latch
#
result_dict = {}
for index, row in df.iterrows():
   if row["Column value"] in result_dict:
        result_dict[row["Columne value"]].append(row)
    else:
        result_dict[row["Column value"]] = [row]

anyone happen to know how I could do this properly, i.e declaratively, with pandas/python? this works but you aren't supposed to iterate imperatively like that with pandas.

basically im trying to get a key value dict, where the keys are the unique values of a column in the dataframe (table), and the dict's values are the rows of the data frame with that column value. just trying to figure out the approach I should take to do it declaratively but it's not coming to me for some reason

#

sorry if there is a better channel for this

serene scaffold
brave latch
#

yeah give me a sec, it takes me a few minutes to run the function. also this is for worked so trying to keeping the actual data anonymized if that is ok

#

thanks

serene scaffold
#

you'd have to make a copy of the dataframe with fake data that captures the schema of the real dataframe.

brave latch
#

ok might just share the real output and delete after

serene scaffold
#

you can DM it to me, if you must.

velvet abyss
#

I'm interested on getting into Data Science, is there anything I should know before start messing with it?

velvet abyss
#

Just that?

serene scaffold
velvet abyss
#

Oh well

#

Math isn't my forte, but this isn't a deal breaker

serene scaffold
#

@brave latch what if the same Deal appears more than once? you can't have the same key twice in a dict.

#

you want a nested list?

brave latch
#

yes I want unique deals -> rows containing that deal

#
result_dict = {}
for index, row in df.iterrows():
   if row["Deal"] in result_dict:
        result_dict[row["Deal"]].append(row)
    else:
        result_dict[row["Deal"]] = [row]

my imperative code checks for that

#

I want to make this declarative

#

because pandas

tacit basin
# velvet abyss I'm interested on getting into Data Science, is there anything I should know bef...

Not really. If you're into tabular data, then there's very good course starting soon. It's on machine learning with python and Sci-kit learn taught by svikit learn core devs. It's free btw. https://www.fun-mooc.fr/en/courses/machine-learning-python-scikit-learn/

serene scaffold
#

@brave latch

In [40]: df.groupby('Deal').apply(lambda d: [row for _, row in d.iterrows()])

try that.

#

well, I guess that's still a dataframe

brave latch
#

thats the exact schema I need though

serene scaffold
#
In [42]: df.groupby('Deal').apply(lambda d: [list(row) for _, row in d.iterrows()]).to_dict()
#

there you go.

brave latch
#

amazing

#

you are awesome

#

thanks so much

serene scaffold
#

the trick is that df.groupby is like a magical amalgamation of individual dataframes

brave latch
#

I tried using groupby

#

but today was my first time

serene scaffold
#

and then apply does a function to each of those

brave latch
#

and i just wasn't grokking the api

tacit basin
# velvet abyss Oh well

I would argue that you don't need lots of math for applied DS/ML. Like Sci-kit learn library abstract s a lot of that so you can focus on applying the tools.

brave latch
#

I knew I needed some function

#

but couldnt figure out what that function should be

brave latch
lapis sequoia
#

Hello! You won't probably remember but in early december I posted a message asking for help in order to decide a machine learning/optimization algorithm that would solve basketball matches referee assignment. You provided pretty solid answers without knowing the actual datasets to work with. Now that we know the datasets, it turns out that there are so many restrictions to implement ML or optimization algorithms. My coworkers decided to use a rules-based AI algorithm. I've been surfing the net trying to figure out some implementations of this approach but I'm constantly reading posts explaining the differences between rules-based and ML algorithms and so on. I wonder if you know an example of a rules-based AI algorithm so that I don't look like an absolute beginner whenever I have to code things that interact with it.

#

Or even coding it myself😅

serene scaffold
#

If you hear someone say "AI is glorified if statements", that is the subset of AI that they're referring to.

lapis sequoia
little crown
#

Why is matrix multiplication significantly faster with numpy than with tensorflow? Should it not run faster on my gpu with tensorflow?

iron basalt
#

(like 1024x1024)

little crown
#

ok i will try it

iron basalt
#

Also the times will be different when you actually do something with the results. Since right now you are just calculating it and throwing it away immediately.

#

(Which has different destruction times, and GPUs are faster if you keep the data there and use it for something else as well)

#

(Avoid going back and forth between the CPU and GPU)

#

Your CPU can calculate many small matrix multiplications before a single small matrix reaches the GPU.

#

(If the matrix data is already in the CPU's local memory)

#

However, with enough of them, it becomes worth it again, but you have to send them all in one batch to the GPU.

little crown
#

Ok, thank you now I understand why it took sol long.

iron basalt
#

It's still "glorified if-statements", but the input can be vague and it's its own programming style (you can make a DSL for it, but don't need to).

lapis sequoia
iron basalt
#

Fuzzy logic can make use of ML later, since the fuzzification / input process can be whatever.

#

I have even seen it slapped on top of spiking neural networks.

safe elk
#

Many early AI research projects involved constructing a representation of a domain using first-order logic predicates, or something similiar. For example you would have a description of a restaurant domain as follows:
at(restaurant,Alice)
at(restaurant,Bob)
at(restaurant,Carol)
works_at(restaurant,Carol)
has_job(restaurant,waitress,Carol)
orders(Bob,pizza)
orders(Alice,sushi)
along with rules for reasoning about the domain, such as:
forall X,Y,Z. orders(X,Y) and has_job(restaurant,waitress,Z) -> serves(Z,X,Y)
which attempts to encode the rule that if person X orders food Y and Z is a waitress at the restaurant then Z will serve food Y to person X.
From the above representation we can deduce:
serves(Carol,Bob,pizza) serves(Carol,Alice,sushi)

little crown
iron basalt
#

Symbolic AI has always had a huge flaw which GPT shares, they both don't anchor symbols / words / etc to physical objects. They have no "world model". In this sense they are both naive algorithms. The real hard work is getting that world model, especially since it's a very complex world we live in, way more complex and messy than any simulation.

#

(GPT is way more efficient than the pure symbolic methods though, so it worked out better in being able to go through way more data and use induction)

#

(Training on text alone will never be enough for an AI to understand language, since language's meaning comes from our physical world)

#

(In addition, not only does it need to train on the real world (or at least a simulation of it), it also needs to be human aligned, in the sense that it needs to assign meaning in the same way we do, we care about certain things that matter to us and therefor label them, it might not care about the same things (where does the chair object begin and end? well for us it begins and ends where it's useful for us, but a computer does not need to sit, so where does it begin and end for it?))

lapis sequoia
coarse gale
#

I'm grateful, learned the term GOFAI as well as the interesting/educational reasoning about what it was and how it was thought about.

#

Also kind of doubtful about 'you can't learn language through just language' but that's just my intuition and I don't want to derail. 😅

safe elk
#

They use LISP and variants

#

One time a Dutch prof lectured at our uni on some of those topics, fuzzy logic, expert systems and neural net...he kept pronouncing Variables Var eyeable

iron basalt
#

I can simulate one falling over in my head.

coarse gale
#

On AI Dungeon (iirc running on OpenAI DaVinci) it literally described the process for building a chair to completion.
So it seemed to be able to verbalize what parts are and a finished project, inferring process from token inference.
To me that's understanding and I'm pretty sure that's both naive and subjective at the same time.

iron basalt
coarse gale
#

I think a made-up universe of text (the kind of knowledge that the GOFAI critics pointed out, abstract from anything the computer might be able to understand beyond boiling it down to algebra) can still represent knowledge.
And if you're asking me to argue why I think a representation of knowledge is indistinguishable from knowledge I'm probably being the naive one. I don't really know enough to defend that position, just my instinct. 😅

lapis sequoia
#

So just to make sure that I got it right, when we talk about rule-based AI it's nothing less than plain if statements aka glorified if statements

coarse gale
#

I guess I'm still stuck on primitive thoughts like even though all maps are inaccurate, Google Maps sure seems useful.

iron basalt
#

Yes it can represent knowledge, but it's not language like how humans have language, which is the goal. Humans have language linked to objects like being able to build a chair physically, not just give the instructions as words on how to do it. The issue is that it's only basing its knowledge on the language structure / predictability, which can get you decently far, but it's not enough for some cases, plus if I tell it the word "chair", it can't feel the chair with for example touch, there is no shared sensory organ / word relationship.

#

When it gives you the instructions to build a chair, it's not simulating the chair, it's just regurgitating the instructions that someone typed in at some point (with induction / mixed together responses).

coarse gale
#

like how humans have language, which is the goal
Honestly I call myself an NLP enthusiast at best but I never thought this was the goal.
If the computer operates with underlying abstraction layers that are totally different than our meat brains but we can still interrelate complicated language processes to each other, doesn't seem far-fetched to say that the computer's completely insular universe of text was still able to provide usefulness to our real world based one.

iron basalt
#

Well if you are into AI/AGI it's the goal, but GPT can be your end goal too, whatever floats your boat.

little crown
# lapis sequoia So just to make sure that I got it right, when we talk about rule-based AI it's ...

Decision tree would be an example for an rule based system https://en.wikipedia.org/wiki/Decision_tree

A decision tree is a decision support tool that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.
Decision trees are commonly used in operations research, specifically in decis...

coarse gale
#

You're correct that it's not AGI but it's still AI 🤔
Like pretend I put a baby in front of GPT and ask it to tell the baby how to build a chair, you're saying if the baby learned how to build the chair that's not AI?

iron basalt
#

I would certainly count it as AI.

coarse gale
#

Seems to me like possibly we don't disagree then, except on what the goal of NLP is.

lapis sequoia
iron basalt
#

NLP in general could be anything involving natural language, including taking natural language as input and outputing random numbers.

little crown
#

Yes exactly 🙂

earnest wadi
#

can anyone help me get back propagation working in my program, I understand the calculus, but not how to apply it

iron basalt
#

(It's about what the input is)

lapis sequoia
#

Thank you so much I feel like I'm fully aware of what rules-based AI is and I think that I can tackle the referee assignment problem myself👌

deft harbor
earnest wadi
lapis sequoia
#

Alright

#

The work week is over

deft harbor
#

Are you storing all your weights in numpy at least?

earnest wadi
#

im just not using tf or keras

lapis sequoia
#

And I’m ready to stay up like a crack head yo finish this project

earnest wadi
deft harbor
#

What do you have so far

earnest wadi
#

wdym

#

towards back prop?

deft harbor
#

in terms of code

earnest wadi
#
import numpy as np

class Dense():
    def __init__(self, units, activation):
        self.units = units
        self.activation = activation
        self.type = "Dense"

    def initialise(self, num_inputs):
        self.weights = (np.random.rand(num_inputs, self.units) * 2) - 1
        self.bias = np.random.rand()

    def forward_propagate(self, inputs):
        self.z = np.dot(inputs, self.weights) + self.bias
        self.a = self.activation(self.z)
        return self.a

    def back_propagate(self):
        pass
#

This is my dense layer

#

I have written then deleted attempts at back prop many times

gleaming remnant
#

Heyy. How can I use matplotlib in Vscode on mac ? I need it to do graphs for a physic project

pure blaze
#

I'm trying to predict product sales (of different products in different times) in relation to stock.
I know I can use fbprophet - but I'm not sure how I'd set up a relation between the regressor (the stock) and the sales (the predicted timeseries) so that every time a sale is predicted, the stock is reduced and a new prediction is ran with the new input.

Does anyone know easier ways of doing this using other models? Is there an easier way to do it using fbprophet?

little ginkgo
#

Does anyone know a good way of plotting 2 y axis in pandas?

#

Or 2 x axis

earnest wadi
#

can anyone help me get back propagation working in my program, I understand the calculus, but not how to apply it

serene scaffold
little ginkgo
serene scaffold
lapis sequoia
#

how do you impute missing values ><

serene scaffold
#

@lapis sequoia what kind of imputation

#

Mean imputation? Mode imputation?

lapis sequoia
lapis sequoia
#

i want to do mode for horsepower

thin palm
#

What's up Python gang, when I concat two panda dataframes of same Rows why does it end up giving me NaN on the last column and add a row?

lapis sequoia
#

not sure

#

can you show a SS of your code?

thin palm
#

I know why.. it's because 128 rows and 127 rows are not same length

#

but so weird, how is a row missing from the same data?

lapis sequoia
#

u can remove all nulls from ur data set

thin palm
#

correct but i've cleaned it all up

#

can I show you what I'm talking about?

lapis sequoia
#

sure

#

im honestly not too great but yeah Id love to see

thin palm
#

I have this dataset known as df and is 127 rows by 13 columns. The first row known as Lender is going to be one hot encoded (OHE) and I'd like to make columns for each of the unique lender names.

#
ohc = OneHotEncoder()
ohe = ohc.fit_transform(df.Lender.values.reshape(-1,1)).toarray()

dfOneHot = pd.DataFrame(ohe, columns=['Lender_' +str(ohc.categories_[0][i]) for i in range(len(ohc.categories_[0]))])
dfh = pd.concat([df, dfOneHot], axis = 1)```
#

this is the result

#

I'd like to drop my Lenders column and put these in instead. Simple concat works but when I do concat on the last line where I set the value equal to dfh it adds another row?

serene scaffold
#

@lapis sequoia you can use fillna with the mean. It's slightly more complicated for mode. I'm on mobile so I can't show you

serene scaffold
serene scaffold
thin palm
#

okay

lapis sequoia
#
!code
#

OOP

serene scaffold
#

Right

lapis sequoia
#
df['Price'].min().fillna.mean()
serene scaffold
#

Put it bacj

#

I was about to copy it

lapis sequoia
#

had an issue there

serene scaffold
#
df['Price'].fillna(df['Price'].mean())
lapis sequoia
#

Yeah that spat out an error

serene scaffold
#

I'm on mobile but I typed it anyway bc I appreciate you

#

Show whole error from traceback

#

Saying that something "caused an error" is opaque.

lapis sequoia
#

what you wrote worked

serene scaffold
#

Yay

lapis sequoia
#

so now i have this small dataframe but

#
df = df['Price'].fillna(df['Price'].mean())```
#

OH

#

this should now have imputed average into the missing values. time to check

serene scaffold
#

Also you're replacing the whole df variable with one column

prime hearth
#

would be great is if can see the distribution of the data to make sure that the mean filling in price for data isnt bias which can be overfitting model

#

but again this depends on how many nan for price column; if just a few then it no biggy

lapis sequoia
#

oh I see.

#

hmmm im still digging myself into a larger hole here.

#

It's way easier to just eliminate all null values

thin palm
#

Why is it that I'm One Hot Encoding a DataFrame column with 127 rows but spits out a 126 rows dataframe??

prime hearth
#

could be nan in one of the row maybe?

#

or is it all valid data

serene scaffold
thin palm
lapis sequoia
#

so youre trying to basically

#

take one column and turn that into x amount of columns instead?

thin palm
#

Noooo, there's 34 unique values inside this specific column. So we'll make 34 new columns but why is it losing a row?

#

here's my OHE code:

ohc = OneHotEncoder()
ohe = ohc.fit_transform(df.Lender.values.reshape(-1,1)).toarray()

dfOneHot = pd.DataFrame(ohe, columns=['Lender_' +str(ohc.categories_[0][i]) for i in range(len(ohc.categories_[0]))])
dfh = pd.concat([df, dfOneHot], axis = 1)```
serene scaffold
thin palm
serene scaffold
#

@thin palm well, you only one hot encode nominal features. I don't know what your features are.

lapis sequoia
#

can u not manually introduce another row or something

prime hearth
#

instead of one hot encode you can apply feature engineering

#

such as pca or creating new feature column with grouping simlairities

serene scaffold
#

Do you understand when you would or wouldn't use one hot encoding?

plush jungle
#

can someone help me understand this code?

#
class MyRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MyRNN, self).__init__()
        self.hidden_size = hidden_size
        self.in2hidden = nn.Linear(input_size + hidden_size, hidden_size)
        self.in2output = nn.Linear(input_size + hidden_size, output_size)

    def forward(self, x, hidden_state):
        combined = torch.cat((x, hidden_state), 1)
        hidden = torch.sigmoid(self.in2hidden(combined))
        output = self.in2output(combined)
        return output, hidden
    
    def init_hidden(self):
        return nn.init.kaiming_uniform_(torch.empty(1, self.hidden_size))```
#

so RNNs take an input and a hidden state

#

and then they give an output and a hidden state

#

the neural net that produces the output is this:

self.in2output = nn.Linear(input_size + hidden_size, output_size)```
#

the neural net that produces the new hidden state is this:

#
self.in2hidden = nn.Linear(input_size + hidden_size, hidden_size)```
#

but this makes no sense to me

#

in this youtube tutorial explaining RNNs, the food represents the input

#

and the weather represents the hidden state

thin palm
serene scaffold
#

@plush jungle are you sure? I would expect both to the the input

serene scaffold
plush jungle
#

then where is the hidden state in this diagram

serene scaffold
#

Let me check so I don't mislead you. But I'm pretty sure all the nodes in the middle are the hidden state and the food and the weather are features

serene scaffold
#

Also it might be a while before I can look into it.

plush jungle
#

except that the code I posted has two neural net layers