#data-science-and-ml

1 messages ยท Page 331 of 1

undone flare
#

smh

rigid zodiac
#

usually when I approach with ridge regression, I try to find the optimum value. This is what I did before, hope it help ```# Variable selection and shrinkage methods we have learned in week 2.

(Both step-wise selection and penalized likelihood approaches).

Ridge Regression

Define a fine grid of tuning parameters, lambdas.

In Python, this tuning parameter is referred to as "alpha"

Set an equal spaced grid in log scale

n_alphas = 100
alphaR = np.logspace(-1, 4, n_alphas)
#aR = np.arange(0, 1000, 5)
#print(alphaR)

Ridge Regression (using alphaR)

coefs_R = []
for a in alphaR:
ridge = linear_model.Ridge(alpha=a)
ridge.fit(X_train, y_train)
coefs_R.append(ridge.coef_)```

undone flare
#

I will try that out too, thanks

#

How would I got about finding an optimal value of K in KNN model. right now I just have a loop and plotting the mae

rigid zodiac
# undone flare How would I got about finding an optimal value of K in KNN model. right now I ju...

this is what i did in my last class project ``` # from sklearn.neighbors import KNeighborsClassifier
# from sklearn.model_selection import cross_val_score

    # knn = KNeighborsClassifier(n_neighbors = 3)# Create KNN classifier with k=3, for instance.
    # knn.fit(X_train,y_cat_train)# Fit the classifier to the data
    
    # y_pred = knn.predict(X_test)# Test error in confusion matrix
    # k_range = range(1,51) # This search over k=1,...,50. Adjust the range as you like.
    # cv_scores = []
    # for k in k_range:  
    #   knn_cv = KNeighborsClassifier(n_neighbors=k)
    #   scores = cross_val_score(knn_cv, X_train, y_cat_train, cv=5) # This code uses 5-fold CV.
    #   cv_scores.append(scores.mean())
    
    # plt.plot(k_range, cv_scores)
    # plt.xlabel('K')
    # plt.ylabel('CV accuracy score')
    
    # #more flexible compare to all of it 
    
    # print(confusion_matrix(y_cat_test,y_pred))```
#

reason why I comment it, because somehow it didnt work or the professor dont require it

undone flare
acoustic halo
proud pond
#

hello

#

is their a learning algorithm for training a NN model, that changes the structure of the NN as well as it's parameters (weights) ?

acoustic halo
#

There's a python implementation available as well if you don't want to do it from scratch

summer musk
#

salary_map={'<=50K':,'>50K':1 }
X_train['salary_map']=X_train['salary'].map(salary_map)

#

its not woeking

#

can anyone help?

serene scaffold
#

what is it supposed to do?

summer musk
#

map the values

#

Try using .loc[row_indexer,col_indexer] = value instead

#

this is kind of warning m getting

serene scaffold
#
salary_map = {'<=50K': , '>50K': 1}
X_train['salary_map'] = X_train['salary'].map(salary_map)
summer musk
#

salary_map={'<=50K':0,'>50K':1 }
X_train['salary_map']=X_train['salary'].map(salary_map)

serene scaffold
#

Try using syntax highlighting and following style conventions.

#

What is the value supposed to be for '<=50K'

summer musk
#

its value present in column

serene scaffold
#

salary_map is a dictionary

summer musk
#

yes

serene scaffold
#

you have '<=50K' as a key. what is the value?

summer musk
#

0

#

i have puted in dict

serene scaffold
#

so, make sure the value is there in your code.

summer musk
#

yes it is there

serene scaffold
summer musk
serene scaffold
#
X_train['salary_map'] = (X_train['salary'] > 50_000).astype(int)

Try that. Also, when sharing code or error messages, please copy and paste the text instead of showing a screenshot.

summer musk
#

yea sure

agile jolt
#

i have an issue on a graph and the screenshot is much needed, i hope it's not that big deal

serene scaffold
#

It's fine if it's a graphic of some kind and not text.

agile jolt
#

okay, great

#

so..this happened

acoustic halo
#

Yeah because you have a billion categories lol

agile jolt
#
import pandas as pd
from pandas import to_datetime
import plotly
import plotly.express as px
import plotly.io as pio


df = pd.read_csv(r'\Users\almas\Desktop\amazon_jobs.csv')


df.dtypes


df["Posting_date"] = to_datetime(df["Posting_date"])

y = df.loc[(to_datetime(df["Posting_date"]) > to_datetime("January 1,2018")) &
          (df["location"] == "US, WA, Seattle ")]


print(df)

y.groupby("Title").size().plot.pie(y="Title",ylabel="LABEL")
acoustic halo
#

Yeah, you group them by title, but theres loads of titles

agile jolt
acoustic halo
#

Well, technically you were successful

agile jolt
#

Yeah haha

#

But how can i filter it to something more visible

#

And useful

acoustic halo
#

You'd have to figure a way to label them better, eg anything that contains the words "software" and "engineer" or "developer" => "software engineer"

#

You would have to decide on the labels and how best to generate them though

#

Or other key words from the description

agile jolt
#

Okay, I'll try something else maybe

#

Thanks!

#

Oh and yes, while i'm here..any idea or example for ternary plot

acoustic halo
#

If your just playing around, maybe categorize by what languages they require

agile jolt
#

Seems a bit hard because it's not a specific category, but 'PREFERRED QUALIFICATIONS' where they took data from applications

#

But thanks

agile jolt
undone flare
empty parrot
#

any text-to-speech recognition with ANN in python available somewhere for learning??

grizzled barn
#

Not sure yet. How could I figure that out?

coral kindle
#

Anybody doing the Kaggle 30DayMl challenge?

tame solstice
#

How can i solve the importerror?

#

matplotlib library is added.

#

but i getted the error

modern dragon
#

Hi guys, how do you improve the accuracy level of your machine learning model?

serene scaffold
#

@modern dragon it's not possible to answer this question in general, as there's no one-size-fits-all solution.

#

What does your model do? How is it performing currently?

#

@tame solstice make sure pycharm is running your code in the environment where you installed matplotlib. If you don't understand what I mean by this (and it's okay if you don't) then it probably isn't.

modern dragon
serene scaffold
modern dragon
#

Oh uh what are the different types of models(

#

?*

#

This is my first ML project ๐Ÿ˜…

serene scaffold
#

just show the code, I guess

#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

serene scaffold
#

please don't post screenshots

modern dragon
#

Oh it's really weird, is it ok if I show you the tutorial I learned it from?

#

Python Machine Learning Tutorial - Learn how to predict the kind of music people like.
๐Ÿ‘ Subscribe for more Python tutorials like this: https://goo.gl/6PYaGF
๐Ÿ‘‰ The CSV file used in this tutorial: https://bit.ly/3muqqta

๐Ÿš€ Learn Python in one hour: https://youtu.be/kqtD5dpn9C8
๐Ÿš€ Python (Full Course): https://www.youtube.com/watch?v=_uQrJ0TkZlc
...

โ–ถ Play video
#

You can skip to 29 mins

serene scaffold
#

When you have the code ready to share, ping me and I'll look next time I'm online.

covert herald
#

does anybody know any good sources to learn machine learning without have to use any of the modules (like sklearn)? ping me if you have an answer, thanks!

serene scaffold
#

@covert herald why don't you want to use modules?

#

They're not an added layer of complexity. They're there to help. You could implement some algorithms "from scratch" for educational purposes, but I would still use numpy at the very least.

#

The reason I insist on numpy: if you write all the math by hand, you're going to waste a lot of time on implementation details that don't deepen your understanding of anything.

undone flare
#

How to know that you have overfitted your model?

tired nymph
#

Hello guys,
I'm using OpenCV and Yolo3 to detect objects in a video file I have in a folder. The problem I don't know how to save the out video ( that has the detection). This is my code:

video = 'test.mp4'
vid = detect_video(video, yolo, all_classes)
undone flare
#

this doesn't seem right

#

why is it doing that?

#

I have one hot encoded the data

#

oh I should drop some columns

undone flare
#

fixed it, the problem was there were too many unique values for some column

grand mantle
#

just like matlab has simulation interfaces

lapis sequoia
#

is there any link between machine learning and binary search?

undone flare
#
LinAlgWarning: Ill-conditioned matrix (rcond=1.05001e-17): result may not be accurate.
  return linalg.solve(A, Xy, sym_pos=True,
```what does this mean? code:
```py
reg2 = Ridge(alpha=0.0, normalize=True).fit(X_train, y_train)
y_pred2 = reg2.predict(X_test)
rmse_ohe[1] = rmse(y_test, y_pred2)
rmse_ohe[1]
velvet thorn
#

basically, there is substantial uncertainty in the regression coefficients

undone flare
#

oh is there a way to fix this?

tame solstice
neat sandal
#

hello @all i want to build a script that detects the dominant color in an image , i did some search and I have found some packages like color-thief but the results aren't good so i want to build some thing my self but i don't how and from where to start any one could help me ?

tame grail
#

i got this table from a website with bs4 and pandas

#

its a list of strings with a length of 1 named dfs and that output is dfs[0]

#

how do i get just the common name column?

#

the spacing changes depending on which state's data im looking at

#

printing for s in dfs[0] gets me the column names

unborn glacier
#

dfs["Common Name"] should work

tame grail
unborn glacier
#

Maybe it's not a data frame?

tame grail
#

no it isn't its a list of strings

#

but theres only 1 element

#

which is that big table

unborn glacier
#

Ah okay, you can turn a table into a data frame and then use what I was talking about

#

I think it's just df = pf.DataFrame(dfs[0])

#

Then you can do df["Common Name"]

covert herald
#

@serene scaffold im use numpy but i dont want to use the machine learning modules until i learn how the algorithms work

tame grail
#

ahh i got it thank you! @unborn glacier

#

it was just this

unborn glacier
#

I don't think there's one single resource that has it all

covert herald
#

alright

unborn glacier
lucid kettle
#

I'm taking an Artificial Intelligence course next semester, yay

grand lion
#

Do I need mathematical rigor to start learning TensorFlow?

serene scaffold
#

Also, I would avoid "learning TensorFlow" or any other library, and instead focus on approaches to AI and use whichever library suits what you're doing.

grand lion
#

Right now I am just planning on using a library to create an LSTM, so what mathematical knowledge would that need?

serene scaffold
grand lion
#

The former

#

Predict the next word in a sentence type of thing

serene scaffold
#

You can get away with a certain amount of not knowing the math behind it, yes.

grand lion
#

Tbh I only know up to geometry so I might learn this another time after I start calculus and linear algebra

serene scaffold
agile jolt
#

any idea which dataset would be good for ternary plot?

balmy junco
#

Hey, I'm trying to use Fitter from fitter library on large image data. It always times out. I tried increasing the timeout quite a lot, but it never makes the cut. I converted it to an np array and everything, but no luck....

#

Can't seem to find anybody that knows anything about it

#

Any thoughts?

grand lion
#

@serene scaffold Katie told me that you work with NLP, so I have a question regarding that. Would you need ML to do it, could you just work based off of grammatical structures of sentences (I.e subject, verbs, predicates, etc.) and classify words as a certain description

plucky lichen
#

well, I dont really know how to explain it
I am trying to archive the messages of my friends and me from a discord channel, that works great, but I get this json file from discord:

serene scaffold
#

Can you put that in a paste bin?

plucky lichen
#

yes sorry

serene scaffold
grand lion
#

Hm

plucky lichen
serene scaffold
#

though there are certain approaches where grammatical features are taken heavily into account.

grand lion
#

For my example, Iโ€™m more or less trying to generate a sentence rather than predict the next word - using basic sentence structures such as simple sentences and the likes, would that need ML?

serene scaffold
#

This happens to be the second assignment in the NLP class I helped teach.

grand lion
#

I would try and create an LSTM but I seriously donโ€™t understand videos for them because they all require previous knowledge in ML

serene scaffold
#

ML is a tough area to jump into, yes

grand lion
#

I can get behind how LSTMโ€™s and RNNโ€™s work but I donโ€™t understand the mathematical portions of it

serene scaffold
#

how do you feel about statistics?

grand lion
#

Not that good at it tbh, Iโ€™m in 8th grade so my mathematical knowledge is quite bad. I do know concepts like correlation coefficient and the likes, but probably not at the point where Iโ€™m competent enough for ML

serene scaffold
#

you can do the ngram/markov chain approach if you simply understand that if something happens 8 times out of 10, it has a .8 chance.

grand lion
#

Hm alright

#

Would there be a certain framework that is strong with those concepts?

#

Or should I work on the concepts first then find a framework that suits my needs

serene scaffold
#

NLTK is a library you can use to get the ngrams.

#

the statistics and stuff, you can just store some numbers in a nested dict data structure of some kind.

grand lion
#

Does it abstract it too much? Cause I do want some abstraction but not too much so I can understand the concepts

grand lion
serene scaffold
#

not really. an ngram is a tuple of n consecutive tokens.
[(not really .), (really . an), (. an ngram), (an ngram is), ... (consecutive tokens .)]

#

these are 3grams or trigrams.

grand lion
#

Are there any articles or books that are helpful for understanding what ngrams are?

serene scaffold
#

maybe? the course I taught specifically didn't have one to save money for the students.

#

well, helped teach. anyway, a token is just a word or punctuation mark. and it's just n tokens in order

grand lion
#

Ohh alright

#

Ohh wait I understand it

#

I was a bit confused at first but now I get it

#

Anyways, thanks for your help! I appreciate it

serene scaffold
#

@grand lion I'll ask my now-former advisor for her slides. Ask me again on like Tuesday.

grand lion
#

Alright, will do

uncut orbit
#

elt is the same thing as tlc for a dataset

grave frost
#

I will give the idea and you would code all of it - prize would be split 50-50

uncut orbit
grand lion
#

Do I need Anaconda to plot on 2d maps with matplotlib?

#

Also do I need mpl_toolkits.basemap?

quiet maple
#

hey

#

i want to learn ML

#

can you guys help me how can i learn quicker and properly

leaden pebble
#

Hey

Suppose i give u a data (interval, frequency) ...and

Here if we Apply
np.histogram(data_given, bins=class interval, density = false
)

We will get two tuples
1.frequency counts
2.bin edges

But now if we do
Density = true

What will that np.histogram give statistically ?

#

Thats the data

royal crest
#

according to documentation

#

see for yourself too!

wide raven
#

Do you guys think the only good way to learn neural networks is by learning every aspect of it

#

and understanding all the math and how they are built?

#

or can you get a good understanding and make a lot of cool AI just by learning tensorflow and mastering that

#

I thought learning from scratch would be nice and help me understand but after hours and hours of learning gradients equations types activation functions

#

it just got too much to handle and I would like to make AI with a less info-needed approach which is why i thought tensorflow would be nice

#

but i am scared that would limit me and what i can make

somber prism
#

guys i have one doubt , are the tree based models prone to outliers, skewed features ?

#

if they are not then i dont need to standardize or scale the features right ?

dull turtle
#

hello

#

i am working with pandas dataframe

#

when i run ```python
for date_1 in rem_dup_dt_column[0]:
print("date_1:", date_1)
print()

row_data = main_dataframe.loc[main_dataframe['date']==date_1]
print("row_data:")
print(row_data)
print()```this command i get only first date is getting stored

i want here that it will run for every entry in date column

#

ping me when u reply

mystic tinsel
#

hello, i had a question regarding label encoding and one hot encoding. A few examples that i found online which had Sex column in it trained the model after label encoding and no one hot encoding, shouldnt one hot encoding be done in such cases? Thanks

ripe forge
#

in cases where column only has 2 unique values, there's zero downside to label encoding. so if the dataset for Sex had only 2 values, then you essentially bypassed the pitfall of label encoding

ripe forge
still osprey
#

aw

mystic tinsel
#

Sorry, gender *

still osprey
ripe forge
#

in general you're correct, for higher cardinality (ie more unique values) in a categorical column, label encoding isn't appropriate

mystic tinsel
#

So like labelling male as 0 and female as 1 doesnt really have any effect on the model huh

ripe forge
#

yes, because it's just two numeric values, with some distance between them

ripe forge
#

you could have even set it to 0.25 and 0.75, or 0.3 and 0.6 if you wanted (though i dont know why you'd want to do that)

#

the model will never see any value outside those two for this feature, and thus it's relations will largely stay completely independent of the actual values

mystic tinsel
#

Sorry if im repeating the question but even the distance or small difference should set those two apart right ? Like in hot encoding its more like true and false but label encoding is more like assigning a value to a variable?

#

Shouldnt that effect even models with only two values..

ripe forge
#

ultimately a model doesn't care. all it does is weight * some_feature

#

the weight could be learnt arbitrarily to scale any 2 values into anything

mystic tinsel
#

Oh

ripe forge
#

and also, for the record, a model also doesn't even understand true or false. all it understands is math and numbers

mystic tinsel
#

Ig i need to think about it a lil more to completely understand that ๐Ÿ˜…

mystic tinsel
#

Thank you!

hardy hornet
#

do anyone know how to change language in Jupyter to English

ripe forge
#
GitHub

Recently all the interface of the Jupyter notebook has been automatically translated into the my own language (French). It is the case on all the web browsers I've tested (Firefox, Chrome, ...

plucky lichen
mystic tinsel
plucky lichen
#

no

#

I will try that

#

merci

mystic tinsel
plucky lichen
#

print( data['messages']) prints everything under it which makes sense,
print( data['messages'][0][0]['embeds'][0]['title'])
should print the value of title but id doesnt

scarlet cypress
#

is 3000 files enough for a CNN

lament topaz
#

hello, I was trying a project on twitter sentiment analysis
so I took dataset from Kaggle - cleaned it (like @, RT, links)
I want to know if is there a way to choose tweets based on a topic??
like I want tweets regarding "Donald Trump", "Artificial Intelligence", etc
can I do that without using the twitter api - just choosing from the dataset!!
pls help!!

somber prism
lament topaz
somber prism
lament topaz
#

yes for any topic, not just the Trump

#

basically I need to create a model for this!!
I only showed wordcloud, bar graphs
need to show accuracy and train the model too
so thats why i thought I should filter the data to a topic only

somber prism
#

try df[df.tweet.str.contains('donald')]

lament topaz
#

oh yeah that works ๐Ÿ™‚

somber prism
lament topaz
#

yeah lol beginner here - always asking for help

grave frost
somber prism
#

can someone tell me what iam doing wrong here or the dataset wasnt meant to be classified easily ?

#

no one? ๐Ÿ˜

somber prism
#

hmm

#

ok wait

lament topaz
#

but Idk data science much,, just started 2 weeks ago!

#

I hope someone else will help!

somber prism
#

hmm ok

somber prism
#

deep neural network ?

grave frost
somber prism
#

idk i only know machine learning algo not deep learning

#

so i have to wait for it ig

mystic tinsel
somber prism
# mystic tinsel could you briefly explain your dataset?

this dataset is about classifying genre of the music , there are like 10 diff genre . it has features like Artist Name ,Track Name , Popularity , danceability , energy key, loudness , mode, speechiness , acousticness , instrumentalness , liveness , valence ,tempo , duration_in min/ms , time_signature

mystic tinsel
#

oh alrighty, Ive also just begun ML so lemme see if i can understand this๐Ÿ˜…

somber prism
#

you couldve simply clicked that kaggle link to see the dataset

mystic tinsel
#

i did, i was kinda confused with the datalist

somber prism
#

ohh

somber prism
mystic tinsel
#

just a few weeks tbh

#

i havent really used kaggle either

somber prism
#

ok

mystic tinsel
#

im new to this, but if i may, are you using popularity as an input?

#

@somber prism

somber prism
#

all the features except class cuz that one is the output variable ( the one needs to be predicted )

mystic tinsel
somber prism
#

yep

mystic tinsel
#

i think you shouldnt use popularity as an input....

somber prism
#

wym ?

mystic tinsel
#

because it is insignificant right?

#

like what determines a genre is the rest of the features but not the popularity?

somber prism
#

genre is dependent on every of those feature there but wait lemme try dropping the popularity and see if i get the improvement in score

mystic tinsel
#

sure

#

ill do the same

somber prism
#

maybe you are right cuz i do get a significant drop in accu due to popularity

#

someone correct me if i am wrong.

#

ok nvm

#

popularity is a important feature

mystic tinsel
#

huh, well my bad

somber prism
somber prism
mystic tinsel
#

i cant seem to download the code, so i cant try it out myself...

#

processing seems to take a lot of time on kaggle

lapis sequoia
#

Hey everyone, my name is Paras, and recently started to learn ML from random resources on youtube and google. Can you please guide me about how and where to learn ML. Thank you in advnace ๐Ÿ™‚

somber prism
real dew
#

If arr is a 3D numpy array, does arr[arr<90] return values in row-wise order as in arr?
In other words, is arr[arr<90] roughly equivalent to this:
(Not considering return type)

output = []
for i in arr:
    for j in i:
        for k in j:
            if k<90: output.appent(k)
return output
serene scaffold
#

!e

import numpy as np
arr = np.random.random((2, 2, 2))
print(arr)
print(arr[arr > .5])
arctic wedgeBOT
#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

001 | [[[0.74413612 0.56879201]
002 |   [0.02811816 0.03907577]]
003 | 
004 |  [[0.78955903 0.69845739]
005 |   [0.61117874 0.84977809]]]
006 | [0.74413612 0.56879201 0.78955903 0.69845739 0.61117874 0.84977809]
serene scaffold
#

@real dew you can use that to infer what the logic is.

#

looks to me that it's the same as if you had first reshaped the array into one dimension

real dew
#

Oh yeah
Thanks!!!

serene scaffold
#

๐Ÿ’š

glad radish
#

Does anyone know a good book or course to learn about machine learning algos like xgboost, random forests, decision trees, etc

serene scaffold
glad radish
serene scaffold
glad radish
serene scaffold
soft viper
#

any monte carlo youtube that i can watch? Trying to get into simulation to fill up my holiday and monte carlo seems to be the buzzword so might start with there

fathom ruin
#

hay guys i want to take the informational fact from a paragraph and put it as bullet points what should i start with?

somber prism
#

have any of you tried to predict 2 variables ?

#

or we have to do it separately by predicting first var then the next var by choosing that as a predicting var

#

?

unborn glacier
#

You can predict 2 or more variables using multiple linear regression

uncut orbit
#

values

serene scaffold
fathom ruin
#

yeah sure just a sec

fathom ruin
# serene scaffold Can you give an example paragraph and what you want to extract?

Sample paragraph :

The inflated style itself is a kind of euphemism. A mass of Latin words falls upon the facts like soft snow, blurring the outline and covering up all the details. The great enemy of clear language is insincerity. When there is a gap between oneโ€™s real and oneโ€™s declared aims, one turns as it were instinctively to long words and exhausted idioms, like a cuttlefish spurting out ink. In our age there is no such thing as โ€˜keeping out of politics.โ€™ All issues are political issues, and politics itself is a mass of lies, evasions, folly, hatred, and schizophrenia. When the general atmosphere is bad, language must suffer. I should expect to find โ€” this is a guess which I have not sufficient knowledge to verify โ€” that the German, Russian and Italian languages have all deteriorated in the last ten or fifteen years, as a result of dictatorship```

Like some of the points should be

*The inflated style itself is a kind of euphemism
*The great enemy of clear language is insincerity
*All issues are political issues, and politics itself is a mass of lies, evasions, folly, hatred, and schizophrenia.

serene scaffold
#

@fathom ruin let me get back to you on this. It's an interesting question.

fathom ruin
#

thanks btw ๐Ÿ™‚

#

ping me when u find any info on this ๐Ÿ˜„ i am trying to find things as well

serene scaffold
#

@fathom ruin just so we're clear, you're just trying to classify which sentences do or do not have true/false statements, yes? You're not trying to determine if the statement is actually true?

fathom ruin
serene scaffold
#

Okay great. That greatly simplifies the problem lemon_long

fathom ruin
#

Shy i am trying to do this for hours ๐Ÿ˜‚ and here you be like its a piece of cake lol

#

I literally have 0 idea on which wat to get the points

serene scaffold
#

I didn't say it's easy, it's just easier than detecting misinformation.

fathom ruin
#

๐Ÿ˜… yeah k nice lol

serene scaffold
#

I'm trying to identify what sets your three bullet points apart from the sentences that aren't of interest. The last one contains a lot of opinions.

fathom ruin
#

i mean it was just a example

#

maybe we should just remove the wide option points and do the remaining and figure out later?

serene scaffold
#

One thing they all have in common though: the subjects of each sentence are third person nouns that aren't people, and the verb is a form of "to be". You might actually be able to solve this with rules.

fathom ruin
#

hmm thats something i didnt know

#

i used a package to seperate verb, noun and other types in a paragraph apart

serene scaffold
#

Spacy?

fathom ruin
#

but it gave the verb WORDS i am not sure how to get the setence

fathom ruin
serene scaffold
#

Yay!

fathom ruin
#

so like where should i go next ๐Ÿค”

serene scaffold
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

fathom ruin
#

how do i get the sentence where the verb is?

serene scaffold
#

Can you show the code?

fathom ruin
#

sure

#

if u want i will just remove those and keep the spacy alone

fathom ruin
#

not as a "sentence"

serene scaffold
fathom ruin
#

how?

serene scaffold
#
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('pretend this is a long paragraph with multiple sentences.')
for sentence in doc.sents:
    # do stuff with sentence
fathom ruin
#

๐Ÿค” let me try that

fathom ruin
serene scaffold
fathom ruin
#

think its working ๐Ÿ˜„

#

Great!

#

thank you so much ๐Ÿ˜„

serene scaffold
fathom ruin
#

for point i can do \n before every points

#

but for just normal paragraph

serene scaffold
fathom ruin
#

ohk thank you again ๐Ÿ˜„

somber prism
glad mulch
#

if i have a df with missing values (Firm Age), how am i able to make the value = previous one + 1

#

i was thinking of ffill

#

but im not sure

serene scaffold
sour spindle
#

hi. i would lke to know how to make a tensorflow input layer for a dataset which is like this [[1,2,3,4,5,6,7], [1,2,3,4,5,6,7]].

#

i need to input each element of the subset into a node

lapis sequoia
#

@glad mulch you can use .shift(1)

grand lion
#

How do you form a sentence from ngrams

lapis sequoia
#

What does this mean? train_metrics = pd.DataFrame({'MAE': mae_train, 'MSE': mse_train, 'RMSE': rmse_train}) train_metrics.reset_index(drop=True, inplace=True) train_metrics.head(10)

#

I'm getting an error when I want to pass the results of the SVR model

merry ridge
#

I'm trying to help a friends son with their machine learning homework and they are using a technique to estimate a PMF I don't think I've seen before.

#

They are constructing a probability distribution by taking emails that are categorized as spam, and creating a frequency histogram of every word that appears in each email. Then generating a probability by taking each frequency and dividing by the sum of all the other words appearances

#

But instead of doing just that, they are adding 1 to each frequency and dividing by the sum plus the number of distinct words to account for the addition of those 1s

#

It seems like some sort of finite population correction factor, the resources this student was provided is riddled with typos everywhere and I don't know why an "n+1" correction in this manner makes sense

#

Put in another way, if their probability distribution for their data is p = (x_1/n, x_2/n, ....., x_m/n). They adjust it to p = ((x_1 + 1)/(n+m), ....., (x_m + 1)/(n+m))

lapis sequoia
glad mulch
thorn bobcat
#

anyone here work with NLP?

#

I'm trying to create a fairly decent chatbot in arabic

acoustic halo
#

@thorn bobcat there's a few, just ask the question

thorn bobcat
#

this is how arabic looks like
ุฃุฑูŠุฏ ู‚ู„ู…ุง.

#

first of all it'll have to be tokenized in reverse instinctively.

#

but how would a transformer even work with arabic?

#

cause the sentences don't have a clear set structure

serene scaffold
#

@thorn bobcat what do you mean that they don't have structure?

acoustic halo
#

Depending on what you want to do, you could just use a prettained Arabic model and save yourself some time

thorn bobcat
# serene scaffold <@361955686185304074> what do you mean that they don't have structure?

This is because it is different from most languages in a few ways.

It is written from right to left.
It uses its own set of characters that are unrecognizable to speakers of other languages.
Vowels are omitted when itโ€™s written. It has a complex and rich grammatical structure, for example, pronouns are embedded in the words themselves in many cases.
It is much more fluid than most other languages as sentences donโ€™t conform to the subject-verb order that is typical of English.
All of this makes it harder to learn and leads to a larger risk of ambiguity than would exist in most other common languages.
serene scaffold
#

@thorn bobcat I don't agree with your assessment that these properties make it harder

thorn bobcat
acoustic halo
#

I can't pretend to know anything about Arabic but I don't see why any of that means it's not modelable

thorn bobcat
acoustic halo
#

Anyway what sort of chat bot do you want to make, something conversational, question answering etc?

thorn bobcat
#

something conversational

#

able to answer philosophical questions

#

and give legal advice

#

ARAGPT2 is a stacked transformer-decoder model
trained using the causal language modeling objec-
tive. The model is trained on 77GB of Arabic text

#

I really wanna learn about transformers but don't know where to start really..

grand lion
#

Once you have the ngrams and the most common frequency, how do you form a sentence from them?

#

I can't seem to find anything on S.O

#

(Using NLTK btw)

acoustic halo
#

@thorn bobcat https://arxiv.org/abs/1706.03762 is the best place to start on learning transformers

#

For specifically making a chat bot, you are best to use a pretrained model as opposed to making your own

acoustic halo
grand lion
#

Create sentences from ngrams
There used to be a method in NLTK called generate but it's deprecated now

#

Stelercus might know since he recommended ngrams to be yesterday but overall I've been trying to search for the answer but for some reason there's zero answers on it

acoustic halo
#

Where are you getting your generate from, I can see it in the nltk.text module fine, no deprecation warnings

grand lion
acoustic halo
#

I can't find any modules with that name or similar locally or in the docs

#

Nevermind it is in an old version

serene scaffold
#

And there's nothing special about subject-verb-object word order.

#

Anyway, I think transformers should work just as well for Arabic.

sick furnace
thorn bobcat
grave frost
thorn bobcat
#

so I looked it up and someone told me it'll be computationally expensive to train a model from scratch

thorn bobcat
#

wanted to do something like this but I understand now it'll cost alot to do it from scratch

#

So I'd like to take what they did and improve it.

thorn bobcat
#

make him more inclined to use the new data over the old data although the old data would still exist.

#

I'd like to also give him a face and apply first order motion.

grave frost
#

because there aren't enough ancient scriptures to constitute a sizeable amount for traning

thorn bobcat
grave frost
grave frost
thorn bobcat
#

and a face that has lips that move matching the words

grave frost
#

ahh, that's pretty easy

#

but why ancient scriptures?

thorn bobcat
#

Idk would seem fascinating talking with an ancient mid eastern philosopher.

grave frost
#

unless you have a ton of data and compute

thorn bobcat
thorn bobcat
#

assume I got a corpus of about 100 movie subtitles and a 1000 books for starters with an average of 150 pages.

grave frost
grave frost
thorn bobcat
grave frost
#

maybe

#

do you have a CPU with huge RAM?

thorn bobcat
#

its free collab? idk random

#

Each user is currently allocated 12 GB of RAM

#

As of October 13, 2018, Google Colab provides a single 12GB NVIDIA Tesla K80 GPU that can be used up to 12 hours continuously.

grave frost
#

no, on your own PC

thorn bobcat
#

4gb ram

grave frost
#

well, leave it and use Colab then

thorn bobcat
#

do I need to understand about transformers to fine tune it?

grave frost
#

yes

#

you need to understand a lot of things to do something

thorn bobcat
#

I meant transformers and multi-headed attention

#

self- attention

#

that kind of stuff

serene scaffold
#

I might be conflating transformers with BERT a bit.

thorn bobcat
#

anyone know the naming convention? used

#

why fc?

serene scaffold
velvet thorn
# thorn bobcat why fc?

fully connected, because every neuron in a layer is connected to every neuron in the preceding and following layers

serene scaffold
#

gm, you might be interested to know, I recently got a data science-related position with a large US company. I have absolutely no idea how. I must have deceived them.

thorn bobcat
thorn bobcat
#

can someone tell me which would be easier to do?

velvet thorn
#

I'm actually planning on applying to a US university for a master's degree

#

doing research now ๐Ÿ˜”

serene scaffold
velvet thorn
#

I don't even have a bachelor's in CS

#

or anything related ๐Ÿ˜”

thorn bobcat
thorn bobcat
velvet thorn
#

well by definition

#

MNIST is done with the MNIST dataset...

#

do you mena like

#

you want to perform a similar task (multiclass image classification) and are asking which dataset might be better tow ork with?

sour spindle
#

my tensorflow results are all in lists like this [result] how do i make them floats so i can see the accuracy?

velvet thorn
#

like the predictions?

sour spindle
#

yeah

velvet thorn
#

so you have like

sour spindle
#

its just one float in a list

velvet thorn
#

show code

sour spindle
#

all of it or just the result

#

its around 120 lines

velvet thorn
#

just the result

sour spindle
#

predictions:[[47.402496]
[47.278564]
[47.387936]
[47.897003]
[48.52338 ]
[48.993202]
[49.162148]
[49.390816]
[49.802197]
[49.949066]
[50.186504]
[50.12692 ]
[50.034527]
[49.935844]
[49.875698]]

actual:[47.11750031 47.18000031 47.48749924 47.81000137 48.50500107 48.83750153
48.92250061 49.25 50.02500153 49.875 50.15499878 49.73749924
49.71749878 49.80749893 49.8125 ]

#

the predictions is the problem

velvet thorn
#

ah

#

so it's just a question of shape

#

you want .reshape

#

are those numpy arrays?

thorn bobcat
sour spindle
#

yeah

velvet thorn
#

you can also use .ravel() or .flat

#

or maybe it's .flat() it's been a long time

#

since I worked with numpy

#

something like that

sour spindle
#

will that also allow the accuaracy number show?

velvet thorn
#

!e

import numpy as np

a = np.array([[1, 2, 3]])
b = np.array([[4], [5], [6]])

print(a - b.flat)
print(a - b.ravel())
arctic wedgeBOT
#

@velvet thorn :white_check_mark: Your eval job has completed with return code 0.

001 | [[-3 -3 -3]]
002 | [[-3 -3 -3]]
velvet thorn
#

there we go

#

you get the idea?

sour spindle
#

yeah

#

will that also allow the accuaracy number show?

velvet thorn
#

that looks like regression though

#

so why are you talking about accuracy?

sour spindle
#

i want to see the accuracy but since its in that nested list format the accuracy number is 000000e00

sour spindle
velvet thorn
#

accuracy is a thing of classification

#

you're doing regression, right?

sour spindle
velvet thorn
#

?

sour spindle
velvet thorn
#

so like

#

classification involves discrete outcomes

#

e.g.

#

is this person positive or negative for COVID

sour spindle
#

ok

velvet thorn
#

regression involves continuous outcomes

#

in this case, stock prices

#

because like

#

it's not just 1 or 0, right

#

it can vary continuously from 0 all the way up to infinity (theoreticallly)

#

so what you're doing is regression

sour spindle
#

oh ok

velvet thorn
#

accuracy is the % of correct predictions

#

but that doesn't make sense for regression, right?

#

say you predict 45

#

and the actual price is 40

sour spindle
#

yeah u are rght

velvet thorn
#

you're wrong, but the how wrong matters

#

that's a lot better than predicting 500, right?

#

so we don't use accuracy for regression

#

there are other metrics

#

the most common is RMSE

#

root mean square error

sour spindle
#

like loss

velvet thorn
#

you can Google that

velvet thorn
#

loss is a general term

#

that tells the model "how wrong" its prediction is

#

it can apply to classification too

#

so there are different loss functions

#

depending on your task

sour spindle
#

ok then i will find some in the docs

#

i will use mape

#

now it seems to work like a charm. thanks

velvet thorn
#

yw! ๐Ÿ‘‹

thorn bobcat
#

anyone want to help me with the task of classifying hieroglyphics?

sick furnace
#

I'm trying to create a function but I'm having trouble setting making it correctly

df = df[(df['TVStandWallMount'] == 0) | (df['TVStandWallMount'] == 1)]

def clean_int_col(df, col):
    df = df[(df[col] == 0) | (df[col] == 1)]`
    return df
sick furnace
#

I have some integer columns that I am trying to clean with df = df[(df['TVStandWallMount'] == 0) | (df['TVStandWallMount'] == 1)]
to retain the binary values

I have a bunch of them and I want to do

for col in df.columns:
    clean_int_col(df, col)
velvet thorn
#

uh.

#

so you want

#

wait

#

I'm confused

#

so

#

basically

#

you want

#

to take out

#

the non-numeric values

#

yes?

sick furnace
#

I want to take everything that is not 1 or 0. Other possible values would be 11, 10, or some other numeric

#

take out anything that's not 1 or 0

velvet thorn
#

okay

#

so

#

for any row

#

in which

#

any column

#

has a non 1 or 0 value

#

remove that?

sick furnace
#

yes

#

df = df[(df['TVStandWallMount'] == 0) | (df['TVStandWallMount'] == 1)]

#

this works

#

but I'd like to iterate over all my columns and apply that

velvet thorn
#

uh.

#

pd.to_numeric(df, errors='coerce').dropna()?

sick furnace
#

but they're not na

velvet thorn
#

they will be

sick furnace
#

the 'wrong' values are something like 11 or 10

velvet thorn
#

oh

#

sorry I'm not focusing hard enough

#

wait then

#

why is there to_numeric above

#

so there are also cases

#

where they're not strings?

#

uh let me think about this for a moment

sick furnace
#

I took out the to_numeric part

#

I sent something wrong at first

#

I edited the message

velvet thorn
#

okay got it

#

do this

#

df[df.isin({0, 1}).all(axis=1)]

sick furnace
#

so that checks the values no? doesnt drop the rows right?

thorn bobcat
#

i have a set of images that look like.

#

to train them using an mnist classifier do i need the position of the object in the image or just the label?

velvet thorn
#

is a df without those rows

thorn bobcat
#

can someone help me prepare my dataset?

modern vapor
#

Does anyone have anything on handwriting with tf, like generating a page of writing? I saw something like it on reddit but cant find anything short of the digits thing on the tf website.

fierce parrot
#
def calculate_correlation(self,feature_one,feature_two):
        feature_one_data = []
        feature_two_data = []
        for data in self.data_list:
            feature_one_data.append(data[feature_one])
            feature_two_data.append(data[feature_two])

        feature_one_mean = statistics.mean(feature_one_data)

        feature_two_mean = statistics.mean(feature_two_data)

        feature_one_sample_std = statistics.stdev(feature_one_data)

        feature_two_sample_std = statistics.stdev(feature_two_data)
        mean_diff_sum = 0

        for k in range(len(feature_one_data)):
            mean_diff_sum += (feature_one_data[k] - feature_one_mean) * (feature_two_data[k]-feature_two_mean)
        print(mean_diff_sum)
        corrcoef = mean_diff_sum/(feature_one_sample_std * feature_two_sample_std)
        return corrcoef

So, I am trying to calculate correlation coefficient by using this class method. self.data_list is a list of dictionaries and contains data such as age, bmi,insurance charge, smoker(boolean), sex etc. I want to calculate correlation coefficient of two features. Normally, I should get a value between -1 and 1. However, when I run this function to test it, I noticed that I get absurd results like 389,500. There must be something wrong with my calculation but I couldn't figure it out. Any ideas what I do wrong?

royal crest
#

what would be the best way to go about removing rows where the numerical values are all 0?

#

i've tried a for loop to iterate over every row but i don't think that's very wise

#

i think another approach might be to "keep" the rows if there's a 1 present in any of them but i'm not sure if i know of a function that does this

royal crest
#

as in no changes have been made to the dataframe

velvet thorn
#

that creates

#

a new DataFrame

#

you need to assign it to a variable, of course

royal crest
#

I have

#

i've assigned it to df_gm and it's the exact same as the original dataframe it seems

#

same shape

#

one of my columns is text content, would that interfere with your method?

flat hollow
flat hollow
velvet thorn
#

you can do this

#

df[(df.select_dtypes('number') != 0).any(axis=1)]

royal crest
#

brilliant, cheers

#

so it selects any numerical column and matches with != 0

wheat sun
#

How do I make a grid where the x number line and y number line are thicker than the other grid lines? Like this:

earnest herald
#

I'm trying to create a deep q learning environment, similar to snake which tracks the position of certain things. How do I deal with their position (delta y, delta x) being null or undefined?

Should I assign it a value it can never reach e.g. 100,100 or allocated an input which can be only 1 or 0 depending on whether these parameters should be ignored

flat hollow
serene scaffold
#

it's easier if you provide code and error messages as text.

delete the "staticmethod" decorator from recommend

acoustic halo
#

mkj is in demographic filter not recommend

near spindle
#

How can I assure that i is always a int, not float?

for i in range(0,2**25):
    step = 0
    print(i)
    while i != 1:
        if i == 0:
            break
        elif i % 2 == 0:
            i /= 2
            step += 1
            print(i, end=" ")
        elif i % 2 == 1:
            i = 3*i + 1
            step += 1
            print(i, end=" ")
    print(f'\Amount of steps: {step}')
serene scaffold
bold timber
serene scaffold
#

do //= instead of /= so it's floor division

serene scaffold
near spindle
#

Spaces only for cleaner code or do they have a purpose?

acoustic halo
#

yes, easier to read

serene scaffold
# near spindle Spaces only for cleaner code or do they have a purpose?

they don't change how it's executed, but it's best to present others with readable code.

Division returns a float rather than an int, since division between integers is the only one (among addition, subtraction, and multiplication) that doesn't always return an integer, mathematically speaking.

near spindle
#

Makes sense, didn't know that // is such a big change

bold timber
near spindle
#

I'm gonna try fixed code once my pc finishes this code and stops being on fire

#

Also, how can I check the time needed to execute whole code?

#

I'm curious if by increasing the power of two by 1, the time needed for execution increases exponentially

#

Okay, it works fine now, except still being on fire

acoustic halo
bold timber
chilly geyser
chilly geyser
#

You could use timeit things but I don't recommend it

#

I would prefer just perf_counter at specific points in your script (usually start, end), and run the script multiple times to get multiple readings than run timeit

near spindle
#

So if I want to use this function, I need to import time and type that function at start and end of code?

acoustic halo
steel hawk
#

Hi guys, I want to ask you if someone worked on getting high season of each product/item in an e-shop. what is the best approach you use or do you have some articles that might help. In another way, I want to know each product's season by variance of sales when it starts and when it ends and the season length.

bold timber
acoustic halo
#

You are joking if you think im gonna rewrite your code for you

bold timber
acoustic halo
#

Your error says exactly what the issue is

#

you are tring to use between

bold timber
#

just a clue, not to rewrite my code

acoustic halo
#

Your dataframe doesnt know what between means

#

If you look a couple lines below you might see what you are missing

#

data_pns3.between, vs data_pns3.mkj.between

bold timber
#

Thank you!!

chilly geyser
arctic wedgeBOT
#

@chilly geyser :white_check_mark: Your eval job has completed with return code 0.

1.0050674946978688
near spindle
chilly geyser
#

!d time

arctic wedgeBOT
#

This module provides various time-related functions. For related functionality, see also the datetime and calendar modules.

Although this module is always available, not all functions are available on all platforms. Most of the functions defined in this module call platform C library functions with the same name. It may sometimes be helpful to consult the platform documentation, because the semantics of these functions varies among platforms.

An explanation of some terminology and conventions is in order.

โ€ข The epoch is the point where the time starts, and is platform dependent. For Unix, the epoch is January 1, 1970, 00:00:00 (UTC). To find out what the epoch is on a given platform, look at time.gmtime(0).

near spindle
#

Thanks, gonna check

#

If I get Time: 2.1679996279999614e-05, it means code was executed within milliseconds?

chilly geyser
#

Yes-kinda?

#

If you're doing benchmark on small things you might want to do timeit.timeit

#

There's also timeit.repeat

hollow falcon
#

pltshow show no result

#

what did i do wrong

chilly geyser
#

Up to the line plt.subplots(2) there are no plots

hollow falcon
#

oh my god im so dumb

willow spindle
#

Greetings. So I have two SERIES:

  1. tst - has fake data,
  2. usr - has true data.
    I am trying to check .isin() on the third series which I made a list:
    third_s = [i for i in df['Some_Col']

The thing is, that both tst and usr returns the same results when checking isin(). I tried:

# all syntax are correct and works on my PC
tst = # ... has fake data series
usr = # ... has data which is also in third_s
third_s = [i for i in df['Some_Col']]

# 1st approach - .empty:
if post.isin(third_s).empty:
  print('Yes it is')
else:
  print('No it is not empty') # why tst returns this if it is not in third_s?

# 2nd approach - .bool:
if post.isin(third_s).bool:
  print('Y') # again, why tst returns that as well TST HAS FAKE DATA
else:
  print('N')

Question: I need to skip in for-loop all tst values that are NOT IN third_s. Any ideas how?

desert oar
#

what are you trying to actually do?

#

[i for i in df['Some_Col']] is the same thing as df['Some Col']

#

i assume you are looking for .any()

#
if post.isin(df['Some_Col']).any():
    print('There is at least one value in "post" that is also in "Some_Col".')
else:
    print('There is no value from "post" that is also in "Some_Col".')
fiery mortar
#

When I set like this it work well.

#

But I set:

#

It doesn't work.

#

Can I solve a classification problem with tensorflow time series dataset?

lapis sequoia
#

Hi folks. I'm looking at some covid related dataset and trying to filter to a specific row and drop Unnamed columns. Could anyone help fix my code?

#
import pandas as pd

df = pd.read_excel('/tmp/Covid-Publication-06-04-2021.xlsx',
                   engine='openpyxl',
                   skiprows=11,
                   sheet_name=['Total Beds Occupied','Total Beds Occupied Covid'],
                   header=1,
                   usecols="B:NX")

eng_hosps_total_use = \
    df['Total Beds Occupied'].loc[df['Total Beds Occupied']['Name'].str.match("ENGLAND", case=True).fillna(False, axis=0)].fillna("").set_index("Name")

eng_hosps_total_use.drop(columns=eng_hosps_total_use.columns.str.match("Unnamed", na=False))

Source: https://www.england.nhs.uk/statistics/statistical-work-areas/covid-19-hospital-activity/

Gives:

raise KeyError(f"{labels[mask]} not found in axis")
KeyError: '[False ... ] not found in axis'
desert oar
#

@lapis sequoia what's wrong with it?

#

drop doesn't work "in place", you need to use inplace=True or do eng_hosps_total_use = eng_hosps_total_use.drop(...)

lapis sequoia
#

@desert oar Same error with inplace=True

desert oar
#

oh i missed the error

#

oh, hah

#

match returns True and False

#

not the matching values

lapis sequoia
#

oh

#

ha

#

Yes. I wondered about passing a list of column names as list. Is it possible to do something like:

#
eng_hosps_total_use.drop(columns=eng_hosps_total_use.columns.str.match("Unnamed", na=False).values)
desert oar
#
unnamed_columns = eng_hosps_total_use.columns[
    eng_hosps_total_use.columns.str.match("Unnamed", na=False)
]

eng_hosps_total_use.drop(columns=unnamed_columns, inplace=True)
lapis sequoia
#

got it

#

thank you

desert oar
#
df['Total Beds Occupied']['Name']

what's this? are you getting a row with the label Name?

#

or do you have multi-index columns?

lapis sequoia
#

no idea

ebon walrus
#

can someone explain why this isnt working?

lapis sequoia
#

@desert oar I'm selecting the column "Name" to match for the row of ENGLAND data. Single row.

rich plover
#

Hi all, I'm trying to import the module gensim and its doesnt work

#

This was the solution as suggested by this thread

#

However, for whatever reason it doesnt work for me

#

I've got gensim installed

desert oar
ebon walrus
#

all teh values are filled in

desert oar
arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

ebon walrus
#

alruiht

desert oar
#

@lapis sequoia sorry didn't mean to @ you

ebon walrus
#

lemme format

lapis sequoia
#

@desert oar np. Do you have any idea btw why Pycharm ipython console keeps giving me

console_thrift.UnsupportedArrayTypeException: UnsupportedArrayTypeException(type='ExceptionOnEvaluate')
ebon walrus
#
p = model.predict(new_house)

#add a new column 'price' in new_house file to show the model predicted price
new_house['Price'] = p

#export new house price file to local system
new_house.to_csv("new_house_price.csv")

import sys 
import os

plt.xlabel('area of house (sq.ft)')
plt.ylabel('Price of House(Dollars.)')
plt.title("relationship plot between area and price")
plt.scatter(price.area,price.price, color = 'red', marker = '+')
plt.plot(df.area, model.predict([['area']]), color = 'blue')
plt.show()

AttributeError                            Traceback (most recent call last)
<ipython-input-38-cea7655e712e> in <module>
      5 plt.ylabel('Price of House(Dollars.)')
      6 plt.title("relationship plot between area and price")
----> 7 plt.scatter(price.area,price.price, color = 'red', marker = '+')
      8 plt.plot(df.area, model.predict([['area']]), color = 'blue')
      9 plt.show()

#

@desert oar

silk axle
#

@marsh beacon We don't allow that type of advertisement here.

lapis sequoia
desert oar
#

@lapis sequoia i see, you have a multi-index column due to using 2 sheets

lapis sequoia
#

Yes. The source data excel file has many worksheets inside the workbook. I only need two of them.

desert oar
#

ah sorry it's not a multi-index, it's a dict

lapis sequoia
#

whichever

desert oar
#

they're not the same!

lapis sequoia
#

Am I likely to need to know for this small task?

#

Happy with a dict

desert oar
#

it's important to know that [ on dataframes and series is a complicated operation with a lot of possible behaviors, while on a dict it isn't

#

so yes it's somewhat important to know what data type you are working with

lapis sequoia
#

@desert oar gotchya. How do I modify your code to drop columns with NaN values?

eng_hosps_total_use.drop(columns=eng_hosps_total_use.columns[eng_hosps_total_use.columns.values.isna()], inplace=True)

= AttributeError no isna

desert oar
#

i'd do it like this:

import pandas as pd

data = pd.read_excel(
    'Covid-Publication-06-04-2021.xlsx',
    engine='openpyxl',
    skiprows=11,
    sheet_name=['Total Beds Occupied', 'Total Beds Occupied Covid'],
    header=1,
    usecols="B:NX",
)

eng_hosps_total_use = data['Total Beds Occupied'].set_index("Name")
eng_hosps_total_use = eng_hosps_total_use.loc["ENGLAND"]
eng_hosps_total_use.drop(
    eng_hosps_total_use.index[
        eng_hosps_total_use.index.str.match("Unnamed", na=False)
    ],
    inplace=True,
)
#

values is deprecated, don't use it

#

note that this ultimately returns a Series, not a DataFrame

#

you only have 1 row of data, no reason to keep it as a dataframe

#

for that matter, why bother "parsing" like this at all? just read the one row you need

desert oar
ebon walrus
#

the error is float area something

desert oar
#

i am saying that you might have accidentally assigned something to price, overwriting the dataframe

lapis sequoia
#

@desert oar Have you ran this code? The dates are being picked up as Float64 dtypes and inserting a 24hr time. Do you have any idea of correcting this?

desert oar
#

i ran some code but only for checking the column names

#

there are other options you can use to control how dates and other data types are handled

#

check the docs for pandas read_excel

lapis sequoia
#

Dates along the X axis

#

So column values would be values of X

#

in the plot

#

So I need timeseries data. The Columns are dates, but are Float64s not datetime

#

That's first problem.

desert oar
#

@lapis sequoia this code gives me a eng_hosps_total_use as a Series with a DateTime index and int data:

import pandas as pd

data = pd.read_excel(
    'Covid-Publication-06-04-2021.xlsx',
    engine='openpyxl',
    skiprows=11,
    sheet_name=['Total Beds Occupied', 'Total Beds Occupied Covid'],
    header=1,
    usecols="B:NX",
)

eng_hosps_total_use = data['Total Beds Occupied'].set_index("Name")
eng_hosps_total_use = eng_hosps_total_use.loc["ENGLAND"]
unnamed_cols = eng_hosps_total_use.index[
    eng_hosps_total_use.index.str.match("Unnamed", na=False)
].tolist()
extra_cols = ['NHS England Region', 'Code']
eng_hosps_total_use.drop(unnamed_cols + extra_cols, inplace=True)
eng_hosps_total_use.index = pd.to_datetime(eng_hosps_total_use)
eng_hosps_total_use = eng_hosps_total_use.astype(int)
#

i'd encourage you to spend time figuring out how it works

lapis sequoia
#

Amazing. Data frame, series... Ballache. What's the difference

#

@desert oar can you give me the shape after your drop

desert oar
#

(370,)

lapis sequoia
#

So this is 370 rows

desert oar
lapis sequoia
#

So I need two series. One for datetime and one for corresponding integer (beds used). Right?

#

Pass to matplotlib's X and Y each series

#

Seems like a lot of work when the dataframe is already a collection of series

desert oar
#

pandas has some utilities for plotting

#

it's also not that much work

flat flare
#

hey guys, i tried importing numpy and matplotlib in idle but cmd gave me error saying pip is not recognised

lapis sequoia
glad mulch
#

do you guys know how to make my graphs stack vertically and horizontally like 4 rows 3 columns . this is what i keep getting

desert oar
#

it looked right when i ran it

#

but i also posted it more an example of another way to do it, not a definitively correct implementation of whatever you are trying to do

lapis sequoia
#

Np

#

The plot should not be linear is all I was alluding to

#

I don't know if this is the right channel but does anyone know how to set a hard RAM limit in PyTorch?

#

My program keeps using all the RAM and it crashes the computer

desert oar
#

(this is the right channel)

lapis sequoia
#

Well how do I do it? I'm running it on Google Colab and I have 12GB of RAM, and I want to set a limit to that because it keeps using all of it and crashes the runtime

desert oar
#

i don't know, i was just trying to answer the first question ๐Ÿ™‚ maybe it's not possible?

#

you might need to change how data is loaded into your model

lapis sequoia
#

@desert oar why does your

eng_hosps_total_use = data['Total Beds Occupied'].set_index("Name")
eng_hosps_total_use = eng_hosps_total_use.loc["ENGLAND"]

return a series, but my

eng_hosps_total_use = \
    df['Total Beds Occupied'].loc[df['Total Beds Occupied']['Name'].str.match("ENGLAND", case=True).fillna(False, axis=0)].fillna(np.NaN).set_index("Name")

returns a dataframe?
I'm doing exactly the same thing with loc. Your just setting the index before looking for row relating to England where as I filter to all of the data for England and finish by setting the index to Name.

Shape difference is (375,) vs (1,375)

desert oar
#

because you're passing something array-like/list-like to .loc

#

really i should be using eng_hosps_total_use.at["ENGLAND"] because i know i only want 1 row

desert oar
#

@lapis sequoia

import matplotlib.pyplot as plt
import pandas as pd


with pd.ExcelFile('Covid-Publication-06-04-2021.xlsx') as xlsx:
    eng_beds_total = xlsx.parse(
        'Total Beds Occupied',
        skiprows=11,
        nrows=1,
        header=1,
        usecols="E:NJ",
    ).squeeze()
    eng_beds_total.index = pd.to_datetime(eng_beds_total.index)

    eng_beds_covid = xlsx.parse(
        'Total Beds Occupied Covid',
        skiprows=11,
        nrows=1,
        header=1,
        usecols="E:NW",
        squeeze=True,
    ).squeeze()
    eng_beds_covid.index = pd.to_datetime(eng_beds_covid.index)

beds = eng_beds_total.to_frame(name='total').join(
    eng_beds_covid.to_frame(name='covid')
)

beds.plot()
plt.show()
lapis sequoia
#

Fabulous.

lapis sequoia
desert oar
#

what do you mean by "line"?

lapis sequoia
#

I quoted you.

desert oar
#

if you have a dataframe with exactly one row, you have two options to turn that row into a series:

  1. .squeeze as in my code
  2. use .at[row label] or .iat[0]
#

ah

#

my answer remains

#

.match returns a boolean series

#

subsetting a Series by a Series returns another Series (or an Index, in this case)

#

my code avoids all that stuff entirely by grabbing data out of the xlsx more selectively

#

no need to "parse" the row labels etc. when you know exactly what row you want

lapis sequoia
#

When you work with Pandas and DataFrames is the typical workflow to reduce what you want to series data for operations e.g. plotting etc.

thorn bobcat
#

yo yo

desert oar
lapis sequoia
#

This function doesn't allow you to specify an engine though. XLRD not good with new xlsx file formats so no idea why yours didn't error.

#

Don't understand the squeeze.

#

gah

desert oar
#

ExcelFile i think supports engine=

#

it also should just auto-detect

lapis sequoia
desert oar
#

!e ```python
import pandas as pd
df = pd.DataFrame({'a': [1,2,3]})
print(df)
print()
print(df.squeeze())

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |    a
002 | 0  1
003 | 1  2
004 | 2  3
005 | 
006 | 0    1
007 | 1    2
008 | 2    3
009 | Name: a, dtype: int64
desert oar
desert oar
#

i don't know why the constructor isn't documented

arctic wedgeBOT
#

pandas/io/excel/_base.py lines 1166 to 1168

def __init__(
    self, path_or_buffer, engine=None, storage_options: StorageOptions = None
):```
lapis sequoia
#

Ah. God sakes I'm shocking.

#
with pd.ExcelFile(... ,engine='openpyxl') as xlsx:

works.

#

@desert oar Is the squeeze on eng_beds_covid responsible for cutting off all of March data?

desert oar
#

probably not, more likely i either didn't understand your requirements or made a mistake

#

oh it's a mistake, i probably didn't do the join right

lapis sequoia
#

I think its a union and

desert oar
#
beds = eng_beds_total.to_frame(name='total').join(
    eng_beds_covid.to_frame(name='covid'),
    how='outer',
)
#

definitely not a union

#

better?

lapis sequoia
#

Yes. It includes only the data where both have same timestamps

#

Sounds like a union to me. Outer join includes everything.

silk axle
#

If I've coded a one-player pong game (with Arcade), how would I go about adding AI to it? I basically want to implement a NEAT algorithm into it, with the fitness function just being the number of hits it can get until it dies. Ideally I'd be able to run multiple samples per generation at the same time (rather than doing one-by-one). If it helps, this is my code: https://paste.pythondiscord.com/oqitemoyuk.py

lapis sequoia
#

@desert oar Just speculating here, but do you notice anything unusual about those two plots?

desert oar
desert oar
#

covid beds smooth, non-covid beds not smooth (scheduled c-sections and elective surgeries?)

lapis sequoia
#

Needs further context. You're right.

desert oar
#

really i'd want to see this going back years

lapis sequoia
#

yep.

desert oar
#

might also be interesting to subtract covid beds from total beds

lapis sequoia
#

yep

desert oar
lapis sequoia
#

beat me to it. Don't provide the code. I'm copypasting

desert oar
#

it was a 1 line addition ๐Ÿ™‚

lapis sequoia
#

@desert oar why do you have the squeeze twice for eng_beds_covid

modest mulch
#

Hi, how do I know if a pth file contains the archteichture of the model, not only its "weights" ?

#

A pytorch model.

lapis sequoia
#

What you need to do is subtract mv_covid from total to give how many beds total beds are non-covid.

thorn bobcat
#

anyone here wrote a paper before?

#

the kind of arxiv?

velvet thorn
#

duplicates are kept

#

leaky abstraction

desert oar
#

same reason i didn't use parse_dates=True, it doesn't parse dates in column names

lapis sequoia
#

hey i beginner on python and i wanna know about AI. Where should i start? And how much should i know?

serene scaffold
lapis sequoia
serene scaffold
#

it's okay if it's just algebra or something. I'm just asking

lapis sequoia
#

Yes

#

Algebra

serene scaffold
lapis sequoia
#

I thought programming doesnt require math

serene scaffold
#

whether or not you find stats and linalg easy will depend, but what ultimately matters is that you maintain a positive attitude about learning. because the learning never stops.

lapis sequoia
#

That is it

serene scaffold
#

what do you mean, that is it?

lapis sequoia
#

Just statistic, how about framework or anything

serene scaffold
#

there are a lot of libraries. numpy, pandas, matplotlib, sklearn, pytorch, tensorflow. but you learn the parts of them that you need as you go.

#

Like I said, it's important to maintain a positive attitude about learning.

lapis sequoia
#

Alright thanks

lapis sequoia
#

Can you freelance as data scientist?

ashen sable
#

does training a model in colab takes a long time ?

wide raven
#

Do you guys think the only good way to learn neural networks is by learning every aspect of it
and understanding all the math and how they are built?
or can you get a good understanding and make a lot of cool AI just by learning tensorflow and mastering that
I thought learning from scratch would be nice and help me understand but after hours and hours of learning gradients equations types activation functions
it just got too much to handle and I would like to make AI with a less info-needed approach which is why i thought tensorflow would be nice
but i am scared that would limit me and what i can make

vague ravine
#

what exactly are you scared?

wide raven
#

that not learning the core of neural networks and just learing tensorflow would limit what i can create

vague ravine
#

how did you how you came to that conclusion

wide raven
#

idk AI just seems like one of those things you need to know completely

vague ravine
#

which one

stark zenith
#

you don't need to know everything about how your car works to drive it

unborn glacier
#

I think you need to know the very basics, matrix multiplication, back-propagation, gradient descent, as well as a theoretical understanding of the rest (like the effects of changing hyper parameters or the idea behind transformers for nlp) to get 95% out of machine learning. You can know basically nothing and still get a lot out of it. The biggest thing is going to be experience. Knowing what to use when comes from having done it before, not some complex mathematical understanding.

wide raven
#

but back prop, gadien descentm, and more are still a lot to handle

#

like these equations and stuff

#

just seem like a lot to fully understand, and I was just wondering if I would just have to know how back prop works and not get into the math for it

ripe forge
#

You could get an intuitive understand and then just move on if you wanted, when starting

#

In my opinion you can defer the math when learning ml for later, I think people emphasize it too much

undone flare
#

If my independent variables are highly correlated should I use Ridge Regression?

cedar sky
#

Hey guys any school student interested in AI here?
Anyone interested in collaborating for this competition can DM me.
https://aischoolofindia.com/waicy-competition/

AISI

WHAT IS WAICY INDIA? WAICY India is an online competition for Indian schools & students which engages them to learn and use artificial intelligence (AI) technology to solve real-world problems. AI researchers around the world are harnessing the power of AI for a sustainable future. From solving the toughest environmental challenges to becoming t...

grand mantle
#

Do anybody know which path planning algorithm is used in tesla map?

I got information from online that dijkstra is used in google maps

desert oar
static acorn
#

ML

#

learners

grand mantle
#

only ML

bold timber
#

how to rename the large column in dataset like this? i want to change the name of column to only s-1, s-2, -s-3 not with 'JENJANGPENDIDIKAN_'

anyone can help me? I've to try reame the column name for 2 days and still don't getting result properly

desert oar
arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |    s_1  s_2
002 | 0   11   21
003 | 1   12   22
004 | 2   13   23
desert oar
#

or equivalently

import re
import pandas as pd

data = pd.DataFrame({
    'JENJANGPENDIDIKAN_s_1': [11,12,13],
    'JENJANGPENDIDIKAN_s_2': [21,22,23],
})

data = data.rename(
    columns=lambda colname: re.sub(r'^JENJANGPENDIDIKAN_', '', colname)
)

print(data)
#

maybe even better:

import pandas as pd

data = pd.DataFrame({
    'JENJANGPENDIDIKAN_s_1': [11,12,13],
    'JENJANGPENDIDIKAN_s_2': [21,22,23],
})

data.columns = data.columns.str.replace(r'^JENJANGPENDIDIKAN_', '', regex=True)

print(data)
#

or if you're using python 3.9+

import pandas as pd

data = pd.DataFrame({
    'JENJANGPENDIDIKAN_s_1': [11,12,13],
    'JENJANGPENDIDIKAN_s_2': [21,22,23],
})

data = data.rename(
    columns=lambda colname: colname.removeprefix('JENJANGPENDIDIKAN_')
)

print(data)
desert oar
#

now you have four ways to do it

bold timber
#

I really big thanks to you

undone flare
#

If I have missing values in a numerical column would I be fine with replacing those values with the mean? I have been doing this for like every project I did and I think it can be better

bold timber
desert oar
desert oar
bold timber
desert oar
bold timber
#

I mean if i have another column like that

desert oar
#

i encourage you to read all 4 solutions and spend time understanding what they do and why they work

#

oh, you should probably do it manually for each prefix

#

you could do it with a single regex but i don't see much value in that

#

are you using python 3.9?

bold timber
desert oar
#

what is 'regex'?
the re.sub thing with r'^prefix'

yes because that's an encoding number
what do you mean by that?

desert oar
#

ok, then you are not using 3.9

bold timber
bold timber
desert oar
#
bad_patterns = [
    '^JENJANGPENDIDIKAN_',
    '^JABATANSTRUKTURAL_',
]

for pattern in bad_patterns:
    data.columns = data.columns.str.replace(pattern, '', regex=True)
#

the ^ in the pattern means "only match at the beginning of the text"

somber prism
#

can someone give me some tips on how to handle this imbalance multiclass classification prob like this

bold timber
desert oar
desert oar
# somber prism

generic list of things to try: weighting, oversampling (e.g. with SMOTE), use gradient boosting which can "focus" on misclassified instances

bold timber
desert oar
#

insert, remove, append, etc

bold timber
desert oar
#

what exactly are you trying to do

bold timber
desert oar
#

which columns do you want? what's the rule for selecting a column?

#

most of the time you can just use a list comprehension

bold timber
#

yes i want to selecting several columns

desert oar
#

i've given you a lot of code already... do you know what a list comprehension is?

#

you can even write a for loop and build a list with append

bold timber
#

list comprehension is just looping in 1 cell, right?

desert oar
#

yes

#

if by "cell" you mean "expression"

#

!e ```python
items1 = [f'number:{i}' for i in range(5)]
print(items1)

items2 = []
for i in range(5):
items2.append(f'number:{i}')
print(items2)

arctic wedgeBOT
#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | ['number:0', 'number:1', 'number:2', 'number:3', 'number:4']
002 | ['number:0', 'number:1', 'number:2', 'number:3', 'number:4']
desert oar
#

same thing, 2 different ways to write it

bold timber
silk axle
#

How can I remove the specified columns of my dataset? I've tried things likepy to_drop = ['reproduction_rate', 'female_smokers', 'male_smokers', 'tests_per_case', 'tests_units', 'excess_mortality'] for x in to_drop: df.drop(x)but keep getting errors (in this case saying invalid key for 'reproduction_rate' even though that's the column header)

#

This shows the rough structure (first line is the headers, then the rest is the data, which has a lot of missing values)

thorn bobcat
#

yo

undone flare
#

also if you want to drop the whole column do df.drop(x, axis=1)

grave frost
#

sigh any hardware experts in TPUs?

serene scaffold
silk axle
#

reproduction_rate is the 2nd one on 3rd line

undone flare
#

yea so as Stelercus said, you will need to provide an axis

new_df = df.drop(["reproduction_rate", "female_smokers"], axis=1)
silk axle
#

!d pandas.DataFrame.drop

arctic wedgeBOT
#

DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')```
Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level. See the user guide <advanced.shown\_levels> for more information about the now unused levels.