undone flare Aug 6, 2021, 2:56 PM

#

smh

rigid zodiac Aug 6, 2021, 3:01 PM

#

usually when I approach with ridge regression, I try to find the optimum value. This is what I did before, hope it help ```# Variable selection and shrinkage methods we have learned in week 2.

(Both step-wise selection and penalized likelihood approaches).

Ridge Regression

Define a fine grid of tuning parameters, lambdas.

In Python, this tuning parameter is referred to as "alpha"

Set an equal spaced grid in log scale

n_alphas = 100
alphaR = np.logspace(-1, 4, n_alphas)
#aR = np.arange(0, 1000, 5)
#print(alphaR)

Ridge Regression (using alphaR)

coefs_R = []
for a in alphaR:
ridge = linear_model.Ridge(alpha=a)
ridge.fit(X_train, y_train)
coefs_R.append(ridge.coef_)```

undone flare Aug 6, 2021, 3:03 PM

#

rigid zodiac usually when I approach with ridge regression, I try to find the optimum value. ...

hmm looks good

#

I will try that out too, thanks

#

How would I got about finding an optimal value of K in KNN model. right now I just have a loop and plotting the mae

rigid zodiac Aug 6, 2021, 3:47 PM

#

undone flare How would I got about finding an optimal value of K in KNN model. right now I ju...

this is what i did in my last class project ``` # from sklearn.neighbors import KNeighborsClassifier
# from sklearn.model_selection import cross_val_score

    # knn = KNeighborsClassifier(n_neighbors = 3)# Create KNN classifier with k=3, for instance.
    # knn.fit(X_train,y_cat_train)# Fit the classifier to the data
    
    # y_pred = knn.predict(X_test)# Test error in confusion matrix
    # k_range = range(1,51) # This search over k=1,...,50. Adjust the range as you like.
    # cv_scores = []
    # for k in k_range:  
    #   knn_cv = KNeighborsClassifier(n_neighbors=k)
    #   scores = cross_val_score(knn_cv, X_train, y_cat_train, cv=5) # This code uses 5-fold CV.
    #   cv_scores.append(scores.mean())
    
    # plt.plot(k_range, cv_scores)
    # plt.xlabel('K')
    # plt.ylabel('CV accuracy score')
    
    # #more flexible compare to all of it 
    
    # print(confusion_matrix(y_cat_test,y_pred))```

#

reason why I comment it, because somehow it didnt work or the professor dont require it

undone flare Aug 6, 2021, 3:53 PM

#

rigid zodiac this is what i did in my last class project ``` # from sklearn.neighbors i...

yea I am also doing something like this

acoustic halo Aug 6, 2021, 4:05 PM

#

undone flare How would I got about finding an optimal value of K in KNN model. right now I ju...

Yeah thats pretty much it, search and pick the lowest error

proud pond Aug 6, 2021, 4:42 PM

#

hello

#

is their a learning algorithm for training a NN model, that changes the structure of the NN as well as it's parameters (weights) ?

acoustic halo Aug 6, 2021, 5:11 PM

#

proud pond is their a learning algorithm for training a NN model, that changes the structur...

Yes, NEAT and it's variants

#

There's a python implementation available as well if you don't want to do it from scratch

summer musk Aug 6, 2021, 5:19 PM

#

salary_map={'<=50K':,'>50K':1 }
X_train['salary_map']=X_train['salary'].map(salary_map)

#

its not woeking

#

can anyone help?

serene scaffold Aug 6, 2021, 5:19 PM

#

what is it supposed to do?

summer musk Aug 6, 2021, 5:19 PM

#

map the values

#

Try using .loc[row_indexer,col_indexer] = value instead

#

this is kind of warning m getting

serene scaffold Aug 6, 2021, 5:20 PM

#

salary_map = {'<=50K': , '>50K': 1}
X_train['salary_map'] = X_train['salary'].map(salary_map)

summer musk Aug 6, 2021, 5:20 PM

#

salary_map={'<=50K':0,'>50K':1 }
X_train['salary_map']=X_train['salary'].map(salary_map)

serene scaffold Aug 6, 2021, 5:20 PM

#

Try using syntax highlighting and following style conventions.

#

What is the value supposed to be for '<=50K'

summer musk Aug 6, 2021, 5:21 PM

#

its value present in column

serene scaffold Aug 6, 2021, 5:21 PM

#

salary_map is a dictionary

summer musk Aug 6, 2021, 5:21 PM

#

yes

serene scaffold Aug 6, 2021, 5:21 PM

#

you have '<=50K' as a key. what is the value?

summer musk Aug 6, 2021, 5:21 PM

#

0

#

i have puted in dict

serene scaffold Aug 6, 2021, 5:21 PM

#

so, make sure the value is there in your code.

summer musk Aug 6, 2021, 5:22 PM

#

yes it is there

serene scaffold Aug 6, 2021, 5:22 PM

#

summer musk salary_map={'<=50K':,'>50K':1 } X_train['salary_map']=X_train['salary'].map(sala...

it's not there in this example

summer musk Aug 6, 2021, 5:22 PM

#

serene scaffold Aug 6, 2021, 5:24 PM

#

X_train['salary_map'] = (X_train['salary'] > 50_000).astype(int)

Try that. Also, when sharing code or error messages, please copy and paste the text instead of showing a screenshot.

summer musk Aug 6, 2021, 5:24 PM

#

yea sure

agile jolt Aug 6, 2021, 5:25 PM

#

i have an issue on a graph and the screenshot is much needed, i hope it's not that big deal

serene scaffold Aug 6, 2021, 5:25 PM

#

It's fine if it's a graphic of some kind and not text.

agile jolt Aug 6, 2021, 5:25 PM

#

okay, great

#

so..this happened

acoustic halo Aug 6, 2021, 5:26 PM

#

Yeah because you have a billion categories lol

agile jolt Aug 6, 2021, 5:26 PM

#

import pandas as pd
from pandas import to_datetime
import plotly
import plotly.express as px
import plotly.io as pio


df = pd.read_csv(r'\Users\almas\Desktop\amazon_jobs.csv')


df.dtypes


df["Posting_date"] = to_datetime(df["Posting_date"])

y = df.loc[(to_datetime(df["Posting_date"]) > to_datetime("January 1,2018")) &
          (df["location"] == "US, WA, Seattle ")]


print(df)

y.groupby("Title").size().plot.pie(y="Title",ylabel="LABEL")

acoustic halo Aug 6, 2021, 5:27 PM

#

Yeah, you group them by title, but theres loads of titles

agile jolt Aug 6, 2021, 5:27 PM

#

Well the idea was from csv (https://www.kaggle.com/atahmasb/amazon-job-skills) show a pie chart of Title (job positions) in Seattle, WA dated from January 1st 2018

AMAZON Job Skills

Software Development jobs

acoustic halo Aug 6, 2021, 5:29 PM

#

Well, technically you were successful

agile jolt Aug 6, 2021, 5:29 PM

#

Yeah haha

#

But how can i filter it to something more visible

#

And useful

acoustic halo Aug 6, 2021, 5:31 PM

#

You'd have to figure a way to label them better, eg anything that contains the words "software" and "engineer" or "developer" => "software engineer"

#

You would have to decide on the labels and how best to generate them though

#

Or other key words from the description

agile jolt Aug 6, 2021, 5:33 PM

#

Okay, I'll try something else maybe

#

Thanks!

#

Oh and yes, while i'm here..any idea or example for ternary plot

acoustic halo Aug 6, 2021, 5:34 PM

#

If your just playing around, maybe categorize by what languages they require

agile jolt Aug 6, 2021, 5:35 PM

#

Seems a bit hard because it's not a specific category, but 'PREFERRED QUALIFICATIONS' where they took data from applications

#

But thanks

agile jolt Aug 6, 2021, 5:44 PM

#

agile jolt Thanks!

anyone?

undone flare Aug 6, 2021, 6:01 PM

#

agile jolt so..this happened

lol

empty parrot Aug 6, 2021, 6:56 PM

#

any text-to-speech recognition with ANN in python available somewhere for learning??

grizzled barn Aug 6, 2021, 7:41 PM

#

Not sure yet. How could I figure that out?

coral kindle Aug 6, 2021, 9:53 PM

#

Anybody doing the Kaggle 30DayMl challenge?

tame solstice Aug 7, 2021, 12:14 AM

#

#

How can i solve the importerror?

#

matplotlib library is added.

#

but i getted the error

modern dragon Aug 7, 2021, 12:21 AM

#

Hi guys, how do you improve the accuracy level of your machine learning model?

serene scaffold Aug 7, 2021, 1:43 AM

#

@modern dragon it's not possible to answer this question in general, as there's no one-size-fits-all solution.

#

What does your model do? How is it performing currently?

#

@tame solstice make sure pycharm is running your code in the environment where you installed matplotlib. If you don't understand what I mean by this (and it's okay if you don't) then it probably isn't.

modern dragon Aug 7, 2021, 2:09 AM

#

serene scaffold What does your model do? How is it performing currently?

It uses age and gender to predict their top 3 category interests

serene scaffold Aug 7, 2021, 2:09 AM

#

modern dragon It uses age and gender to predict their top 3 category interests

what type of model is it?

modern dragon Aug 7, 2021, 2:10 AM

#

Oh uh what are the different types of models(

#

?*

#

This is my first ML project 😅

serene scaffold Aug 7, 2021, 2:10 AM

#

just show the code, I guess

#

!paste

arctic wedgeBOT Aug 7, 2021, 2:10 AM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

serene scaffold Aug 7, 2021, 2:10 AM

#

please don't post screenshots

modern dragon Aug 7, 2021, 2:11 AM

#

Oh it's really weird, is it ok if I show you the tutorial I learned it from?

#

https://www.youtube.com/watch?v=7eh4d6sabA0

YouTube

Programming with Mosh

Python Machine Learning Tutorial (Data Science)

Python Machine Learning Tutorial - Learn how to predict the kind of music people like.
👍 Subscribe for more Python tutorials like this: https://goo.gl/6PYaGF
👉 The CSV file used in this tutorial: https://bit.ly/3muqqta

🚀 Learn Python in one hour: https://youtu.be/kqtD5dpn9C8
🚀 Python (Full Course): https://www.youtube.com/watch?v=_uQrJ0TkZlc
...

▶ Play video

#

You can skip to 29 mins

serene scaffold Aug 7, 2021, 2:30 AM

#

modern dragon Oh it's really weird, is it ok if I show you the tutorial I learned it from?

no, I would need to see the code

#

When you have the code ready to share, ping me and I'll look next time I'm online.

covert herald Aug 7, 2021, 2:41 AM

#

does anybody know any good sources to learn machine learning without have to use any of the modules (like sklearn)? ping me if you have an answer, thanks!

serene scaffold Aug 7, 2021, 3:04 AM

#

@covert herald why don't you want to use modules?

#

They're not an added layer of complexity. They're there to help. You could implement some algorithms "from scratch" for educational purposes, but I would still use numpy at the very least.

#

The reason I insist on numpy: if you write all the math by hand, you're going to waste a lot of time on implementation details that don't deepen your understanding of anything.

undone flare Aug 7, 2021, 3:55 AM

#

How to know that you have overfitted your model?

tired nymph Aug 7, 2021, 4:31 AM

#

Hello guys,
I'm using OpenCV and Yolo3 to detect objects in a video file I have in a folder. The problem I don't know how to save the out video ( that has the detection). This is my code:

video = 'test.mp4'
vid = detect_video(video, yolo, all_classes)

undone flare Aug 7, 2021, 6:03 AM

#

uhh

#

this doesn't seem right

#

why is it doing that?

#

I have one hot encoded the data

#

oh I should drop some columns

undone flare Aug 7, 2021, 6:40 AM

#

fixed it, the problem was there were too many unique values for some column

grand mantle Aug 7, 2021, 6:41 AM

#

just like matlab has simulation interfaces

lapis sequoia Aug 7, 2021, 6:42 AM

#

is there any link between machine learning and binary search?

undone flare Aug 7, 2021, 7:19 AM

#

LinAlgWarning: Ill-conditioned matrix (rcond=1.05001e-17): result may not be accurate.
  return linalg.solve(A, Xy, sym_pos=True,
```what does this mean? code:
```py
reg2 = Ridge(alpha=0.0, normalize=True).fit(X_train, y_train)
y_pred2 = reg2.predict(X_test)
rmse_ohe[1] = rmse(y_test, y_pred2)
rmse_ohe[1]

velvet thorn Aug 7, 2021, 7:59 AM

#

undone flare ``` LinAlgWarning: Ill-conditioned matrix (rcond=1.05001e-17): result may not be...

you might have strong multicollinearity

#

basically, there is substantial uncertainty in the regression coefficients

undone flare Aug 7, 2021, 8:04 AM

#

oh is there a way to fix this?

tame solstice Aug 7, 2021, 8:57 AM

#

serene scaffold <@382606091646795776> make sure pycharm is running your code in the environment ...

thank you ,i solved ,i added matplot on phcharm

neat sandal Aug 7, 2021, 1:47 PM

#

hello @all i want to build a script that detects the dominant color in an image , i did some search and I have found some packages like color-thief but the results aren't good so i want to build some thing my self but i don't how and from where to start any one could help me ?

tame grail Aug 7, 2021, 3:03 PM

#

i got this table from a website with bs4 and pandas

#

its a list of strings with a length of 1 named dfs and that output is dfs[0]

#

how do i get just the common name column?

#

the spacing changes depending on which state's data im looking at

#

printing for s in dfs[0] gets me the column names

unborn glacier Aug 7, 2021, 3:09 PM

#

dfs["Common Name"] should work

#

Or use iloc:
https://www.geeksforgeeks.org/select-rows-columns-by-name-or-index-in-pandas-dataframe-using-loc-iloc/

GeeksforGeeks

Select Rows & Columns by Name or Index in Pandas DataFrame using [ ...

A Computer Science portal for geeks. It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions.

tame grail Aug 7, 2021, 3:10 PM

#

unborn glacier dfs["Common Name"] should work

it says the indices must be integers or slices, not strings

unborn glacier Aug 7, 2021, 3:11 PM

#

Maybe it's not a data frame?

tame grail Aug 7, 2021, 3:11 PM

#

no it isn't its a list of strings

#

but theres only 1 element

#

which is that big table

unborn glacier Aug 7, 2021, 3:12 PM

#

Ah okay, you can turn a table into a data frame and then use what I was talking about

#

I think it's just df = pf.DataFrame(dfs[0])

#

Then you can do df["Common Name"]

covert herald Aug 7, 2021, 3:16 PM

#

@serene scaffold im use numpy but i dont want to use the machine learning modules until i learn how the algorithms work

tame grail Aug 7, 2021, 3:17 PM

#

ahh i got it thank you! @unborn glacier

#

it was just this

unborn glacier Aug 7, 2021, 3:22 PM

#

covert herald <@253696366952316929> im use numpy but i dont want to use the machine learning m...

For basic ones like linear regression you should just be able to look up an algorithm and implement it yourself. For more complicated ones, you can find tutorials like "random forest from scratch using numpy" or "neural network from scratch using numpy"

#

I don't think there's one single resource that has it all

covert herald Aug 7, 2021, 3:22 PM

#

alright

unborn glacier Aug 7, 2021, 3:29 PM

#

neat sandal hello @all i want to build a script that detects the dominant color in an image ...

Images are just rgb values, so you can extract them and map them into a 3d space, then run some sort of clustering algorithm to find the dominant colors.

lucid kettle Aug 7, 2021, 3:59 PM

#

I'm taking an Artificial Intelligence course next semester, yay

neat sandal Aug 7, 2021, 4:08 PM

#

unborn glacier Images are just rgb values, so you can extract them and map them into a 3d space...

yes it is a very nice idea,

grand lion Aug 7, 2021, 4:44 PM

#

Do I need mathematical rigor to start learning TensorFlow?

serene scaffold Aug 7, 2021, 4:59 PM

#

grand lion Do I need mathematical rigor to start learning TensorFlow?

well, TensorFlow is for deep learning in general, so the amount of mathematical knowledge you need to understand what you're doing depends on what you are doing. You will probably need to know linear algebra.

#

Also, I would avoid "learning TensorFlow" or any other library, and instead focus on approaches to AI and use whichever library suits what you're doing.

grand lion Aug 7, 2021, 5:00 PM

#

Right now I am just planning on using a library to create an LSTM, so what mathematical knowledge would that need?

serene scaffold Aug 7, 2021, 5:02 PM

#

grand lion Right now I am just planning on using a library to create an LSTM, so what mathe...

are you trying to use an LSTM to do something, or implement the actual LSTM?

grand lion Aug 7, 2021, 5:02 PM

#

The former

#

Predict the next word in a sentence type of thing

serene scaffold Aug 7, 2021, 5:03 PM

#

You can get away with a certain amount of not knowing the math behind it, yes.

#

https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html

grand lion Aug 7, 2021, 5:04 PM

#

Tbh I only know up to geometry so I might learn this another time after I start calculus and linear algebra

serene scaffold Aug 7, 2021, 5:05 PM

#

grand lion Tbh I only know up to geometry so I might learn this another time after I start ...

in the mean time, developing general programming skills will always serve you well.

agile jolt Aug 7, 2021, 7:07 PM

#

any idea which dataset would be good for ternary plot?

#

i found this one but im not sure if that's ok: https://www.kaggle.com/vinven7/comprehensive-database-of-minerals

Comprehensive database of Minerals

A list of of 3112 minerals, their chemical composition and properties

balmy junco Aug 7, 2021, 7:42 PM

#

Hey, I'm trying to use Fitter from fitter library on large image data. It always times out. I tried increasing the timeout quite a lot, but it never makes the cut. I converted it to an np array and everything, but no luck....

#

Can't seem to find anybody that knows anything about it

#

Any thoughts?

grand lion Aug 7, 2021, 9:56 PM

#

@serene scaffold Katie told me that you work with NLP, so I have a question regarding that. Would you need ML to do it, could you just work based off of grammatical structures of sentences (I.e subject, verbs, predicates, etc.) and classify words as a certain description

plucky lichen Aug 7, 2021, 9:56 PM

#

well, I dont really know how to explain it
I am trying to archive the messages of my friends and me from a discord channel, that works great, but I get this json file from discord:

serene scaffold Aug 7, 2021, 9:56 PM

#

Can you put that in a paste bin?

plucky lichen Aug 7, 2021, 9:56 PM

#

yes sorry

serene scaffold Aug 7, 2021, 9:56 PM

#

grand lion <@253696366952316929> Katie told me that you work with NLP, so I have a question...

It's more ML/AI than it is linguistics in many ways.

grand lion Aug 7, 2021, 9:57 PM

#

Hm

plucky lichen Aug 7, 2021, 9:57 PM

#

plucky lichen well, I dont really know how to explain it I am trying to archive the messages o...

https://pastebin.com/vrvXVsMe

Pastebin

"total_results": 34, "messages": [ [ { - Paste...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

serene scaffold Aug 7, 2021, 9:57 PM

#

though there are certain approaches where grammatical features are taken heavily into account.

grand lion Aug 7, 2021, 9:57 PM

#

For my example, I’m more or less trying to generate a sentence rather than predict the next word - using basic sentence structures such as simple sentences and the likes, would that need ML?

serene scaffold Aug 7, 2021, 9:58 PM

#

grand lion For my example, I’m more or less trying to generate a sentence rather than predi...

depends on your definition of "ML", but if you're just having fun, you can make a sentence generator using ngrams and markov chains.

#

This happens to be the second assignment in the NLP class I helped teach.

grand lion Aug 7, 2021, 9:59 PM

#

I would try and create an LSTM but I seriously don’t understand videos for them because they all require previous knowledge in ML

serene scaffold Aug 7, 2021, 9:59 PM

#

ML is a tough area to jump into, yes

grand lion Aug 7, 2021, 9:59 PM

#

I can get behind how LSTM’s and RNN’s work but I don’t understand the mathematical portions of it

serene scaffold Aug 7, 2021, 10:00 PM

#

how do you feel about statistics?

grand lion Aug 7, 2021, 10:01 PM

#

Not that good at it tbh, I’m in 8th grade so my mathematical knowledge is quite bad. I do know concepts like correlation coefficient and the likes, but probably not at the point where I’m competent enough for ML

serene scaffold Aug 7, 2021, 10:01 PM

#

you can do the ngram/markov chain approach if you simply understand that if something happens 8 times out of 10, it has a .8 chance.

grand lion Aug 7, 2021, 10:02 PM

#

Hm alright

#

Would there be a certain framework that is strong with those concepts?

#

Or should I work on the concepts first then find a framework that suits my needs

serene scaffold Aug 7, 2021, 10:02 PM

#

NLTK is a library you can use to get the ngrams.

#

the statistics and stuff, you can just store some numbers in a nested dict data structure of some kind.

grand lion Aug 7, 2021, 10:03 PM

#

Does it abstract it too much? Cause I do want some abstraction but not too much so I can understand the concepts

grand lion Aug 7, 2021, 10:03 PM

#

serene scaffold the statistics and stuff, you can just store some numbers in a nested dict data ...

Ah, so stuff like the most common word in x position (I.e the position of the first word)

serene scaffold Aug 7, 2021, 10:04 PM

#

not really. an ngram is a tuple of n consecutive tokens.
[(not really .), (really . an), (. an ngram), (an ngram is), ... (consecutive tokens .)]

#

these are 3grams or trigrams.

grand lion Aug 7, 2021, 10:04 PM

#

Are there any articles or books that are helpful for understanding what ngrams are?

serene scaffold Aug 7, 2021, 10:05 PM

#

maybe? the course I taught specifically didn't have one to save money for the students.

#

well, helped teach. anyway, a token is just a word or punctuation mark. and it's just n tokens in order

grand lion Aug 7, 2021, 10:06 PM

#

Ohh alright

#

Ohh wait I understand it

#

I was a bit confused at first but now I get it

#

Anyways, thanks for your help! I appreciate it

serene scaffold Aug 7, 2021, 10:08 PM

#

@grand lion I'll ask my now-former advisor for her slides. Ask me again on like Tuesday.

grand lion Aug 7, 2021, 10:08 PM

#

Alright, will do

uncut orbit Aug 7, 2021, 11:28 PM

#

elt is the same thing as tlc for a dataset

grave frost Aug 7, 2021, 11:52 PM

#

pretty interesting competition
https://www.drivendata.org/competitions/79/competition-image-similarity-1-dev/
is anyone pretty expereinced in CV research/kaggle here willing to colab?

DrivenData

Facebook AI Image Similarity Challenge: Matching Track

Advance the science of image similarity detection, with applications in areas including content tracing, copyright infringement and misinformation. In the

#

~~I will give the idea and you would code all of it - prize would be split 50-50~~

uncut orbit Aug 8, 2021, 12:34 AM

#

grave frost pretty interesting competition https://www.drivendata.org/competitions/79/compet...

dont wonder why no one takes the deal

grand lion Aug 8, 2021, 12:42 AM

#

Do I need Anaconda to plot on 2d maps with matplotlib?

#

Also do I need mpl_toolkits.basemap?

quiet maple Aug 8, 2021, 2:58 AM

#

hey

#

i want to learn ML

#

can you guys help me how can i learn quicker and properly

leaden pebble Aug 8, 2021, 3:56 AM

#

Hey

Suppose i give u a data (interval, frequency) ...and

Here if we Apply
np.histogram(data_given, bins=class interval, density = false
)

We will get two tuples
1.frequency counts
2.bin edges

But now if we do
Density = true

What will that np.histogram give statistically ?

#

#

Thats the data

royal crest Aug 8, 2021, 5:22 AM

#

according to documentation

#

https://numpy.org/doc/stable/reference/generated/numpy.histogram.html

#

see for yourself too!

wide raven Aug 8, 2021, 6:01 AM

#

Do you guys think the only good way to learn neural networks is by learning every aspect of it

#

and understanding all the math and how they are built?

#

or can you get a good understanding and make a lot of cool AI just by learning tensorflow and mastering that

#

I thought learning from scratch would be nice and help me understand but after hours and hours of learning gradients equations types activation functions

#

it just got too much to handle and I would like to make AI with a less info-needed approach which is why i thought tensorflow would be nice

#

but i am scared that would limit me and what i can make

somber prism Aug 8, 2021, 6:03 AM

#

guys i have one doubt , are the tree based models prone to outliers, skewed features ?

#

if they are not then i dont need to standardize or scale the features right ?

dull turtle Aug 8, 2021, 8:01 AM

#

hello

#

i am working with pandas dataframe

#

when i run ```python
for date_1 in rem_dup_dt_column[0]:
print("date_1:", date_1)
print()

row_data = main_dataframe.loc[main_dataframe['date']==date_1]
print("row_data:")
print(row_data)
print()```this command i get only first date is getting stored

i want here that it will run for every entry in date column

#

ping me when u reply

mystic tinsel Aug 8, 2021, 8:53 AM

#

hello, i had a question regarding label encoding and one hot encoding. A few examples that i found online which had Sex column in it trained the model after label encoding and no one hot encoding, shouldnt one hot encoding be done in such cases? Thanks

ripe forge Aug 8, 2021, 9:16 AM

#

in cases where column only has 2 unique values, there's zero downside to label encoding. so if the dataset for Sex had only 2 values, then you essentially bypassed the pitfall of label encoding

still osprey Aug 8, 2021, 9:16 AM

#

mystic tinsel hello, i had a question regarding label encoding and one hot encoding. A few exa...

SEX???

ripe forge Aug 8, 2021, 9:16 AM

#

still osprey SEX???

sex is a synonym for gender

still osprey Aug 8, 2021, 9:17 AM

#

aw

mystic tinsel Aug 8, 2021, 9:17 AM

#

Sorry, gender *

mystic tinsel Aug 8, 2021, 9:17 AM

#

ripe forge in cases where column only has 2 unique values, there's zero downside to label e...

Ohh

still osprey Aug 8, 2021, 9:17 AM

#

joeangry

ripe forge Aug 8, 2021, 9:18 AM

#

in general you're correct, for higher cardinality (ie more unique values) in a categorical column, label encoding isn't appropriate

mystic tinsel Aug 8, 2021, 9:18 AM

#

So like labelling male as 0 and female as 1 doesnt really have any effect on the model huh

ripe forge Aug 8, 2021, 9:18 AM

#

yes, because it's just two numeric values, with some distance between them

mystic tinsel Aug 8, 2021, 9:18 AM

#

ripe forge in general you're correct, for higher cardinality (ie more unique values) in a c...

Makes sense, thanks!!

ripe forge Aug 8, 2021, 9:19 AM

#

you could have even set it to 0.25 and 0.75, or 0.3 and 0.6 if you wanted (though i dont know why you'd want to do that)

#

the model will never see any value outside those two for this feature, and thus it's relations will largely stay completely independent of the actual values

mystic tinsel Aug 8, 2021, 9:20 AM

#

Sorry if im repeating the question but even the distance or small difference should set those two apart right ? Like in hot encoding its more like true and false but label encoding is more like assigning a value to a variable?

#

Shouldnt that effect even models with only two values..

ripe forge Aug 8, 2021, 9:20 AM

#

ultimately a model doesn't care. all it does is weight * some_feature

#

the weight could be learnt arbitrarily to scale any 2 values into anything

mystic tinsel Aug 8, 2021, 9:21 AM

#

Oh

ripe forge Aug 8, 2021, 9:22 AM

#

and also, for the record, a model also doesn't even understand true or false. all it understands is math and numbers

mystic tinsel Aug 8, 2021, 9:22 AM

#

Ig i need to think about it a lil more to completely understand that 😅

mystic tinsel Aug 8, 2021, 9:22 AM

#

ripe forge and also, for the record, a model also doesn't even understand true or false. al...

👍 👍

#

Thank you!

hardy hornet Aug 8, 2021, 9:25 AM

#

do anyone know how to change language in Jupyter to English

ripe forge Aug 8, 2021, 9:40 AM

#

google says try https://stackoverflow.com/questions/52667314/jupyter-notebook-is-displayed-partially-in-french or https://github.com/jupyter/notebook/issues/4158

Stack Overflow

Jupyter notebook is displayed partially in French

I'm using Jupyter for Python programming on Windows 10 and some of the text is translated in French but not all of it (which makes it kinda annoying).
Does someone know how to change the display

GitHub

Change the (natural) language of the Notebook interface back to Eng...

Recently all the interface of the Jupyter notebook has been automatically translated into the my own language (French). It is the case on all the web browsers I've tested (Firefox, Chrome, ...

plucky lichen Aug 8, 2021, 9:47 AM

#

can anyone help me to convert a nested json directory to a dataframe?
https://pastebin.com/vrvXVsMe
heres the json, I tried everything google has to offer...

Pastebin

"total_results": 34, "messages": [ [ { - Paste...

Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.

mystic tinsel Aug 8, 2021, 10:10 AM

#

plucky lichen can anyone help me to convert a nested json directory to a dataframe? https://p...

ig you could extract keys and values from the json object using a for loop and keep adding new rows in each iteration while setting the key as the column and the value as the column value?

#

https://www.kite.com/python/answers/how-to-get-values-from-a-json-string-in-python -check this out maybe?

Code Faster with Line-of-Code Completions, Cloudless Processing

Kite is a free autocomplete for Python developers. Code faster with the Kite plugin for your code editor, featuring Line-of-Code Completions and cloudless processing.

plucky lichen Aug 8, 2021, 10:11 AM

#

no

#

I will try that

#

merci

mystic tinsel Aug 8, 2021, 10:14 AM

#

plucky lichen I will try that

or you could convert the json to csv first? maybe thats easier

plucky lichen Aug 8, 2021, 10:47 AM

#

ok, I got another question, I want to print a value of a nested dict.
I got the path from here: https://piotros.github.io/json-path-picker/ , but I don't get a result

#

print( data['messages']) prints everything under it which makes sense,
print( data['messages'][0][0]['embeds'][0]['title'])
should print the value of title but id doesnt

scarlet cypress Aug 8, 2021, 10:51 AM

#

is 3000 files enough for a CNN

lament topaz Aug 8, 2021, 10:59 AM

#

hello, I was trying a project on twitter sentiment analysis
so I took dataset from Kaggle - cleaned it (like @, RT, links)
I want to know if is there a way to choose tweets based on a topic??
like I want tweets regarding "Donald Trump", "Artificial Intelligence", etc
can I do that without using the twitter api - just choosing from the dataset!!
pls help!!

somber prism Aug 8, 2021, 11:12 AM

#

lament topaz hello, I was trying a project on twitter sentiment analysis so I took dataset fr...

can you send the dataset link

lament topaz Aug 8, 2021, 11:13 AM

#

https://www.kaggle.com/ramyavidiyala/twitter-tweets-data-for-sentiment-analysis

Twitter Tweets Data for Sentiment Analysis

Machine Learning / Understanding Text Processing / Text data Tweets

somber prism Aug 8, 2021, 11:21 AM

#

lament topaz https://www.kaggle.com/ramyavidiyala/twitter-tweets-data-for-sentiment-analysis

so you want get all the tweets regarding donald ?

lament topaz Aug 8, 2021, 11:22 AM

#

yes for any topic, not just the Trump

#

basically I need to create a model for this!!
I only showed wordcloud, bar graphs
need to show accuracy and train the model too
so thats why i thought I should filter the data to a topic only

somber prism Aug 8, 2021, 11:23 AM

#

try df[df.tweet.str.contains('donald')]

lament topaz Aug 8, 2021, 11:25 AM

#

oh yeah that works 🙂

somber prism Aug 8, 2021, 11:26 AM

#

lament topaz oh yeah that works 🙂

are you a beginner ? cuz i am and i need one help

lament topaz Aug 8, 2021, 11:27 AM

#

yeah lol beginner here - always asking for help

grave frost Aug 8, 2021, 11:28 AM

#

uncut orbit dont wonder why no one takes the deal

its just a joke lol

somber prism Aug 8, 2021, 11:29 AM

#

can someone help me with this. i tried this dataset https://www.kaggle.com/purumalgi/music-genre-classification . i filled the nan values with their mean then i tried to fit it to logistic, svm, random forest clf, naive bayes and xgboost but i am only getting the accuracy of 0.52 in cross validation score

Music Genre Classification

Optimizing multi-class log loss to generalize well on unseen data

#

can someone tell me what iam doing wrong here or the dataset wasnt meant to be classified easily ?

#

https://www.kaggle.com/muhammedjaabir/music-genre-clf , my notebook url

#

no one? 😐

lament topaz Aug 8, 2021, 11:33 AM

#

somber prism https://www.kaggle.com/muhammedjaabir/music-genre-clf , my notebook url

this link is 404

somber prism Aug 8, 2021, 11:33 AM

#

hmm

#

ok wait

lament topaz Aug 8, 2021, 11:34 AM

#

but Idk data science much,, just started 2 weeks ago!

#

I hope someone else will help!

somber prism Aug 8, 2021, 11:35 AM

#

hmm ok

grave frost Aug 8, 2021, 1:06 PM

#

somber prism can someone help me with this. i tried this dataset https://www.kaggle.com/purum...

tried DNN?

somber prism Aug 8, 2021, 1:06 PM

#

grave frost tried DNN?

dnn?

#

deep neural network ?

grave frost Aug 8, 2021, 1:12 PM

#

somber prism deep neural network ?

yes

somber prism Aug 8, 2021, 1:13 PM

#

idk i only know machine learning algo not deep learning

#

so i have to wait for it ig

mystic tinsel Aug 8, 2021, 1:23 PM

#

somber prism can someone help me with this. i tried this dataset https://www.kaggle.com/purum...

could you briefly explain your dataset?

somber prism Aug 8, 2021, 1:27 PM

#

mystic tinsel could you briefly explain your dataset?

this dataset is about classifying genre of the music , there are like 10 diff genre . it has features like Artist Name ,Track Name , Popularity , danceability , energy key, loudness , mode, speechiness , acousticness , instrumentalness , liveness , valence ,tempo , duration_in min/ms , time_signature

mystic tinsel Aug 8, 2021, 1:28 PM

#

oh alrighty, Ive also just begun ML so lemme see if i can understand this😅

somber prism Aug 8, 2021, 1:28 PM

#

you couldve simply clicked that kaggle link to see the dataset

mystic tinsel Aug 8, 2021, 1:29 PM

#

i did, i was kinda confused with the datalist

somber prism Aug 8, 2021, 1:29 PM

#

ohh

somber prism Aug 8, 2021, 1:29 PM

#

somber prism you couldve simply clicked that kaggle link to see the dataset

how long has it been since you started ?

mystic tinsel Aug 8, 2021, 1:30 PM

#

just a few weeks tbh

#

i havent really used kaggle either

somber prism Aug 8, 2021, 1:30 PM

#

ok

mystic tinsel Aug 8, 2021, 1:32 PM

#

im new to this, but if i may, are you using popularity as an input?

#

@somber prism

somber prism Aug 8, 2021, 1:41 PM

#

all the features except class cuz that one is the output variable ( the one needs to be predicted )

mystic tinsel Aug 8, 2021, 1:42 PM

#

somber prism this dataset is about classifying genre of the music , there are like 10 diff ge...

all the features in this except Artist name, Track Name?

somber prism Aug 8, 2021, 1:43 PM

#

yep

mystic tinsel Aug 8, 2021, 1:43 PM

#

i think you shouldnt use popularity as an input....

somber prism Aug 8, 2021, 1:43 PM

#

wym ?

mystic tinsel Aug 8, 2021, 1:43 PM

#

because it is insignificant right?

#

like what determines a genre is the rest of the features but not the popularity?

somber prism Aug 8, 2021, 1:46 PM

#

genre is dependent on every of those feature there but wait lemme try dropping the popularity and see if i get the improvement in score

mystic tinsel Aug 8, 2021, 1:46 PM

#

sure

#

ill do the same

somber prism Aug 8, 2021, 1:47 PM

#

maybe you are right cuz i do get a significant drop in accu due to popularity

#

someone correct me if i am wrong.

#

ok nvm

#

popularity is a important feature

mystic tinsel Aug 8, 2021, 1:49 PM

#

huh, well my bad

somber prism Aug 8, 2021, 1:49 PM

#

somber prism Aug 8, 2021, 1:49 PM

#

mystic tinsel huh, well my bad

its ok

mystic tinsel Aug 8, 2021, 1:49 PM

#

i cant seem to download the code, so i cant try it out myself...

#

processing seems to take a lot of time on kaggle

lapis sequoia Aug 8, 2021, 1:52 PM

#

Hey everyone, my name is Paras, and recently started to learn ML from random resources on youtube and google. Can you please guide me about how and where to learn ML. Thank you in advnace 🙂

somber prism Aug 8, 2021, 2:00 PM

#

lapis sequoia Hey everyone, my name is Paras, and recently started to learn ML from random res...

i started first by learning probability and statistics from khan academy --> ml by andrew ng --> how to create model using python by python engg ( just search some model name with python engineer ) --> started working with diff datasets and still doing

real dew Aug 8, 2021, 2:30 PM

#

If arr is a 3D numpy array, does arr[arr<90] return values in row-wise order as in arr?
In other words, is arr[arr<90] roughly equivalent to this:
(Not considering return type)

output = []
for i in arr:
    for j in i:
        for k in j:
            if k<90: output.appent(k)
return output

serene scaffold Aug 8, 2021, 2:59 PM

#

!e

import numpy as np
arr = np.random.random((2, 2, 2))
print(arr)
print(arr[arr > .5])

arctic wedgeBOT Aug 8, 2021, 2:59 PM

#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

001 | [[[0.74413612 0.56879201]
002 |   [0.02811816 0.03907577]]
003 | 
004 |  [[0.78955903 0.69845739]
005 |   [0.61117874 0.84977809]]]
006 | [0.74413612 0.56879201 0.78955903 0.69845739 0.61117874 0.84977809]

serene scaffold Aug 8, 2021, 2:59 PM

#

@real dew you can use that to infer what the logic is.

#

looks to me that it's the same as if you had first reshaped the array into one dimension

real dew Aug 8, 2021, 3:02 PM

#

Oh yeah
Thanks!!!

serene scaffold Aug 8, 2021, 3:02 PM

#

💚

glad radish Aug 8, 2021, 3:22 PM

#

Does anyone know a good book or course to learn about machine learning algos like xgboost, random forests, decision trees, etc

serene scaffold Aug 8, 2021, 3:24 PM

#

glad radish Does anyone know a good book or course to learn about machine learning algos lik...

try "data science from scratch". read the most recent edition unless you're already advanced with python.

glad radish Aug 8, 2021, 3:31 PM

#

serene scaffold try "data science from scratch". read the most recent edition unless you're alre...

is it the one written by joel grus?

serene scaffold Aug 8, 2021, 3:31 PM

#

glad radish is it the one written by joel grus?

I believe so

glad radish Aug 8, 2021, 3:32 PM

#

serene scaffold I believe so

thanks! I'll check it out

serene scaffold Aug 8, 2021, 3:40 PM

#

glad radish thanks! I'll check it out

If you attend a university, see if it's in their online library

soft viper Aug 8, 2021, 3:42 PM

#

any monte carlo youtube that i can watch? Trying to get into simulation to fill up my holiday and monte carlo seems to be the buzzword so might start with there

fathom ruin Aug 8, 2021, 4:16 PM

#

hay guys i want to take the informational fact from a paragraph and put it as bullet points what should i start with?

somber prism Aug 8, 2021, 4:23 PM

#

have any of you tried to predict 2 variables ?

#

or we have to do it separately by predicting first var then the next var by choosing that as a predicting var

#

?

unborn glacier Aug 8, 2021, 4:33 PM

#

You can predict 2 or more variables using multiple linear regression

uncut orbit Aug 8, 2021, 4:49 PM

#

values

serene scaffold Aug 8, 2021, 4:54 PM

#

fathom ruin hay guys i want to take the **informational fact from a paragraph** and put it a...

Can you give an example paragraph and what you want to extract?

fathom ruin Aug 8, 2021, 4:56 PM

#

yeah sure just a sec

fathom ruin Aug 8, 2021, 4:59 PM

#

serene scaffold Can you give an example paragraph and what you want to extract?

Sample paragraph :

The inflated style itself is a kind of euphemism. A mass of Latin words falls upon the facts like soft snow, blurring the outline and covering up all the details. The great enemy of clear language is insincerity. When there is a gap between one’s real and one’s declared aims, one turns as it were instinctively to long words and exhausted idioms, like a cuttlefish spurting out ink. In our age there is no such thing as ‘keeping out of politics.’ All issues are political issues, and politics itself is a mass of lies, evasions, folly, hatred, and schizophrenia. When the general atmosphere is bad, language must suffer. I should expect to find — this is a guess which I have not sufficient knowledge to verify — that the German, Russian and Italian languages have all deteriorated in the last ten or fifteen years, as a result of dictatorship```

Like some of the points should be

*The inflated style itself is a kind of euphemism
*The great enemy of clear language is insincerity
*All issues are political issues, and politics itself is a mass of lies, evasions, folly, hatred, and schizophrenia.

serene scaffold Aug 8, 2021, 5:00 PM

#

@fathom ruin let me get back to you on this. It's an interesting question.

fathom ruin Aug 8, 2021, 5:01 PM

#

serene scaffold <@764123180691750933> let me get back to you on this. It's an interesting questi...

sure

#

thanks btw 🙂

#

ping me when u find any info on this 😄 i am trying to find things as well

serene scaffold Aug 8, 2021, 5:04 PM

#

@fathom ruin just so we're clear, you're just trying to classify which sentences do or do not have true/false statements, yes? You're not trying to determine if the statement is actually true?

fathom ruin Aug 8, 2021, 5:04 PM

#

serene scaffold <@764123180691750933> just so we're clear, you're just trying to classify which ...

yep i am not trying to determine if the statement is actually true

serene scaffold Aug 8, 2021, 5:05 PM

#

Okay great. That greatly simplifies the problem lemon_long

fathom ruin Aug 8, 2021, 5:05 PM

#

Shy i am trying to do this for hours 😂 and here you be like its a piece of cake lol

#

I literally have 0 idea on which wat to get the points

serene scaffold Aug 8, 2021, 5:06 PM

#

I didn't say it's easy, it's just easier than detecting misinformation.

fathom ruin Aug 8, 2021, 5:06 PM

#

😅 yeah k nice lol

serene scaffold Aug 8, 2021, 5:08 PM

#

I'm trying to identify what sets your three bullet points apart from the sentences that aren't of interest. The last one contains a lot of opinions.

fathom ruin Aug 8, 2021, 5:09 PM

#

i mean it was just a example

#

maybe we should just remove the wide option points and do the remaining and figure out later?

serene scaffold Aug 8, 2021, 5:10 PM

#

One thing they all have in common though: the subjects of each sentence are third person nouns that aren't people, and the verb is a form of "to be". You might actually be able to solve this with rules.

fathom ruin Aug 8, 2021, 5:10 PM

#

hmm thats something i didnt know

#

i used a package to seperate verb, noun and other types in a paragraph apart

serene scaffold Aug 8, 2021, 5:11 PM

#

Spacy?

fathom ruin Aug 8, 2021, 5:11 PM

#

but it gave the verb WORDS i am not sure how to get the setence

fathom ruin Aug 8, 2021, 5:11 PM

#

serene scaffold Spacy?

yep

serene scaffold Aug 8, 2021, 5:11 PM

#

Yay!

fathom ruin Aug 8, 2021, 5:12 PM

#

so like where should i go next 🤔

serene scaffold Aug 8, 2021, 5:12 PM

#

!paste

arctic wedgeBOT Aug 8, 2021, 5:12 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

fathom ruin Aug 8, 2021, 5:12 PM

#

how do i get the sentence where the verb is?

serene scaffold Aug 8, 2021, 5:12 PM

#

Can you show the code?

fathom ruin Aug 8, 2021, 5:12 PM

#

sure

#

https://paste.pythondiscord.com/minewahoqe.properties this is not only spicy but i am also adding cutting own passage with nltk

#

if u want i will just remove those and keep the spacy alone

#

https://paste.pythondiscord.com/amenubilef.lua this is without the previous code and just spacy

fathom ruin Aug 8, 2021, 5:16 PM

#

serene scaffold Can you show the code?

but this just returns all the verbs

#

not as a "sentence"

serene scaffold Aug 8, 2021, 5:23 PM

#

fathom ruin but this just returns all the verbs

spacy can divide it up into sentences for you.

fathom ruin Aug 8, 2021, 5:23 PM

#

how?

serene scaffold Aug 8, 2021, 5:24 PM

#

import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp('pretend this is a long paragraph with multiple sentences.')
for sentence in doc.sents:
    # do stuff with sentence

fathom ruin Aug 8, 2021, 5:25 PM

#

🤔 let me try that

fathom ruin Aug 8, 2021, 5:26 PM

#

serene scaffold ```py import spacy nlp = spacy.load('en_core_web_sm') doc = nlp('pretend this is...

but this just splits the paragraph by . should i check for verb and split?

serene scaffold Aug 8, 2021, 5:28 PM

#

fathom ruin but this just splits the paragraph by . should i check for verb and split?

it doesn't just split it by punctuation, no. it would, for example, know that "I went to Dr. Johnson because I was sick." is one sentence and not two.

fathom ruin Aug 8, 2021, 5:30 PM

#

fathom ruin Sample paragraph : ``` The inflated style itself is a kind of euphemism. A mass ...

oh wait yeah
for that original para, it gave me this

*The inflated style itself is a kind of euphemism.
*A mass of Latin words falls upon the facts like soft snow, blurring the outline and covering up all the details.
*The great enemy of clear language is insincerity.

#

think its working 😄

#

Great!

#

thank you so much 😄

serene scaffold Aug 8, 2021, 5:30 PM

#

lemon_hyperpleased

fathom ruin Aug 8, 2021, 5:31 PM

#

serene scaffold <:lemon_hyperpleased:754441879822663811>

totally not related to this but when writing a text file is there a way to format it? now it just writes it all in a single line

#

for point i can do \n before every points

#

but for just normal paragraph

serene scaffold Aug 8, 2021, 5:32 PM

#

fathom ruin but for just normal paragraph

' '.join(list_of_strings)

fathom ruin Aug 8, 2021, 5:32 PM

#

ohk thank you again 😄

somber prism Aug 8, 2021, 5:49 PM

#

unborn glacier You can predict 2 or more variables using multiple linear regression

i want it for classification

glad mulch Aug 8, 2021, 6:05 PM

#

if i have a df with missing values (Firm Age), how am i able to make the value = previous one + 1

#

i was thinking of ffill

#

but im not sure

serene scaffold Aug 8, 2021, 6:31 PM

#

glad mulch if i have a df with missing values (Firm Age), how am i able to make the value =...

did you look into .interpolate?

sour spindle Aug 8, 2021, 6:36 PM

#

hi. i would lke to know how to make a tensorflow input layer for a dataset which is like this [[1,2,3,4,5,6,7], [1,2,3,4,5,6,7]].

#

i need to input each element of the subset into a node

lapis sequoia Aug 8, 2021, 6:44 PM

#

@glad mulch you can use .shift(1)

grand lion Aug 8, 2021, 6:44 PM

#

How do you form a sentence from ngrams

lapis sequoia Aug 8, 2021, 7:57 PM

#

What does this mean? train_metrics = pd.DataFrame({'MAE': mae_train, 'MSE': mse_train, 'RMSE': rmse_train}) train_metrics.reset_index(drop=True, inplace=True) train_metrics.head(10)

#

#

I'm getting an error when I want to pass the results of the SVR model

merry ridge Aug 8, 2021, 8:11 PM

#

I'm trying to help a friends son with their machine learning homework and they are using a technique to estimate a PMF I don't think I've seen before.

#

They are constructing a probability distribution by taking emails that are categorized as spam, and creating a frequency histogram of every word that appears in each email. Then generating a probability by taking each frequency and dividing by the sum of all the other words appearances

#

But instead of doing just that, they are adding 1 to each frequency and dividing by the sum plus the number of distinct words to account for the addition of those 1s

#

It seems like some sort of finite population correction factor, the resources this student was provided is riddled with typos everywhere and I don't know why an "n+1" correction in this manner makes sense

#

Put in another way, if their probability distribution for their data is p = (x_1/n, x_2/n, ....., x_m/n). They adjust it to p = ((x_1 + 1)/(n+m), ....., (x_m + 1)/(n+m))

lapis sequoia Aug 8, 2021, 8:53 PM

#

lapis sequoia What does this mean? ```train_metrics = pd.DataFrame({'MAE': mae_train, 'MSE': m...

mae_train etc. are scalars, should be lists.

glad mulch Aug 8, 2021, 9:05 PM

#

serene scaffold did you look into `.interpolate`?

problem with that is that if there is a NaN after with a different security name it would also interpolate to that right

thorn bobcat Aug 8, 2021, 9:06 PM

#

anyone here work with NLP?

#

I'm trying to create a fairly decent chatbot in arabic

acoustic halo Aug 8, 2021, 9:07 PM

#

@thorn bobcat there's a few, just ask the question

thorn bobcat Aug 8, 2021, 9:08 PM

#

thorn bobcat I'm trying to create a fairly decent chatbot in arabic

this is the question

#

this is how arabic looks like
أريد قلما.

#

first of all it'll have to be tokenized in reverse instinctively.

#

but how would a transformer even work with arabic?

#

cause the sentences don't have a clear set structure

serene scaffold Aug 8, 2021, 9:10 PM

#

@thorn bobcat what do you mean that they don't have structure?

acoustic halo Aug 8, 2021, 9:10 PM

#

Depending on what you want to do, you could just use a prettained Arabic model and save yourself some time

thorn bobcat Aug 8, 2021, 9:11 PM

#

serene scaffold <@361955686185304074> what do you mean that they don't have structure?

This is because it is different from most languages in a few ways.

It is written from right to left.
It uses its own set of characters that are unrecognizable to speakers of other languages.
Vowels are omitted when it’s written. It has a complex and rich grammatical structure, for example, pronouns are embedded in the words themselves in many cases.
It is much more fluid than most other languages as sentences don’t conform to the subject-verb order that is typical of English.
All of this makes it harder to learn and leads to a larger risk of ambiguity than would exist in most other common languages.

serene scaffold Aug 8, 2021, 9:12 PM

#

@thorn bobcat I don't agree with your assessment that these properties make it harder

thorn bobcat Aug 8, 2021, 9:12 PM

#

Our largest
model, ARAGPT2-MEGA, has 1.46 billion pa-
rameters, which makes it the largest Arabic
language model available.
https://arxiv.org/pdf/2012.15520.pdf

acoustic halo Aug 8, 2021, 9:12 PM

#

I can't pretend to know anything about Arabic but I don't see why any of that means it's not modelable

thorn bobcat Aug 8, 2021, 9:12 PM

#

serene scaffold <@361955686185304074> I don't agree with your assessment that these properties m...

it was actually some sites assessment.

acoustic halo Aug 8, 2021, 9:14 PM

#

Anyway what sort of chat bot do you want to make, something conversational, question answering etc?

thorn bobcat Aug 8, 2021, 9:14 PM

#

something conversational

#

able to answer philosophical questions

#

and give legal advice

#

ARAGPT2 is a stacked transformer-decoder model
trained using the causal language modeling objec-
tive. The model is trained on 77GB of Arabic text

#

I really wanna learn about transformers but don't know where to start really..

grand lion Aug 8, 2021, 9:19 PM

#

Once you have the ngrams and the most common frequency, how do you form a sentence from them?

#

I can't seem to find anything on S.O

#

(Using NLTK btw)

acoustic halo Aug 8, 2021, 9:25 PM

#

@thorn bobcat https://arxiv.org/abs/1706.03762 is the best place to start on learning transformers

arXiv.org

Attention Is All You Need

The dominant sequence transduction models are based on complex recurrent or
convolutional neural networks in an encoder-decoder configuration. The best
performing models also connect the encoder...

#

For specifically making a chat bot, you are best to use a pretrained model as opposed to making your own

acoustic halo Aug 8, 2021, 9:27 PM

#

grand lion Once you have the ngrams and the most common frequency, how do you form a senten...

What do you mean make sentences?

grand lion Aug 8, 2021, 9:27 PM

#

Create sentences from ngrams
There used to be a method in NLTK called generate but it's deprecated now

#

Stelercus might know since he recommended ngrams to be yesterday but overall I've been trying to search for the answer but for some reason there's zero answers on it

acoustic halo Aug 8, 2021, 9:34 PM

#

Where are you getting your generate from, I can see it in the nltk.text module fine, no deprecation warnings

grand lion Aug 8, 2021, 9:35 PM

#

acoustic halo Where are you getting your generate from, I can see it in the nltk.text module f...

nltk.models.ngrams or something along the lines of that

acoustic halo Aug 8, 2021, 9:40 PM

#

I can't find any modules with that name or similar locally or in the docs

#

Nevermind it is in an old version

serene scaffold Aug 8, 2021, 9:43 PM

#

thorn bobcat This is because it is different from most languages in a few ways. It is wr...

People who insist that a certain language is especially "complex and rich" as compared to other languages usually have ulterior motives. Every language is complex and rich.

#

And there's nothing special about subject-verb-object word order.

#

Anyway, I think transformers should work just as well for Arabic.

sick furnace Aug 8, 2021, 10:01 PM

#

https://stackoverflow.com/questions/68705057/cleaning-pandas-column-on-specific-data-type

I was hoping I could get help on this question I posted

Stack Overflow

Cleaning Pandas column on specific data type

I'm trying to clean some columns and there are a few things I've encountered.
There is a bunch of sales data with binary values for each product.
Attached is a sample of the dataset.
The process ha...

thorn bobcat Aug 8, 2021, 10:32 PM

#

serene scaffold People who insist that a certain language is especially "complex and rich" as co...

could be true because i got this from a link to website marketing their nlp toolkit

grave frost Aug 8, 2021, 11:12 PM

#

thorn bobcat ARAGPT2 is a stacked transformer-decoder model trained using the causal language...

prolly just fine-tune it on your dataset

thorn bobcat Aug 8, 2021, 11:12 PM

#

so I looked it up and someone told me it'll be computationally expensive to train a model from scratch

grave frost Aug 8, 2021, 11:13 PM

#

https://tenor.com/view/yes-dog-indeed-nod-gif-10818519

Tenor

thorn bobcat Aug 8, 2021, 11:13 PM

#

https://github.com/aub-mind/arabert I want to work on this

GitHub

GitHub - aub-mind/arabert: Pre-trained Transformers for the Arabic ...

Pre-trained Transformers for the Arabic Language Understanding and Generation (Arabic BERT, Arabic GPT2, Arabic Electra) - GitHub - aub-mind/arabert: Pre-trained Transformers for the Arabic Languag...

#

wanted to do something like this but I understand now it'll cost alot to do it from scratch

#

So I'd like to take what they did and improve it.

grave frost Aug 8, 2021, 11:14 PM

#

thorn bobcat So I'd like to take what they did and improve it.

how?

thorn bobcat Aug 8, 2021, 11:15 PM

#

grave frost how?

I'd like to train it on ancient scriptures

#

make him more inclined to use the new data over the old data although the old data would still exist.

#

I'd like to also give him a face and apply first order motion.

grave frost Aug 8, 2021, 11:15 PM

#

thorn bobcat I'd like to also give him a face and apply first order motion.

what?

grave frost Aug 8, 2021, 11:15 PM

#

thorn bobcat I'd like to train it on ancient scriptures

that's oddly niche - better fine tune it

#

because there aren't enough ancient scriptures to constitute a sizeable amount for traning

thorn bobcat Aug 8, 2021, 11:16 PM

#

grave frost that's oddly niche - better fine tune it

what is fine tuning? how do I begin to grasp this. From the most basic level to complex concepts.

grave frost Aug 8, 2021, 11:16 PM

#

thorn bobcat what is fine tuning? how do I begin to grasp this. From the most basic level to ...

check out a few articles, start with the official paper

grave frost Aug 8, 2021, 11:17 PM

#

thorn bobcat what is fine tuning? how do I begin to grasp this. From the most basic level to ...

I'd like to also give him a face and apply first order motion.
though I think you want to do something else? can you elaborate on this?

thorn bobcat Aug 8, 2021, 11:17 PM

#

grave frost > I'd like to also give him a face and apply first order motion. though I think ...

I want to generate a voice for this agent.

#

and a face that has lips that move matching the words

grave frost Aug 8, 2021, 11:18 PM

#

ahh, that's pretty easy

#

but why ancient scriptures?

thorn bobcat Aug 8, 2021, 11:18 PM

#

grave frost but why ancient scriptures?

well ancient scriptures and law.

#

Idk would seem fascinating talking with an ancient mid eastern philosopher.

grave frost Aug 8, 2021, 11:19 PM

#

thorn bobcat well ancient scriptures and law.

assuming you know arabic, fine-tuning is your best bet

#

unless you have a ton of data and compute

thorn bobcat Aug 8, 2021, 11:19 PM

#

grave frost assuming you know arabic, fine-tuning is your best bet

I know arabic

thorn bobcat Aug 8, 2021, 11:19 PM

#

grave frost unless you have a ton of data and compute

what do you mean by that?

#

assume I got a corpus of about 100 movie subtitles and a 1000 books for starters with an average of 150 pages.

grave frost Aug 8, 2021, 11:20 PM

#

thorn bobcat what do you mean by that?

as in a few hundred GBs of data, and few hundred GPUs

grave frost Aug 8, 2021, 11:20 PM

#

thorn bobcat assume I got a corpus of about 100 movie subtitles and a 1000 books for starters...

that's small - you can only fine tune it

thorn bobcat Aug 8, 2021, 11:20 PM

#

grave frost as in a few hundred GBs of data, and few hundred GPUs

I am gonna be using the free version of google collab.

grave frost Aug 8, 2021, 11:20 PM

#

thorn bobcat I am gonna be using the free version of google collab.

should be enough for fine-tuning

#

maybe

#

do you have a CPU with huge RAM?

thorn bobcat Aug 8, 2021, 11:22 PM

#

its free collab? idk random

#

Each user is currently allocated 12 GB of RAM

#

As of October 13, 2018, Google Colab provides a single 12GB NVIDIA Tesla K80 GPU that can be used up to 12 hours continuously.

grave frost Aug 8, 2021, 11:23 PM

#

no, on your own PC

thorn bobcat Aug 8, 2021, 11:23 PM

#

4gb ram

grave frost Aug 8, 2021, 11:23 PM

#

well, leave it and use Colab then

thorn bobcat Aug 8, 2021, 11:25 PM

#

do I need to understand about transformers to fine tune it?

grave frost Aug 8, 2021, 11:25 PM

#

yes

#

you need to understand a lot of things to do something

thorn bobcat Aug 8, 2021, 11:44 PM

#

grave frost you need to understand a lot of things to do something

I want to understand them at the most basic level

#

I meant transformers and multi-headed attention

#

self- attention

#

that kind of stuff

serene scaffold Aug 9, 2021, 12:49 AM

#

thorn bobcat I want to understand them at the most basic level

to make it very basic, they represent words in way that takes context into account.

#

I might be conflating transformers with BERT a bit.

thorn bobcat Aug 9, 2021, 1:23 AM

#

serene scaffold to make it very basic, they represent words in way that takes context into accou...

self.fc1 = nn.linear(input_size,50)
  self.fc2 = nn.linear(50, input_size)```

#

anyone know the naming convention? used

#

why fc?

serene scaffold Aug 9, 2021, 1:31 AM

#

thorn bobcat why fc?

it stands for "fully connected"

velvet thorn Aug 9, 2021, 1:33 AM

#

thorn bobcat why fc?

fully connected, because every neuron in a layer is connected to every neuron in the preceding and following layers

serene scaffold Aug 9, 2021, 1:34 AM

#

gm, you might be interested to know, I recently got a data science-related position with a large US company. I have absolutely no idea how. I must have deceived them.

thorn bobcat Aug 9, 2021, 1:34 AM

#

so I'm trying to do the MNEST classification challenge but I want to do it with https://www.kaggle.com/salmaneunus/rock-classification or https://github.com/morrisfranken/glyphreader

velvet thorn Aug 9, 2021, 1:35 AM

#

serene scaffold gm, you might be interested to know, I recently got a data science-related posit...

congratulations! 🔥

thorn bobcat Aug 9, 2021, 1:35 AM

#

can someone tell me which would be easier to do?

velvet thorn Aug 9, 2021, 1:35 AM

#

I'm actually planning on applying to a US university for a master's degree

#

doing research now 😔

thorn bobcat Aug 9, 2021, 1:35 AM

#

serene scaffold gm, you might be interested to know, I recently got a data science-related posit...

fake it till you make it.

serene scaffold Aug 9, 2021, 1:36 AM

#

velvet thorn I'm actually planning on applying to a US university for a master's degree

you: I'm applying to a master's program
everyone: you don't have a masters?

velvet thorn Aug 9, 2021, 1:36 AM

#

I don't even have a bachelor's in CS

#

or anything related 😔

velvet thorn Aug 9, 2021, 1:37 AM

#

thorn bobcat so I'm trying to do the MNEST classification challenge but I want to do it with ...

did you mean MNIST?

thorn bobcat Aug 9, 2021, 1:37 AM

#

velvet thorn did you mean MNIST?

yea

thorn bobcat Aug 9, 2021, 1:38 AM

#

velvet thorn I don't even have a bachelor's in CS

shouldn't stop you.

velvet thorn Aug 9, 2021, 1:38 AM

#

thorn bobcat yea

uh

#

well by definition

#

MNIST is done with the MNIST dataset...

#

do you mena like

#

you want to perform a similar task (multiclass image classification) and are asking which dataset might be better tow ork with?

sour spindle Aug 9, 2021, 1:39 AM

#

my tensorflow results are all in lists like this [result] how do i make them floats so i can see the accuracy?

velvet thorn Aug 9, 2021, 1:40 AM

#

sour spindle my tensorflow results are all in lists like this [result] how do i make them flo...

what do you mean results

#

like the predictions?

sour spindle Aug 9, 2021, 1:40 AM

#

yeah

velvet thorn Aug 9, 2021, 1:40 AM

#

so you have like

sour spindle Aug 9, 2021, 1:40 AM

#

its just one float in a list

velvet thorn Aug 9, 2021, 1:40 AM

#

show code

sour spindle Aug 9, 2021, 1:41 AM

#

all of it or just the result

#

its around 120 lines

velvet thorn Aug 9, 2021, 1:41 AM

#

just the result

sour spindle Aug 9, 2021, 1:41 AM

#

predictions:[[47.402496]
[47.278564]
[47.387936]
[47.897003]
[48.52338 ]
[48.993202]
[49.162148]
[49.390816]
[49.802197]
[49.949066]
[50.186504]
[50.12692 ]
[50.034527]
[49.935844]
[49.875698]]

actual:[47.11750031 47.18000031 47.48749924 47.81000137 48.50500107 48.83750153
48.92250061 49.25 50.02500153 49.875 50.15499878 49.73749924
49.71749878 49.80749893 49.8125 ]

#

the predictions is the problem

velvet thorn Aug 9, 2021, 1:42 AM

#

ah

#

so it's just a question of shape

#

you want .reshape

#

are those numpy arrays?

thorn bobcat Aug 9, 2021, 1:43 AM

#

velvet thorn you want to perform a similar task (multiclass image classification) and are ask...

yea with the same code used for mnist

sour spindle Aug 9, 2021, 1:43 AM

#

yeah

velvet thorn Aug 9, 2021, 1:43 AM

#

you can also use .ravel() or .flat

#

or maybe it's .flat() it's been a long time

#

since I worked with numpy

#

something like that

sour spindle Aug 9, 2021, 1:43 AM

#

will that also allow the accuaracy number show?

velvet thorn Aug 9, 2021, 1:43 AM

#

!e

import numpy as np

a = np.array([[1, 2, 3]])
b = np.array([[4], [5], [6]])

print(a - b.flat)
print(a - b.ravel())

arctic wedgeBOT Aug 9, 2021, 1:43 AM

#

@velvet thorn :white_check_mark: Your eval job has completed with return code 0.

001 | [[-3 -3 -3]]
002 | [[-3 -3 -3]]

velvet thorn Aug 9, 2021, 1:44 AM

#

there we go

#

you get the idea?

sour spindle Aug 9, 2021, 1:44 AM

#

yeah

#

will that also allow the accuaracy number show?

velvet thorn Aug 9, 2021, 1:44 AM

#

sour spindle will that also allow the accuaracy number show?

that depends on how you're calculating it

#

that looks like regression though

#

so why are you talking about accuracy?

sour spindle Aug 9, 2021, 1:45 AM

#

i want to see the accuracy but since its in that nested list format the accuracy number is 000000e00

sour spindle Aug 9, 2021, 1:46 AM

#

velvet thorn so why are you talking about accuracy?

i need it since i have to document the prediction accuracy

velvet thorn Aug 9, 2021, 1:46 AM

#

sour spindle i need it since i have to document the prediction accuracy

but the point is

#

accuracy is a thing of classification

#

you're doing regression, right?

sour spindle Aug 9, 2021, 1:49 AM

#

velvet thorn you're doing regression, right?

i dont really know since this is the first network i made from scratch and i am predicting stock prices

velvet thorn Aug 9, 2021, 1:54 AM

#

sour spindle i dont really know since this is the first network i made from scratch and i am ...

do you know the difference between classification and regression

#

?

sour spindle Aug 9, 2021, 1:54 AM

#

velvet thorn do you know the difference between classification and regression

i know what classification is. but not regression

velvet thorn Aug 9, 2021, 1:55 AM

#

sour spindle i know what classification is. but not regression

okay

#

so like

#

classification involves discrete outcomes

#

e.g.

#

is this person positive or negative for COVID

sour spindle Aug 9, 2021, 1:55 AM

#

ok

velvet thorn Aug 9, 2021, 1:56 AM

#

regression involves continuous outcomes

#

in this case, stock prices

#

because like

#

it's not just 1 or 0, right

#

it can vary continuously from 0 all the way up to infinity (theoreticallly)

#

so what you're doing is regression

sour spindle Aug 9, 2021, 1:56 AM

#

oh ok

velvet thorn Aug 9, 2021, 1:56 AM

#

accuracy is the % of correct predictions

#

but that doesn't make sense for regression, right?

#

say you predict 45

#

and the actual price is 40

sour spindle Aug 9, 2021, 1:56 AM

#

yeah u are rght

velvet thorn Aug 9, 2021, 1:56 AM

#

you're wrong, but the how wrong matters

#

that's a lot better than predicting 500, right?

#

so we don't use accuracy for regression

#

there are other metrics

#

the most common is RMSE

#

root mean square error

sour spindle Aug 9, 2021, 1:57 AM

#

like loss

velvet thorn Aug 9, 2021, 1:57 AM

#

you can Google that

velvet thorn Aug 9, 2021, 1:57 AM

#

sour spindle like loss

yes but no

#

loss is a general term

#

that tells the model "how wrong" its prediction is

#

it can apply to classification too

#

so there are different loss functions

#

depending on your task

sour spindle Aug 9, 2021, 1:58 AM

#

ok then i will find some in the docs

#

i will use mape

#

now it seems to work like a charm. thanks

velvet thorn Aug 9, 2021, 2:02 AM

#

yw! 👋

thorn bobcat Aug 9, 2021, 2:30 AM

#

anyone want to help me with the task of classifying hieroglyphics?

sick furnace Aug 9, 2021, 2:40 AM

#

I'm trying to create a function but I'm having trouble setting making it correctly

df = df[(df['TVStandWallMount'] == 0) | (df['TVStandWallMount'] == 1)]

def clean_int_col(df, col):
    df = df[(df[col] == 0) | (df[col] == 1)]`
    return df

velvet thorn Aug 9, 2021, 2:43 AM

#

sick furnace I'm trying to create a function but I'm having trouble setting making it correct...

what do yo uwant to do?

sick furnace Aug 9, 2021, 2:46 AM

#

I have some integer columns that I am trying to clean with df = df[(df['TVStandWallMount'] == 0) | (df['TVStandWallMount'] == 1)]
to retain the binary values

I have a bunch of them and I want to do

for col in df.columns:
    clean_int_col(df, col)

velvet thorn Aug 9, 2021, 2:47 AM

#

uh.

#

so you want

#

wait

#

I'm confused

#

so

#

basically

#

you want

#

to take out

#

the non-numeric values

#

yes?

sick furnace Aug 9, 2021, 2:48 AM

#

I want to take everything that is not 1 or 0. Other possible values would be 11, 10, or some other numeric

#

take out anything that's not 1 or 0

velvet thorn Aug 9, 2021, 2:49 AM

#

okay

#

so

#

for any row

#

in which

#

any column

#

has a non 1 or 0 value

#

remove that?

sick furnace Aug 9, 2021, 2:49 AM

#

yes

#

df = df[(df['TVStandWallMount'] == 0) | (df['TVStandWallMount'] == 1)]

#

this works

#

but I'd like to iterate over all my columns and apply that

velvet thorn Aug 9, 2021, 2:49 AM

#

uh.

#

pd.to_numeric(df, errors='coerce').dropna()?

sick furnace Aug 9, 2021, 2:50 AM

#

but they're not na

velvet thorn Aug 9, 2021, 2:50 AM

#

they will be

sick furnace Aug 9, 2021, 2:50 AM

#

the 'wrong' values are something like 11 or 10

velvet thorn Aug 9, 2021, 2:50 AM

#

oh

#

sorry I'm not focusing hard enough

#

wait then

#

why is there to_numeric above

#

so there are also cases

#

where they're not strings?

#

uh let me think about this for a moment

sick furnace Aug 9, 2021, 2:51 AM

#

I took out the to_numeric part

#

I sent something wrong at first

#

I edited the message

velvet thorn Aug 9, 2021, 2:51 AM

#

okay got it

#

do this

#

df[df.isin({0, 1}).all(axis=1)]

sick furnace Aug 9, 2021, 2:55 AM

#

so that checks the values no? doesnt drop the rows right?

thorn bobcat Aug 9, 2021, 2:56 AM

#

i have a set of images that look like.

#

to train them using an mnist classifier do i need the position of the object in the image or just the label?

velvet thorn Aug 9, 2021, 2:58 AM

#

sick furnace so that checks the values no? doesnt drop the rows right?

the result of that expression

#

is a df without those rows

thorn bobcat Aug 9, 2021, 3:19 AM

#

can someone help me prepare my dataset?

modern vapor Aug 9, 2021, 7:01 AM

#

Does anyone have anything on handwriting with tf, like generating a page of writing? I saw something like it on reddit but cant find anything short of the digits thing on the tf website.

fierce parrot Aug 9, 2021, 7:28 AM

#

def calculate_correlation(self,feature_one,feature_two):
        feature_one_data = []
        feature_two_data = []
        for data in self.data_list:
            feature_one_data.append(data[feature_one])
            feature_two_data.append(data[feature_two])

        feature_one_mean = statistics.mean(feature_one_data)

        feature_two_mean = statistics.mean(feature_two_data)

        feature_one_sample_std = statistics.stdev(feature_one_data)

        feature_two_sample_std = statistics.stdev(feature_two_data)
        mean_diff_sum = 0

        for k in range(len(feature_one_data)):
            mean_diff_sum += (feature_one_data[k] - feature_one_mean) * (feature_two_data[k]-feature_two_mean)
        print(mean_diff_sum)
        corrcoef = mean_diff_sum/(feature_one_sample_std * feature_two_sample_std)
        return corrcoef

So, I am trying to calculate correlation coefficient by using this class method. self.data_list is a list of dictionaries and contains data such as age, bmi,insurance charge, smoker(boolean), sex etc. I want to calculate correlation coefficient of two features. Normally, I should get a value between -1 and 1. However, when I run this function to test it, I noticed that I get absurd results like 389,500. There must be something wrong with my calculation but I couldn't figure it out. Any ideas what I do wrong?

royal crest Aug 9, 2021, 7:57 AM

#

what would be the best way to go about removing rows where the numerical values are all 0?

#

i've tried a for loop to iterate over every row but i don't think that's very wise

#

i think another approach might be to "keep" the rows if there's a 1 present in any of them but i'm not sure if i know of a function that does this

velvet thorn Aug 9, 2021, 8:09 AM

#

royal crest what would be the best way to go about removing rows where the numerical values ...

df[(df != 0).any(axis=1)]

royal crest Aug 9, 2021, 8:14 AM

#

velvet thorn `df[(df != 0).any(axis=1)]`

hmm didn't seem to work

#

as in no changes have been made to the dataframe

velvet thorn Aug 9, 2021, 8:15 AM

#

royal crest as in no changes have been made to the dataframe

ye

#

that creates

#

a new DataFrame

#

you need to assign it to a variable, of course

royal crest Aug 9, 2021, 8:16 AM

#

I have

#

i've assigned it to df_gm and it's the exact same as the original dataframe it seems

#

same shape

#

one of my columns is text content, would that interfere with your method?

flat hollow Aug 9, 2021, 8:18 AM

#

velvet thorn `df[(df != 0).any(axis=1)]`

axis = 1 is columns isn't it?

flat hollow Aug 9, 2021, 8:19 AM

#

royal crest one of my columns is text content, would that interfere with your method?

yeah that would interfere since gm is just keeping all the rows that have any non-zero value in them

velvet thorn Aug 9, 2021, 8:19 AM

#

royal crest one of my columns is text content, would that interfere with your method?

yes

#

you can do this

#

df[(df.select_dtypes('number') != 0).any(axis=1)]

royal crest Aug 9, 2021, 8:22 AM

#

brilliant, cheers

#

so it selects any numerical column and matches with != 0

wheat sun Aug 9, 2021, 10:11 AM

#

How do I make a grid where the x number line and y number line are thicker than the other grid lines? Like this:

earnest herald Aug 9, 2021, 10:13 AM

#

I'm trying to create a deep q learning environment, similar to snake which tracks the position of certain things. How do I deal with their position (delta y, delta x) being null or undefined?

Should I assign it a value it can never reach e.g. 100,100 or allocated an input which can be only 1 or 0 depending on whether these parameters should be ignored

flat hollow Aug 9, 2021, 10:30 AM

#

wheat sun How do I make a grid where the x number line and y number line are thicker than ...

is this in matplotlib? if so then sth like py for axis in ['top','bottom','left','right']: ax.spines[axis].set_linewidth(0.5) would change the width of those lines where axis are the 4 lines and ax is your subplot axis object

serene scaffold Aug 9, 2021, 10:59 AM

#

it's easier if you provide code and error messages as text.

delete the "staticmethod" decorator from recommend

acoustic halo Aug 9, 2021, 11:04 AM

#

mkj is in demographic filter not recommend

near spindle Aug 9, 2021, 11:04 AM

#

How can I assure that i is always a int, not float?

for i in range(0,2**25):
    step = 0
    print(i)
    while i != 1:
        if i == 0:
            break
        elif i % 2 == 0:
            i /= 2
            step += 1
            print(i, end=" ")
        elif i % 2 == 1:
            i = 3*i + 1
            step += 1
            print(i, end=" ")
    print(f'\Amount of steps: {step}')

serene scaffold Aug 9, 2021, 11:05 AM

#

near spindle How can I assure that i is always a int, not float? ```py for i in range(0,2**25...

please put spaces on either side of infix operators; i % 2 == 0, not i%2==0

bold timber Aug 9, 2021, 11:05 AM

#

acoustic halo mkj is in demographic filter not recommend

what should i do?

serene scaffold Aug 9, 2021, 11:05 AM

#

do //= instead of /= so it's floor division

bold timber Aug 9, 2021, 11:05 AM

#

serene scaffold it's easier if you provide code and error messages as text. delete the "staticm...

what should i do?

serene scaffold Aug 9, 2021, 11:06 AM

#

bold timber what should i do?

I don't know.

near spindle Aug 9, 2021, 11:06 AM

#

Spaces only for cleaner code or do they have a purpose?

acoustic halo Aug 9, 2021, 11:06 AM

#

yes, easier to read

serene scaffold Aug 9, 2021, 11:07 AM

#

near spindle Spaces only for cleaner code or do they have a purpose?

they don't change how it's executed, but it's best to present others with readable code.

Division returns a float rather than an int, since division between integers is the only one (among addition, subtraction, and multiplication) that doesn't always return an integer, mathematically speaking.

near spindle Aug 9, 2021, 11:08 AM

#

Makes sense, didn't know that // is such a big change

bold timber Aug 9, 2021, 11:08 AM

#

acoustic halo mkj is in demographic filter not recommend

how to solve that?

near spindle Aug 9, 2021, 11:09 AM

#

I'm gonna try fixed code once my pc finishes this code and stops being on fire

#

Also, how can I check the time needed to execute whole code?

#

I'm curious if by increasing the power of two by 1, the time needed for execution increases exponentially

#

Okay, it works fine now, except still being on fire

acoustic halo Aug 9, 2021, 11:14 AM

#

bold timber what should i do?

dont pass mkj into recommend

bold timber Aug 9, 2021, 11:18 AM

#

acoustic halo dont pass mkj into recommend

another issues, how to solve it?

#

chilly geyser Aug 9, 2021, 11:35 AM

#

near spindle Spaces only for cleaner code or do they have a purpose?

They also delineate things, it's probably easier to programatically lint/search code with spaces and code without

chilly geyser Aug 9, 2021, 11:36 AM

#

near spindle Also, how can I check the time needed to execute whole code?

For simple measurement you can use time.perf_counter

#

You could use timeit things but I don't recommend it

#

I would prefer just perf_counter at specific points in your script (usually start, end), and run the script multiple times to get multiple readings than run timeit

near spindle Aug 9, 2021, 11:38 AM

#

So if I want to use this function, I need to import time and type that function at start and end of code?

acoustic halo Aug 9, 2021, 11:40 AM

#

bold timber another issues, how to solve it?

Sorry but it tells you exactly whats wrong, you there isn't a between attribute in a DataFrame, you will have to sort that

steel hawk Aug 9, 2021, 11:57 AM

#

Hi guys, I want to ask you if someone worked on getting high season of each product/item in an e-shop. what is the best approach you use or do you have some articles that might help. In another way, I want to know each product's season by variance of sales when it starts and when it ends and the season length.

bold timber Aug 9, 2021, 12:07 PM

#

acoustic halo Sorry but it tells you exactly whats wrong, you there isn't a between attribute ...

this is my full code. what should i write in my code?

acoustic halo Aug 9, 2021, 12:07 PM

#

You are joking if you think im gonna rewrite your code for you

bold timber Aug 9, 2021, 12:09 PM

#

acoustic halo You are joking if you think im gonna rewrite your code for you

I mean can you give me a clue?

acoustic halo Aug 9, 2021, 12:09 PM

#

Your error says exactly what the issue is

#

you are tring to use between

bold timber Aug 9, 2021, 12:09 PM

#

just a clue, not to rewrite my code

acoustic halo Aug 9, 2021, 12:10 PM

#

Your dataframe doesnt know what between means

#

If you look a couple lines below you might see what you are missing

#

data_pns3.between, vs data_pns3.mkj.between

bold timber Aug 9, 2021, 12:14 PM

#

acoustic halo If you look a couple lines below you might see what you are missing

Oh yeah i understand

#

Thank you!!

chilly geyser Aug 9, 2021, 12:27 PM

#

near spindle So if I want to use this function, I need to `import time` and type that functio...

!e

from time import perf_counter, sleep
start = perf_counter()
# your code here
sleep(1)  # to simulate things happening
end = perf_counter()
print(end - start)

arctic wedgeBOT Aug 9, 2021, 12:27 PM

#

@chilly geyser :white_check_mark: Your eval job has completed with return code 0.

1.0050674946978688

near spindle Aug 9, 2021, 12:32 PM

#

chilly geyser !e ```py from time import perf_counter, sleep start = perf_counter() # your code...

Thanks a lot, is there a documentation for this module on official page?

chilly geyser Aug 9, 2021, 12:33 PM

#

!d time

arctic wedgeBOT Aug 9, 2021, 12:33 PM

#

time

This module provides various time-related functions. For related functionality, see also the datetime and calendar modules.

Although this module is always available, not all functions are available on all platforms. Most of the functions defined in this module call platform C library functions with the same name. It may sometimes be helpful to consult the platform documentation, because the semantics of these functions varies among platforms.

An explanation of some terminology and conventions is in order.

• The epoch is the point where the time starts, and is platform dependent. For Unix, the epoch is January 1, 1970, 00:00:00 (UTC). To find out what the epoch is on a given platform, look at time.gmtime(0).

near spindle Aug 9, 2021, 12:35 PM

#

Thanks, gonna check

#

If I get Time: 2.1679996279999614e-05, it means code was executed within milliseconds?

chilly geyser Aug 9, 2021, 12:41 PM

#

Yes-kinda?

#

If you're doing benchmark on small things you might want to do timeit.timeit

#

There's also timeit.repeat

hollow falcon Aug 9, 2021, 1:42 PM

#

pltshow show no result

#

what did i do wrong

chilly geyser Aug 9, 2021, 2:01 PM

#

Up to the line plt.subplots(2) there are no plots

hollow falcon Aug 9, 2021, 2:02 PM

#

oh my god im so dumb

willow spindle Aug 9, 2021, 3:04 PM

#

Greetings. So I have two SERIES:

tst - has fake data,
usr - has true data.
I am trying to check .isin() on the third series which I made a list:
third_s = [i for i in df['Some_Col']

The thing is, that both tst and usr returns the same results when checking isin(). I tried:

# all syntax are correct and works on my PC
tst = # ... has fake data series
usr = # ... has data which is also in third_s
third_s = [i for i in df['Some_Col']]

# 1st approach - .empty:
if post.isin(third_s).empty:
  print('Yes it is')
else:
  print('No it is not empty') # why tst returns this if it is not in third_s?

# 2nd approach - .bool:
if post.isin(third_s).bool:
  print('Y') # again, why tst returns that as well TST HAS FAKE DATA
else:
  print('N')

Question: I need to skip in for-loop all tst values that are NOT IN third_s. Any ideas how?

desert oar Aug 9, 2021, 3:43 PM

#

what are you trying to actually do?

#

[i for i in df['Some_Col']] is the same thing as df['Some Col']

#

but to answer your specific question@willow spindle , .empty does not check if "all values are false". it checks if one of its axes is length 0. you should read the docs instead of guessing. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.empty.html

#

i assume you are looking for .any()

#

if post.isin(df['Some_Col']).any():
    print('There is at least one value in "post" that is also in "Some_Col".')
else:
    print('There is no value from "post" that is also in "Some_Col".')

#

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.any.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.all.html

fiery mortar Aug 9, 2021, 4:57 PM

#

I have built a model time series forecasting with TensorFlow dataset creator. At this link https://www.tensorflow.org/tutorials/structured_data/time_series?authuser=1#data_windowing.

TensorFlow

Time series forecasting | TensorFlow Core

#

When I set like this it work well.

#

But I set:

#

#

It doesn't work.

#

Can I solve a classification problem with tensorflow time series dataset?

lapis sequoia Aug 9, 2021, 5:15 PM

#

Hi folks. I'm looking at some covid related dataset and trying to filter to a specific row and drop Unnamed columns. Could anyone help fix my code?

#

import pandas as pd

df = pd.read_excel('/tmp/Covid-Publication-06-04-2021.xlsx',
                   engine='openpyxl',
                   skiprows=11,
                   sheet_name=['Total Beds Occupied','Total Beds Occupied Covid'],
                   header=1,
                   usecols="B:NX")

eng_hosps_total_use = \
    df['Total Beds Occupied'].loc[df['Total Beds Occupied']['Name'].str.match("ENGLAND", case=True).fillna(False, axis=0)].fillna("").set_index("Name")

eng_hosps_total_use.drop(columns=eng_hosps_total_use.columns.str.match("Unnamed", na=False))

Source: https://www.england.nhs.uk/statistics/statistical-work-areas/covid-19-hospital-activity/

Gives:

raise KeyError(f"{labels[mask]} not found in axis")
KeyError: '[False ... ] not found in axis'

Statistics » COVID-19 Hospital Activity

Health and high quality care for all, now and for future generations

desert oar Aug 9, 2021, 5:19 PM

#

@lapis sequoia what's wrong with it?

#

drop doesn't work "in place", you need to use inplace=True or do eng_hosps_total_use = eng_hosps_total_use.drop(...)

lapis sequoia Aug 9, 2021, 5:20 PM

#

@desert oar Same error with inplace=True

desert oar Aug 9, 2021, 5:20 PM

#

oh i missed the error

#

oh, hah

#

match returns True and False

#

not the matching values

lapis sequoia Aug 9, 2021, 5:21 PM

#

oh

#

ha

#

Yes. I wondered about passing a list of column names as list. Is it possible to do something like:

#

eng_hosps_total_use.drop(columns=eng_hosps_total_use.columns.str.match("Unnamed", na=False).values)

desert oar Aug 9, 2021, 5:22 PM

#

unnamed_columns = eng_hosps_total_use.columns[
    eng_hosps_total_use.columns.str.match("Unnamed", na=False)
]

eng_hosps_total_use.drop(columns=unnamed_columns, inplace=True)

lapis sequoia Aug 9, 2021, 5:22 PM

#

got it

#

thank you

desert oar Aug 9, 2021, 5:22 PM

#

df['Total Beds Occupied']['Name']

what's this? are you getting a row with the label Name?

#

or do you have multi-index columns?

lapis sequoia Aug 9, 2021, 5:23 PM

#

no idea

ebon walrus Aug 9, 2021, 5:23 PM

#

can someone explain why this isnt working?

#

lapis sequoia Aug 9, 2021, 5:24 PM

#

@desert oar I'm selecting the column "Name" to match for the row of ENGLAND data. Single row.

rich plover Aug 9, 2021, 5:25 PM

#

Hi all, I'm trying to import the module gensim and its doesnt work

#

#

This was the solution as suggested by this thread

#

https://github.com/RaRe-Technologies/gensim/issues/1886

GitHub

cannot import name 'LabeledSentence' · Issue #1886 · RaRe-Technolog...

Description LabeledSentence is not being imported from gensim.models.doc2vec. from gensim.models.doc2vec import LabeledSentence the error I am getting is cannot import name 'LabeledSentence'

#

However, for whatever reason it doesnt work for me

#

I've got gensim installed

desert oar Aug 9, 2021, 5:26 PM

#

ebon walrus can someone explain why this isnt working?

is price null/nan?

ebon walrus Aug 9, 2021, 5:27 PM

#

desert oar is `price` null/`nan`?

here is the data

#

#

all teh values are filled in

desert oar Aug 9, 2021, 5:28 PM

#

lapis sequoia <@!389497659087650836> I'm selecting the column "Name" to match for the row of ...

!code-block @ebon walrus can you share your code, error messages, and data as text, not as a screenshot? see below 👇

arctic wedgeBOT Aug 9, 2021, 5:28 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

ebon walrus Aug 9, 2021, 5:28 PM

#

alruiht

desert oar Aug 9, 2021, 5:28 PM

#

@lapis sequoia sorry didn't mean to @ you

ebon walrus Aug 9, 2021, 5:28 PM

#

lemme format

lapis sequoia Aug 9, 2021, 5:29 PM

#

@desert oar np. Do you have any idea btw why Pycharm ipython console keeps giving me

console_thrift.UnsupportedArrayTypeException: UnsupportedArrayTypeException(type='ExceptionOnEvaluate')

ebon walrus Aug 9, 2021, 5:29 PM

#

p = model.predict(new_house)

#add a new column 'price' in new_house file to show the model predicted price
new_house['Price'] = p

#export new house price file to local system
new_house.to_csv("new_house_price.csv")

import sys 
import os

plt.xlabel('area of house (sq.ft)')
plt.ylabel('Price of House(Dollars.)')
plt.title("relationship plot between area and price")
plt.scatter(price.area,price.price, color = 'red', marker = '+')
plt.plot(df.area, model.predict([['area']]), color = 'blue')
plt.show()

AttributeError                            Traceback (most recent call last)
<ipython-input-38-cea7655e712e> in <module>
      5 plt.ylabel('Price of House(Dollars.)')
      6 plt.title("relationship plot between area and price")
----> 7 plt.scatter(price.area,price.price, color = 'red', marker = '+')
      8 plt.plot(df.area, model.predict([['area']]), color = 'blue')
      9 plt.show()

#

@desert oar

silk axle Aug 9, 2021, 5:30 PM

#

@marsh beacon We don't allow that type of advertisement here.

desert oar Aug 9, 2021, 5:31 PM

#

lapis sequoia <@!389497659087650836> np. Do you have any idea btw why Pycharm ipython console ...

no idea. restart the console

lapis sequoia Aug 9, 2021, 5:31 PM

#

desert oar no idea. restart the console

no figured it out. Happens when have DataFrame open in SciView inspection. Need to close the dataframe.

desert oar Aug 9, 2021, 5:31 PM

#

@lapis sequoia i see, you have a multi-index column due to using 2 sheets

lapis sequoia Aug 9, 2021, 5:32 PM

#

Yes. The source data excel file has many worksheets inside the workbook. I only need two of them.

desert oar Aug 9, 2021, 5:35 PM

#

ah sorry it's not a multi-index, it's a dict

lapis sequoia Aug 9, 2021, 5:37 PM

#

whichever

desert oar Aug 9, 2021, 5:39 PM

#

they're not the same!

lapis sequoia Aug 9, 2021, 5:39 PM

#

Am I likely to need to know for this small task?

#

Happy with a dict

desert oar Aug 9, 2021, 5:40 PM

#

it's important to know that [ on dataframes and series is a complicated operation with a lot of possible behaviors, while on a dict it isn't

#

so yes it's somewhat important to know what data type you are working with

lapis sequoia Aug 9, 2021, 5:43 PM

#

@desert oar gotchya. How do I modify your code to drop columns with NaN values?

eng_hosps_total_use.drop(columns=eng_hosps_total_use.columns[eng_hosps_total_use.columns.values.isna()], inplace=True)

= AttributeError no isna

desert oar Aug 9, 2021, 5:43 PM

#

i'd do it like this:

import pandas as pd

data = pd.read_excel(
    'Covid-Publication-06-04-2021.xlsx',
    engine='openpyxl',
    skiprows=11,
    sheet_name=['Total Beds Occupied', 'Total Beds Occupied Covid'],
    header=1,
    usecols="B:NX",
)

eng_hosps_total_use = data['Total Beds Occupied'].set_index("Name")
eng_hosps_total_use = eng_hosps_total_use.loc["ENGLAND"]
eng_hosps_total_use.drop(
    eng_hosps_total_use.index[
        eng_hosps_total_use.index.str.match("Unnamed", na=False)
    ],
    inplace=True,
)

#

values is deprecated, don't use it

#

note that this ultimately returns a Series, not a DataFrame

#

you only have 1 row of data, no reason to keep it as a dataframe

#

for that matter, why bother "parsing" like this at all? just read the one row you need

desert oar Aug 9, 2021, 5:47 PM

#

ebon walrus ``` p = model.predict(new_house) #add a new column 'price' in new_house file to...

price might not be a dataframe if that's the error you get

ebon walrus Aug 9, 2021, 5:47 PM

#

desert oar `price` might not be a dataframe if that's the error you get

whats the dataframe then?

#

the error is float area something

desert oar Aug 9, 2021, 5:48 PM

#

i am saying that you might have accidentally assigned something to price, overwriting the dataframe

lapis sequoia Aug 9, 2021, 5:48 PM

#

desert oar `values` is deprecated, don't use it

Ah, easy

eng_hosps_total_use.dropna(axis=1, how='all')

#

@desert oar Have you ran this code? The dates are being picked up as Float64 dtypes and inserting a 24hr time. Do you have any idea of correcting this?

desert oar Aug 9, 2021, 5:59 PM

#

i ran some code but only for checking the column names

#

there are other options you can use to control how dates and other data types are handled

#

check the docs for pandas read_excel

lapis sequoia Aug 9, 2021, 6:01 PM

#

desert oar check the docs for pandas `read_excel`

Did that, but no luck. The idea is to make a simple line-bar matplotlib graph. Try to graph the DataFrame.

import maplotlib.pyplot as plt
plt.plot(eng_hosps_total_use)

#

Dates along the X axis

#

So column values would be values of X

#

in the plot

#

So I need timeseries data. The Columns are dates, but are Float64s not datetime

#

That's first problem.

desert oar Aug 9, 2021, 6:12 PM

#

@lapis sequoia this code gives me a eng_hosps_total_use as a Series with a DateTime index and int data:

import pandas as pd

data = pd.read_excel(
    'Covid-Publication-06-04-2021.xlsx',
    engine='openpyxl',
    skiprows=11,
    sheet_name=['Total Beds Occupied', 'Total Beds Occupied Covid'],
    header=1,
    usecols="B:NX",
)

eng_hosps_total_use = data['Total Beds Occupied'].set_index("Name")
eng_hosps_total_use = eng_hosps_total_use.loc["ENGLAND"]
unnamed_cols = eng_hosps_total_use.index[
    eng_hosps_total_use.index.str.match("Unnamed", na=False)
].tolist()
extra_cols = ['NHS England Region', 'Code']
eng_hosps_total_use.drop(unnamed_cols + extra_cols, inplace=True)
eng_hosps_total_use.index = pd.to_datetime(eng_hosps_total_use)
eng_hosps_total_use = eng_hosps_total_use.astype(int)

#

i'd encourage you to spend time figuring out how it works

lapis sequoia Aug 9, 2021, 6:13 PM

#

Amazing. Data frame, series... Ballache. What's the difference

#

@desert oar can you give me the shape after your drop

desert oar Aug 9, 2021, 6:17 PM

#

(370,)

lapis sequoia Aug 9, 2021, 6:17 PM

#

So this is 370 rows

desert oar Aug 9, 2021, 6:18 PM

#

lapis sequoia Amazing. Data frame, series... Ballache. What's the difference

a series is a single "column", like a 1-d array, and each element has a label (the "index"). a dataframe is a "table", a collection of several series, where each series itself has a label (the "columns") and each row has a label (the "index").

#

https://pandas.pydata.org/docs/user_guide/dsintro.html

lapis sequoia Aug 9, 2021, 6:22 PM

#

So I need two series. One for datetime and one for corresponding integer (beds used). Right?

#

Pass to matplotlib's X and Y each series

#

Seems like a lot of work when the dataframe is already a collection of series

desert oar Aug 9, 2021, 6:27 PM

#

pandas has some utilities for plotting

#

it's also not that much work

flat flare Aug 9, 2021, 6:27 PM

#

hey guys, i tried importing numpy and matplotlib in idle but cmd gave me error saying pip is not recognised

lapis sequoia Aug 9, 2021, 6:30 PM

#

desert oar it's also not _that_ much work

You should plt your code. It isn't correct.

glad mulch Aug 9, 2021, 6:31 PM

#

do you guys know how to make my graphs stack vertically and horizontally like 4 rows 3 columns . this is what i keep getting

desert oar Aug 9, 2021, 6:33 PM

#

lapis sequoia You should plt your code. It isn't correct.

i'm also a stranger on the internet, offering free advice and assistance during gaps in my workday

#

it looked right when i ran it

#

but i also posted it more an example of another way to do it, not a definitively correct implementation of whatever you are trying to do

lapis sequoia Aug 9, 2021, 6:47 PM

#

Np

#

The plot should not be linear is all I was alluding to

#

I don't know if this is the right channel but does anyone know how to set a hard RAM limit in PyTorch?

#

My program keeps using all the RAM and it crashes the computer

desert oar Aug 9, 2021, 7:23 PM

#

(this is the right channel)

lapis sequoia Aug 9, 2021, 7:26 PM

#

Well how do I do it? I'm running it on Google Colab and I have 12GB of RAM, and I want to set a limit to that because it keeps using all of it and crashes the runtime

desert oar Aug 9, 2021, 7:29 PM

#

i don't know, i was just trying to answer the first question 🙂 maybe it's not possible?

#

you might need to change how data is loaded into your model

lapis sequoia Aug 9, 2021, 7:58 PM

#

@desert oar why does your

eng_hosps_total_use = data['Total Beds Occupied'].set_index("Name")
eng_hosps_total_use = eng_hosps_total_use.loc["ENGLAND"]

return a series, but my

eng_hosps_total_use = \
    df['Total Beds Occupied'].loc[df['Total Beds Occupied']['Name'].str.match("ENGLAND", case=True).fillna(False, axis=0)].fillna(np.NaN).set_index("Name")

returns a dataframe?
I'm doing exactly the same thing with loc. Your just setting the index before looking for row relating to England where as I filter to all of the data for England and finish by setting the index to Name.

Shape difference is (375,) vs (1,375)

desert oar Aug 9, 2021, 8:10 PM

#

because you're passing something array-like/list-like to .loc

#

really i should be using eng_hosps_total_use.at["ENGLAND"] because i know i only want 1 row

desert oar Aug 9, 2021, 8:26 PM

#

@lapis sequoia

import matplotlib.pyplot as plt
import pandas as pd


with pd.ExcelFile('Covid-Publication-06-04-2021.xlsx') as xlsx:
    eng_beds_total = xlsx.parse(
        'Total Beds Occupied',
        skiprows=11,
        nrows=1,
        header=1,
        usecols="E:NJ",
    ).squeeze()
    eng_beds_total.index = pd.to_datetime(eng_beds_total.index)

    eng_beds_covid = xlsx.parse(
        'Total Beds Occupied Covid',
        skiprows=11,
        nrows=1,
        header=1,
        usecols="E:NW",
        squeeze=True,
    ).squeeze()
    eng_beds_covid.index = pd.to_datetime(eng_beds_covid.index)

beds = eng_beds_total.to_frame(name='total').join(
    eng_beds_covid.to_frame(name='covid')
)

beds.plot()
plt.show()

#

lapis sequoia Aug 9, 2021, 8:28 PM

#

Fabulous.

lapis sequoia Aug 9, 2021, 8:35 PM

#

desert oar because you're passing something array-like/list-like to `.loc`

How would I transform my line to make a series? I thought df.transpose() might do this.

desert oar Aug 9, 2021, 8:36 PM

#

what do you mean by "line"?

lapis sequoia Aug 9, 2021, 8:36 PM

#

I quoted you.

desert oar Aug 9, 2021, 8:36 PM

#

if you have a dataframe with exactly one row, you have two options to turn that row into a series:

.squeeze as in my code
use .at[row label] or .iat[0]

#

ah

#

my answer remains

#

.match returns a boolean series

#

subsetting a Series by a Series returns another Series (or an Index, in this case)

#

my code avoids all that stuff entirely by grabbing data out of the xlsx more selectively

#

no need to "parse" the row labels etc. when you know exactly what row you want

lapis sequoia Aug 9, 2021, 8:41 PM

#

When you work with Pandas and DataFrames is the typical workflow to reduce what you want to series data for operations e.g. plotting etc.

thorn bobcat Aug 9, 2021, 8:44 PM

#

yo yo

desert oar Aug 9, 2021, 8:49 PM

#

lapis sequoia When you work with Pandas and DataFrames is the typical workflow to reduce what ...

in this particular case i actually built it back up into a dataframe. but i pulled a series out of each file because that's all we needed from each file

lapis sequoia Aug 9, 2021, 9:01 PM

#

This function doesn't allow you to specify an engine though. XLRD not good with new xlsx file formats so no idea why yours didn't error.

#

Don't understand the squeeze.

#

gah

desert oar Aug 9, 2021, 9:06 PM

#

ExcelFile i think supports engine=

#

it also should just auto-detect

lapis sequoia Aug 9, 2021, 9:07 PM

#

desert oar no need to "parse" the row labels etc. when you know exactly what row you want

What is the shape of beds ?

Also, I don't intend to rely on df.plot() as I'd want to call plt directly, e.g. there is no axhline() as there is for plt.

desert oar Aug 9, 2021, 9:07 PM

#

!e ```python
import pandas as pd
df = pd.DataFrame({'a': [1,2,3]})
print(df)
print()
print(df.squeeze())

arctic wedgeBOT Aug 9, 2021, 9:07 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |    a
002 | 0  1
003 | 1  2
004 | 2  3
005 | 
006 | 0    1
007 | 1    2
008 | 2    3
009 | Name: a, dtype: int64

lapis sequoia Aug 9, 2021, 9:07 PM

#

No engine option https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.ExcelFile.parse.html

desert oar Aug 9, 2021, 9:07 PM

#

lapis sequoia What is the shape of `beds` ? Also, I don't intend to rely on `df.plot()` as I...

you can get the Axis object after using df.plot

desert oar Aug 9, 2021, 9:07 PM

#

lapis sequoia No `engine` option https://pandas.pydata.org/pandas-docs/stable/reference/api/pa...

that's the parse method of ExcelFile, not the ExcelFile constructor

#

i don't know why the constructor isn't documented

#

https://github.com/pandas-dev/pandas/blob/v1.3.1/pandas/io/excel/_base.py#L1166-L1168

arctic wedgeBOT Aug 9, 2021, 9:08 PM

#

pandas/io/excel/_base.py lines 1166 to 1168

def __init__(
    self, path_or_buffer, engine=None, storage_options: StorageOptions = None
):```

lapis sequoia Aug 9, 2021, 9:11 PM

#

Ah. God sakes I'm shocking.

#

with pd.ExcelFile(... ,engine='openpyxl') as xlsx:

works.

#

@desert oar Is the squeeze on eng_beds_covid responsible for cutting off all of March data?

desert oar Aug 9, 2021, 9:21 PM

#

probably not, more likely i either didn't understand your requirements or made a mistake

#

oh it's a mistake, i probably didn't do the join right

lapis sequoia Aug 9, 2021, 9:21 PM

#

I think its a union and

desert oar Aug 9, 2021, 9:21 PM

#

beds = eng_beds_total.to_frame(name='total').join(
    eng_beds_covid.to_frame(name='covid'),
    how='outer',
)

#

definitely not a union

#

better?

lapis sequoia Aug 9, 2021, 9:22 PM

#

Yes. It includes only the data where both have same timestamps

#

Sounds like a union to me. Outer join includes everything.

silk axle Aug 9, 2021, 9:24 PM

#

If I've coded a one-player pong game (with Arcade), how would I go about adding AI to it? I basically want to implement a NEAT algorithm into it, with the fitness function just being the number of hits it can get until it dies. Ideally I'd be able to run multiple samples per generation at the same time (rather than doing one-by-one). If it helps, this is my code: https://paste.pythondiscord.com/oqitemoyuk.py

lapis sequoia Aug 9, 2021, 9:25 PM

#

@desert oar Just speculating here, but do you notice anything unusual about those two plots?

desert oar Aug 9, 2021, 9:26 PM

#

lapis sequoia Sounds like a union to me. Outer join includes everything.

it's an outer join on the index values. a union is different, there's no union "on" anything, a union treats the tables as sets of tuples as in relational algebra

desert oar Aug 9, 2021, 9:27 PM

#

lapis sequoia <@!389497659087650836> Just speculating here, but do you notice anything unusual...

dip around christmas, total beds rising even as covid beds are falling. would want to see pre-2020 bed data for context

#

covid beds smooth, non-covid beds not smooth (scheduled c-sections and elective surgeries?)

lapis sequoia Aug 9, 2021, 9:28 PM

#

Needs further context. You're right.

desert oar Aug 9, 2021, 9:32 PM

#

really i'd want to see this going back years

lapis sequoia Aug 9, 2021, 9:32 PM

#

yep.

desert oar Aug 9, 2021, 9:32 PM

#

might also be interesting to subtract covid beds from total beds

lapis sequoia Aug 9, 2021, 9:32 PM

#

yep

desert oar Aug 9, 2021, 9:33 PM

#

lapis sequoia Aug 9, 2021, 9:33 PM

#

beat me to it. Don't provide the code. I'm copypasting

desert oar Aug 9, 2021, 9:33 PM

#

it was a 1 line addition 🙂

lapis sequoia Aug 9, 2021, 9:37 PM

#

@desert oar why do you have the squeeze twice for eng_beds_covid

modest mulch Aug 9, 2021, 9:38 PM

#

Hi, how do I know if a pth file contains the archteichture of the model, not only its "weights" ?

#

A pytorch model.

lapis sequoia Aug 9, 2021, 9:47 PM

#

~~What you need to do is subtract mv_covid from total to give how many beds total beds are non-covid.~~

thorn bobcat Aug 9, 2021, 10:38 PM

#

anyone here wrote a paper before?

#

the kind of arxiv?

velvet thorn Aug 9, 2021, 10:48 PM

#

desert oar it's an outer join on the index values. a union is different, there's no union "...

it’s not exact, right?

#

duplicates are kept

#

leaky abstraction

desert oar Aug 9, 2021, 11:01 PM

#

lapis sequoia <@!389497659087650836> why do you have the squeeze twice for `eng_beds_covid`

oh that was a mistake, squeeze=True only squeezes columns, but not rows

#

same reason i didn't use parse_dates=True, it doesn't parse dates in column names

lapis sequoia Aug 10, 2021, 12:31 AM

#

hey i beginner on python and i wanna know about AI. Where should i start? And how much should i know?

serene scaffold Aug 10, 2021, 1:05 AM

#

lapis sequoia hey i beginner on python and i wanna know about AI. Where should i start? And ho...

what's your math background?

lapis sequoia Aug 10, 2021, 1:07 AM

#

serene scaffold what's your math background?

Nothing

serene scaffold Aug 10, 2021, 1:07 AM

#

lapis sequoia Nothing

what's the highest math class you've taken?

#

it's okay if it's just algebra or something. I'm just asking

lapis sequoia Aug 10, 2021, 1:08 AM

#

Yes

#

Algebra

serene scaffold Aug 10, 2021, 1:40 AM

#

lapis sequoia Algebra

you'll need to also learn statistics and linear algebra

lapis sequoia Aug 10, 2021, 1:45 AM

#

serene scaffold you'll need to also learn statistics and linear algebra

Is that easy?

#

I thought programming doesnt require math

serene scaffold Aug 10, 2021, 1:45 AM

#

lapis sequoia I thought programming doesnt require math

AI most certainly does.

#

whether or not you find stats and linalg easy will depend, but what ultimately matters is that you maintain a positive attitude about learning. because the learning never stops.

lapis sequoia Aug 10, 2021, 1:47 AM

#

That is it

serene scaffold Aug 10, 2021, 1:47 AM

#

what do you mean, that is it?

lapis sequoia Aug 10, 2021, 1:48 AM

#

Just statistic, how about framework or anything

serene scaffold Aug 10, 2021, 1:48 AM

#

there are a lot of libraries. numpy, pandas, matplotlib, sklearn, pytorch, tensorflow. but you learn the parts of them that you need as you go.

#

Like I said, it's important to maintain a positive attitude about learning.

lapis sequoia Aug 10, 2021, 1:56 AM

#

Alright thanks

lapis sequoia Aug 10, 2021, 4:57 AM

#

Can you freelance as data scientist?

ashen sable Aug 10, 2021, 5:27 AM

#

does training a model in colab takes a long time ?

wide raven Aug 10, 2021, 5:41 AM

#

Do you guys think the only good way to learn neural networks is by learning every aspect of it
and understanding all the math and how they are built?
or can you get a good understanding and make a lot of cool AI just by learning tensorflow and mastering that
I thought learning from scratch would be nice and help me understand but after hours and hours of learning gradients equations types activation functions
it just got too much to handle and I would like to make AI with a less info-needed approach which is why i thought tensorflow would be nice
but i am scared that would limit me and what i can make

vague ravine Aug 10, 2021, 5:49 AM

#

what exactly are you scared?

wide raven Aug 10, 2021, 5:50 AM

#

that not learning the core of neural networks and just learing tensorflow would limit what i can create

vague ravine Aug 10, 2021, 5:54 AM

#

how did you how you came to that conclusion

wide raven Aug 10, 2021, 6:00 AM

#

idk AI just seems like one of those things you need to know completely

vague ravine Aug 10, 2021, 6:04 AM

#

which one

stark zenith Aug 10, 2021, 6:17 AM

#

you don't need to know everything about how your car works to drive it

unborn glacier Aug 10, 2021, 6:20 AM

#

I think you need to know the very basics, matrix multiplication, back-propagation, gradient descent, as well as a theoretical understanding of the rest (like the effects of changing hyper parameters or the idea behind transformers for nlp) to get 95% out of machine learning. You can know basically nothing and still get a lot out of it. The biggest thing is going to be experience. Knowing what to use when comes from having done it before, not some complex mathematical understanding.

wide raven Aug 10, 2021, 6:33 AM

#

unborn glacier I think you need to know the very basics, matrix multiplication, back-propagatio...

Yeah I see what you mean

#

but back prop, gadien descentm, and more are still a lot to handle

#

#

like these equations and stuff

#

just seem like a lot to fully understand, and I was just wondering if I would just have to know how back prop works and not get into the math for it

ripe forge Aug 10, 2021, 6:39 AM

#

You could get an intuitive understand and then just move on if you wanted, when starting

#

In my opinion you can defer the math when learning ml for later, I think people emphasize it too much

undone flare Aug 10, 2021, 6:42 AM

#

If my independent variables are highly correlated should I use Ridge Regression?

cedar sky Aug 10, 2021, 6:46 AM

#

Hey guys any school student interested in AI here?
Anyone interested in collaborating for this competition can DM me.
https://aischoolofindia.com/waicy-competition/

AISI

WAICY Competition | AISI

WHAT IS WAICY INDIA? WAICY India is an online competition for Indian schools & students which engages them to learn and use artificial intelligence (AI) technology to solve real-world problems. AI researchers around the world are harnessing the power of AI for a sustainable future. From solving the toughest environmental challenges to becoming t...

grand mantle Aug 10, 2021, 7:05 AM

#

Do anybody know which path planning algorithm is used in tesla map?

I got information from online that dijkstra is used in google maps

desert oar Aug 10, 2021, 7:07 AM

#

undone flare If my independent variables are highly correlated should I use Ridge Regression?

It could help, yes. It's often a good idea anyway, depending on what you are trying to do. Did you check the VIFs for the fitted linear model?

undone flare Aug 10, 2021, 7:18 AM

#

desert oar It could help, yes. It's often a good idea anyway, depending on what you are try...

no

#

I will do that actually

static acorn Aug 10, 2021, 7:21 AM

#

ML

#

learners

grand mantle Aug 10, 2021, 7:46 AM

#

static acorn ML

nothing about path planning ?

#

only ML

bold timber Aug 10, 2021, 8:16 AM

#

how to rename the large column in dataset like this? i want to change the name of column to only s-1, s-2, -s-3 not with 'JENJANGPENDIDIKAN_'

anyone can help me? I've to try reame the column name for 2 days and still don't getting result properly

desert oar Aug 10, 2021, 8:21 AM

#

bold timber how to rename the large column in dataset like this? i want to change the name o...

!eval ```python
import re
import pandas as pd

def remove_long_prefix(colname):
return re.sub(r'^JENJANGPENDIDIKAN_', '', colname)

data = pd.DataFrame({
'JENJANGPENDIDIKAN_s_1': [11,12,13],
'JENJANGPENDIDIKAN_s_2': [21,22,23],
})

data = data.rename(columns=remove_long_prefix)

print(data)

arctic wedgeBOT Aug 10, 2021, 8:21 AM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |    s_1  s_2
002 | 0   11   21
003 | 1   12   22
004 | 2   13   23

desert oar Aug 10, 2021, 8:22 AM

#

or equivalently

import re
import pandas as pd

data = pd.DataFrame({
    'JENJANGPENDIDIKAN_s_1': [11,12,13],
    'JENJANGPENDIDIKAN_s_2': [21,22,23],
})

data = data.rename(
    columns=lambda colname: re.sub(r'^JENJANGPENDIDIKAN_', '', colname)
)

print(data)

#

maybe even better:

import pandas as pd

data = pd.DataFrame({
    'JENJANGPENDIDIKAN_s_1': [11,12,13],
    'JENJANGPENDIDIKAN_s_2': [21,22,23],
})

data.columns = data.columns.str.replace(r'^JENJANGPENDIDIKAN_', '', regex=True)

print(data)

#

or if you're using python 3.9+

import pandas as pd

data = pd.DataFrame({
    'JENJANGPENDIDIKAN_s_1': [11,12,13],
    'JENJANGPENDIDIKAN_s_2': [21,22,23],
})

data = data.rename(
    columns=lambda colname: colname.removeprefix('JENJANGPENDIDIKAN_')
)

print(data)

bold timber Aug 10, 2021, 8:25 AM

#

desert oar or equivalently ```python import re import pandas as pd data = pd.DataFrame({ ...

THANK YOU SO MUCHHHHHHHHH

desert oar Aug 10, 2021, 8:25 AM

#

now you have four ways to do it

bold timber Aug 10, 2021, 8:25 AM

#

I really big thanks to you

undone flare Aug 10, 2021, 8:30 AM

#

If I have missing values in a numerical column would I be fine with replacing those values with the mean? I have been doing this for like every project I did and I think it can be better

bold timber Aug 10, 2021, 8:42 AM

#

desert oar now you have _four_ ways to do it

but how if I have multiple columns like that?

desert oar Aug 10, 2021, 8:43 AM

#

bold timber but how if I have multiple columns like that?

it shouldn't be any different

desert oar Aug 10, 2021, 8:43 AM

#

undone flare If I have missing values in a numerical column would I be fine with replacing th...

it depends on what's missing, how often it's missing, why it's missing, and what you're doing with it 🙂 there are no strict rules or unambiguous best practices

bold timber Aug 10, 2021, 8:44 AM

#

desert oar it shouldn't be any different

is that manually one by one for each column?

desert oar Aug 10, 2021, 8:45 AM

#

bold timber is that manually one by one for each column?

no, the first data = pd.DataFrame line is just creating a new dataframe to demonstrate the solution

bold timber Aug 10, 2021, 8:45 AM

#

I mean if i have another column like that

desert oar Aug 10, 2021, 8:45 AM

#

i encourage you to read all 4 solutions and spend time understanding what they do and why they work

#

oh, you should probably do it manually for each prefix

#

you could do it with a single regex but i don't see much value in that

#

are you using python 3.9?

bold timber Aug 10, 2021, 8:47 AM

#

desert oar you could do it with a single regex but i don't see much value in that

what is 'regex'?

yes because that's an encoding number

desert oar Aug 10, 2021, 8:47 AM

#

what is 'regex'?
the re.sub thing with r'^prefix'

yes because that's an encoding number
what do you mean by that?

bold timber Aug 10, 2021, 8:47 AM

#

desert oar i encourage you to read all 4 solutions and spend time understanding what they d...

i using 3.8.5 version

desert oar Aug 10, 2021, 8:47 AM

#

ok, then you are not using 3.9

bold timber Aug 10, 2021, 8:48 AM

#

desert oar ok, then you are not using 3.9

and then?

bold timber Aug 10, 2021, 8:48 AM

#

desert oar > what is 'regex'? the `re.sub` thing with `r'^prefix'` > yes because that's an...

im sorry i misunderstand before

desert oar Aug 10, 2021, 8:49 AM

#

bad_patterns = [
    '^JENJANGPENDIDIKAN_',
    '^JABATANSTRUKTURAL_',
]

for pattern in bad_patterns:
    data.columns = data.columns.str.replace(pattern, '', regex=True)

#

the ^ in the pattern means "only match at the beginning of the text"

bold timber Aug 10, 2021, 8:55 AM

#

desert oar ```python bad_patterns = [ '^JENJANGPENDIDIKAN_', '^JABATANSTRUKTURAL_',...

Thank you so muchh

somber prism Aug 10, 2021, 9:19 AM

#

can someone give me some tips on how to handle this imbalance multiclass classification prob like this

#

bold timber Aug 10, 2021, 9:24 AM

#

desert oar ```python bad_patterns = [ '^JENJANGPENDIDIKAN_', '^JABATANSTRUKTURAL_',...

do you know how to change column position in datset?

desert oar Aug 10, 2021, 9:31 AM

#

bold timber do you know how to change column position in datset?

use df = df[colnames], where colnames is a list of column names in the order you want

desert oar Aug 10, 2021, 9:38 AM

#

somber prism

generic list of things to try: weighting, oversampling (e.g. with SMOTE), use gradient boosting which can "focus" on misclassified instances

somber prism Aug 10, 2021, 9:40 AM

#

desert oar generic list of things to try: weighting, oversampling (e.g. with SMOTE), use gr...

ok

bold timber Aug 10, 2021, 9:40 AM

#

desert oar use `df = df[colnames]`, where `colnames` is a list of column names in the order...

No, i mean changing column position in larger dataset, not only access several columns

desert oar Aug 10, 2021, 9:41 AM

#

bold timber No, i mean changing column position in larger dataset, not only access several c...

you can do arbitrary manipulation on the list of column names

#

insert, remove, append, etc

bold timber Aug 10, 2021, 9:44 AM

#

desert oar you can do arbitrary manipulation on the list of column names

I have 4725 columns, how can to do it simple way?

desert oar Aug 10, 2021, 9:44 AM

#

what exactly are you trying to do

bold timber Aug 10, 2021, 9:52 AM

#

desert oar what exactly are you trying to do

Now I'm making a RecommenderSystem. this picture is result of recomendation with 4725 column. How i can to take only several column in recommendation table?

desert oar Aug 10, 2021, 9:54 AM

#

which columns do you want? what's the rule for selecting a column?

#

most of the time you can just use a list comprehension

bold timber Aug 10, 2021, 9:54 AM

#

desert oar which columns do you want? what's the rule for selecting a column?

can you give me example of code?

#

yes i want to selecting several columns

desert oar Aug 10, 2021, 9:56 AM

#

i've given you a lot of code already... do you know what a list comprehension is?

#

you can even write a for loop and build a list with append

bold timber Aug 10, 2021, 9:56 AM

#

desert oar i've given you a lot of code already... do you know what a list comprehension is...

i know, but i not properly understand

#

list comprehension is just looping in 1 cell, right?

desert oar Aug 10, 2021, 9:58 AM

#

yes

#

if by "cell" you mean "expression"

#

!e ```python
items1 = [f'number:{i}' for i in range(5)]
print(items1)

items2 = []
for i in range(5):
items2.append(f'number:{i}')
print(items2)

arctic wedgeBOT Aug 10, 2021, 9:59 AM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 | ['number:0', 'number:1', 'number:2', 'number:3', 'number:4']
002 | ['number:0', 'number:1', 'number:2', 'number:3', 'number:4']

desert oar Aug 10, 2021, 9:59 AM

#

same thing, 2 different ways to write it

bold timber Aug 10, 2021, 10:16 AM

#

desert oar same thing, 2 different ways to write it

can u give me example in dataframe? i've to try your way, but i confuse when i want to selecting on middle column

silk axle Aug 10, 2021, 11:05 AM

#

How can I remove the specified columns of my dataset? I've tried things likepy to_drop = ['reproduction_rate', 'female_smokers', 'male_smokers', 'tests_per_case', 'tests_units', 'excess_mortality'] for x in to_drop: df.drop(x)but keep getting errors (in this case saying invalid key for 'reproduction_rate' even though that's the column header)

#

This shows the rough structure (first line is the headers, then the rest is the data, which has a lot of missing values)

thorn bobcat Aug 10, 2021, 11:32 AM

#

yo

undone flare Aug 10, 2021, 11:34 AM

#

silk axle How can I remove the specified columns of my dataset? I've tried things like```p...

what does df.columns return?

#

also if you want to drop the whole column do df.drop(x, axis=1)

grave frost Aug 10, 2021, 11:56 AM

#

sigh any hardware experts in TPUs?

serene scaffold Aug 10, 2021, 11:56 AM

#

silk axle How can I remove the specified columns of my dataset? I've tried things like```p...

(on mobile) drop works on rows rather than columns by default, so you have to change the axis. Also you should pass the whole list of column labels that you want to drop. Also the drop method, like nearly all pandas methods, returns a new dataframe with the transformation applied and leaves the original untouched. It's not the same as mutator methods elsewhere in python.

silk axle Aug 10, 2021, 12:26 PM

#

undone flare what does `df.columns` return?

#

reproduction_rate is the 2nd one on 3rd line

undone flare Aug 10, 2021, 12:28 PM

#

yea so as Stelercus said, you will need to provide an axis

new_df = df.drop(["reproduction_rate", "female_smokers"], axis=1)

silk axle Aug 10, 2021, 12:28 PM

#

serene scaffold (on mobile) drop works on rows rather than columns by default, so you have to ch...

I did originally pass in the full list but got still say invalid key error

#

!d pandas.DataFrame.drop

arctic wedgeBOT Aug 10, 2021, 12:28 PM

#

pandas.DataFrame.drop


DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')```
Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding axis, or by specifying directly index or column names. When using a multi-index, labels on different levels can be removed by specifying the level. See the user guide <advanced.shown\_levels> for more information about the now unused levels.

#data-science-and-ml

(Both step-wise selection and penalized likelihood approaches).

Ridge Regression

Define a fine grid of tuning parameters, lambdas.

In Python, this tuning parameter is referred to as "alpha"

Set an equal spaced grid in log scale

Ridge Regression (using alphaR)