'numpy.int64' object has no attribute 'lower' - Python Dash | Smarter Dev | Page 1

valid glen Jan 13, 2023, 12:10 AM

#

Hi I'm a newbie in Python, I don't know how to proceed I just need to put values inside a column using this code

for i in data.index:
if x <= len(data):
data.loc[x,['nb_prediction']] = nb.predict(vectorizer.transform([data.iat[x,1]]))[0]
x += 1

The error that I'm getting is 'numpy.int64' object has no attribute 'lower' but this code works when I run it on an ipynb file.

Is there some other way instead of doing a loop 😓

red gazelle Jan 13, 2023, 12:12 AM

#

You don't use lower anywhere in that code though?

#

Is that all the code you have?

#

And what's the full error message?

valid glen Jan 13, 2023, 12:14 AM

#

red gazelle And what's the full error message?

hii

#

is it okay if I send a python file?

red gazelle Jan 13, 2023, 12:16 AM

#

If the code is short enough, you can use codeblocks as well

#

!formatting

mighty grottoBOT Jan 13, 2023, 12:16 AM

#

Code Formatting

When sharing code with the community, please use the correct formatting for ease of readability.

Example

```py
YOUR CODE HERE
```

Those are back ticks not single quotes, typically the key above TAB

valid glen Jan 13, 2023, 12:16 AM

#

the error on my VSC is just showing numpy.int64 object has no attribute lower 😔

red gazelle Jan 13, 2023, 12:16 AM

#

No stack trace?

#

o.o

valid glen Jan 13, 2023, 12:16 AM

#

i mean on the terminal sorry I'm a newbie

#

red gazelle Jan 13, 2023, 12:18 AM

#

Do you use .lower() anywhere in that code?

valid glen Jan 13, 2023, 12:18 AM

#

no sir

#

it's working on my ipynb file though but it's not working when i put it on a .py file sadpanda

red gazelle Jan 13, 2023, 12:19 AM

#

What's in the "Problems" tab?

#

OHHH

#

It comes from vectorizer

#

That's expecting a string

valid glen Jan 13, 2023, 12:20 AM

#

this is my ipynb files i practice there cause i run in code chunks it's way easier 😅

red gazelle Jan 13, 2023, 12:20 AM

#

At least that's what stackoverflow is saying about it

#

.lower() is a string method

valid glen Jan 13, 2023, 12:21 AM

#

but my target is already a string sir

#

let me check my csv file

red gazelle Jan 13, 2023, 12:21 AM

#

numpy.int64 implies it's an integer

#

64bit integer, to be exact

valid glen Jan 13, 2023, 12:22 AM

#

ohhh

#

here's an example of me dataframe

#

is [0,1] correct?

red gazelle Jan 13, 2023, 12:23 AM

#

Haven't used dataframes enough to be able to tell you

valid glen Jan 13, 2023, 12:23 AM

#

ohh I see 🙏

red gazelle Jan 13, 2023, 12:23 AM

#

Gonna ping @night quiver here cause they know more about NumPy than I do

night quiver Jan 13, 2023, 12:23 AM

#

wassup

valid glen Jan 13, 2023, 12:24 AM

#

thank youu very much 🙏

#

hii sir

#

I'm having an numpy.int64 problem but my target is a string

night quiver Jan 13, 2023, 12:24 AM

#

Where are you using .lower()? None of the code you sent has it

valid glen Jan 13, 2023, 12:24 AM

#

night quiver Where are you using `.lower()`? None of the code you sent has it

i don't have any .lower() code sir sadpanda

#

night quiver Jan 13, 2023, 12:25 AM

#

What does data look like?

#

Not the excel spreadsheet but the pandas dataframe

valid glen Jan 13, 2023, 12:26 AM

#

ooohh i haven't checked it out yet, cause it's inside a def i dunno how to break down the code to check the dataframe after uploading it sadpanda

night quiver Jan 13, 2023, 12:26 AM

#

also what does data.dtypes give you

#

What are you trying to do? Some sort of naive bayes classification?

valid glen Jan 13, 2023, 12:27 AM

#

night quiver What are you trying to do? Some sort of naive bayes classification?

yes sir

#

i just want to label the reviews positive/negative

valid glen Jan 13, 2023, 12:28 AM

#

night quiver also what does `data.dtypes` give you

let me see sir 🙏

#

night quiver Jan 13, 2023, 12:29 AM

#

Okay that all looks fine I think

valid glen Jan 13, 2023, 12:30 AM

#

I think this is the last problem that i will encounter sir 😂

#

after this it's just all graphs 🙏

night quiver Jan 13, 2023, 12:30 AM

#

Truth be told, I'm a bit confused by your code

valid glen Jan 13, 2023, 12:31 AM

#

sooryy 😭

night quiver Jan 13, 2023, 12:31 AM

#

What is all this doing?

#

#

Also what is vectorizer?

#

Wait, what naive bayes package are you using?

#

Why are you predicting it line by line

valid glen Jan 13, 2023, 12:33 AM

#

night quiver Wait, what naive bayes package are you using?

scikit learn sir

valid glen Jan 13, 2023, 12:33 AM

#

night quiver Why are you predicting it line by line

umm I want to put a column that tells if they're positive/negative sir

#

like this

night quiver Jan 13, 2023, 12:33 AM

#

Hmmm this is a very strange way to do it

valid glen Jan 13, 2023, 12:34 AM

#

valid glen Jan 13, 2023, 12:34 AM

#

night quiver Hmmm this is a very strange way to do it

i did it with what i currently know and learned 🙏

#

i cant find a cleaner code in the internet so i had to improvise and it actually worked 😂

night quiver Jan 13, 2023, 12:34 AM

#

How are you learning all this? Is this a school thing or are you self teaching yourself?

valid glen Jan 13, 2023, 12:35 AM

#

im teaching myself sir for our project in school 🙏

night quiver Jan 13, 2023, 12:35 AM

#

I see

#

Ummm

#

I see a few problems with your data

#

Even before the model building

valid glen Jan 13, 2023, 12:36 AM

#

yes sir

#

can I ask what the problems are sir sadpanda

mighty grottoBOT Jan 13, 2023, 12:38 AM

#

@valid glen

Snorbud🐻 Uploaded Some Code

here's the whole code sir if this is okay to send here

Uploaded these files to a Gist

upload.py

View The Gist

night quiver Jan 13, 2023, 12:38 AM

#

Well first of all, the first two columns should be removed

#

They are not useful to the model

red gazelle Jan 13, 2023, 12:39 AM

#

Oh

valid glen Jan 13, 2023, 12:39 AM

#

ooohh

red gazelle Jan 13, 2023, 12:39 AM

#

I can spot the issue to start off with as well

night quiver Jan 13, 2023, 12:39 AM

#

Also you have a tiny sample size?

valid glen Jan 13, 2023, 12:39 AM

#

yes sir

red gazelle Jan 13, 2023, 12:39 AM

#

CountVectorizer is expecting a text document

mighty grottoBOT Jan 13, 2023, 12:39 AM

#

@valid glen

Snorbud🐻 Uploaded Some Code

Attachment: finaluploadtest.csv

,Unnamed: 0,reviews,review_stars,detect,sentiment_score,difference,spam_classification,product_sentiment,biproduct_sentiment,count
0,0,i bought six month later bright white line appear i contact acer warranti i dismay appar known extra includ product i dismay expect pay ship texa repair whi becaus defect ship manufactur year do not recommend i stick appl product i done past year zero issu,-0.5,en,-0.8,0.3,ham,negative,0,1
1,1,work great put year old break everyth,1.0,en,0.62,0.38,ham,positive,1,1
2,2,it first macbook qualiti standard my issu i bought june say produc the titl mislead make think model i hope sit shelf two year look like,0.5,en,0.49,0.01,ham,positive,1,1
3,3,the item ship describ shown they awar howev notifi ship think buy this touch screen fold back shown i would give star possibl,-1.0,en,-0.3,-0.7,ham,negative,0,1
4,4,deliveri time great qualiti outstand,1.0,en,0.84,0.16,ham,positive,1,1

valid glen Jan 13, 2023, 12:40 AM

#

oops

#

um just a sec

night quiver Jan 13, 2023, 12:40 AM

#

red gazelle CountVectorizer is expecting a text document

Yeah so it should only be run on the reviews column

#

But they seem to be indexing the wrong column

valid glen Jan 13, 2023, 12:41 AM

#

here's a csv file sir 🙏

red gazelle Jan 13, 2023, 12:41 AM

#

Yea, and it's being ran on a colum containing numpy.int64 instead of text

night quiver Jan 13, 2023, 12:41 AM

#

You don't need to share the csv with us it's fine lol

valid glen Jan 13, 2023, 12:41 AM

#

ooooh

red gazelle Jan 13, 2023, 12:41 AM

#

(also, Google Drive tends to leak your private information, such as emails/names)

valid glen Jan 13, 2023, 12:41 AM

#

oops sorry

night quiver Jan 13, 2023, 12:41 AM

#

@valid glen First thing you need to do before any preprocessing for the model is the make sure your data is clean and usable

#

At the moment it's not

valid glen Jan 13, 2023, 12:42 AM

#

I see sir 🙏

night quiver Jan 13, 2023, 12:42 AM

#

Get rid of the first two columns

#

And if you only have 5 rows in your data, your model is not going to be any good

#

Also, you need to look at the other columns as well

#

What is detect? what is count?

#

They only have one unique value which means any data that is not en or 1 will screw the model up

#

Unless you don't use those features

valid glen Jan 13, 2023, 12:44 AM

#

ooohh I use count for pie graph sir 😂

night quiver Jan 13, 2023, 12:44 AM

#

ew pie graphs

valid glen Jan 13, 2023, 12:44 AM

#

😆

#

okiee sir I'll fix my data first 🙏

#

Thank youu very much

night quiver Jan 13, 2023, 12:45 AM

#

Get more data if you can, fix your data, split it into X and y, do train validation test splits if possible otherwise you may need to use crossvalidation

valid glen Jan 13, 2023, 12:46 AM

#

oohh i do have crossvalidation sir

night quiver Jan 13, 2023, 12:46 AM

#

At this moment, 5 samples is no where near enough to do anything meaningful

#

Even with crossvalidation, only having 5 samples means your model will overfit and all folds will have crap and inconsistent performance

valid glen Jan 13, 2023, 12:47 AM

#

oohhh i see i see

#

i'll try to fix my data first sir and check if it might work 🙏 thank youu very much again wolfwave

night quiver Jan 13, 2023, 12:49 AM

#

Good luck

valid glen Jan 13, 2023, 1:12 AM

#

sorry, still not working sir crowcrying

#

i ran another .py file to check the dataframe here's what I got sir

#

oooh maybe because of this one

StringIO saves (unicode) strings in memory and therefore doesn't have an encoding. If you do need a similar object with encoding you might want to have a look at BytesIO.

valid glen Jan 13, 2023, 1:41 AM

#

could there be a way to convert StringIO / utf-8 to a normal dataframe with normal strings?

night quiver Jan 13, 2023, 1:43 AM

#

Why are you even using StringIO

#

I've never had to do that before

valid glen Jan 13, 2023, 1:43 AM

#

it's the default code here for uploading in dash app sir 🙏

#

i just copied it 😅

night quiver Jan 13, 2023, 1:44 AM

#

Huh interesting

#

Have you looked at what data looks like?

valid glen Jan 13, 2023, 1:45 AM

#

valid glen i ran another .py file to check the dataframe here's what I got sir

yes sir this is what it shows

#

night quiver Jan 13, 2023, 1:46 AM

#

Well, do you realise that this line is referring to this column?

#

Which is why I told you to get rid of the first two columns earlier?

valid glen Jan 13, 2023, 1:47 AM

#

night quiver Well, do you realise that this line is referring to this column?

i changed it to 1 2 and 3 sir still showing int64 or float64 inside the terminal

night quiver Jan 13, 2023, 1:48 AM

#

Read in the data into pandas\

#

and remove all the columns that are irrelevant

#

Your data is still not tidy so don't expect the preprocessing to work

red gazelle Jan 13, 2023, 1:50 AM

#

thonks

#

What's that doing there?

valid glen Jan 13, 2023, 1:50 AM

#

i tried putting it there 😂 maybe it would work

night quiver Jan 13, 2023, 1:51 AM

#

You are trying random stuff out because you aren't trying to understand it conceptually

#

First things first, get your data cleaned properly

valid glen Jan 13, 2023, 1:52 AM

#

sorry 😔 I don't really get some of the explanations

night quiver Jan 13, 2023, 1:52 AM

#

Second, look at the naive bayes documentation

#

#

The predict method takes an array like shape and returns an array

#

So you don't have to do predict on each row one by one

#

You can take the whole series or dataframe and input it at once

#

Something else doesn't seem right about your vectorizer

valid glen Jan 13, 2023, 1:56 AM

#

I'll try to understand everything sir thank you 🙏

night quiver Jan 13, 2023, 1:57 AM

#

Oh I see

valid glen Jan 13, 2023, 1:57 AM

#

the code above works in ipynb file sir but only in py file is not working

night quiver Jan 13, 2023, 1:58 AM

#

What's your py file like

valid glen Jan 13, 2023, 1:59 AM

#

i just inserted my codes inside this sir

#

https://dash.plotly.com/dash-core-components/upload

Upload | Dash for Python Documentation | Plotly

Upload

night quiver Jan 13, 2023, 2:00 AM

#

That's not your code?

valid glen Jan 13, 2023, 2:00 AM

#

valid glen https://dash.plotly.com/dash-core-components/upload

The uploading codes are not mine sir it's what I copied here in dash plotly

night quiver Jan 13, 2023, 2:01 AM

#

I want to see your script

valid glen Jan 13, 2023, 2:01 AM

#

it looks like this

#

okie sir

mighty grottoBOT Jan 13, 2023, 2:01 AM

#

@valid glen

Snorbud🐻 Uploaded Some Code

Uploaded these files to a Gist

upload.py

View The Gist

night quiver Jan 13, 2023, 2:03 AM

#

Okay so what's the problem with it

valid glen Jan 13, 2023, 2:03 AM

#

i think the problem is with uploading the csv file

#

def parse_contents(contents, filename, date):
content_type, content_string = contents.split(',')

decoded = base64.b64decode(content_string)
try:
    if 'csv' in filename:
        # Assume that the user uploaded a CSV file
        data = pd.read_csv(
            io.StringIO(decoded.decode('utf-8')))

#

maybe because it's creating it in a different format?

night quiver Jan 13, 2023, 2:04 AM

#

Oh

#

I'm not too experience with dash actually

valid glen Jan 13, 2023, 2:05 AM

#

It's okie sir 🙏 you helped me on preprocessing my data and I am really thankful

night quiver Jan 13, 2023, 2:06 AM

#

Oh yeah btw

#

You don't have to use sir or maam

valid glen Jan 13, 2023, 2:06 AM

#

but im just a junior 😅

night quiver Jan 13, 2023, 2:06 AM

#

Doesn't really matter

red gazelle Jan 13, 2023, 2:06 AM

#

Everyone's equal here

valid glen Jan 13, 2023, 2:07 AM

#

it feels awkward 😂 but could i call you bro or just wang and hex

red gazelle Jan 13, 2023, 2:07 AM

#

Names is okay, but you don't have to use names in every message

valid glen Jan 13, 2023, 2:08 AM

#

ooohh okie2

#

Thank yooouu agaain loootss wolfwave

#

Hope you guys have an awesomee yeearr

red gazelle Jan 13, 2023, 2:09 AM

#

Wanted to ask, does the code from Dash work without adding new code into it?

#

Does it parse the csv properly?

valid glen Jan 13, 2023, 2:09 AM

#

Yes it shows a table like this

valid glen Jan 13, 2023, 2:09 AM

#

valid glen

this one

#

sorry it's a habit

red gazelle Jan 13, 2023, 2:09 AM

#

Okay, so the Dash code works then, and the CSV parser works

valid glen Jan 13, 2023, 2:10 AM

#

red gazelle Okay, so the Dash code works then, and the CSV parser works

yeess

night quiver Jan 13, 2023, 2:19 AM

#

@valid glen Hex and I have been discussing a little bit more

#

But why are you splitting your original data into training and testing data, when it's already in "production"?

valid glen Jan 13, 2023, 2:22 AM

#

night quiver But why are you splitting your original data into training and testing data, whe...

I had errors when I import a model using pickle sir saying that it the model was not fit or vectorized but I clearly followed instruction

night quiver Jan 13, 2023, 2:23 AM

#

I mean this part ```py
data1 = pd.read_csv("C:/Users/DVF2/Desktop/Check 1-8-23/testdata.csv")

def simple_split(data,y,length,split_mark=.8):
if split_mark > 0. and split_mark < 1.0:
n = int(split_mark*length)
else:
n = int(split_mark)
X_train = data[:n].copy()
X_test = data[n:].copy()
y_train = y[:n].copy()
y_test = y[n:].copy()
return X_train,X_test,y_train,y_test

vectorizer = CountVectorizer()
X_train,X_test,y_train,y_test = simple_split(data1.reviews,data1.biproduct_sentiment,len(data1))

X_train = vectorizer.fit_transform(X_train)
X_test = vectorizer.transform(X_test)

nb = MultinomialNB()
nb.fit(X_train, y_train)

#

You are running this model in a "production" dashboard

#

Why would you need to split it into training and testing

#

Whereas if you know the model is good (from development), you can just train on the whole dataset

valid glen Jan 13, 2023, 2:24 AM

#

i don't know how to get the X_train and y_train only 😅

#

im a total newbie 🙏

#

but i did save it on a model using this sir

import pickle

filename = 'nb_model'
pickle.dump(nb,open(filename,'wb'))

and load it with this one

nb = pickle.load(open(filename,'rb'))

#

worked fine for the .ipynb file and in .py file it was a big problem 😨

night quiver Jan 13, 2023, 2:31 AM

#

night quiver But why are you splitting your original data into training and testing data, whe...

@near hatch Hey, sorry for the ping. If you're available, would you be able to explain what I mean by this

night quiver Jan 13, 2023, 3:00 AM

#

Okay well, it's not that important

#

What tends to happen is you do your model development using holdout sets, whether that's train val test, k fold cv or whatever other methods.

#

After evaluating the out of sample performance, if the model is satisfactory and ready for deployment, it's typical to remove the splits and use the whole dataset for training

#

That allows the deployed model to have as much data to learn from as possible

#

@valid glen

valid glen Jan 13, 2023, 6:27 AM

#

Sorry I just woke up 🙏 got it sir, I think it is easier to understand after a good sleep kek 🙏 thank yoouu

near hatch Jan 13, 2023, 9:50 AM

#

night quiver <@304414990658437121> Hey, sorry for the ping. If you're available, would you be...

It would seem you caught me in my sleep, hah. sleepywolf

I see you already explained model evaluation, but I'd like to add that there's always a trade-off whenever you're working in model training and evaluation. If you use too much of your available data for training the model, your model will usually perform better, but your evaluation won't be reliable at all, since you did not save enough data for it. The opposite holds true as well, if you use too much of your data for model evaluation, it's probably gonna be a reliable evaluation, but the model will likely underperform since it did not train on enough samples.

The moment you're satisfied with your validation results you might want to deploy the model to production. Like wang said, you'll probably want to train using the entirety of the data available for you. This is because by then you'll already have a lower bound estimate on how that model performs. Which means your model trained using all data will generally outperform your train/validation/test model. In some sense, the development using holdout sets is a pessimistic (although necessarily so) estimate on how your final model will likely perform.

night quiver Jan 13, 2023, 9:52 AM

#

@valid glen ^^^

valid glen Jan 13, 2023, 9:57 AM

#

I love you guys 🫶

valid glen Jan 13, 2023, 12:24 PM

#

Haiiiii wolfwave It worked now

#

Thank youuu thank youuu thank youuu 🫶

mighty grottoBOT Jan 13, 2023, 12:27 PM

#

near hatch It would seem you caught me in my sleep, hah. <:sleepywolf:860657676776898620> ...

@near hatch has been given 8 kudos from @valid glen and 1 other member.

mighty grottoBOT Jan 13, 2023, 12:28 PM

#

night quiver <@696585867078402139> ^^^

@night quiver has been given 4 kudos from @valid glen.

#'numpy.int64' object has no attribute 'lower' - Python Dash