#'numpy.int64' object has no attribute 'lower' - Python Dash

207 messages ยท Page 1 of 1 (latest)

valid glen
#

Hi I'm a newbie in Python, I don't know how to proceed I just need to put values inside a column using this code

for i in data.index:
if x <= len(data):
data.loc[x,['nb_prediction']] = nb.predict(vectorizer.transform([data.iat[x,1]]))[0]
x += 1

The error that I'm getting is 'numpy.int64' object has no attribute 'lower' but this code works when I run it on an ipynb file.

Is there some other way instead of doing a loop ๐Ÿ˜“

red gazelle
#

You don't use lower anywhere in that code though?

#

Is that all the code you have?

#

And what's the full error message?

valid glen
#

is it okay if I send a python file?

red gazelle
#

If the code is short enough, you can use codeblocks as well

#

!formatting

mighty grottoBOT
#
Code Formatting

When sharing code with the community, please use the correct formatting for ease of readability.

Example

```py
YOUR CODE HERE
```

Those are back ticks not single quotes, typically the key above TAB

valid glen
#

the error on my VSC is just showing numpy.int64 object has no attribute lower ๐Ÿ˜”

red gazelle
#

No stack trace?

#

o.o

valid glen
#

i mean on the terminal sorry I'm a newbie

red gazelle
#

Do you use .lower() anywhere in that code?

valid glen
#

no sir

#

it's working on my ipynb file though but it's not working when i put it on a .py file sadpanda

red gazelle
#

What's in the "Problems" tab?

#

OHHH

#

It comes from vectorizer

#

That's expecting a string

valid glen
#

this is my ipynb files i practice there cause i run in code chunks it's way easier ๐Ÿ˜…

red gazelle
#

At least that's what stackoverflow is saying about it

#

.lower() is a string method

valid glen
#

but my target is already a string sir

#

let me check my csv file

red gazelle
#

numpy.int64 implies it's an integer

#

64bit integer, to be exact

valid glen
#

ohhh

#

here's an example of me dataframe

#

is [0,1] correct?

red gazelle
#

Haven't used dataframes enough to be able to tell you

valid glen
#

ohh I see ๐Ÿ™

red gazelle
#

Gonna ping @night quiver here cause they know more about NumPy than I do

night quiver
#

wassup

valid glen
#

thank youu very much ๐Ÿ™

#

hii sir

#

I'm having an numpy.int64 problem but my target is a string

night quiver
#

Where are you using .lower()? None of the code you sent has it

valid glen
night quiver
#

What does data look like?

#

Not the excel spreadsheet but the pandas dataframe

valid glen
#

ooohh i haven't checked it out yet, cause it's inside a def i dunno how to break down the code to check the dataframe after uploading it sadpanda

night quiver
#

also what does data.dtypes give you

#

What are you trying to do? Some sort of naive bayes classification?

valid glen
#

i just want to label the reviews positive/negative

valid glen
night quiver
#

Okay that all looks fine I think

valid glen
#

I think this is the last problem that i will encounter sir ๐Ÿ˜‚

#

after this it's just all graphs ๐Ÿ™

night quiver
#

Truth be told, I'm a bit confused by your code

valid glen
#

sooryy ๐Ÿ˜ญ

night quiver
#

What is all this doing?

#

Also what is vectorizer?

#

Wait, what naive bayes package are you using?

#

Why are you predicting it line by line

valid glen
valid glen
#

like this

night quiver
#

Hmmm this is a very strange way to do it

valid glen
valid glen
#

i cant find a cleaner code in the internet so i had to improvise and it actually worked ๐Ÿ˜‚

night quiver
#

How are you learning all this? Is this a school thing or are you self teaching yourself?

valid glen
#

im teaching myself sir for our project in school ๐Ÿ™

night quiver
#

I see

#

Ummm

#

I see a few problems with your data

#

Even before the model building

valid glen
#

yes sir

#

can I ask what the problems are sir sadpanda

mighty grottoBOT
#

@valid glen

Snorbud๐Ÿป Uploaded Some Code

here's the whole code sir if this is okay to send here

Uploaded these files to a Gist
night quiver
#

Well first of all, the first two columns should be removed

#

They are not useful to the model

red gazelle
#

Oh

valid glen
#

ooohh

red gazelle
#

I can spot the issue to start off with as well

night quiver
#

Also you have a tiny sample size?

valid glen
#

yes sir

red gazelle
#

CountVectorizer is expecting a text document

mighty grottoBOT
#

@valid glen

Snorbud๐Ÿป Uploaded Some Code
Attachment: finaluploadtest.csv
,Unnamed: 0,reviews,review_stars,detect,sentiment_score,difference,spam_classification,product_sentiment,biproduct_sentiment,count
0,0,i bought six month later bright white line appear i contact acer warranti i dismay appar known extra includ product i dismay expect pay ship texa repair whi becaus defect ship manufactur year do not recommend i stick appl product i done past year zero issu,-0.5,en,-0.8,0.3,ham,negative,0,1
1,1,work great put year old break everyth,1.0,en,0.62,0.38,ham,positive,1,1
2,2,it first macbook qualiti standard my issu i bought june say produc the titl mislead make think model i hope sit shelf two year look like,0.5,en,0.49,0.01,ham,positive,1,1
3,3,the item ship describ shown they awar howev notifi ship think buy this touch screen fold back shown i would give star possibl,-1.0,en,-0.3,-0.7,ham,negative,0,1
4,4,deliveri time great qualiti outstand,1.0,en,0.84,0.16,ham,positive,1,1

valid glen
#

oops

#

um just a sec

night quiver
#

But they seem to be indexing the wrong column

valid glen
#

here's a csv file sir ๐Ÿ™

red gazelle
#

Yea, and it's being ran on a colum containing numpy.int64 instead of text

night quiver
#

You don't need to share the csv with us it's fine lol

valid glen
#

ooooh

red gazelle
#

(also, Google Drive tends to leak your private information, such as emails/names)

valid glen
#

oops sorry

night quiver
#

@valid glen First thing you need to do before any preprocessing for the model is the make sure your data is clean and usable

#

At the moment it's not

valid glen
#

I see sir ๐Ÿ™

night quiver
#

Get rid of the first two columns

#

And if you only have 5 rows in your data, your model is not going to be any good

#

Also, you need to look at the other columns as well

#

What is detect? what is count?

#

They only have one unique value which means any data that is not en or 1 will screw the model up

#

Unless you don't use those features

valid glen
#

ooohh I use count for pie graph sir ๐Ÿ˜‚

night quiver
#

ew pie graphs

valid glen
#

๐Ÿ˜†

#

okiee sir I'll fix my data first ๐Ÿ™

#

Thank youu very much

night quiver
#

Get more data if you can, fix your data, split it into X and y, do train validation test splits if possible otherwise you may need to use crossvalidation

valid glen
#

oohh i do have crossvalidation sir

night quiver
#

At this moment, 5 samples is no where near enough to do anything meaningful

#

Even with crossvalidation, only having 5 samples means your model will overfit and all folds will have crap and inconsistent performance

valid glen
#

oohhh i see i see

#

i'll try to fix my data first sir and check if it might work ๐Ÿ™ thank youu very much again wolfwave

night quiver
#

Good luck

valid glen
#

sorry, still not working sir crowcrying

#

i ran another .py file to check the dataframe here's what I got sir

#

oooh maybe because of this one

StringIO saves (unicode) strings in memory and therefore doesn't have an encoding. If you do need a similar object with encoding you might want to have a look at BytesIO.

valid glen
#

could there be a way to convert StringIO / utf-8 to a normal dataframe with normal strings?

night quiver
#

Why are you even using StringIO

#

I've never had to do that before

valid glen
#

it's the default code here for uploading in dash app sir ๐Ÿ™

#

i just copied it ๐Ÿ˜…

night quiver
#

Huh interesting

#

Have you looked at what data looks like?

valid glen
night quiver
#

Well, do you realise that this line is referring to this column?

#

Which is why I told you to get rid of the first two columns earlier?

valid glen
night quiver
#

Read in the data into pandas\

#

and remove all the columns that are irrelevant

#

Your data is still not tidy so don't expect the preprocessing to work

red gazelle
#

What's that doing there?

valid glen
#

i tried putting it there ๐Ÿ˜‚ maybe it would work

night quiver
#

You are trying random stuff out because you aren't trying to understand it conceptually

#

First things first, get your data cleaned properly

valid glen
#

sorry ๐Ÿ˜” I don't really get some of the explanations

night quiver
#

Second, look at the naive bayes documentation

#

The predict method takes an array like shape and returns an array

#

So you don't have to do predict on each row one by one

#

You can take the whole series or dataframe and input it at once

#

Something else doesn't seem right about your vectorizer

valid glen
#

I'll try to understand everything sir thank you ๐Ÿ™

night quiver
#

Oh I see

valid glen
#

the code above works in ipynb file sir but only in py file is not working

night quiver
#

What's your py file like

valid glen
#

i just inserted my codes inside this sir

night quiver
#

That's not your code?

valid glen
night quiver
#

I want to see your script

valid glen
#

it looks like this

#

okie sir

mighty grottoBOT
#

@valid glen

Snorbud๐Ÿป Uploaded Some Code
Uploaded these files to a Gist
night quiver
#

Okay so what's the problem with it

valid glen
#

i think the problem is with uploading the csv file

#

def parse_contents(contents, filename, date):
content_type, content_string = contents.split(',')

decoded = base64.b64decode(content_string)
try:
    if 'csv' in filename:
        # Assume that the user uploaded a CSV file
        data = pd.read_csv(
            io.StringIO(decoded.decode('utf-8')))
#

maybe because it's creating it in a different format?

night quiver
#

Oh

#

I'm not too experience with dash actually

valid glen
#

It's okie sir ๐Ÿ™ you helped me on preprocessing my data and I am really thankful

night quiver
#

Oh yeah btw

#

You don't have to use sir or maam

valid glen
#

but im just a junior ๐Ÿ˜…

night quiver
#

Doesn't really matter

red gazelle
#

Everyone's equal here

valid glen
#

it feels awkward ๐Ÿ˜‚ but could i call you bro or just wang and hex

red gazelle
#

Names is okay, but you don't have to use names in every message

valid glen
#

ooohh okie2

#

Thank yooouu agaain loootss wolfwavewolfwavewolfwavewolfwavewolfwave

#

Hope you guys have an awesomee yeearr

red gazelle
#

Wanted to ask, does the code from Dash work without adding new code into it?

#

Does it parse the csv properly?

valid glen
#

Yes it shows a table like this

valid glen
#

sorry it's a habit

red gazelle
#

Okay, so the Dash code works then, and the CSV parser works

night quiver
#

@valid glen Hex and I have been discussing a little bit more

#

But why are you splitting your original data into training and testing data, when it's already in "production"?

valid glen
night quiver
#

I mean this part ```py
data1 = pd.read_csv("C:/Users/DVF2/Desktop/Check 1-8-23/testdata.csv")

def simple_split(data,y,length,split_mark=.8):
if split_mark > 0. and split_mark < 1.0:
n = int(split_mark*length)
else:
n = int(split_mark)
X_train = data[:n].copy()
X_test = data[n:].copy()
y_train = y[:n].copy()
y_test = y[n:].copy()
return X_train,X_test,y_train,y_test

vectorizer = CountVectorizer()
X_train,X_test,y_train,y_test = simple_split(data1.reviews,data1.biproduct_sentiment,len(data1))

X_train = vectorizer.fit_transform(X_train)
X_test = vectorizer.transform(X_test)

nb = MultinomialNB()
nb.fit(X_train, y_train)

#

You are running this model in a "production" dashboard

#

Why would you need to split it into training and testing

#

Whereas if you know the model is good (from development), you can just train on the whole dataset

valid glen
#

i don't know how to get the X_train and y_train only ๐Ÿ˜…

#

im a total newbie ๐Ÿ™

#

but i did save it on a model using this sir

import pickle

filename = 'nb_model'
pickle.dump(nb,open(filename,'wb'))

and load it with this one

nb = pickle.load(open(filename,'rb'))

#

worked fine for the .ipynb file and in .py file it was a big problem ๐Ÿ˜จ

night quiver
night quiver
#

Okay well, it's not that important

#

What tends to happen is you do your model development using holdout sets, whether that's train val test, k fold cv or whatever other methods.

#

After evaluating the out of sample performance, if the model is satisfactory and ready for deployment, it's typical to remove the splits and use the whole dataset for training

#

That allows the deployed model to have as much data to learn from as possible

#

@valid glen

valid glen
#

Sorry I just woke up ๐Ÿ™ got it sir, I think it is easier to understand after a good sleep kek ๐Ÿ™ thank yoouu

near hatch
# night quiver <@304414990658437121> Hey, sorry for the ping. If you're available, would you be...

It would seem you caught me in my sleep, hah. sleepywolf

I see you already explained model evaluation, but I'd like to add that there's always a trade-off whenever you're working in model training and evaluation. If you use too much of your available data for training the model, your model will usually perform better, but your evaluation won't be reliable at all, since you did not save enough data for it. The opposite holds true as well, if you use too much of your data for model evaluation, it's probably gonna be a reliable evaluation, but the model will likely underperform since it did not train on enough samples.

The moment you're satisfied with your validation results you might want to deploy the model to production. Like wang said, you'll probably want to train using the entirety of the data available for you. This is because by then you'll already have a lower bound estimate on how that model performs. Which means your model trained using all data will generally outperform your train/validation/test model. In some sense, the development using holdout sets is a pessimistic (although necessarily so) estimate on how your final model will likely perform.

night quiver
#

@valid glen ^^^

valid glen
#

I love you guys ๐Ÿซถ

valid glen
#

Haiiiii wolfwavewolfwavewolfwavewolfwavewolfwave It worked now

#

Thank youuu thank youuu thank youuu ๐Ÿซถ

mighty grottoBOT
mighty grottoBOT