#data-science-and-ml

1 messages ยท Page 293 of 1

shut slate
#

'Index' object is not callable

#

๐Ÿ˜ฆ

lavish swift
#

if I remember right, when I rename columns, I use a dict like

df.rename(columns = {'old_name' : 'new_name'}, inplace = True)

So if you want to use a list perhaps you can create a dict by combining df.columns and newColumns?

shut slate
#

Hi M00se

#

I remember you

#

You helped me b4

lavish swift
#

hey! I thought your name looked familiar ๐Ÿ™‚

shut slate
#

what is a dict?

#

lol

lavish swift
#

dictionary

#

key/value pairs

shut slate
#

how do you create a dictionary?

#

lol

lavish swift
#

trying to work on a few things right, maybe someone can help with that in the meantime, if not see if you can do some searching. If you're still stuck I'll try to check back in

shut slate
#

ye i am. Thanks

hollow sentinel
#
myDict = {"dog":5, "cat":6}
#

that is a dictionary

#

dictionaries have key value pairs

#
myDict["dog"] = 5
#

this is how you access the value of a given key

#

good resource for learning dictionaries

#

take your time and understand these bc they're important

#

understanding the basic data structures is imo more important than Pandas

trim oar
#

These are very commonly used data structures

hollow sentinel
#

maybe learning linked lists/other data structures would be good too

#

before you just randomly throw yourself into Pandas

grave frost
#

he is right, you should build a desktop with your own GPU. why? firstly, the cloud is expensive to run (like colab). You won't believe how easy it is to put your credit card on GCP and forget to terminate the instance (or maybe due to a bad connection it didn't terminate) leading to lots of loss of money.

Next up is competitions. I had the same naive mindset but soon I realized that hypertuing takes a looong time (despite using some pretty advanced stuff and Dask). You are so much better off if you build your own desktop and have it run while you sleep/watch YT. so If your end goal is competitions (kaggle, etc.) then probably building a desktop is best.

IMO the only reason you should opt cloud is when you just can't afford to buy a desktop. AMD Ryzen 7 + RTX titan seems a pretty solid choice (albeit expensive, you can use it for gaming or start with 3080ti and move on).

stuff like Colab only offer 16Gb GPU's which runs out pretty quick. sinceVRAM is a very limiting factor when doing DL, going for a multi PCie mobo is good cuz later you can put 2 3080's or something cheaper than that

#

usually, my recommendation is cloud to everybody to get started, but since you are already making a PC, you just need an Nvidia GPU and you are good to go for both gaming and DL

shut slate
#

Ok thank you guys

serene scaffold
misty flint
# shut slate Ok thank you guys

i know you got your problem fixed but another way to tell jupyter to find your file is by running this in a separate cell
%cd [copy and paste your filepath here]

#

without the brackets lol

shut slate
#

I see

#

Thanks

iron basalt
misty flint
shut slate
#

I mean I did some basic python programs before, bt if yo tackle a real problem you lean more. Evem if its trial and error

misty flint
#

ye

serene scaffold
shut slate
#

like if statements, loops, lists etc

iron basalt
#

Yeah I recommend practice problems that are concrete and may force you too learn a new data structure. But going straight to pandas is like the final boss. Pandas itself is implemented with the knowledge of a bunch of data structures and python concepts (dunderscore methods, boolean masks, etc).

hollow sentinel
#

dunder methods are the underscore methods right?

#

like _iter

#

why is that italicized idk

iron basalt
#

(And even makes use of python's dynamic nature by allowing stuff like both strings and integers for keys).

#

dunder = double under, like __init__ is one of them.

#

They let you do things like implement [] for your custom class.

shut slate
#

Ye idk, I am doing a big data certificate and they dove straight in. Keep in mind that I did some programming b4, even if it was really basic. The people who literally never saw code b4 just copy paste from professors notes lol

#

Gl to them

grave frost
iron basalt
#
>>> class Foo:
...     def __init__(self):
...             self.x = 10
...     def __getitem__(self, key):
...             print(key, self.x)
... 
>>> foo = Foo()
>>> foo["bar"]
bar 10
>>> 
astral path
#

is there a way to run R gstat functions in python?

#

i'm trying to make an H-Scatterplot but it appears to only be implemented in R

exotic maple
iron basalt
#

yes and setitem

#

From that example it should be obvious now why Pandas can do many different things depending on the type of the key.

#

(btw numpy also does this)

misty flint
#

but pandas is like the better sibling over numpy

bitter harbor
#

those are fighting words

grave frost
# misty flint but pandas is like the better sibling over numpy

I disagree - both were made in mind with some specific core philosophies that served different needs. Oftentimes you would use both (like pandas for manipulating the DF and extracting some data in form of numpy arrays). Plus a lot of libs have direct support for numpy arrays, so can I say that numpy is superior??

bitter harbor
#

isn't pandas built on numpy?

grave frost
#

pandas is an open-source library built on top of numpy

bitter harbor
#

just making sure im not going insane ๐Ÿ˜„

#

they both have their uses and what not, if you've ever tried dealing with strings with numpy, you'll quickly understand the benefit of pandas

#

but i'd argue pandas is a lot more specific in its use cases

serene scaffold
#

I'd actually argue that pandas is more general

grave frost
bitter harbor
#

haha i've actually been doing the same for my data course work

grave frost
serene scaffold
# bitter harbor how so

Pandas is for tabular data in general. You could even use it just to read a csv and write it back to file with a different delimiter. And it has math operations, but it has string operations and database operations, too

#

Numpy is for math.

oak elk
#

Got it thannk you man

bitter harbor
serene scaffold
grave frost
uncut barn
#

Are there CNNs where it takes a bunch of images and outputs the probability of it being a dog or another animal and then after that we take all the images that were predicted as a dog and feed it into another CNN to determine the probabilties of the breed of the dog?, if so can you point me to a link where someone has done it?

bitter harbor
grave frost
#

model outputs the probablity and we usually chose the one with the maximum score

uncut barn
#

im talking about a CNN to a another CNN

#

so predict the animal, then of those animals that were predicted as a dog feed it into another CNN to predict their breed

bitter harbor
#

you can chain models together where ones output is the input of the next

grave frost
bitter harbor
#

but you've still got 2 separate models

rough shore
#

How is it possible to learn ai?

uncut barn
#

yh I know the intermediate step is confusing me

grave frost
uncut barn
#

do i use the argmax to filter out the original dataset to get those images that were predicted as a dog?

serene scaffold
uncut barn
#

i'm talking hypothetically, i know there is an animals/ dogs and cats data set

grave frost
#

you need another one for breeds

#

I don't see how it is complicated: first model tells you its a cat or dog, second one tells you the breed. (you could compress it to one model too like dog.labrador)

uncut barn
#

do i feed the images that were dogs (which was predicted by the first CNN) by filtering out using the indices?

grave frost
#

indices of what?

uncut barn
#

indices of the images

grave frost
#

model outputs a probability distribution, not a dataset

uncut barn
#

so if the 6 th image was predicted as a dog due to the probability being the greatest, would i need to get the 6th image in the original dataset and feed that into the 2nd CNN?

ripe forge
#

But yes, argmax on that and take a subset of images that belong to the same dog class for step 2. This would be something you do during inference, at training you know which ones are dogs and so on

#

So during training you take the correct subset based on ground truth.

uncut barn
misty flint
#

anybody use lexnlp? cant get package to install

#

probs bc of dependencies

astral path
#

How do I take a dataframe like this

#

and add a new column with the mean value of the columns count and avg_plays for each artist?

#

so for the artist Marcioz, it would have a new column with the mean of all count values in a row in which the artist is Marcioz, and a column for the avg_plays equivalent

serene scaffold
exotic maple
#

you can use pivot_table

#

or df.groupby("artist").agg({"count":sum, "avg_plays": np.average})

astral path
#

like aggregating count by artist except it's the mean

#

ok yeah yours makes sense

astral path
misty flint
#

windows

exotic maple
#

your data types are probabily wrong then

#

did you check each column dtype?

astral path
#

yea

#

im changing it to df.groupby("artist").agg({"count":np.average}) because that's all i need right now

#

that would work right?

shut slate
#

Hi guys

#

Why does this not rename?

serene scaffold
#

It doesn't change the name of the existing one

shut slate
#

it changes other stuff lol

serene scaffold
#

Oh you did in place. Hmm

shut slate
#

this works

astral path
#

df.groupby("artist")['count'].transform(np.average)

shut slate
#

like i want to do df.claim type.mean()

#

So I wanted to put underscore

#

But wtf lol

iron basalt
#

run print(df.columns) and show output

#

(before edit)

shut slate
#

Oh I see

#

There is a space

iron basalt
#

Ah I guessed correctly

#

I have seen this happen also when people try to select rows by name and the name has a space (like space in front of artist name).

#

Some spaces can sneak in from excel and other places.

shut slate
#

Thank you dude ๐Ÿ™‚

#

it worked

iron basalt
#

I prefer df.rename(columns={...}), less confusing than axis=1 or whatever.

shut slate
#

Hi guys

#

one last question

#

Can I get the mean of a float or do I have to change it to a float?

lapis sequoia
#

I want to start using Python for scripting in text-processing

#

Is it possible to change a file contents without opening it with help of python?

exotic maple
#

try using np.mean or np.average

#

also

#

to aggregate

#

use .agg()

distant hedge
#

Hey guys, quick question. How can I store df in x and then continue editing x?

x = x.loc[x['ID' != 0] = 'Hello'
exotic maple
#

you want...a dataframe as a column of another dataframe?

distant hedge
#

I am working with excel that needs a lot of manipulations. For example we have John, Mike, and Angela. I need to edit things in their dataframes before saving for each person.

distant hedge
exotic maple
#

eh

#

just copy it?

#

x = df

#

lol

#

that or i'm not understanding

umbral sierra
#

Someone to help :p

distant hedge
distant hedge
# exotic maple that or i'm not understanding
import pandas as pd
import re

df = pd.read_csv('old.csv')

#Step1 - Finding all value in column Name beginning with Ale
x = df.loc[df['Full Name'].str.contains('^Ale[a-z]*', regex=True)]

#Step 2 - Replacing all the rows in column name by Alex
x = x.loc[x['Full Name'] != 0] = 'Alex'

#Step 3 - Getting back to original dataframe + Step 1
y = df.loc[df['Full Name'].str.contains('^Je[a-z]*', regex=True)]

#Step 4 - Step 2 but replacing by Jessica
y = y.loc[x['Full Name'] != 0] = 'Jessica'
#

So, this doesn't work... and spits out an error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-18-fc63323fe12c> in <module>
      3 
      4 #Step 2 - Replacing all the rows in column name by Alex
----> 5 x = x.loc[x['Full Name'] != 0] = ['Alex']
      6 
      7 # #Step 3 - Getting back to original dataframe + Step 1

AttributeError: 'list' object has no attribute 'loc'
serene scaffold
distant hedge
#

Sure

serene scaffold
#

Something just came up so it may be a bit

lavish swift
hollow sentinel
#

what's import re?

#

oh regex

#

cool

arctic wedgeBOT
#

Hey @distant hedge!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

distant hedge
#

CSV not allowed, let me upload it somewhere

distant hedge
lusty coral
distant hedge
lusty coral
#

Apply gets the row of the dataframe as a series. Then you can do whatever you want with it

coral cloak
#

so I'm splitting a string and adding each word to a list like so:

" ".join(x).split()

is there a way to NOT add a word if it doesn't meet some criteria (specifically checking if the word exists in another list)? having trouble with list comprehensions here

lusty coral
#

Via if of course

#

.... in ....

coral cloak
#

y = " ".join(x).split() if x.split() not in LIST

invalid syntax

tidal bough
#

" ".join(word for word in x if ...).split()

#

where ... is your condition.

eager umbra
#

Is there someone in here that I can ask a couple pandas questions to? I am trying figure out how to sum a column based on other column data. Kind of like a Sumifs in excel, but am having a hard time figuring it out.

tidal bough
#

word for word in x if ... here is a generator expression.

lavish swift
#

Results

#

maybe verify you're reading in the right csv?

#

do a df.head()?

#

just to verify? Weird otherwise

distant hedge
lavish swift
# distant hedge That's what I have as well, however, if you try to further modify output of data...

You had mentioned when printing x it printed funny for you. I'd try to get that printing right before you move on. It should be a dataframe, and from here it certainly looks like you're doing it right. The only difference I'd mention is the file you uploaded was Old.csv and not old.csv (you have all lowercase in your code). Though that might not be an issues if you simply created a copy to post here?

distant hedge
distant hedge
lavish swift
#

huh...that's different still. The first time it was:
AttributeError: 'list' object has no attribute 'loc'

now it's talking about a 'str' object. Something is funky with x

#

so you're not seeing an error until that line you pointed out, but I think something is happening before that line that is ultimately causing the error

distant hedge
#

Odd, because I have no errors if I comment it out

#

and I only have Step 1 and 2 as lines

coral cloak
#

Thanks for the help, guys. Unfortunately, I don't seem to be getting the output I want here (a list of most common words in the data that is not in LIST)

lavish swift
# distant hedge

Ok, I THINK I figured out what was causing your issue? x = x.loc[x['Full Name'] != 0] = 'Alex' This line didn't tell pandas which column you wanted to set the value for, so x became just a string 'Alex'

#

I think what you want is something like this:

x.loc[x['Full Name'] != '', ['Full Name']] = 'Alex'

Where you're setting the full name of each column to "Alex"

#

you can probably even simplify it further since your x dataframe will only be Ale* based on your regex,

dull path
distant hedge
lavish swift
#

yeah, I had to read in the CSV again, since it was already "Alex"

distant hedge
#

I am not sure why our 1st line did not work with .loc but the second with .replace did ๐Ÿ˜†

distant hedge
lavish swift
#

sure thing! Glad ya got it working!

distant hedge
#

@lavish swift Thank you, I am trying to automate a reporting that takes 3-4 hours with pandas. I think this should be my last challenge. This error made me think that we can only use df to call dataframe ha ha.

serene scaffold
#

Ping to reply; I'm going to another channel.

granite wolf
#

Anyone got any ideas why my matplotlib graphs are printing like this?

bronze skiff
#

did you sort by datetime before plotting?

#

looking at it, you didn't

granite wolf
#

I did that previously and thought it did it automatically as when i sort manually i get plots like this lemon_thinking

#

okay so its sorting dates incorrectly

tidal bough
paper lake
# granite wolf

u should sort the dates first with the corresponding values before plotting

granite wolf
paper lake
#

is this covid?

granite wolf
#

I think it's the to_datetime part which is converting them wrong

granite wolf
paper lake
granite wolf
#

thanks for the help

#

it seems a lot like the to_datetime has converted the dates wrong as clearly the maldives hasnt recorded 17507 cases on the 2nd of december, 9 months from now

paper lake
#

@granite wolf just remembered i have a notebook for covid

#

gotta check my drive

#

opening it on google colab now ๐Ÿ˜„

#

feel free to copy my notebook

granite wolf
#

thanks a lot ๐Ÿ˜€

misty flint
#

@paper lake aww so friendly cattohug

misty flint
lament fiber
#

Is this an appropriate channel for asking about pandas, or would that go into a different channel?

misty flint
#

yes you can ask here

lament fiber
#

If this is the correct channel, my question is about the distinction between float64 and Float64. Specifically, looking for any documentation on the latter (and e.g., why .astype("Float64").astype("Int64") works to convert a floating-point value to an integer with no errors or warnings).

#

As you can imagine, it's impossible to Google for Float64, since all the hits are for float64.

distant hedge
#

Hey guys, I am pulling my hair out. Is there a way to replace all values in a row by a string? All values are unique strings.

misty flint
native patrol
velvet thorn
#

nan vs pd.NA

lament fiber
#

So an ExtensionArray is an internal pandas thing that's kind of an abstraction of a 1D array, but just for pandas-internal types?

#

Is there a way to do something like .astype("Int64") on a Float64 ExtensionArray and have it fail, instead of returning all the values truncated to an integer? Or do I need to just use a separate thing to test if the values are actually integers vs. floats, and change behavior accordingly?

ripe forge
#

Separate thing to test

#

Astype* is an explicit instruction to coerce the values. Expecting it to then break like that would be... Inappropriate.

lament fiber
#

From my perspective, the Pythonic idea of duck-typing, "try something and see if it fails rather than testing it" would make it make more sense to fail out if it can't work, rather than changing values.

ripe forge
#

!e print(int(3.14))

arctic wedgeBOT
#

@ripe forge :white_check_mark: Your eval job has completed with return code 0.

3
native patrol
#

^

ripe forge
#

This isn't about something hypothetical or abstract. Its a literal instruction to "give me an int from this".

native patrol
#

if you don't care about floating point precision
you can use series.astype('int').eq(series).all()
or use np.isclose

misty flint
ripe forge
#

So if you use a type coercion, the behaviour of just getting the int portion is more practical for normal use*

lament fiber
#

Notably, you can't do .astype("float64").astype("Int64") and have it work. So in that case, it doesn't just say "you asked for an Int64, we're giving you one regardless.

#

In that case, you get a TypeError about "cannot safely cast non-equivalent float64 to int64."

#

So if the behavior is supposed to be "astype will always return the requested type, even if that requires unsafe downconversions," it isn't consistent.

#

Some examples (using the Python bot correctly, I hope):

#

!e pd.Series([1.1, 2.1, pd.NA]).astype("Int64")

arctic wedgeBOT
#

You are not allowed to use that command here. Please use the #bot-commands channel instead.

lament fiber
#

OK, I don't have bot access, and would need to import pandas anyway. Regardless, my examples were going to be that (which fails), that with .astype("float64") in the middle (which fails), and that with .astype("Float64") in the middle (which succeeds). This is a bigger difference between float64 and Float64 than just "one uses np.nan, the other uses pd.NA."

#

And maybe should be documented somewhere, if this is the intended behavior.

distant hedge
#

Can someone help me to figure out why when I define variable at df, it converts it into a string?

ripe forge
#

It probably is documented somewhere I assume. The only thing I'd add is, this would be pandas specific decisions, and you may find further surprises as you explore the library. Fair heads up.

#

Not all python semantics will be consistently used by pandas. It makes its own set of assumptions

ripe forge
#

At that point the variable x has no relation to the df, you assigned Alex to it*

distant hedge
ripe forge
#

Okay. And did your current code not do it?

#

Just for context, x = blahblah = "Alex" is evaluated as blahblah = "Alex" and then separately x = "Alex" so don't chain assignments like that, that variable x is useless for you.

#

You might as well remove that entire line

distant hedge
#

I am using x because i am working on a large dataframe that has many people and I have to do specific changes to each person's results.

#

In that sense, what I am trying to achieve is to have Alex = filtered data frame by the person that is assigned.
Right now step 1 is to find occurrences when "Ale" is at the beginning of the name and then store it in df.
Second step is to take that df, and replace all column cells by "Alex" instead of "Aleksandu, Michel", "Aleksey, Jess", "Alexxandr", etc.

ripe forge
#

All that doesn't matter to me. I'm telling you that the way you wrote the syntax, x is literally the string alex

#

And this is because you explicitly wrote syntax that works that way. Which means x is clearly not what you "hoped" it would be assigned to. There's a mismatch between what you wanted to do and the syntax you wrote.

distant hedge
#

But it works without x right below that code

ripe forge
#

Yes. It even works with x too. For an explanation of what the syntax did read my earlier message

#

All I'm saying is, you did not put a dataframe in x. I know you wanted to.. But that's not what you wrote for python.

distant hedge
#

ahhh I see, had to give it a second look. In that case, is there a way to fix it?

ripe forge
#

Yes. Just assign to x separately in a new line

#

Don't use chained assignments.

#

Btw these kind of errors are called "logical" errors. Very tough to spot because it's essentially valid syntax that doesn't match what you wanted to write. But once you understand that these kinds of issues can happen, it makes it a lot easier to spot one later.

distant hedge
#

Thank you Darr. That's a lesson learned. I am still struggling to do what I want. :/

#

I am not sure how else I could overwrite all cells in a column

ripe forge
#

Okay. I'll take a guess as to what I think you needed. In cell 3 you're indexing using a boolean array. Save that boolean array separately

#

(ps. I'm on phone so typing code is not really easy. So you'll have to help me out here a bit)

distant hedge
#

No worries haha

ripe forge
#

So see the part inside the square brackets? Just assign that separately to a variable

#

Maybe give it a name, alex_indexer or whatever name makes sense to you

#

Then cell 4 shouldn't exist, and you can directly use cell 5. That updates the df

#

Afterwards, if you only wanted the rows with df with Alex, you just write x = df[alex_indexer]

#

I assume this is what you wanted.

distant hedge
distant hedge
ripe forge
#

Hm. Could you reshare current code and outputs, and then perhaps describe what you needed?

distant hedge
#

Sure, just a sec ๐Ÿ™‚ I will need to clean it up or I will confuse you.

ripe forge
#

Hi jae, we don't allow self promotion or request for jobs on our discord.

misty flint
#

!rule6

#

hmm

ripe forge
#

Needs a space.

misty flint
#

!rule 6

arctic wedgeBOT
#

6. No spamming or unapproved advertising, including requests for paid work. Open-source projects can be shared with others in #python-general and code reviews can be asked for in a help channel.

misty flint
#

ah thanks

distant hedge
#

@ripe forge

ripe forge
#

Ah see, okay. So question, did df get updated so far?

#

Also I don't think your x.index.isin part makes any sense whatsoever.

#

You're doing an isin on x

distant hedge
#

Yes, multiple times, since I needed to delete rows and rename some and delete people by age, etc.

ripe forge
#

Er no, let me rephrase

#

Does that cell 7 where you update x modify df in the code shown as is? Explicitly the df variable

#

I also think we might need better names for your variables. That can help while talking about it ๐Ÿ˜…

distant hedge
#

I've been banging my head against the wall for the past 6hrs with this. My brain is barely alive.

misty flint
#

๐Ÿ•ฏ๏ธ

distant hedge
ripe forge
#

Okay. But yeah this isn't how I'd be updating df

#

I mentioned my approach earlier, using an indexer. But I suppose I'll share the core principle or tip

#

If you want to update a df, you should work on that df directly. Easiest way to avoid headaches

distant hedge
ripe forge
#

Ah Cool.

#

So you should have all you need for now then, yeah? Your df is being updated properly so I guess you can keep going

distant hedge
misty flint
ripe forge
#

Np! I hate to break it to you, that line is not ideal at all, I'm like 99%sure x.loc['Full Name'] = "Alex" would have worked just fine

#

But yeah ๐Ÿ˜…

paper lake
misty flint
#

ah that reminds me

#

i didnt finish working through some pandas exercises from this morning

#

should i do them now or continue to read about stats

#

fun fact: the pandas exercises helped me do well on my dataframes quiz

#

stats it is

distant hedge
ripe forge
#

If you want, give it a try now for good luck

distant hedge
#

When I use x.loc['Full Name'] = "Alex" it renames all the other columns to Alex

ripe forge
#

Because I can't take guarantees for the code you had before.

distant hedge
#

I will show you with clean code.

misty flint
#

good luck

distant hedge
#

clean code

ripe forge
#

Cool, ty

#

Guess it's just going to still be treated as a chain. Good thing the warning shows up

misty flint
#

anybody seen this before? its an open source book

#

might check out chapter two since it dives more into the math

ripe forge
#

Yep it's good

iron basalt
misty flint
iron basalt
#

Starts with curve fitting, probability, decision theory, etc.

#

DL should be pretty straight forward after that book.

#

(And the many non-DL things covered too)

misty flint
#

i like that it starts with that

#

thanks bud, ill add this one to the list. but i might get the physical copy since it looks pretty promising

misty flint
#

oh R has some nice stats functions

misty flint
#

ive found that the most effective way for me to get through a textbook is the pomodoro technique

#

else i just find my brain doesnt want to keep reading lol

hollow sentinel
#

That works very well

misty flint
#

yeah

#

i used it before

misty flint
#

but now im doing it more religiously bc of the learning how to learn course

misty flint
paper lake
misty flint
#

anyway

#

you set a timer to work for a certain amount of time

#

completely, no distractions

#

then afterwards in your break

#

you can check your phone, etc.

hollow sentinel
#

I like to work on things for 2 hours

#

and then take a break for one hour

misty flint
#

i cant focus for that long

misty flint
#

so i do it the traditional way

hollow sentinel
#

I might try it that way too

misty flint
hollow sentinel
#

instead of just endlessly coding

paper lake
#

and now i am procrastinating :((

hollow sentinel
#

Eh I mean I do take breaks

#

Iโ€™m on this server a lot

misty flint
#

๐Ÿ•ฏ๏ธ

#

im on my break

#

ok im off

paper lake
#

same byeeeee

misty flint
#

check out this course when you have the time. super helpful

paper lake
#

@misty flint thanks!! now i can use my freebie lol

autumn veldt
#
X = data.iloc[:, 0:-1].values
y = data.iloc[:, 8]
.
.
.
#tree viz and tree text
graph= Source(tree.export_graphviz(clf, feature_names=X.columns,   class_names=True,
                                     filled=True))
display(SVG(graph.pipe(format='svg')))
print('\n')
tree_root = export_text(clf)
print(tree_root)

i got error when i try to visualize my tree. it says 'numpy.ndarray' object has no attribute 'Columns'. im using dataset with 8 features columns and 1 for target class.
what should i do for this kind of problem?

lapis sequoia
autumn veldt
#

im removing .values, but i got this instead

#

actually this problem can be solved if i use ```data = data.apply(le.fit_transform)`` the reason why i use iloc instead apply for encoding my label is because i want to encode only for my variable predictor not my variable target

sand sluice
#

is there any way to blur an image using opencv while avoiding specific points?

#

i want to blur this without the black bar

#

with a full blur, this happens

misty flint
#

interesting

mortal pendant
#

Hey! I have a small private Discord bot and I'd like to make an AI (I presume an LSTM RNN based on my research) for generating messages on request based on those from people in my server (to clarify, I'll only be collecting data from people who give consent, likely through a reaction to a message that fully explains it, as the last thing I want to do is run into any legal issues). The problem is, while I've been able to make a simple RNN in the past, the way I've previously done it can't be trained and generate data over time, which I would want for this. I'm struggling to understand LSTM, though, and I find I learn better in practise. So, I'm wondering if any of you's would know any good starting points for this? It's also worth noting I do have a MySQL database, which for live training I assume would be better for saving training data, so if there's anything that would be able to use this, that would be great! So, pretty much, the optimal tutorial or module if one exists that would handle most of this for me would be for training and using a live async-compatible model for text generation that is able to save and use the data to and from a SQL database for efficiency. If anyone knows any good starting points for this, please let me know as anything will help!

lean ledge
mortal pendant
#

I've seen GPT2 before, but didn't think it would work for something like this as I thought that required a starting point, which would be cool as a secondary option but I'm mainly looking for it to be purely based on the data it has

#

Never seen huggingface before so I'll have a look at that later, thanks ๐Ÿ‘

lean ledge
mortal pendant
#

I've just had a quick look at it since I've got to go in a few minutes, and it does look really good so tysm ๐Ÿ˜„

lean ledge
#

Nw

mortal pendant
# lean ledge You just need to condition the starting point to either the probability distribu...

Oh sorry that's not quite what I'm looking for, if I understand you correctly; would I have to use something like GPT2 if I use huggingface or can huggingface generate text purely based on the input data? I don't want any external sources, and iirc GPT compares the input data to data it has found elsewhere on the internet and adds to the input data based on what it has from the public internet

#

I do have to go now though so I'll have a look later. If you have any more information I should know, please let me know and I'll see it later ๐Ÿ‘

lean ledge
#

Training it on small datasets like a discord chat would leave it deficient in its ability to generate coherent text that it hasn't already seen before

#

GPT doesn't inherently "compare" anything, it learns relationships between words and the probability distributions of the words that show up

#

To understand language, it needs a lot of data on that language

turbid willow
grave frost
#

agree with raggy, you would have to fine-tune a model @mortal pendant . You could try with a simple Keras model with a transformer block (multi-head attention and a FCN) and judge the output for yourselves. You would immediately notice that the output is not always good (as in the model barely gets even the grammar correct, forget the output).

but all the above points would be invalid if you have a ton of data to train and large GPUs to throw at it

lavish tundra
#

someone know if is possible to use pandas to create one column with a list of strings and select one string by the position on the list?

golden turtle
#

hi, im having problem with opencv, cuz when i use webcam and make anything with frames the webcam is very laggy
is it becouse of pc comeponent? it isnt the worst one

velvet thorn
#

but with basic numpy you can mask and apply a blur kernel

arctic wedgeBOT
#

Hey @lapis sequoia!

It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

gray galleon
#

Hey guys, does anyone have a defined dataset/link for Fridge Item Recognition??

iron mango
#

Trained my first CNN, how do I figure out its accuracy value? People say " My model has 80% accuracy" how do I find that?

tidal bough
#

Well, there's val_accuracy (accuracy on the validation dataset) in your screenshot.

iron mango
#

it all has different values for each epoch. How do I find the final accuracy % ?

ripe forge
#

The last one is the final one

iron mango
#

So I should say that my model has 79% accuracy?

grave frost
#

if you train it for more epochs, it would get to 100% accuracy

iron mango
#

oh...

#

how do I fix that?

#

Feature Extraction part

CNNmodel.add(tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(200,200, 3)))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.BatchNormalization())
CNNmodel.add(tf.keras.layers.Conv2D(16, (3, 3), activation= 'relu'))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.Conv2D(32, (3, 3), activation= 'relu'))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.Conv2D(64, (3, 3), activation= 'relu'))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.Conv2D(64, (3, 3), activation= 'relu'))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))

Neural Network - For classification

CNNmodel.add(tf.keras.layers.Flatten())
CNNmodel.add(tf.keras.layers.Dense(512, activation='relu'))
CNNmodel.add(tf.keras.layers.Dropout(0.7))
CNNmodel.add(tf.keras.layers.Dense(128, activation='relu'))
CNNmodel.add(tf.keras.layers.Dropout(0.5))
CNNmodel.add(tf.keras.layers.Dense(64, activation='relu'))
CNNmodel.add(tf.keras.layers.Dropout(0.3))
CNNmodel.add(tf.keras.layers.Dense(4,activation='softmax'))

#

I am trying for atleast 85%> accuracy

grave frost
#

you had to put dropout on the conv layer too ๐Ÿ™‚ and reduce the maximum dropout frmo 0.7 to something more like 0.4 or 0.3 or you would restrict the network and create a bottleneck

deft dawn
#

How to build webgis with machine learning? Iam need source code to extracting unstructured data to DB database
#internals-and-peps

grave frost
grave frost
#

Forgiveness if I am wrong, but isn't AUC a metric to maximize? if that's the case, then you should let your model run some more

#

TBH, your AUC looks kinda rocky. what set is it evaluated on? and what is your train_acc and val_acc? @pure pond

mystic orchid
#

Hi guys, plesae help with some ideas for cv project

grave frost
mystic orchid
grave frost
#

vehicle detection, vehicle number plate detection, counting people etc.

mystic orchid
#

thanks)

serene scaffold
candid sable
#

anyone familiar with image labelling for face detection? I want to do a similar style of labelling for the shape of a bone and I'm having trouble finding resources/software

serene scaffold
#

I don't have personal experience with that, but I can take a look if you'd like

misty flint
#

did someone tag me?

serene scaffold
misty flint
#

think it was a ghost ping

safe tapir
#

maybe meta for this chat:
can anyone talk to how they use Python with Julia, places where the latter might be a good drop-in replacement, etc?

misty flint
#

if foxxy was here, they could probs answer that

#

before sorting

#

after sorting

hollow sentinel
#

interesting

misty flint
#

the lesson here is if your matplotlib plot looks funky

#

its probably not sorted

mortal pendant
# lean ledge Training it on small datasets like a discord chat would leave it deficient in it...

Well, Iโ€™ve done something similar but it wasnโ€™t live where I simply used the bot to put loads of messages into a JSON file an then read trained textgenrnn on the messages, even filtering them down loads too to avoid really short messages and messages containing links or embeds and what not, and still ended up with pretty good data, and this was ages ago when my Discord server was pretty much just created. So I would only imagine it would be better now

#

Though for that I was processing it on a Google Colab notebook, so it might have scored better due to performance, but I wasnโ€™t using the GPUs and the VPS Iโ€™m using for the bot isnโ€™t too bad performance wise, though it doesnโ€™t have GPUs

lapis sequoia
#

Hey guys anyone has a good formula for number of bins that I should use in a histogram?
Currently doing this:
int(math.sqrt(len(df)))

iron mango
#

Feature Extraction part

CNNmodel.add(tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(200,200, 3)))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.BatchNormalization())
CNNmodel.add(tf.keras.layers.Dropout(0.4))
CNNmodel.add(tf.keras.layers.Conv2D(16, (3, 3), activation= 'relu'))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.Dropout(0.4))
CNNmodel.add(tf.keras.layers.Conv2D(32, (3, 3), activation= 'relu'))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.Dropout(0.4))
CNNmodel.add(tf.keras.layers.Conv2D(64, (3, 3), activation= 'relu'))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.Dropout(0.3))
CNNmodel.add(tf.keras.layers.Conv2D(64, (3, 3), activation= 'relu'))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.Dropout(0.3))

Neural Network - For classification

CNNmodel.add(tf.keras.layers.Flatten())
CNNmodel.add(tf.keras.layers.Dense(128, activation='relu'))
CNNmodel.add(tf.keras.layers.Dropout(0.4))
CNNmodel.add(tf.keras.layers.Dense(64, activation='relu'))
CNNmodel.add(tf.keras.layers.Dropout(0.3))
CNNmodel.add(tf.keras.layers.Dense(4,activation='softmax'))

#

I am getting weird spikey loss graphs for this.... any suggestions please??

mortal pendant
#

It's taking a while but it's already at 50000 which sounds like a fairly large dataset to me ๐Ÿ˜… To clarify, though, that is before filtering so I would likely actulaly use a lot less of this

hoary wigeon
#

hello

#

I fetched tabular data (COVID-19 WORLDOMETER) from html file.
I have created a dataframe using the data
I want to change the name index,How can i do that

dark lake
#

Hi. Is there an inference model in e.g. scikit-learn I can use to classify a single variable into groups?

grave frost
exotic maple
#

you mean setting a column as an index?

hoary wigeon
exotic maple
#

you can use the set_index() method

hoary wigeon
#

i meant rename columns

#

before it was having name containing commas and slash

exotic maple
#

if you want to rename ALL columns I suggest you just pass a list of names directly tot he attribute

#

df.columns = [COLUMNS here]

hoary wigeon
#

yeah i passed dict

exotic maple
#

if you want to rename a few columns

#

you cna do it via rename method

hoary wigeon
#
    d = pd.read_html(html_filename)
    df = pd.DataFrame(d[0])
    i=0
    col_r_name = {}
    for col_name in df.columns:
        col_r_name[col_name] = 'data_'+str(i)
        i+=1
    df_new = df.rename(columns=col_r_name)
    print(df_new)```
exotic maple
#

df.rename(columns={"column old name":"column new name"})

hoary wigeon
#

yeah

hoary wigeon
exotic maple
#

why a for loop though= remember that pandas is vectorized, you can broadcast all the new names at once

hoary wigeon
#

?

#

i dint understood what u just said

#

i used for loops for storing columns old annd new names

exotic maple
#

for col_name in df.columns:
col_r_name[col_name] = 'data_'+str(i)
i+=1
df_new = df.rename(columns=col_r_name)

hoary wigeon
#

oh

#

yeah

exotic maple
#

i think thats ok to geenrate but the last line is inefficient

#

you can just broadcast it al the end

hoary wigeon
#

oh lmme try

#

wait

#

i did the same

exotic maple
#

You could do this perhaps (I havent tried dictionary comprehension in ages)

newcols = {column:column+"_" for column in list(df.columns)}

df.rename(columns=newcols)
#

fk lol

hoary wigeon
#

haha

#

hello @exotic maple

#

can we convert all data to lowercase ?

#

at once

exotic maple
#

just throw .lower()

hoary wigeon
#

what about numeric ?

#

no errors ?

exotic maple
#

...that doesnt make any sense lol

#

ehm

hoary wigeon
#

ohk

exotic maple
#

let me think

#

you can probably do it via apply

#

but you'd need to a define a new function to just get all the "object" columns

#

apply .lower()

#

and then update the values

hoary wigeon
#

i will use colum name = colum.lower()

exotic maple
#

you can also be a lazy fuck and do it like this too, by column lmao

#

df[column] = df[column].apply(lambda x: x.lower())

hoary wigeon
#

yeah

exotic maple
#

ideally you'd define a real function for ti though

hoary wigeon
sand sluice
untold viper
#

how do i perform canny edge detection on processed_img
the commented code for edge detection giving me error pip-req-build-wvn_it83\opencv\modules\imgproc\src\canny.cpp:829: error: (-215:Assertion failed) _src.depth() == CV_8U in function 'cv::Canny'

iron mango
#

InvalidArgumentError: Can not squeeze dim[1], expected a dimension of 1, got 4
[[node Squeeze (defined at <ipython-input-65-bdcef4f4c42e>:1) ]] [Op:__inference_train_function_75365]

Function call stack:
train_function

#
  • How do I sort this out?? *
serene scaffold
#

!paste

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

distant hedge
#

Guys, I am geeking out on Regular Expressions import re . If you are new to Python like me, I think you will enjoy it sooo much in any code you write. It will become a default import for me for anything I work on. ๐Ÿ˜„

serene scaffold
#

what are you working on where you're finding them helpful?

iron mango
#

only this step is causing d error, the rest of the CNN training part went smoothly

#

this was the second last step of data augmentation

serene scaffold
# iron mango

if you can copy and paste the text into the paste bin, I might be able to help.

arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

serene scaffold
#

There's something about six frames though? I don't know what that means.

iron mango
#
history2 = CNNmodel.fit(train_dataset, validation_data=test_dataset, epochs = 15)
Epoch 1/15
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-96-9df2d3960108> in <module>()
----> 1 history2 = CNNmodel.fit(train_dataset, validation_data=test_dataset, epochs = 15)

6 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

InvalidArgumentError:  Can not squeeze dim[1], expected a dimension of 1, got 4
     [[node Squeeze (defined at <ipython-input-65-bdcef4f4c42e>:1) ]] [Op:__inference_train_function_75365]

Function call stack:
train_function
sonic raft
#

Hi guys! Is there any important difference between those two Pytorch tensor mean functions?

tensor.mean()

vs

tensor.mean((-1,-2))

?

tidal bough
#

-1 and -2 means the last 2 dimensions.

grave frost
tidal bough
#

If your tensor has more than 2 dimensions, these are very different - as in, the former will always produce a scalar, whereas the latter will produce a tensor with the same dimensions as the input except the last 2 dimensions(which are averaged over).

sonic raft
tidal bough
#

Like, if your tensor is 5,10,20, the former would average over all 1000 cells and return a scalar, whereas the latter would return a 5, 1d tensor - each cell in it the mean of 200 cells of the original tensor

#

like, means_tensor[i] would be equal to torch.mean(tensor[i,:,:])

tidal bough
#

WDYM?

#

3d tensor got averaged over 2 last dimensions and became 1d.

sonic raft
tidal bough
#

yeah, precisely. You'd go from, say, 50,1920,1080 (50 one-channel (grayscale, presumably) images of 1920x1080 pixels) to 50, - each element being an average of all pixels in that image

lapis sequoia
#

hey guys, anyone has a clue how to set up pycharm pro, jupyter notebook, in a way that it is cell based run? so far there is an editor on the left side and preview on the right side

tidal bough
#

then you can run stuff

lapis sequoia
#

it isn't about running thing

#

it is about the editor's style

#

hate this so much, so confusing

#

@tidal bough

tidal bough
#

you can toggle what's shown, IIRC

#

for example, only show the output in the previews

lapis sequoia
#

the thing i want it to look and feel like jupyter, the only reason i don't use jupyter is because of the awful completion recommendation and no helper preview

#

whereas vs code still has that, not that great either, so far nothing beats pycharm, but it is so ascetically confusing

tidal bough
#

jupyter does have some autocompletion and doc viewing, but yeah, not the best

lapis sequoia
#

jedi is trash my man hahah

#

so i'm ready to drop some cash if they actually work

#

I have 0 clue how did he make it like this

tidal bough
#

Like what, specifically?

#

Like, the colors?

lapis sequoia
#

nah cell based editing

#

if you look at mine i have editor on the left side and preview on the right

tidal bough
#

I see

lapis sequoia
#

i just want it all to be the same thing, but no clue

tidal bough
#

lemme open up Pycharm, if I have enough memory for that lol

lapis sequoia
#

you have the same as that one or you just don't use pycharm for that?

tidal bough
#

nah, I'm going to see if I can figure out how to make it this way

#

it's probably quite a good idea - I was using the two-screens way

lapis sequoia
#

like vs is cool, but the amount of bugs and speed is not at par to pycharm

tidal bough
#

for me VSCode just doesn't have some of PyCharm's features

#

notably: showing the contents of numpy arrays(Scientific view or whatever) and the profiler which shows the results as a graph

lapis sequoia
#

same i really tried to make vsc work, but man it doesn't cut it

tidal bough
#

on the other hand, PyCharm just loaded for me.

#

more than 5 minutes of loading time. This is why I don't use it often ๐Ÿ™‚

lapis sequoia
#

i guess will require another year to index everything? ๐Ÿคฃ

#

i just start it in the morning

#

get my coffee

#

go and take a shower

#

then wait for another 10 minutes for it to load

tidal bough
#

maybe it's Jupyter Lab or something, lol

lapis sequoia
#

yeah seems to be no answe r

#

such a shame was looking forward to it

distant hedge
#

Do you mean this?

#

That's Jupyter

lapis sequoia
#

sometimes it works

#

sometimes it doesn't

#

i read that jedi and some other completion helpers in jupyter struggle with indexing

#

@distant hedge

distant hedge
lapis sequoia
#

your works, mine struggles

distant hedge
#

@lapis sequoia did you make sure to import it first?

#

Also, are you pressing tab?

lapis sequoia
#

yeah did all that, it is just inconsistent, which i hate, even vs code sometimes struggles

#

like 0 completion recommendations

distant hedge
#

What are you using?

lapis sequoia
#

vscode

#

are you on a notebook or labs?

distant hedge
#

labs

lapis sequoia
#

what's your completion helper?

#

jedi?

distant hedge
#

None

#

I didn't install any, but I think by default it's jedi

lapis sequoia
#

yeah default is jedi

#

wellp be thanksful yours works now hahah

distant hedge
#

Ha ha ha I didn't had any issues so far. ๐Ÿ™‚ Are you using ctrl+space on vscode?

#

Or maybe you are using Kite

lapis sequoia
#

yeah it is fine, it works most of the times so i'm happy, usually what i do

#

NO

#

NOW AYYYYYY kite

distant hedge
#

I have decided against Kite, I hate the damn thing. So many popups it's unreasonable.

lapis sequoia
#

that thing is so over priced

#

10 euros a month for a god damn completion assistant? ffffffff that

#

i mean if it was 20 euros a year sure

#

but 120 euros a year for a

#

wait no it is 140 euros

#

ffff that even harder

grave frost
#

But honestly, once you get used to the lack of autocompletion in Jupyter, it isn't that bad as you think. I just use it to complete long variable names. apart from that, not much is require tbh

#

you could use TabNine tho

#

I would say that - I was involved in a part of the early development of kite (just an external person who had a solution to x problem) and they basically ripped off my solution that I provided on the expectation that I would be compensated adequately. All they gave was a fuckin mug and shirt whose customs+shipping I was supposed to pay. Their whole company is built on mistrust and un-sporting practices. not even a certificate to put on my CV

twilit pilot
#

Does anyone have any NLP related project ideas? Idc what level

lavish tundra
serene scaffold
lavish tundra
#

ye, but i want create a new col

#

like column strings, on the index 0 had a list, on the index 1 another list...

serene scaffold
lavish tundra
#

i was trying something like this

#

but looks like i should sum line by line?

#

the most close of a list i could be was to put , between the strings

serene scaffold
#

@lavish tundra can you post an example of the desired output as text, and a sample of the csv as text?

candid sable
serene scaffold
candid sable
#

I'm an extreme beginner in this field, only used Keras/TF, so no

serene scaffold
#

It's like Wanda vision, in the sense that it's a thing. They don't actually have anything in common

lapis sequoia
grave frost
serene scaffold
#

Anyway, pytorch is another library for deep learning, and torch vision supports computer vision. I think facial recognition is a common use case for learning how to use it

velvet thorn
#

the point of applying a blurring kernel manually is that you can customise it

serene scaffold
#

For the record, gm knows more than me about anything you might ask in this channel except maybe specific areas of nlp

shy kraken
#

Hi I have a pandas dataframe with 5 or so columns and 20 rows. I want to create a new column where I apply a custom function that takes the most recent 5 rows of data from two columns. It's like a rolling function. I've tried something like this which doesn't work ```py
data['Rolling_Beta'] = data.rolling(5).apply(beta(data.B,data.C))

How might I achieve what I'm trying to achieve?
lapis sequoia
tidal bough
grave frost
candid sable
lapis sequoia
misty flint
shy kraken
# tidal bough You probably want to apply your `beta`, not the result of applying `beta` to som...

thanks I've tried this:

data['Rolling_Beta'] = data.rolling(5).apply(beta)

and I get this error TypeError: beta() missing 1 required positional argument: 'B'

So I thought of putting the two particular columns I wanted together and apply it on that a la:


data_a = data.A,data.B

data['Rolling_Beta'] = data_a.rolling(5).apply(beta)

This doesn't work either, I get a AttributeError: 'tuple' object has no attribute 'rolling'

tidal bough
#

check what rolling gives exactly

#

it it gives tuples, your beta must be single-argument

#

!docs pandas.DataFrame.rolling

arctic wedgeBOT
#
DataFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)```
Provide rolling window calculations.

Parameters  **window**int, offset, or BaseIndexer subclassSize of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.

If its an offset then this will be the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes.

If a BaseIndexer subclass is passed, calculates the window boundaries based on the defined `get_window_bounds` method. Additional rolling keyword arguments, namely min\_periods, center, and closed will be passed to get\_window\_bounds.

**min\_periods**int, default NoneMinimum number of observations in window required to have a value (otherwise result is NA). For a window that is specified by an offset, min\_periods will default to 1. Otherwise, min\_periods will default to the size of the window.... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html#pandas.DataFrame.rolling)
serene scaffold
grave frost
#

NLP = Huggingface! ๐Ÿค—

serene scaffold
#

The state of the art depends a lot on different approaches for representing words as vectors

lapis sequoia
#

totally off-topic

candid sable
#

What's some manual object labelling software? I seem to only find corporate solutions.. and Lionsbridge lol

serene scaffold
shy kraken
#

hmmm

grave frost
lapis sequoia
candid sable
grave frost
candid sable
#

or csv, or something. something to say that this particular edge is relevant - can't crop the images to only cointain that specific edge as there's plenty of surrounding noise

grave frost
#

path_to_image, Female

lavish tundra
serene scaffold
#

Do you need any other types of columns?

candid sable
grave frost
lavish tundra
candid sable
lavish tundra
#

i have all the data i know how to merge columns, but idk how to add a list in one cell

grave frost
candid sable
grave frost
#

feature extraction is not something manually done in images.

candid sable
#

just Keras and label folders for now, k-fold validation and retrained resnet. similar results to k-fold with own model

grave frost
#

pre-trained resnet was a bad idea. but I still don't see how a image labelling software can help increase data

candid sable
#

I didn't say it helps increase data lol

candid sable
grave frost
#

cool, but how does the image labelling software fit into all this?

#

you already have the labelled data

candid sable
#

I want to input the points of the shape I want it to look at

grave frost
#

could you elaborate what "points of the shape" are?

iron basalt
#

You want to label pixels of an image and store those pixel coordinates in a separate file?

grave frost
#

IMO it would be some sophisticated image preprocessing

iron basalt
candid sable
#

can I link papers?

grave frost
#

so you want a different label for each box?

candid sable
#

pretty much

grave frost
#

The paper is about predicting the bone age ๐Ÿคท that's regression

iron basalt
#

Idk, but if all you want is to take such and image and have a label for each box, you should be able to code that in python in 30-60 minutes.

#

(manually drawn rectangles)

grave frost
#

for each image?

#

"Making music to psychologically manipulate people into doing what the lyrics say"

candid sable
grave frost
#

then squiggle is right - manually drawn rectangles (large enough to cover most images)

candid sable
grave frost
#

it would be pretty boring, but you would have to do that

iron basalt
#

It's just a complete beginner's drawing tool, nothing more is needed.

lapis sequoia
#

Hello! I'm trying to classify a customer complaint database by topic, but I don't have much experience with ML, and since I don't have an already labeled dataset, I'm not sure how to proceed with unsupervised learning methods. I'm considering classifying the complaints by keywords (e.g., counting the frequency of said words and selecting some relevant keywords manually), but I don't really know how exactly to do it - I've managed to get the frequency and I already have an idea on which keywords to select, but I don't know where to proceed from that. Can anyone help?

grave frost
candid sable
#

I don't want the boxes showing up in the image, or drawing them and training with boxes in image. I want those boxes to be the objects it would look at when trying to classify whether it's M or F. Would that be possible?

grave frost
#

just crop it

iron basalt
candid sable
grave frost
#

crop + preprocess

candid sable
#

I already preprocessed enough, I have an edge image

#

but I can't uncurve the irrelevant edges, can I?

candid sable
iron basalt
#

What would you use to draw the rectangles?

candid sable
#

to generate a label file

iron basalt
#

You just do it.

#

With python.

candid sable
#

wiiiiith? any library?

iron basalt
#

You mean libraries?

#

pygame, pyglet, pyqt, pyside, pywxwidgets, or any other UI framework that lets you draw stuff.

candid sable
#

sorry, but as I said previously, I'm very new to this and my only experience re: labels is just using folder names as labels

iron basalt
#

You're new to python?

candid sable
#

new to ML stuff, only used Python for basic data operations

iron basalt
#

This has not really much to do with ML and everything to do with being able to make an app.

lapis sequoia
iron basalt
#

Then you need to learn more python for basic stuff like File IO, making a GUI, etc.

candid sable
#

nope, all I want is a model really

iron basalt
#

But you also want a tool to label the data...

#

To get that model.

candid sable
#

yeah, there's no windows .exe that can do that?

iron basalt
#

According to you, no.

candid sable
#

well I only found corporate things, that's why I asked in the first place

iron basalt
#

Idk if anyone would know of such a specific application.

#

All I know is that it can be made in python in like an hour.

candid sable
iron basalt
#

You don't need XML, over-complicated file format for something so simple.

grave frost
iron basalt
#

XML is soooo far from simple.

grave frost
#

what, it just has custom tags. that's it

iron basalt
#

It's an entire tree structure.

#

Requires a bunch of parsing rules.

#

For storing boxes it's as simple as:

grave frost
#

well, ye gotta put the effort - but Im pretty sure there would be a lib for xml

iron basalt
#
87,124,54,24
55,200,20,20
...
grave frost
#

Image_ID/path?

iron basalt
#

Image path first line

grave frost
#

well, IMO the OP's approach is too complicated

#

(xml adds to that)

candid sable
#

I wasn't dead set on XML lol

#

I just gave it as an example as it's what Imagenet used and I don't know others

iron basalt
#

xml is for when your thing is very tree-like (and potentially any number of children per node).

grave frost
#

but what is your end-goal (leaving aside the boxes for now)?

grave frost
candid sable
#

for the love of me I can't explain it in english but I have an analogy

iron basalt
#

It's not even just for datasets.

lapis sequoia
iron basalt
#

Think like a robot. One common file format for simulations is URDF.

candid sable
iron basalt
#

It's xml type of file because a robot's parts connect like a tree. Like the main body might be a node and it has 4 children nodes which are wheels.

grave frost
#

ok, thats a good one

#

@candid sable try

#

@lapis sequoia how's GME?

#

265$ ---> OH frick

candid sable
#

you know how men have the Adamโ€™s apple - Iโ€™d like to detect a similar bump on a bone

lapis sequoia
candid sable
#

but looking back, only regression would be a potential solution so yeah

grave frost
lapis sequoia
#

i deal more with credit stuff

grave frost
grave frost
candid sable
lapis sequoia
grave frost
lapis sequoia
grave frost
candid sable
#

yes the size of it

grave frost
lapis sequoia
#

at least in the country where i live

grave frost
# candid sable yes the size of it

well well well. You can simplify the model - 2 models, one for returning bounding box coords and another prog that crops it then feeds to the 2nd one which performs image regression

grave frost
lapis sequoia
#

yes

grave frost
#

well, that's intriguing. how does it work exactly?

candid sable
candid sable
#

Iโ€™m a disaster at math

lapis sequoia
#

you can have a max interest rate of aprox. the double of the basic national interest rate

#

having that, you can work in several ways

lapis sequoia
#

you can have an investor fully financing a single loan

#

or

#

you can have several investors financing loans in quotas

candid sable
#

I saw a similar concept on an Ethereum loan platform

lapis sequoia
grave frost
#

what advantage does that give over the traditional banking system?

lapis sequoia
#

also, in the economic setting of my country specifically, it's way more advantageous to invest in private loans instead of federal securities, for instance

#

when the basic national interest rate gets at it's lowest, the investment yield can get to negative levels

#

and for credit investors, it can be really helpful to diversify through private loans

#

for a borrower, it can be way easier to actually get credit when you don't have to go through the whoooole banking process

#

credit scoring was the whole reason i got interested in data science - most companies in that segment use econometric modeling to determine credit risk

grave frost
#

whoa, that's gonna go over my head ๐Ÿคฏ

lapis sequoia
#

if you live in the US, i'm pretty sure there are companies that work in that business model

lapis sequoia
#

it's a really good alternative if you're a seasoned investor and want to diversify your wallet

grave frost
#

what is credit risk? the risk that the loanee won't pay the interest?

lapis sequoia
#

the risk that the loanee won't pay at all

#

there's several ways to calculate that

#

i'm not sure how because i'm not entirely versed in the econometrics behind it lol

#

but it takes some variables like age, credit bureaus score, income commitment, etc

grave frost
#

that's interesting. any way we can have a look at that data?

lapis sequoia
#

i'm actually using excel files lol

#

do you mind if i just sample one column? i can't really share the whole dataset because it may contain sensitive info

#

the column i'm looking to classify, of course

arctic wedgeBOT
#

Hey @lapis sequoia!

It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

lapis sequoia
#

oh well

grave frost
#

you can export it as a csv

arctic wedgeBOT
#

Hey @lapis sequoia!

It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

lapis sequoia
#

jesus

grave frost
#

well, I don't think its that hard to convert a csv to gif man ๐Ÿ™‚

lapis sequoia
#

you can actually do that? lmao

grave frost
#

just jokin. can you paste a coupla rows here?

lapis sequoia
#

ofc

#

well, that should basically do it

grave frost
#

ahh, not english ๐Ÿ˜ฆ

lapis sequoia
#

well, basically i have some complaint comments

#

i know the keywords i want

#

and i want to classify the text based off on that

#

i've read a bit on autokeras but i'm not sure on how to proceed with that

#

i've considered using regex to determine the classes

#

in such a way where if the row contains word x, then class = x, for instance

grave frost
#

ok, so that's the first step ^^ to build a dataset. one column labels, one column text.

#

next, we would take all the data in each cell and make an array out of it. so text_array would contain [test1, text2, text3..... and same with the labels.

#

lastly, we would just pass both arrays to autokeras and you are done in about 30 lines of code

#

though it will take time for autokeras to find the best model

lapis sequoia
grave frost
lapis sequoia
#

that can actually work very well with some good regex

grave frost
#

you can use regex, but I advise against it. a lot of things are nuanced

#

either way, it depends on the dataset

lapis sequoia
#

i've used str.contains too

#

it just feels a little imprecise tho

#

after i normalized the text it doesn't feel like such a huge problem

#

what would you recomend?

grave frost
#

I didn't mean the method to identify label, I said that finding labels programmatically is a bad idea. because if you can do it with programming, why are you making a model?

lapis sequoia
#

it's kinda pointless to make a model if i can actually label it myself with programming and keyword selecting

grave frost
#

ye, you got it. your task seems pretty simple so a large enough if [word1, word2, word3] in sentence should do that anyway

lapis sequoia
#

and if i want the full context of that specific label, i can just get the bigrams/trigrams for that label

lapis sequoia
#

it does seem like a good method

#

thank you very much

grave frost
lapis sequoia
grave frost
lapis sequoia
#

i'm both customer support and intern data analyst

#

lol

grave frost
#

internships. what can you do

lapis sequoia
#

ikr?

grave frost
#

its basically exploitation

lapis sequoia
#

not that i really mind it tbh

#

i had no idea of what to do it my major until my intership

grave frost
lapis sequoia
#

i'm hoping to get a firmer grasp on statistics and programming so i can fully transition to data

uncut orbit
#

data science is awesome

#

trying learning some calc

lapis sequoia
#

it's amazing

uncut orbit
#

ikr

lapis sequoia
uncut orbit
#

ah

lapis sequoia
#

but i pretend to move to statistics

uncut orbit
#

lmao

#

are you on kaggle?

#

its a great place to get started

lapis sequoia
#

basically learned python through it

uncut orbit
#

then thats good

lapis sequoia
#

i hope so

uncut orbit
#

i dont work yet so i can only imagine

lapis sequoia
#

it's crazy

#

when you're in a department where you're the only one you can code

#

you're literally god

uncut orbit
#

yea

#

thats what my data science teacher was telling me

lapis sequoia
#

that's not exactly good though lol

uncut orbit
#

the resources right

lapis sequoia
#

it's really pressuring, as an intern, to have so much expectation in the analysis you execute

uncut orbit
#

oh

#

well dont worry if you're the only one

#

they'll be more

lapis sequoia
#

i mean, there's a whole team focused on that

#

but they're busy with other stuff

uncut orbit
#

thats crazy

lapis sequoia
#

it is lmao

#

it does make you to be always on the edge to learn more and improvise

uncut orbit
#

yea

#

thats what pressure does

#

but i personally cant wait to start professional data science

lapis sequoia
#

it's a really good career path

uncut orbit
#

it is

#

and for me especially its fun and comforting

lapis sequoia
#

that's good

#

having a specific path is even more pleasuring

uncut orbit
#

yea

timid depot
#

Is anyone here good at deep learning, neural networks?
Plz mind DMing me
Plz plz plz

uncut orbit
#

i've barely worked with them and i'm confused with weights and biases

timid depot
#

๐Ÿ˜ญ

uncut orbit
#

i also wonder how nueral nets can work with robots

timid depot
#

they can

uncut orbit
#

but how do you implement them in robots

#

is there like some chip?

timid depot
shy kraken
#

so I have a dataframe data and I'm getting the standard deviation of a particular column Beta. So I'm using data.Beta.std(). I'm multiplying that by 2 and adding it to the mean and that number is higher than all the numbers in my dataset....is that possible?

timid depot
uncut orbit
#

yea i get that

#

but how does the whole thing work

#

how do you train it

timid depot
#

You cant directly code in robot it doesnt have keys
You code in computer and then transfer the program to your robot

uncut orbit
#

and is python a language that is used for it?

timid depot
#

Yea it is
Python has libraries like tensorflow, pytorch which can be used for neural nets, deep learning algorithms

uncut orbit
#

i mean like for the robots

#

i've worked a little with tensorflow

timid depot
timid depot
uncut orbit
#

ok

#

i think i have some code from before

timid depot
#

Ok

#

Can I dm you

uncut orbit
#

sure

timid depot
#

Thanks

misty flint
distant needle
#

Is it appropriate to ask plotting questions here?

rugged spire
#

plotting as in matplotlib?

distant needle
#

@rugged spire yes, sorry, my keyboard decided to have a seizure right when you answered

#

I can't for the life of me figure out how to destroy a figure / canvas completely. I am placing the canvas in a tkinter Frame, but I still can't figure out how to delete the canvas object

#

If you think the error is more of a problem with the UI, I can shift this to UI instead.

rugged spire
#

um

misty flint
#

tkinter is yikes

rugged spire
#

sorry but i have never used tkinter

misty flint
#

but yeah UI is for tkinter questions

rugged spire
#

so i am not really sure