#data-science-and-ml
1 messages ยท Page 293 of 1
if I remember right, when I rename columns, I use a dict like
df.rename(columns = {'old_name' : 'new_name'}, inplace = True)
So if you want to use a list perhaps you can create a dict by combining df.columns and newColumns?
hey! I thought your name looked familiar ๐
trying to work on a few things right, maybe someone can help with that in the meantime, if not see if you can do some searching. If you're still stuck I'll try to check back in
ye i am. Thanks
myDict = {"dog":5, "cat":6}
that is a dictionary
dictionaries have key value pairs
myDict["dog"] = 5
this is how you access the value of a given key
good resource for learning dictionaries
take your time and understand these bc they're important
understanding the basic data structures is imo more important than Pandas
These are very commonly used data structures
maybe learning linked lists/other data structures would be good too
before you just randomly throw yourself into Pandas
he is right, you should build a desktop with your own GPU. why? firstly, the cloud is expensive to run (like colab). You won't believe how easy it is to put your credit card on GCP and forget to terminate the instance (or maybe due to a bad connection it didn't terminate) leading to lots of loss of money.
Next up is competitions. I had the same naive mindset but soon I realized that hypertuing takes a looong time (despite using some pretty advanced stuff and Dask). You are so much better off if you build your own desktop and have it run while you sleep/watch YT. so If your end goal is competitions (kaggle, etc.) then probably building a desktop is best.
IMO the only reason you should opt cloud is when you just can't afford to buy a desktop. AMD Ryzen 7 + RTX titan seems a pretty solid choice (albeit expensive, you can use it for gaming or start with 3080ti and move on).
stuff like Colab only offer 16Gb GPU's which runs out pretty quick. sinceVRAM is a very limiting factor when doing DL, going for a multi PCie mobo is good cuz later you can put 2 3080's or something cheaper than that
usually, my recommendation is cloud to everybody to get started, but since you are already making a PC, you just need an Nvidia GPU and you are good to go for both gaming and DL
Ok thank you guys
Pandas is more about understanding the numpic approach to iterative operations and database style operations. Knowing linked lists won't help you with pandas per se.
i know you got your problem fixed but another way to tell jupyter to find your file is by running this in a separate cell
%cd [copy and paste your filepath here]
without the brackets lol
I think the value of knowing it would be more indirect. Simply having more experience with python programming in general. I see a lot of people getting in DS and pandas without even knowing basic programming and the tools provided by python.

I mean I did some basic python programs before, bt if yo tackle a real problem you lean more. Evem if its trial and error
ye
I agree that implementing a linked list (especially if done with dunder methods) can teach one a lot of general python.
like if statements, loops, lists etc
Yeah I recommend practice problems that are concrete and may force you too learn a new data structure. But going straight to pandas is like the final boss. Pandas itself is implemented with the knowledge of a bunch of data structures and python concepts (dunderscore methods, boolean masks, etc).
dunder methods are the underscore methods right?
like _iter
why is that italicized idk
(And even makes use of python's dynamic nature by allowing stuff like both strings and integers for keys).
dunder = double under, like __init__ is one of them.
They let you do things like implement [] for your custom class.
Ye idk, I am doing a big data certificate and they dove straight in. Keep in mind that I did some programming b4, even if it was really basic. The people who literally never saw code b4 just copy paste from professors notes lol
Gl to them
but __version__ doesnt do that ๐ฆ
>>> class Foo:
... def __init__(self):
... self.x = 10
... def __getitem__(self, key):
... print(key, self.x)
...
>>> foo = Foo()
>>> foo["bar"]
bar 10
>>>
is there a way to run R gstat functions in python?
i'm trying to make an H-Scatterplot but it appears to only be implemented in R
the getitem dunder is the one used to implement brackets on python?
yes and setitem
From that example it should be obvious now why Pandas can do many different things depending on the type of the key.
(btw numpy also does this)
those are fighting words
I disagree - both were made in mind with some specific core philosophies that served different needs. Oftentimes you would use both (like pandas for manipulating the DF and extracting some data in form of numpy arrays). Plus a lot of libs have direct support for numpy arrays, so can I say that numpy is superior??
isn't pandas built on numpy?
pandas is an open-source library built on top of numpy
just making sure im not going insane ๐
they both have their uses and what not, if you've ever tried dealing with strings with numpy, you'll quickly understand the benefit of pandas
but i'd argue pandas is a lot more specific in its use cases
I'd actually argue that pandas is more general
ahh, I don't do much with pandas. usually, I extract it as a NumPy arrays and do operations on that. then later, back to pandas, some more, and then export
haha i've actually been doing the same for my data course work
how so
ye, I find arrays simpler and easier
Pandas is for tabular data in general. You could even use it just to read a csv and write it back to file with a different delimiter. And it has math operations, but it has string operations and database operations, too
Numpy is for math.
Got it thannk you man
doesn't most of that utility come from numpy tho?
its just a joke 
Dataframes use arrays for their data model, but that doesn't mean it's just a wrapper around numpy functionality
haha, you could have started WW3
Are there CNNs where it takes a bunch of images and outputs the probability of it being a dog or another animal and then after that we take all the images that were predicted as a dog and feed it into another CNN to determine the probabilties of the breed of the dog?, if so can you point me to a link where someone has done it?
https://github.com/pandas-dev/pandas/blob/master/pandas/core/computation/ops.py#L619 sepcifically the self.func = getattr(np, name)
Is that not just mapping their functions to numpy ones?
I'm not saying that it's just a wrapper but from what I can see (which mind you isn't much I don't have any spare brain cells rn), a lot of the functionality comes from numpy
all of them do that ๐คท that's how classification works
model outputs the probablity and we usually chose the one with the maximum score
im talking about a CNN to a another CNN
so predict the animal, then of those animals that were predicted as a dog feed it into another CNN to predict their breed
you can chain models together where ones output is the input of the next
it would be the same structure as the first one just with different data
but you've still got 2 separate models
How is it possible to learn ai?
yh I know the intermediate step is confusing me
the wonder of evolution and the human mind
do i use the argmax to filter out the original dataset to get those images that were predicted as a dog?
This is too broad of a question. AI refers to a lot of things.
???
what dataset do you have?
i'm talking hypothetically, i know there is an animals/ dogs and cats data set
you need another one for breeds
I don't see how it is complicated: first model tells you its a cat or dog, second one tells you the breed. (you could compress it to one model too like dog.labrador)
do i feed the images that were dogs (which was predicted by the first CNN) by filtering out using the indices?
indices of what?
indices of the images
model outputs a probability distribution, not a dataset
so if the 6 th image was predicted as a dog due to the probability being the greatest, would i need to get the 6th image in the original dataset and feed that into the 2nd CNN?
yeah, ofc
But yes, argmax on that and take a subset of images that belong to the same dog class for step 2. This would be something you do during inference, at training you know which ones are dogs and so on
So during training you take the correct subset based on ground truth.
thanks, i needed this confirmation
How do I take a dataframe like this
and add a new column with the mean value of the columns count and avg_plays for each artist?
so for the artist Marcioz, it would have a new column with the mean of all count values in a row in which the artist is Marcioz, and a column for the avg_plays equivalent
Windows or no?
you mean agregatin results by artist?
you can use pivot_table
or df.groupby("artist").agg({"count":sum, "avg_plays": np.average})
it returns allNaN
windows
yea
im changing it to df.groupby("artist").agg({"count":np.average}) because that's all i need right now
that would work right?
That method returns a new dataframe
It doesn't change the name of the existing one
it changes other stuff lol
Oh you did in place. Hmm
i got it to work
df.groupby("artist")['count'].transform(np.average)
Ah I guessed correctly
I have seen this happen also when people try to select rows by name and the name has a space (like space in front of artist name).
Some spaces can sneak in from excel and other places.
I prefer df.rename(columns={...}), less confusing than axis=1 or whatever.
Hi guys
one last question
Can I get the mean of a float or do I have to change it to a float?
I want to start using Python for scripting in text-processing
Is it possible to change a file contents without opening it with help of python?
Yeah i prefer this as wel
of course you can
try using np.mean or np.average
also
to aggregate
use .agg()
Hey guys, quick question. How can I store df in x and then continue editing x?
x = x.loc[x['ID' != 0] = 'Hello'
you want...a dataframe as a column of another dataframe?
I am working with excel that needs a lot of manipulations. For example we have John, Mike, and Angela. I need to edit things in their dataframes before saving for each person.
Not really. I just want to store Dataframe in a variable so that it doesn't affect the df parameter.
Someone to help :p
Just a second ๐
import pandas as pd
import re
df = pd.read_csv('old.csv')
#Step1 - Finding all value in column Name beginning with Ale
x = df.loc[df['Full Name'].str.contains('^Ale[a-z]*', regex=True)]
#Step 2 - Replacing all the rows in column name by Alex
x = x.loc[x['Full Name'] != 0] = 'Alex'
#Step 3 - Getting back to original dataframe + Step 1
y = df.loc[df['Full Name'].str.contains('^Je[a-z]*', regex=True)]
#Step 4 - Step 2 but replacing by Jessica
y = y.loc[x['Full Name'] != 0] = 'Jessica'
So, this doesn't work... and spits out an error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-18-fc63323fe12c> in <module>
3
4 #Step 2 - Replacing all the rows in column name by Alex
----> 5 x = x.loc[x['Full Name'] != 0] = ['Alex']
6
7 # #Step 3 - Getting back to original dataframe + Step 1
AttributeError: 'list' object has no attribute 'loc'
can you provide a sample of the CSV so that I can try to reproduce the error?
Sure
Something just came up so it may be a bit
It's strange. Seems x = df.loc[df['Full Name'].str.contains('^Ale[a-z]*', regex=True)] is causing x to be a list and not a df, but I'm not sure why. Have you tried printing x before moving onto the next step to see what it looks like?
Hey @distant hedge!
It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.
Feel free to ask in #community-meta if you think this is a mistake.
CSV not allowed, let me upload it somewhere
I did try that, it was printing it weird, without column names
Try apply maybe? Slower but it is more flexible
I am new to pd. I will have to read about the apply method to figure out how it works ๐
Apply gets the row of the dataframe as a series. Then you can do whatever you want with it
so I'm splitting a string and adding each word to a list like so:
" ".join(x).split()
is there a way to NOT add a word if it doesn't meet some criteria (specifically checking if the word exists in another list)? having trouble with list comprehensions here
Use "in" and list of words maybe?
Via if of course
.... in ....
y = " ".join(x).split() if x.split() not in LIST
invalid syntax
Is there someone in here that I can ask a couple pandas questions to? I am trying figure out how to sum a column based on other column data. Kind of like a Sumifs in excel, but am having a hard time figuring it out.
word for word in x if ... here is a generator expression.
I just ran this:
x = northman_df.loc[northman_df['Full Name'].str.contains('^Ale[a-z]*', regex=True)]
print(type(x))
print(x)
And got a dataframe and what I would have expected.
Results
maybe verify you're reading in the right csv?
do a df.head()?
just to verify? Weird otherwise
That's what I have as well, however, if you try to further modify output of dataframe of x, it will give you an error.
You had mentioned when printing x it printed funny for you. I'd try to get that printing right before you move on. It should be a dataframe, and from here it certainly looks like you're doing it right. The only difference I'd mention is the file you uploaded was Old.csv and not old.csv (you have all lowercase in your code). Though that might not be an issues if you simply created a copy to post here?
I actually tried it again and now it prints ok... idk what happened, maybe a bug
huh...that's different still. The first time it was:
AttributeError: 'list' object has no attribute 'loc'
now it's talking about a 'str' object. Something is funky with x
so you're not seeing an error until that line you pointed out, but I think something is happening before that line that is ultimately causing the error
Odd, because I have no errors if I comment it out
and I only have Step 1 and 2 as lines
Thanks for the help, guys. Unfortunately, I don't seem to be getting the output I want here (a list of most common words in the data that is not in LIST)
Ok, I THINK I figured out what was causing your issue? x = x.loc[x['Full Name'] != 0] = 'Alex' This line didn't tell pandas which column you wanted to set the value for, so x became just a string 'Alex'
I think what you want is something like this:
x.loc[x['Full Name'] != '', ['Full Name']] = 'Alex'
Where you're setting the full name of each column to "Alex"
you can probably even simplify it further since your x dataframe will only be Ale* based on your regex,
I have tried this one, but no luck. Getting a different error. I am trying to replace all the names starting with Ale to Alex. I have also found a solution ๐
Error yours gave me
Solution
yeah, I had to read in the CSV again, since it was already "Alex"
I am not sure why our 1st line did not work with .loc but the second with .replace did ๐
Thank you so much for your help โค๏ธ It was a challenge for both of us ha ha
sure thing! Glad ya got it working!
@lavish swift Thank you, I am trying to automate a reporting that takes 3-4 hours with pandas. I think this should be my last challenge. This error made me think that we can only use df to call dataframe ha ha.
anyone?
so you want to count word frequencies for words that are not in LIST (which is presumably a list of stop words)?
Ping to reply; I'm going to another channel.
I did that previously and thought it did it automatically as when i sort manually i get plots like this 
okay so its sorting dates incorrectly
plt.plot never sorts the inputs, no.
u should sort the dates first with the corresponding values before plotting
is this covid?
I think it's the to_datetime part which is converting them wrong
yes, trying to learn python and matplotlib
you need to convert the values as Dates i guess hmm i kinda forgot since i use julia now. lemme check for a bit maybe
thanks for the help
it seems a lot like the to_datetime has converted the dates wrong as clearly the maldives hasnt recorded 17507 cases on the 2nd of december, 9 months from now
@granite wolf just remembered i have a notebook for covid
gotta check my drive
opening it on google colab now ๐
@granite wolf https://colab.research.google.com/drive/1koj6KeoVPBfEUQ_LCXPVsxZZCwcC_Lje?usp=sharing here is mine, i think i referred a tutorial like last time
feel free to copy my notebook
thanks a lot ๐
git rekt rex

Is this an appropriate channel for asking about pandas, or would that go into a different channel?
If this is the correct channel, my question is about the distinction between float64 and Float64. Specifically, looking for any documentation on the latter (and e.g., why .astype("Float64").astype("Int64") works to convert a floating-point value to an integer with no errors or warnings).
As you can imagine, it's impossible to Google for Float64, since all the hits are for float64.
The only reference I can find in the documentation is here, when it's noted in passing that Int64 will coerce to Float64 if necessary. Except, the text doesn't even acknowledge that this is different from float64, let alone explain the differences. https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html
Hey guys, I am pulling my hair out. Is there a way to replace all values in a row by a string? All values are unique strings.
the null value is different

the distinction is that series with dtype Float64 is an ExtensionArray
nan vs pd.NA
So an ExtensionArray is an internal pandas thing that's kind of an abstraction of a 1D array, but just for pandas-internal types?
Is there a way to do something like .astype("Int64") on a Float64 ExtensionArray and have it fail, instead of returning all the values truncated to an integer? Or do I need to just use a separate thing to test if the values are actually integers vs. floats, and change behavior accordingly?
Separate thing to test
Astype* is an explicit instruction to coerce the values. Expecting it to then break like that would be... Inappropriate.
From my perspective, the Pythonic idea of duck-typing, "try something and see if it fails rather than testing it" would make it make more sense to fail out if it can't work, rather than changing values.
!e print(int(3.14))
@ripe forge :white_check_mark: Your eval job has completed with return code 0.
3
^
This isn't about something hypothetical or abstract. Its a literal instruction to "give me an int from this".
if you don't care about floating point precision
you can use series.astype('int').eq(series).all()
or use np.isclose

So if you use a type coercion, the behaviour of just getting the int portion is more practical for normal use*
Notably, you can't do .astype("float64").astype("Int64") and have it work. So in that case, it doesn't just say "you asked for an Int64, we're giving you one regardless.
In that case, you get a TypeError about "cannot safely cast non-equivalent float64 to int64."
So if the behavior is supposed to be "astype will always return the requested type, even if that requires unsafe downconversions," it isn't consistent.
Some examples (using the Python bot correctly, I hope):
!e pd.Series([1.1, 2.1, pd.NA]).astype("Int64")
You are not allowed to use that command here. Please use the #bot-commands channel instead.
OK, I don't have bot access, and would need to import pandas anyway. Regardless, my examples were going to be that (which fails), that with .astype("float64") in the middle (which fails), and that with .astype("Float64") in the middle (which succeeds). This is a bigger difference between float64 and Float64 than just "one uses np.nan, the other uses pd.NA."
And maybe should be documented somewhere, if this is the intended behavior.
Can someone help me to figure out why when I define variable at df, it converts it into a string?
It probably is documented somewhere I assume. The only thing I'd add is, this would be pandas specific decisions, and you may find further surprises as you explore the library. Fair heads up.
Not all python semantics will be consistently used by pandas. It makes its own set of assumptions
Huh? You are assigning Alex the string to both the df and to x
At that point the variable x has no relation to the df, you assigned Alex to it*
I am trying to replace all values in column with 'Alex'
Okay. And did your current code not do it?
Just for context, x = blahblah = "Alex" is evaluated as blahblah = "Alex" and then separately x = "Alex" so don't chain assignments like that, that variable x is useless for you.
You might as well remove that entire line
I am using x because i am working on a large dataframe that has many people and I have to do specific changes to each person's results.
In that sense, what I am trying to achieve is to have Alex = filtered data frame by the person that is assigned.
Right now step 1 is to find occurrences when "Ale" is at the beginning of the name and then store it in df.
Second step is to take that df, and replace all column cells by "Alex" instead of "Aleksandu, Michel", "Aleksey, Jess", "Alexxandr", etc.
All that doesn't matter to me. I'm telling you that the way you wrote the syntax, x is literally the string alex
And this is because you explicitly wrote syntax that works that way. Which means x is clearly not what you "hoped" it would be assigned to. There's a mismatch between what you wanted to do and the syntax you wrote.
But it works without x right below that code
Yes. It even works with x too. For an explanation of what the syntax did read my earlier message
All I'm saying is, you did not put a dataframe in x. I know you wanted to.. But that's not what you wrote for python.
ahhh I see, had to give it a second look. In that case, is there a way to fix it?
Yes. Just assign to x separately in a new line
Don't use chained assignments.
Btw these kind of errors are called "logical" errors. Very tough to spot because it's essentially valid syntax that doesn't match what you wanted to write. But once you understand that these kinds of issues can happen, it makes it a lot easier to spot one later.
Thank you Darr. That's a lesson learned. I am still struggling to do what I want. :/
I am not sure how else I could overwrite all cells in a column
Okay. I'll take a guess as to what I think you needed. In cell 3 you're indexing using a boolean array. Save that boolean array separately
(ps. I'm on phone so typing code is not really easy. So you'll have to help me out here a bit)
No worries haha
So see the part inside the square brackets? Just assign that separately to a variable
Maybe give it a name, alex_indexer or whatever name makes sense to you
Then cell 4 shouldn't exist, and you can directly use cell 5. That updates the df
Afterwards, if you only wanted the rows with df with Alex, you just write x = df[alex_indexer]
I assume this is what you wanted.
I think I have found documentation on what's happening https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
Not quite ha ha. I still appreciate your help.
Hm. Could you reshare current code and outputs, and then perhaps describe what you needed?
Sure, just a sec ๐ I will need to clean it up or I will confuse you.
Hi jae, we don't allow self promotion or request for jobs on our discord.
Needs a space.
!rule 6
6. No spamming or unapproved advertising, including requests for paid work. Open-source projects can be shared with others in #python-general and code reviews can be asked for in a help channel.
ah thanks
@ripe forge
Ah see, okay. So question, did df get updated so far?
Also I don't think your x.index.isin part makes any sense whatsoever.
You're doing an isin on x
Yes, multiple times, since I needed to delete rows and rename some and delete people by age, etc.
Er no, let me rephrase
Does that cell 7 where you update x modify df in the code shown as is? Explicitly the df variable
I also think we might need better names for your variables. That can help while talking about it ๐
sorry ha ha
I've been banging my head against the wall for the past 6hrs with this. My brain is barely alive.
๐ฏ๏ธ
correct, but it only works in that file. I have failed to make it run the the main file although they are identical.
Okay. But yeah this isn't how I'd be updating df
I mentioned my approach earlier, using an indexer. But I suppose I'll share the core principle or tip
If you want to update a df, you should work on that df directly. Easiest way to avoid headaches
Never mind, I was calling wrong row. It works there too.
Ah Cool.
So you should have all you need for now then, yeah? Your df is being updated properly so I guess you can keep going
Yes, you are absolutely right. Everything works. I feel though it's like a house of cards, I am afraid to change anything because it will collapse. I also don't understand how this line works ha ha x.loc[~x.index.isin(x),'Full Name'] = 'Alex' I am not sure where I dug it up.
THANK YOU FOR YOUR HELP :)))))

Np! I hate to break it to you, that line is not ideal at all, I'm like 99%sure x.loc['Full Name'] = "Alex" would have worked just fine
But yeah ๐
time to ban u for using a sticka!! 

ah that reminds me
i didnt finish working through some pandas exercises from this morning
should i do them now or continue to read about stats

fun fact: the pandas exercises helped me do well on my dataframes quiz
stats it is

x.loc['Full Name'] = "Alex" was the first thing I tried. It did not work ๐ฆ
If you want, give it a try now for good luck
When I use x.loc['Full Name'] = "Alex" it renames all the other columns to Alex
Because I can't take guarantees for the code you had before.
I will show you with clean code.
clean code
Cool, ty
Guess it's just going to still be treated as a chain. Good thing the warning shows up
anybody seen this before? its an open source book
might check out chapter two since it dives more into the math

Yep it's good
Or the standard ML book, still very solid: https://www.amazon.com/Pattern-Recognition-Learning-Information-Statistics/dp/0387310738

Starts with curve fitting, probability, decision theory, etc.
DL should be pretty straight forward after that book.
(And the many non-DL things covered too)
i like that it starts with that
thanks bud, ill add this one to the list. but i might get the physical copy since it looks pretty promising
ive found that the most effective way for me to get through a textbook is the pomodoro technique
else i just find my brain doesnt want to keep reading lol

That works very well
whats pomodoro
is it food?

but now im doing it more religiously bc of the learning how to learn course

anyway
you set a timer to work for a certain amount of time
completely, no distractions
then afterwards in your break
you can check your phone, etc.
so i do it the traditional way
I might try it that way too
basically
instead of just endlessly coding
first time coding, i did it every day 7 hours nonstop. and last november, i was burnt out doing that every day. i wish i have found out the pomodoro technique from earlier :((
and now i am procrastinating :((
same byeeeee
check out this course when you have the time. super helpful
@misty flint thanks!! now i can use my freebie lol
X = data.iloc[:, 0:-1].values
y = data.iloc[:, 8]
.
.
.
#tree viz and tree text
graph= Source(tree.export_graphviz(clf, feature_names=X.columns, class_names=True,
filled=True))
display(SVG(graph.pipe(format='svg')))
print('\n')
tree_root = export_text(clf)
print(tree_root)
i got error when i try to visualize my tree. it says 'numpy.ndarray' object has no attribute 'Columns'. im using dataset with 8 features columns and 1 for target class.
what should i do for this kind of problem?
you are getting values in the first line X = ......(.values)
if you apply values on dataframe it returns numpy nd array.
Numpy arrays don't have columns
im removing .values, but i got this instead
actually this problem can be solved if i use ```data = data.apply(le.fit_transform)`` the reason why i use iloc instead apply for encoding my label is because i want to encode only for my variable predictor not my variable target
Sorry - I deleted.
is there any way to blur an image using opencv while avoiding specific points?
i want to blur this without the black bar
with a full blur, this happens
Hey! I have a small private Discord bot and I'd like to make an AI (I presume an LSTM RNN based on my research) for generating messages on request based on those from people in my server (to clarify, I'll only be collecting data from people who give consent, likely through a reaction to a message that fully explains it, as the last thing I want to do is run into any legal issues). The problem is, while I've been able to make a simple RNN in the past, the way I've previously done it can't be trained and generate data over time, which I would want for this. I'm struggling to understand LSTM, though, and I find I learn better in practise. So, I'm wondering if any of you's would know any good starting points for this? It's also worth noting I do have a MySQL database, which for live training I assume would be better for saving training data, so if there's anything that would be able to use this, that would be great! So, pretty much, the optimal tutorial or module if one exists that would handle most of this for me would be for training and using a live async-compatible model for text generation that is able to save and use the data to and from a SQL database for efficiency. If anyone knows any good starting points for this, please let me know as anything will help!
If you care about getting something working together, you should just import huggingface transformers and fine-tune a GPT2 model
I've seen GPT2 before, but didn't think it would work for something like this as I thought that required a starting point, which would be cool as a secondary option but I'm mainly looking for it to be purely based on the data it has
Never seen huggingface before so I'll have a look at that later, thanks ๐
You just need to condition the starting point to either the probability distribution of the starting word of whole corpus or condition it on something previously said on discord
I've just had a quick look at it since I've got to go in a few minutes, and it does look really good so tysm ๐
Nw
Oh sorry that's not quite what I'm looking for, if I understand you correctly; would I have to use something like GPT2 if I use huggingface or can huggingface generate text purely based on the input data? I don't want any external sources, and iirc GPT compares the input data to data it has found elsewhere on the internet and adds to the input data based on what it has from the public internet
I do have to go now though so I'll have a look later. If you have any more information I should know, please let me know and I'll see it later ๐
It would pre pretrained on a lot of text data from the internet and then you would fine-tune it on your own data. It's sort of a necessity to do it this way unless you've got massive amounts of data and money to dump on compute.
Training it on small datasets like a discord chat would leave it deficient in its ability to generate coherent text that it hasn't already seen before
GPT doesn't inherently "compare" anything, it learns relationships between words and the probability distributions of the words that show up
To understand language, it needs a lot of data on that language
#help-peanut pls
agree with raggy, you would have to fine-tune a model @mortal pendant . You could try with a simple Keras model with a transformer block (multi-head attention and a FCN) and judge the output for yourselves. You would immediately notice that the output is not always good (as in the model barely gets even the grammar correct, forget the output).
but all the above points would be invalid if you have a ton of data to train and large GPUs to throw at it
someone know if is possible to use pandas to create one column with a list of strings and select one string by the position on the list?
hi, im having problem with opencv, cuz when i use webcam and make anything with frames the webcam is very laggy
is it becouse of pc comeponent? it isnt the worst one
not sure about using openCV
but with basic numpy you can mask and apply a blur kernel
Hey @lapis sequoia!
It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.
Feel free to ask in #community-meta if you think this is a mistake.
Hey guys, does anyone have a defined dataset/link for Fridge Item Recognition??
Trained my first CNN, how do I figure out its accuracy value? People say " My model has 80% accuracy" how do I find that?
Well, there's val_accuracy (accuracy on the validation dataset) in your screenshot.
it all has different values for each epoch. How do I find the final accuracy % ?
The last one is the final one
So I should say that my model has 79% accuracy?
its overfittin'
if you train it for more epochs, it would get to 100% accuracy
oh...
how do I fix that?
Feature Extraction part
CNNmodel.add(tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(200,200, 3)))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.BatchNormalization())
CNNmodel.add(tf.keras.layers.Conv2D(16, (3, 3), activation= 'relu'))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.Conv2D(32, (3, 3), activation= 'relu'))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.Conv2D(64, (3, 3), activation= 'relu'))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.Conv2D(64, (3, 3), activation= 'relu'))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
Neural Network - For classification
CNNmodel.add(tf.keras.layers.Flatten())
CNNmodel.add(tf.keras.layers.Dense(512, activation='relu'))
CNNmodel.add(tf.keras.layers.Dropout(0.7))
CNNmodel.add(tf.keras.layers.Dense(128, activation='relu'))
CNNmodel.add(tf.keras.layers.Dropout(0.5))
CNNmodel.add(tf.keras.layers.Dense(64, activation='relu'))
CNNmodel.add(tf.keras.layers.Dropout(0.3))
CNNmodel.add(tf.keras.layers.Dense(4,activation='softmax'))
I am trying for atleast 85%> accuracy
you had to put dropout on the conv layer too ๐ and reduce the maximum dropout frmo 0.7 to something more like 0.4 or 0.3 or you would restrict the network and create a bottleneck
How to build webgis with machine learning? Iam need source code to extracting unstructured data to DB database
#internals-and-peps
just curious, what is a webgis?
Yes itโs possible
Forgiveness if I am wrong, but isn't AUC a metric to maximize? if that's the case, then you should let your model run some more
TBH, your AUC looks kinda rocky. what set is it evaluated on? and what is your train_acc and val_acc? @pure pond
Hi guys, plesae help with some ideas for cv project
project for cv or using cv2?
for cv(computer vision)
vehicle detection, vehicle number plate detection, counting people etc.
thanks)
>>> import pandas as pd
>>> df = pd.DataFrame()
# Create a column named 'strings' with these values in each row
>>> df['strings'] = ['hello', 'i', 'like', 'python']
>>> df
strings
0 hello
1 i
2 like
3 python
# select a location (loc) by index (i), in this case row 2 column 0
>>> df.iloc[2, 0]
'like'
anyone familiar with image labelling for face detection? I want to do a similar style of labelling for the shape of a bone and I'm having trouble finding resources/software
I don't have personal experience with that, but I can take a look if you'd like
are you aware of the inbox feature in the discord client? otherwise you might have been ghost pinged.
maybe meta for this chat:
can anyone talk to how they use Python with Julia, places where the latter might be a good drop-in replacement, etc?
interesting
Well, Iโve done something similar but it wasnโt live where I simply used the bot to put loads of messages into a JSON file an then read trained textgenrnn on the messages, even filtering them down loads too to avoid really short messages and messages containing links or embeds and what not, and still ended up with pretty good data, and this was ages ago when my Discord server was pretty much just created. So I would only imagine it would be better now
Though for that I was processing it on a Google Colab notebook, so it might have scored better due to performance, but I wasnโt using the GPUs and the VPS Iโm using for the bot isnโt too bad performance wise, though it doesnโt have GPUs
Hey guys anyone has a good formula for number of bins that I should use in a histogram?
Currently doing this:
int(math.sqrt(len(df)))
Feature Extraction part
CNNmodel.add(tf.keras.layers.Conv2D(16, (3,3), activation='relu', input_shape=(200,200, 3)))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.BatchNormalization())
CNNmodel.add(tf.keras.layers.Dropout(0.4))
CNNmodel.add(tf.keras.layers.Conv2D(16, (3, 3), activation= 'relu'))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.Dropout(0.4))
CNNmodel.add(tf.keras.layers.Conv2D(32, (3, 3), activation= 'relu'))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.Dropout(0.4))
CNNmodel.add(tf.keras.layers.Conv2D(64, (3, 3), activation= 'relu'))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.Dropout(0.3))
CNNmodel.add(tf.keras.layers.Conv2D(64, (3, 3), activation= 'relu'))
CNNmodel.add(tf.keras.layers.MaxPooling2D(2,2))
CNNmodel.add(tf.keras.layers.Dropout(0.3))
Neural Network - For classification
CNNmodel.add(tf.keras.layers.Flatten())
CNNmodel.add(tf.keras.layers.Dense(128, activation='relu'))
CNNmodel.add(tf.keras.layers.Dropout(0.4))
CNNmodel.add(tf.keras.layers.Dense(64, activation='relu'))
CNNmodel.add(tf.keras.layers.Dropout(0.3))
CNNmodel.add(tf.keras.layers.Dense(4,activation='softmax'))
I am getting weird spikey loss graphs for this.... any suggestions please??
Actually, to get a good idea of how big my dataset is, I'll quickly make my bot count all the messages just from people who have already given permission from when I was using textgenrnn. There will actually end up being much more than that since there's a lot more server members now who might be interested, but should provide a reasonable metric
It's taking a while but it's already at 50000 which sounds like a fairly large dataset to me ๐ To clarify, though, that is before filtering so I would likely actulaly use a lot less of this
hello
I fetched tabular data (COVID-19 WORLDOMETER) from html file.
I have created a dataframe using the data
I want to change the name index,How can i do that
Hi. Is there an inference model in e.g. scikit-learn I can use to classify a single variable into groups?
too much dropout ๐ the model is losing its ability to map your data sufficiently because your dropout is too aggressive.
what do you mean by changing the name index?
you mean setting a column as an index?
you can use the set_index() method
if you want to rename ALL columns I suggest you just pass a list of names directly tot he attribute
df.columns = [COLUMNS here]
yeah i passed dict
d = pd.read_html(html_filename)
df = pd.DataFrame(d[0])
i=0
col_r_name = {}
for col_name in df.columns:
col_r_name[col_name] = 'data_'+str(i)
i+=1
df_new = df.rename(columns=col_r_name)
print(df_new)```
df.rename(columns={"column old name":"column new name"})
yeah
i used this method
why a for loop though= remember that pandas is vectorized, you can broadcast all the new names at once
?
i dint understood what u just said
i used for loops for storing columns old annd new names
for col_name in df.columns:
col_r_name[col_name] = 'data_'+str(i)
i+=1
df_new = df.rename(columns=col_r_name)
i think thats ok to geenrate but the last line is inefficient
you can just broadcast it al the end
You could do this perhaps (I havent tried dictionary comprehension in ages)
newcols = {column:column+"_" for column in list(df.columns)}
df.rename(columns=newcols)
fk lol
just throw .lower()
ohk
let me think
you can probably do it via apply
but you'd need to a define a new function to just get all the "object" columns
apply .lower()
and then update the values
i will use colum name = colum.lower()
you can also be a lazy fuck and do it like this too, by column lmao
df[column] = df[column].apply(lambda x: x.lower())
yeah
ideally you'd define a real function for ti though
this will do
Not sure what you mean, if you mask wouldnโt the blur still use the black values for nearby values in the mask
how do i perform canny edge detection on processed_img
the commented code for edge detection giving me error pip-req-build-wvn_it83\opencv\modules\imgproc\src\canny.cpp:829: error: (-215:Assertion failed) _src.depth() == CV_8U in function 'cv::Canny'
InvalidArgumentError: Can not squeeze dim[1], expected a dimension of 1, got 4
[[node Squeeze (defined at <ipython-input-65-bdcef4f4c42e>:1) ]] [Op:__inference_train_function_75365]
Function call stack:
train_function
- How do I sort this out?? *
are you sure there's not more to the error than that? I would ask you to share more context, preferably using our paste bin
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
Guys, I am geeking out on Regular Expressions import re . If you are new to Python like me, I think you will enjoy it sooo much in any code you write. It will become a default import for me for anything I work on. ๐
regular expressions are bae af
what are you working on where you're finding them helpful?
only this step is causing d error, the rest of the CNN training part went smoothly
this was the second last step of data augmentation
if you can copy and paste the text into the paste bin, I might be able to help.
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
There's something about six frames though? I don't know what that means.
history2 = CNNmodel.fit(train_dataset, validation_data=test_dataset, epochs = 15)
Epoch 1/15
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-96-9df2d3960108> in <module>()
----> 1 history2 = CNNmodel.fit(train_dataset, validation_data=test_dataset, epochs = 15)
6 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:
InvalidArgumentError: Can not squeeze dim[1], expected a dimension of 1, got 4
[[node Squeeze (defined at <ipython-input-65-bdcef4f4c42e>:1) ]] [Op:__inference_train_function_75365]
Function call stack:
train_function
i wish i was better at regex

Hi guys! Is there any important difference between those two Pytorch tensor mean functions?
tensor.mean()
vs
tensor.mean((-1,-2))
?
The latter passes a dim tuples - which dimensions to take a mean over (and therefore reduce them away).
-1 and -2 means the last 2 dimensions.
expand the frames to see the full error. The error is pretty clear - TF expected an array of 4 dimensions while you are passing 1D array. check your shapes with model.summary() to see what shapes are being passed for layers and check/reshape your data accordingly
If your tensor has more than 2 dimensions, these are very different - as in, the former will always produce a scalar, whereas the latter will produce a tensor with the same dimensions as the input except the last 2 dimensions(which are averaged over).
What my tensor has 3 dimensions? Both of them(my code examples) are supposed to produce a scalar.
The latter would produce a 1d tensor, I'm pretty sure.
Like, if your tensor is 5,10,20, the former would average over all 1000 cells and return a scalar, whereas the latter would return a 5, 1d tensor - each cell in it the mean of 200 cells of the original tensor
like, means_tensor[i] would be equal to torch.mean(tensor[i,:,:])
so former 1D and latter 2D?
I think I understand, thank you!
Because in my case: I have a cube tensor, each matrix represents an image, and I need the average of the pixels for each image.
So, if I do it like (-1,-2) It should be fine ๐
yeah, precisely. You'd go from, say, 50,1920,1080 (50 one-channel (grayscale, presumably) images of 1920x1080 pixels) to 50, - each element being an average of all pixels in that image
hey guys, anyone has a clue how to set up pycharm pro, jupyter notebook, in a way that it is cell based run? so far there is an editor on the left side and preview on the right side
The only confusing part I recall is that you need to poke around in the dropdown menu in the bar at the top to actually launch the Jupyter server
then you can run stuff
it isn't about running thing
it is about the editor's style
hate this so much, so confusing
@tidal bough
the thing i want it to look and feel like jupyter, the only reason i don't use jupyter is because of the awful completion recommendation and no helper preview
whereas vs code still has that, not that great either, so far nothing beats pycharm, but it is so ascetically confusing
jupyter does have some autocompletion and doc viewing, but yeah, not the best
jedi is trash my man hahah
so i'm ready to drop some cash if they actually work
I have 0 clue how did he make it like this
nah cell based editing
if you look at mine i have editor on the left side and preview on the right
I see
i just want it all to be the same thing, but no clue
lemme open up Pycharm, if I have enough memory for that lol
you have the same as that one or you just don't use pycharm for that?
nah, I'm going to see if I can figure out how to make it this way
it's probably quite a good idea - I was using the two-screens way
like vs is cool, but the amount of bugs and speed is not at par to pycharm
for me VSCode just doesn't have some of PyCharm's features
notably: showing the contents of numpy arrays(Scientific view or whatever) and the profiler which shows the results as a graph
same i really tried to make vsc work, but man it doesn't cut it
on the other hand, PyCharm just loaded for me.
more than 5 minutes of loading time. This is why I don't use it often ๐
i guess will require another year to index everything? ๐คฃ
i just start it in the morning
get my coffee
go and take a shower
then wait for another 10 minutes for it to load
I wonder if that's Pycharm
maybe it's Jupyter Lab or something, lol
yeah seems to be no answe r
such a shame was looking forward to it
nah, try typing np. and then wait for the fill
sometimes it works
sometimes it doesn't
i read that jedi and some other completion helpers in jupyter struggle with indexing
@distant hedge
your works, mine struggles
yeah did all that, it is just inconsistent, which i hate, even vs code sometimes struggles
like 0 completion recommendations
What are you using?
labs
Ha ha ha I didn't had any issues so far. ๐ Are you using ctrl+space on vscode?
Or maybe you are using Kite
yeah it is fine, it works most of the times so i'm happy, usually what i do
NO
NOW AYYYYYY kite
I have decided against Kite, I hate the damn thing. So many popups it's unreasonable.
that thing is so over priced
10 euros a month for a god damn completion assistant? ffffffff that
i mean if it was 20 euros a year sure
but 120 euros a year for a
wait no it is 140 euros
ffff that even harder
kite is so much better IMO: https://www.kite.com/integrations/kite-vs-tabnine/
But honestly, once you get used to the lack of autocompletion in Jupyter, it isn't that bad as you think. I just use it to complete long variable names. apart from that, not much is require tbh
you could use TabNine tho
I would say that - I was involved in a part of the early development of kite (just an external person who had a solution to x problem) and they basically ripped off my solution that I provided on the expectation that I would be compensated adequately. All they gave was a fuckin mug and shirt whose customs+shipping I was supposed to pay. Their whole company is built on mistrust and un-sporting practices. not even a certificate to put on my CV
Does anyone have any NLP related project ideas? Idc what level
i mean all in one line like the column string on the index 0 had ('hello', 'i', 'like', 'python')
so you want a Python list to be an element that occupies one cell of a dataframe, and you do not want to create a new column?
ye, but i want create a new col
like column strings, on the index 0 had a list, on the index 1 another list...
>>> import pandas as pd
>>> df = pd.DataFrame()
>>> df['string_lists'] = [['hello', 'goodbye'], ['python', 'java']]
>>> df
string_lists
0 [hello, goodbye]
1 [python, java]
i was trying something like this
but looks like i should sum line by line?
the most close of a list i could be was to put , between the strings
@lavish tundra can you post an example of the desired output as text, and a sample of the csv as text?
Sorry for taking so long to reply and thank you for the timely response. That would be highly appreciated!
Are you familiar with torch vision?
I'm an extreme beginner in this field, only used Keras/TF, so no
It's like Wanda vision, in the sense that it's a thing. They don't actually have anything in common
lol so they went thank you and also fuck you ๐คฃ
yep. I was too naive ๐ฆ but at least I got some experience dealing with "adults"
not necessarily?
Anyway, pytorch is another library for deep learning, and torch vision supports computer vision. I think facial recognition is a common use case for learning how to use it
the point of applying a blurring kernel manually is that you can customise it
For the record, gm knows more than me about anything you might ask in this channel except maybe specific areas of nlp
Hi I have a pandas dataframe with 5 or so columns and 20 rows. I want to create a new column where I apply a custom function that takes the most recent 5 rows of data from two columns. It's like a rolling function. I've tried something like this which doesn't work ```py
data['Rolling_Beta'] = data.rolling(5).apply(beta(data.B,data.C))
How might I achieve what I'm trying to achieve?
well as long as you get a good resume points, i'm sure it came in handy?
You probably want to apply your beta, not the result of applying beta to something.
I don't see how I can put it on my resume without some formal documentation, but I dont know much about CV's anyways ๐คท
I could use some NLP help as well with a uni assignment but hopefully I'll manage to solve that in the following days, when I actually take the time to look at it ๐
re: Pytorch - quick Google in and found a bunch of tutorials, although pretty old and lacking instructions on labelling your own. Seems like a good start for now, thanks
yeah, big plus (ffff that guys? ๐ )
im planning on doing a minor in computational linguistics/nlp, do you have any favorite resources youd recommend
thanks I've tried this:
data['Rolling_Beta'] = data.rolling(5).apply(beta)
and I get this error TypeError: beta() missing 1 required positional argument: 'B'
So I thought of putting the two particular columns I wanted together and apply it on that a la:
data_a = data.A,data.B
data['Rolling_Beta'] = data_a.rolling(5).apply(beta)
This doesn't work either, I get a AttributeError: 'tuple' object has no attribute 'rolling'
check what rolling gives exactly
it it gives tuples, your beta must be single-argument
!docs pandas.DataFrame.rolling
DataFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)```
Provide rolling window calculations.
Parameters **window**int, offset, or BaseIndexer subclassSize of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.
If its an offset then this will be the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes.
If a BaseIndexer subclass is passed, calculates the window boundaries based on the defined `get_window_bounds` method. Additional rolling keyword arguments, namely min\_periods, center, and closed will be passed to get\_window\_bounds.
**min\_periods**int, default NoneMinimum number of observations in window required to have a value (otherwise result is NA). For a window that is specified by an offset, min\_periods will default to 1. Otherwise, min\_periods will default to the size of the window.... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html#pandas.DataFrame.rolling)
Not really. You can check my github to see what projects I've worked on and get a sense of what I might know about.
NLP = Huggingface! ๐ค
The state of the art depends a lot on different approaches for representing words as vectors
haha i used to live in fairfax and dupont circle
totally off-topic
What's some manual object labelling software? I seem to only find corporate solutions.. and Lionsbridge lol
Dupont must have been nice. I'm not so ambitious as to assume I'll ever live there.
hmmm
you can make your own ๐คท all you have to do is to divert image to the correct folder?
sounds good. thanks
it is quite nice, but was too much, i was underage so what's the point of living near the night life places when I am unable to use them. Fairfax was much better imo, needed a car, but super chill place and you don't have those drunk/drugged homeless people walking around the streets in the evenings.
I tried that but my dataset is very limited, I'd like to provide separate .xml files for labelling a feature on a bone.
Bigger picture: Binary classifier for male/female determination of an edge of a bone
no idea, I have never worked with .xml
or csv, or something. something to say that this particular edge is relevant - can't crop the images to only cointain that specific edge as there's plenty of surrounding noise
so you would construct a csv for all paths_to_image and the label?
path_to_image, Female
the idea is to put a lot of different string on a list, and access the string by the position of the column and the position on the list,
like:
Column : String
0 ('banana', 'apple', 'mango'...)
1 ('potato', 'blueberry', 'milk')
...
so i'll need to access apple by the position, for example: [db.at[0, 'String']][1]
Do you need any other types of columns?
yeah, I get that - but the results are bad if I just specify that image X, Y, Z are female, etc
why would the results be bad? that means your model is not correct
what u mean? i'll have this column and a ID column, like the id for the ('banana'...) will be 6546984654
because I have a very limited dataset
i have all the data i know how to merge columns, but idk how to add a list in one cell
ye, but how would a image labelling software help in that case?
I could input points on which edge to focus in relation to an adjacent edge and not have it take into account all the other irrelevant irregularities.. or at least I'm thinking I can
wut? what technique are you using?
feature extraction is not something manually done in images.
just Keras and label folders for now, k-fold validation and retrained resnet. similar results to k-fold with own model
pre-trained resnet was a bad idea. but I still don't see how a image labelling software can help increase data
I didn't say it helps increase data lol
same goes with pretrained vgg16 or no pretrained at all
cool, but how does the image labelling software fit into all this?
you already have the labelled data
I want to input the points of the shape I want it to look at
could you elaborate what "points of the shape" are?
You want to label pixels of an image and store those pixel coordinates in a separate file?
how does that help the model converge?>
IMO it would be some sophisticated image preprocessing
Idk and I don't care, the question was about manually labeling things.
cool enough
can I link papers?
One of the methods for identifying growth disorder is by assessing the skeletal bone age. A child with a healthy growth rate will have approximately the same chronological and bone ages. It is important to detect any growth disorder as early as possible, so that mitigation treatment can be administered with less negative consequences. Recently, ...
so you want a different label for each box?
pretty much
The paper is about predicting the bone age ๐คท that's regression
Idk, but if all you want is to take such and image and have a label for each box, you should be able to code that in python in 30-60 minutes.
(manually drawn rectangles)
for each image?
Argh, reddit idiots https://www.reddit.com/r/MachineLearning/comments/m2bo1y/d_the_real_problem_program_of_the_next_5_years/
"Making music to psychologically manipulate people into doing what the lyrics say"
yeah but I'm interested in feeding it the keypoints used for that
then squiggle is right - manually drawn rectangles (large enough to cover most images)
it would be pretty boring, but you would have to do that
It's just a complete beginner's drawing tool, nothing more is needed.
Hello! I'm trying to classify a customer complaint database by topic, but I don't have much experience with ML, and since I don't have an already labeled dataset, I'm not sure how to proceed with unsupervised learning methods. I'm considering classifying the complaints by keywords (e.g., counting the frequency of said words and selecting some relevant keywords manually), but I don't really know how exactly to do it - I've managed to get the frequency and I already have an idea on which keywords to select, but I don't know where to proceed from that. Can anyone help?
just use autokeras - one stop solution
I don't want the boxes showing up in the image, or drawing them and training with boxes in image. I want those boxes to be the objects it would look at when trying to classify whether it's M or F. Would that be possible?
just crop it
Yes, you just store the box coordinates and sizes in a separate file which is associated with the image. Or in the same file (custom file format).
cropped still has other features in the background that are irrelevant
crop + preprocess
I already preprocessed enough, I have an edge image
but I can't uncurve the irrelevant edges, can I?
yes, and what would I use for that? conversely, instead of rectangles, can I use points like used in facial landmark detection?
What would you use to draw the rectangles?
to generate a label file
wiiiiith? any library?
You mean libraries?
pygame, pyglet, pyqt, pyside, pywxwidgets, or any other UI framework that lets you draw stuff.
sorry, but as I said previously, I'm very new to this and my only experience re: labels is just using folder names as labels
You're new to python?
new to ML stuff, only used Python for basic data operations
This has not really much to do with ML and everything to do with being able to make an app.
From what I have quickly read right now, I should define the classes and run and run 20% of the dataset as training, check the model precision, do some cross-validation and apply to the general dataset?
Then you need to learn more python for basic stuff like File IO, making a GUI, etc.
nope, all I want is a model really
yeah, there's no windows .exe that can do that?
According to you, no.
well I only found corporate things, that's why I asked in the first place
Idk if anyone would know of such a specific application.
All I know is that it can be made in python in like an hour.
literally none to load image, I draw my object box, and it saves the coordinates in an xml or something?
yeah, thats the correct way
You don't need XML, over-complicated file format for something so simple.
XML was made to be simple AFAIK
XML is soooo far from simple.
what, it just has custom tags. that's it
It's an entire tree structure.
Requires a bunch of parsing rules.
For storing boxes it's as simple as:
well, ye gotta put the effort - but Im pretty sure there would be a lib for xml
87,124,54,24
55,200,20,20
...
Image_ID/path?
Image path first line
I wasn't dead set on XML lol
I just gave it as an example as it's what Imagenet used and I don't know others
xml is for when your thing is very tree-like (and potentially any number of children per node).
but what is your end-goal (leaving aside the boxes for now)?
when is some dataset tree-like?
for the love of me I can't explain it in english but I have an analogy
It's not even just for datasets.
out of curiosity, what is your native language?
Think like a robot. One common file format for simulations is URDF.
hungarian
It's xml type of file because a robot's parts connect like a tree. Like the main body might be a node and it has 4 children nodes which are wheels.
ok, thats a good one
@candid sable try
@lapis sequoia how's GME?
265$ ---> OH frick
you know how men have the Adamโs apple - Iโd like to detect a similar bump on a bone
i'm not really checking it, now that you mentioned it lol
but looking back, only regression would be a potential solution so yeah
well, then why do you need specific parts of the image cropped out?
i deal more with credit stuff
ye, them r/wsb are prob rich by now
wdym by credit stuff?
other irregularities where the bump could be - the surface isnโt always straight
loans, mostly
nice - smart and stable
i work at a p2p loan company
wait - so all you want to do is to detect a specific feature on an image
yes the size of it
is that even legal? not to be under some financial framework
suprisingly, it is
at least in the country where i live
well well well. You can simplify the model - 2 models, one for returning bounding box coords and another prog that crops it then feeds to the 2nd one which performs image regression
r u sure?
yes
well, that's intriguing. how does it work exactly?
yeah I need to look more into regression. thanks for the patience man
cool, no worries
Iโm a disaster at math
you can have a max interest rate of aprox. the double of the basic national interest rate
having that, you can work in several ways
that's....not fair
you can have an investor fully financing a single loan
or
you can have several investors financing loans in quotas
I saw a similar concept on an Ethereum loan platform
it all depends on credit score simulation
what advantage does that give over the traditional banking system?
less bureocracy for getting a loan
also, in the economic setting of my country specifically, it's way more advantageous to invest in private loans instead of federal securities, for instance
when the basic national interest rate gets at it's lowest, the investment yield can get to negative levels
and for credit investors, it can be really helpful to diversify through private loans
for a borrower, it can be way easier to actually get credit when you don't have to go through the whoooole banking process
credit scoring was the whole reason i got interested in data science - most companies in that segment use econometric modeling to determine credit risk
whoa, that's gonna go over my head ๐คฏ
if you live in the US, i'm pretty sure there are companies that work in that business model
nah, not in US
it's a really good alternative if you're a seasoned investor and want to diversify your wallet
what is credit risk? the risk that the loanee won't pay the interest?
the risk that the loanee won't pay at all
there's several ways to calculate that
i'm not sure how because i'm not entirely versed in the econometrics behind it lol
but it takes some variables like age, credit bureaus score, income commitment, etc
that's interesting. any way we can have a look at that data?
i'm actually using excel files lol
do you mind if i just sample one column? i can't really share the whole dataset because it may contain sensitive info
the column i'm looking to classify, of course
๐ฆ sample is good too
Hey @lapis sequoia!
It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.
Feel free to ask in #community-meta if you think this is a mistake.
oh well
you can export it as a csv
Hey @lapis sequoia!
It looks like you tried to attach file type(s) that we do not allow (.csv). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.
Feel free to ask in #community-meta if you think this is a mistake.
jesus
well, I don't think its that hard to convert a csv to gif man ๐
you can actually do that? lmao
just jokin. can you paste a coupla rows here?
ahh, not english ๐ฆ
well, basically i have some complaint comments
i know the keywords i want
and i want to classify the text based off on that
i've read a bit on autokeras but i'm not sure on how to proceed with that
i've considered using regex to determine the classes
in such a way where if the row contains word x, then class = x, for instance
ok, so that's the first step ^^ to build a dataset. one column labels, one column text.
next, we would take all the data in each cell and make an array out of it. so text_array would contain [test1, text2, text3..... and same with the labels.
lastly, we would just pass both arrays to autokeras and you are done in about 30 lines of code
though it will take time for autokeras to find the best model
so you say, creating a labeled data myself, based on the keywords i select?
yep, has to human made for best accuracy
that can actually work very well with some good regex
you can use regex, but I advise against it. a lot of things are nuanced
either way, it depends on the dataset
i've used str.contains too
it just feels a little imprecise tho
after i normalized the text it doesn't feel like such a huge problem
what would you recomend?
I didn't mean the method to identify label, I said that finding labels programmatically is a bad idea. because if you can do it with programming, why are you making a model?
that is a good point indeed
it's kinda pointless to make a model if i can actually label it myself with programming and keyword selecting
ye, you got it. your task seems pretty simple so a large enough if [word1, word2, word3] in sentence should do that anyway
and if i want the full context of that specific label, i can just get the bigrams/trigrams for that label
ye, that option is always open
maybe, but you have to explore your data first
i'm sadly very familiar with it lol
cool, no worries. Good luck! ๐
internships. what can you do
ikr?
its basically exploitation
not that i really mind it tbh
i had no idea of what to do it my major until my intership
gotta do it some day or the other. plus, companies dig that - so hang on!
i'm hoping to get a firmer grasp on statistics and programming so i can fully transition to data
i was really surprised with what you can do with such skills
it's amazing
ikr
sadly my undergraduation focus much more on economic story than economic theory/econometrics
ah
but i pretend to move to statistics
i had lessons with the data science team of my company
basically learned python through it
then thats good
i hope so
i dont work yet so i can only imagine
it's crazy
when you're in a department where you're the only one you can code
you're literally god
that's not exactly good though lol
the resources right
it's really pressuring, as an intern, to have so much expectation in the analysis you execute
thats crazy
yea
thats what pressure does
but i personally cant wait to start professional data science
it's a really good career path
yea
Is anyone here good at deep learning, neural networks?
Plz mind DMing me
Plz plz plz
i've barely worked with them and i'm confused with weights and biases
๐ญ
i also wonder how nueral nets can work with robots
they can
Robots must have a chipset
Like Arduino or Raspberry
so I have a dataframe data and I'm getting the standard deviation of a particular column Beta. So I'm using data.Beta.std(). I'm multiplying that by 2 and adding it to the mean and that number is higher than all the numbers in my dataset....is that possible?
You cant expect to add a neural net in just a walking robot who just walks
You cant directly code in robot it doesnt have keys
You code in computer and then transfer the program to your robot
and is python a language that is used for it?
Yea it is
Python has libraries like tensorflow, pytorch which can be used for neural nets, deep learning algorithms
It also has a library for robots
Ive heard of it
Can you plz teach me too
sure
Thanks
reinforcement learning. take a look at some vids on YT. pretty interesting
Is it appropriate to ask plotting questions here?
plotting as in matplotlib?
@rugged spire yes, sorry, my keyboard decided to have a seizure right when you answered
I can't for the life of me figure out how to destroy a figure / canvas completely. I am placing the canvas in a tkinter Frame, but I still can't figure out how to delete the canvas object
If you think the error is more of a problem with the UI, I can shift this to UI instead.
um
tkinter is yikes
sorry but i have never used tkinter
but yeah UI is for tkinter questions
so i am not really sure




