#data-science-and-ml
1 messages ยท Page 14 of 1
would it not take longer?
Longer than?
longer than running it once and expecting a reasonable output ?
How are you planning on running it once and getting that "reasonable output" though?
This model won't do that
OCR somehow does it.
Right, but OCR is not this model ๐
Would it be possible to find what an OCR model looks like ?
You can read the paper on google tesseract if you are interested
There are many other solutions too
Some use recurrent neural networks
@shrewd grove brooo you been 1096 days on this server damn
One method is just what I said, which is clustering (separating) the letters
And classifying each one separately
And considering you already are working on a model to classify those letters, that would for sure be the easiest solution, as long as the letters are separated quite well already in the image
how different is classifying two letters to one letter ?
sadly, they are not.
Well the problem is not just getting the letters right
Because your model will already give you a higher value for 'a' and for 'f' if those are in the image
But it is knowing what order they are in
And dealing with for example duplicates
And maybe you would also like context to play a role, and you want the process to more likely classify a letter as e after its found 'airplan' and not an f
I could obtain a list of "allowed" words.
which would turn it into classification problem - but the list is huge.
Ideally you want to teach your model what letter is likely to follow from some other letters it has found
And not store it in some database
Can I approach these problems one-at-a-time, rather than try solutions clumped into a huge model ?
Sure you could
But that means you would have to extract info yourself, and apply it when creating a model
Instead of making a model that can learn this info itself
It might be good to look at some tutorials on making OCR and seeing what options there are
Or maybe even read some papers if you are into that
I tried googling and most of my findings could be summed up with "train pre-trained model from this library".
In that case look up what the model is
What stuff you don't understand, and look up those parts
Modern models can be quite complex though, so for good results, it might get a bit complicated
You will have to understand stuff like attention and transformers and recurrent neural network
I wish there was a simple model, that I could learn from.
Well, you can be creative yourself as well
One idea would to f.e. try and slide a window over the image
Then classify each window
So for an image like this you would get stuff like 'aaaaaoaaassssssssdddbddddddfffffff'
And then try to make something from that
is it not what convolution does already ?
Yeah, a convolutional layer also slides a window across an image
It does something different with each window though
It sums the product of the window with the kernel
Whereas in this case you get a class* (a letter) for each window
Im getting fixated on an idea, but not sure how to implement.
assuming i have an image (matrice) of x * y.
If it slided a window 1*y, finding patterns.
and I would get (somehow) a matrice of x*1 filled with patterns.
than it would simply be consuming that... into letters?
That is indeed a good idea, because it already exists ๐
It would probably have to be a recursive neural network though
Since then you don't just use each slice itself to classify the slice
But also data from previous slices
image -> [a_1, a_2, a_3, s_1, s_2 ... ] -> "asdf" ?
assuming that letters are in constant places and always look the same - no need for recursion ?
hmm, you mean each slice is classified itself?
hmm
so if you got a begging a letter "a", followed by a middle of letter "a", finished with an end of letter "a" - it is likely an "a".
Alright
but take this example
What letter are you seeing (letter is black, bg white)
So then it would be kinda hard right
You can't classify a single slice
Not enough information
yeah, but You could classify it into "straight line"
so a line of patterns makes a letter.
could i not teach my model to recognise sequences of patterns ?
Yeah, but sequences are often done with reccurrent neural networks
Or transformers with some more modern networks
And what would you use to even teach the model the "patterns"
For an image with a single letter you know the answer
output - letters.
But for a "straight line" you'd have to annotate each slice of the image
Wouldnt it be crammed into one model though ?
so effectivly i would not care how does it calls each slices ?
Well if you want the model to output stuff like "abc" and it would be 1 convolutional neural network
Then you would probably need 26*26*26 output neurons
And it would be hardcoded on amount of letters
that is assuming I use one-hot encoding at the output layer
Yes
what if i wanted numbers, that i could than round to ascii codes?
Well that would be way more output nodes then if 1-hot encoded
But I think it might just be good to look into recurrent neural networks before trying to avoid them at all cost
They are probably the more simple way of doing this
the problem I see that most examples I encountered are ... either very basic
or using all-fancy models, that are not described.
Then you would probably need to educate yourself a bit, maybe read a book on neural networks
is it my research skills or google-based tutorials are off ?
The creators of pytorch have a good one on deep neural networks, but they don't go into recurrent I think
A book it is than. Any recommendations ?
No, some tutorials and papers are written in the most confusing way possible, that is just how it is sometimes :/
I wouldnt mind creating a few dozens models just to learn.
The "quick and easy" tutorials can also sometimes/often contain misleading or just wrong information
but most tutorials call for classification which I (apologies) found not-interesting, really.
Indeed. This is why i went bruteforce programming route.
hmm yeah..
But that might also not be the solution
As you would end up with a model that has 480 million parameters without knowing why it has so many f.e. ๐
I have some security/programming background, so bruteforce is usually the solution.
Well not in data science and machine learning
Just a bunch of math to begin with
And once you understand the basics you can learn about the perceptron/linear regression and other more simple methods
And build your way to complex models
There is (unfortunately) just a lot of theory to go through before you can really understand what you are doing with the models
which is a problem, as most Machine Learning is to me a black-box.
Right
You put some data in, You might or might not get a nice model.
And that works if someone has already made a model that is easy to use that you can grab
And no way to validate mid-results.
Jup
And you can honestly get a lot done without understanding anything about ML, but once you get stuck, you get stuck
And for this project, this might be a bit of a wall
What would you recommend than? Is there a book that would give me the basics?
Imo the models you are trying to use at this moment are quite complex already for someone who might not have that much experience with ML
Preferably you already have a bit of experience with calculus and linear algebra
Then you also want to get into probability theory/statistics
I do
I can calculate simple probabilities, not really touching on normal distribution, as it requires integrals.
You don't need to calculate it by hand, but just getting the intuition is pretty important
But it depends on how in-depth you want to go, I'm also just a student, I don't have all the answers as I'm still learning too
But I have read some books that I thought were pretty useful on some topics
I think what im looking for is "Machine Learning for dummies". As if "This is a convolution layer. Input is this, weights are constant - what do you expect to get out?"
so than I can build models with an intuition of "oh, i expect these data to be in a certain range"
but perhaps im thinking wrong and the black-boxness of ML prevents this approach
Sliding windows to classify sequences is a thing, e.g. to recognize objects in vision. And it's what humans already do via saccading (they also use motion detection (optical flow) from the jumps). But there needs to be some memory used to link the slices together as a sequence, often done via an RNN, but there are other options too (RNNs are good at some things, and not at others), you may also use RNNs in combination with other methods, they can help with short term stuff (e.g. one jump from one slice to another).
Different fields describe it in different ways
it's a little of both. knowing what a convolution layer is and how it works definitely helps you build understanding about model building in general; it can help you answer the question of when to use a CNN or not
It's good to get a bit of a feel from each field to get an idea that is more intuitive for you specifically
I read this book btw, which goes into deep neural networks
https://www.manning.com/books/deep-learning-with-pytorch
It also covers some more basic concepts useful for ML
think about accuracy/precision/recall etc.
And loss, and gradients and stuff
*Humans can't do convolutions, it's not biologically plausible, but they can do some very similar stuff (saccading, tiled receptive fields (but without shared weights)).
However, convolutions work really well on computers.
I think human eye kinda "remembers" past few seconds to detect movement/changes.
Yeah it does at multiple levels.
I see it is part of a bundle - and "Real-World Machine Learning" seems promising.
Haven't read that one
But I really liked this book
I kinda read some of these books in my free-time, some are pretty good reads with illustrations and stuff
But you do have to like this type of stuff if you want to read through these type of books
Maybe there is some "demo ;)" out there where you can just read some of the first pages and see if you like the book
Real-World Machine Learning has a free trial
there's also the fastai course which people seem to enjoy
hello friends, I found something superrrrr interesting when I was experimenting around with neural networks
so I was trying to visualize what a neural network was doing by visualizing each layer as some set of transformations to the input space
affine_transformation + nonlinearity in each layer
it's impossible to visualze this for very high dimentional spaces, so I limited myself to 2 neurons
btw this was inspired from seeing andrej karpathy's convnet.js simulation which does exactly that
Got some cool images then?
so when I tried separating two circles using only two neurons, nothing happened!!!!!!
it didn't budge and it didn't separate no matter how many layers I chained
(the slider shows the order of the transformation)
but by just moving 1 dimention up, the separation almost became trivial, super easy for the network
I don't know i just found this super interesting so I thought I'd share it with ya'll
think of it this way: imagine if you had a z axis pointing out of the computer screen, and you could move the blue ring "up" (towards your face) and the red ring "down" (away from your face) -- perfect linear separation!
Are you familiar with support vector machines?
congrats, you've just rediscovered not only the field of "kernel methods" (which were a very big deal at one point) but also fundamentally why constructing higher-order features is useful for learning arbitrarily complicated highly-nonlinear problems. you should feel legitimately proud of figuring this out!
thank you so much but all I did was reimplement what I saw in convnet.js by Andrej Karpathy and mess around with it ๐
yeah I spent a little bit of time with them but extremely surface level
yeah it makes sense when you look at it after the fact for sure
it's suuuuper interesting though
Yeah cool animation too 
oh yeah the animation shows exactly what i was describing. "pulling" the circles through the extra dimension lets you separate them freely.
Thanks it's for a video I'm working on where I wanted to eventually talk about the manifold hypothesis ๐
anyone know the syntax to make thi sinto a grouped bar chart with 0-9 x axis, values y axis and hue by model?
please do print(death_balanced.head().to_dict('list')) Please remember to always give a copy and pastable example of every dataframe you ever need help with.
{'rf': [0.125, 0.145, 0.137, 0.164, 0.195], 'lr': [0.156, 0.168, 0.11, 0.17, 0.2], 'nb': [0.174, 0.189, 0.152, 0.187, 0.208], 'xgb': [0.059, 0.137, 0.11, 0.139, 0.23]}
what have you tried so far?
how is this different from what you want
its EXACTLY what i want but i want it in seaborn to match my other plots
how did u do it?
never used seaborn.
cripes
my report has to stay consitant so i cant just do that sadly
seaborn requires x,y, and hue
what is hue
for example,
data=penguins, kind="bar",
x="species", y="body_mass_g", hue="sex",
ci="sd", palette="dark", alpha=.6, height=6
)
g.despine(left=True)
g.set_axis_labels("", "Body mass (g)")
g.legend.set_title("")```
so for me its hue = titles/first column depending on transpoed or not
then idk what is y
they have a column for all three
categorical
@steady basalt the docs have this df as an example
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
and then they do this
sns.barplot(x="day", y="total_bill", hue="sex", data=tips)
and they get this
you end up having to do some fucky stuff.
In [38]: df
Out[38]:
rf lr nb xgb
0 0.125 0.156 0.174 0.059
1 0.145 0.168 0.189 0.137
2 0.137 0.110 0.152 0.110
3 0.164 0.170 0.187 0.139
4 0.195 0.200 0.208 0.230
In [39]: df.reset_index().melt(id_vars='index')
Out[39]:
index variable value
0 0 rf 0.125
1 1 rf 0.145
2 2 rf 0.137
3 3 rf 0.164
4 4 rf 0.195
5 0 lr 0.156
6 1 lr 0.168
7 2 lr 0.110
8 3 lr 0.170
9 4 lr 0.200
10 0 nb 0.174
11 1 nb 0.189
12 2 nb 0.152
13 3 nb 0.187
14 4 nb 0.208
15 0 xgb 0.059
16 1 xgb 0.137
17 2 xgb 0.110
18 3 xgb 0.139
19 4 xgb 0.230
sns.barplot(hue='variable', x='index', y='value', data=df.reset_index().melt(id_vars='index'))
Output exceeds the size limit. Open the full output data in a text editor
Epoch 1/100
6/6 [==============================] - 0s 1ms/step - loss: 6.6317 - accuracy: 0.4778
Epoch 2/100
6/6 [==============================] - 0s 1ms/step - loss: 6.3355 - accuracy: 0.4778
Epoch 3/100
6/6 [==============================] - 0s 1ms/step - loss: 6.0349 - accuracy: 0.4778
Epoch 4/100
6/6 [==============================] - 0s 1ms/step - loss: 5.7364 - accuracy: 0.4778
Epoch 5/100
6/6 [==============================] - 0s 2ms/step - loss: 5.4379 - accuracy: 0.4778
Epoch 6/100
6/6 [==============================] - 0s 2ms/step - loss: 5.1394 - accuracy: 0.4778
Epoch 7/100
6/6 [==============================] - 0s 2ms/step - loss: 4.8425 - accuracy: 0.4778
Epoch 8/100
6/6 [==============================] - 0s 2ms/step - loss: 4.5507 - accuracy: 0.4778
Epoch 9/100
6/6 [==============================] - 0s 2ms/step - loss: 4.2534 - accuracy: 0.4778
Epoch 10/100
6/6 [==============================] - 0s 2ms/step - loss: 3.9574 - accuracy: 0.4778
Epoch 11/100
6/6 [==============================] - 0s 2ms/step - loss: 3.6707 - accuracy: 0.4778
Epoch 12/100
6/6 [==============================] - 0s 2ms/step - loss: 3.3756 - accuracy: 0.4778
Epoch 13/100
...
Epoch 99/100
6/6 [==============================] - 0s 2ms/step - loss: 0.1075 - accuracy: 0.4778
Epoch 100/100
6/6 [==============================] - 0s 2ms/step - loss: 0.1068 - accuracy: 0.4778```
my network isnt improving ?
the model is very very simple
inputs = keras.Input(1)
outputs = keras.layers.Dense(1, activation='softmax')(inputs)
model = keras.Model(inputs, outputs)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])```
the data is also simple
data_set = pd.DataFrame()
data_set['x'] = [i for i in range(-100, 100)]
data_set['y'] = [1 if random_sigmoid(i) >= 0.5 else 0 for i in range(-100, 100)]```
am I missing something ?
confused
so u un transposed it
then made a cat for model
hmm i see very nice
@hollow jetty this server isn't a place for recruitment. sorry
No worries just wanted to also showcase some of my work
I have been working on bit on generative art and nerfs
I have a big list of ints and corresponding a and b int values...how do i train a model so that when i give it that list of ints..it gives me a good estimate for the a and b?
also some potential resources for the same?
Worked thanks man
what is the value of yi in this soft margin classification equation for svm
Look into linear regression
it's the label on the i'th data point, no?
so why isnt that isolated in the equation
why should it be?
shouldnt that be what we are trying to figure out
so if the label is unknown
why is it like that
wouldnt there be a reason the equation was made that way
.latex $$\left{\begin{array}{lr}\boldsymbol{w}^T\boldsymbol{x}_i-b\geq1&y_i=1\\boldsymbol{w}^T\boldsymbol{x}_i-b\leq-1&y_i=-1\end{array}\right}$$
does anyone here have experience with making twitter scraper bots?
!rule 5
5. Do not provide or request help on projects that may break laws, breach terms of services, or are malicious or inappropriate.
(Unless you are using Twitter's official APIs)
can i dm you i have some questions?
No.
damn okay
son of a.... i spent at least a day trying to figure out why I couldn't connect to S3 in python with hadoop-aws. type of credential provider was the reason ha
Heyyyy, could someone explain to me about this one. I am new to signal processing, and confused of it. Basically, I want to input my audio and calculate the harmonic distribution. But I don't know what's the correlation between amplitude and harmonic distribution in the following codes.
as a sidenote, bot's tex preamble has been updates. you can now use \bm instead of boldsymbol
Now is such a graph acceptable? The scatterplot is the column vs target variable plot and the histplot behind it is the distribution for the column
It's pretty chaotic with the histogram behind it
The scatter plot makes it hard to read the histograms, maybe just make it two separate figures
hello does anyone know why cuda can't see my gpu is available i got newest version of cuda toolkit installed and latest drivers for gpu and my gpu is a card that works with cuda
I think it would help if You gave us more details. How do You check it, whats your OS etc.
I'm using windows 10, visual studio code, rtx 3090
Have You got a compiler installed ?
yea
it's just cuda isn't seeing my gpu as available device
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()
#Additional Info when using cuda
if device.type == 'cuda':
print(torch.cuda.get_device_name(0))
this is the code that i'm using to recognize if it's finding it
Have You considered running some cuda samples?
i haven't yet, i'm new to using cuda
Try it. You will see if its cuda or pytorch
alright i'll try it soon thx for the help
Assuming this is pandas: https://stackoverflow.com/questions/19124601/pretty-print-an-entire-pandas-series-dataframe
didn't work
specifics? What did not work ?
didn't detect the gpu
like runs with cpu
but not gpu
did You run the cuda sample ? Did it work?
didn't work
Not sure how the market is, cuz my specification is a bit different, but similar.
what do you specialize in?
But generally, if I were to guess: Yes, but... people are getting abused this way.
Im a programmer.
python?
Do anyone know what the issue is here? https://gyazo.com/ec3b700cf924ce03df1826188104940d
cpp.
anyhow - I have seen people programming without a degree.
However, they were usually underpaid and abused.
damn this language is really well paid
it is not - comparable where I live, really.
where do you live?
function does not exist
London
I did use the function symbols earlier without any issues
Did You import it in current session? Where is the function coming from ?
Generally, after a few years of experience people tend to care less for your degree - as long as You have one.
so if you have a CS degree, dont do another one - one is enough. If You dont, Id advise to get one.
Am I wrong to say it is part of sympy?
theta_change is not defined bro
ahhh thank you
it is
okay man
are you importing it ?
btw where is this gonna help you
I have tried different ways. like : import sympy, from sympy import * and now lastly import sympy as sp
Later plotting a graph
damn
form sympy import * should give You your syntax.
My friend used python 3.6 and it worked for him but for me who is using 3.9 seem to have issues running the same code*
im amazed to know that python could plot a graph
import sympy would be used as sympy.symbols()
i thought only r could do it
You`ll be supriced how nice python is in jupyter XD
What you mean about syntex?
what language do you prefer for data science pyhton or r ?
syntax - think "grammar" - how You use stuff.
I am still a noob in python*
run && show me import again, just to be sure
So this should be the correct one?
more or less, but lets not get into semantics here.
Python for sure. I don`t know much about r
i don't really know much about the r tho
Alright. Just type && and run?
i have prior experience in python but my course includes r so
just run the imports "as is" and show it to me.
i was just asking which is reliable the most
I must be really slow but I don`t catch it
I think R is more maths-heavy-applications, but could be wrong here.
Run the imports. Make a screenshot, post it here.
okay
tho*
so you see - "import sympy as sp"
which means all sympy functions need to be predicated by "sp."
Well, that is the last one I used.
I have repliced it with sp
so your "symbols" becomes "sp.symbols"
Aaah
I don`t know why but sometimes my codes do not run. Or well they run but wont be give me the solution. It works sometimes to close jupyer and open it again.
Like now, it kind of works but wont show much
You see that [*] ?
@shrewd grove Do you know why?
that means its yet to be run
So it still kind of running?
look here: the upper field has a [6] - means it finished running.
and its 6th thing ive run in this notebook
the next one has a *, which means it is either running or scheduled to be run.
Alright! thank you for the explanation. Although I `ve been waiting for some time it wont continue* Patient is they key?
can You show me the top code with a * ?
could be it is running for a long time, could be cpu-intensive (hence taking a long time).
Or could be you made a mistake and ended up with an infinite-loop - so it will never finish!
This one?
im not sure what sp.solve does here - but it could take a while.
you see all of them have stars. So the one which is executing is either the top one or one of the sections before Question 3.
The * start even on the first In [*] where the import data is
Seem like python have a hard time importing the functions
It worked all the way down to row 5 and there it stopped. Could it be that I have import some pictures that requires to much of the notebook
Always easier if you show the code ๐ But it could be You are loading too much data. How much you loading ?
isn't it labeled here with "Frequency"?
oh oops sorry i meant x axis
look at what you do to label the y axis.
assuming that you do. I can't actually see all the code, since you did a screenshot.
Not much at all, I would say
I won't look at another screenshot of text. I will only look at screenshots of the actual plots.
Use paste https://paste.pythondiscord.com/
df.groupby(["Year", "Action.Reason"])["Number.of.transactions"].sum().plot(xlabel = "Annual Number of Transactions", title = "Annual Number of Transactions plotted against the reason for the smartcard replacement (Action Reason)", kind = "hist")
it's only one line of code that I was asking them to share.
I just wanted to give you another option if forty could not see all the codes ๐
yea sorry i wasnt thinking striaght lol!
@lapis sequoia read this https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html?highlight=plot
split it into multiple lines, print values after each line.
Worked smoothly but it is this, as we saw earlier who takes a lot of time to solve https://paste.pythondiscord.com/qudafayevo
Yeah, I would expect this to be so. Make sure that Ic is the right value /format.
Thank you for your time๐
No worries ๐
How long would it usually take to train a neural network with 40000 training data and 10,000 testing data with GPU?
there is 2 inputs that can be expressed as a float from 0 to 1, and the model is relatively small to 1 hidden layer + using gtx 1660 ti
Hi all, question about open data sources - I wanted to play around with any nutritional data sources, if there are any open ones that include nutritional information for food, including brands - does such a thing exist? (apologies if this is not the right place to ask)
this is the place to ask. one thing you can do is look for papers that would use the kinds of datasets that you want, and see if their dataset is available, or who you might need to ask to get ahold of it.
Ok thanks for the tip, I will look in to that angle as well.
hello sorry to bother can anyone recomend a good python machine learning course on youtube?
Don't be afraid to ask for help in the appropriate topical channel. Andrew Ng's videos are highly recommended
thanks
Would never guess my codes would be that big that my RAM could not handle it. https://gyazo.com/19121ac38e33d3262c0b4f6969e14b74. A small assignment and this happen. I guess I`ll have to get a new computer just after 3 years...
I know I'm the one who's supposed to know this, but is there a way to make spaCy use the same tokenizer as any huggingface BERT model? If you don't already know how to do this, do not answer, as I have already crawled Google, and we don't need a duplication of efforts.
@mild dirge I found a great resource yesterday - https://keras.io/examples/vision/captcha_ocr/
Ah cool
That does use a recurrent neural network btw
LSTM (or Long-Short term memory) is a recurrent layer
I brute-force-programmed my goal in it
as I was reviewing the data i noticed that the text I am after is in different places
sooo Im assuming this wont trace it.
I thought the bi-directional layers are recurrent.
It uses a bidrectional LSTM layer
oh, true that.
Can someone help me what this thing is
the guide at https://fairlearn.org/v0.7.0/quickstart.html seems pretty straightforward
Is MetricFrame the fairness thing?
What is gm.overall telling us
overall score for a metric, e.g. overall accuracy
Overall fairness score?
Accuracy, in this case.
And then you get the breakdown of the same metric, accuracy, by group
yup, since the metric here is metric=sklearn.metrics.accuracy_score
I thought it has some automatic fairness calculator
10x more cat images, same f1 score, thats a bias problem right?
1000 -> 10000 images
@mild dirge I am trying to understand what is actually happening in my model. Up for a chat sometime ?
I don't use recurrent neural networks that much, so I don't know if I could explain the entire model to you
I just have some theoretical knowledge about them
I wrote up everything until the recurrent part.
And it's pretty late here, so maybe we can talk tomorrow if you still want then
But i guess I am wrong
sooo what I suggest we do, if You want to help me - Im gonna write it up till tommorow and than send it to You
That would be fine too, but again, not that comfortable with RNNs, so probably just ask here, and ping me
And I'll definitely help if I can 
cheers!
hey guys does one of u want to review 60 lines of sloppy code and maybe give me some tips what to improve and how to do so?
https://paste.pythondiscord.com/raqudogemo
- Youre reading the file twice. Once with pandas, once in a classic pythonic-way.
Also "file" should be "filename" or "fname", if I were to comment on variable naming. - Not sure if there is a point to sort the glob in top loop.
- You are calling split multiple times, where You can do something like this:
a, b, c = "a,b,c".split(",")
or:
abc = "a,b,c".split(",")
a = abc[0]
b = abc[1]
... - group_name - dont call so many str's there - "#" is a string anyway. I think it could be an f-string or use join or something.
If You simplify it and take point 8 into account - You will get something like group_name = f"#{iteration} x {foam} x"
And than last loop will not have to check for first char - as it is always '#'. - Line 50 - You are defining a variable just to use it once? Make it inline.
- what is data? dictionary of pandas dataframes ? Same with df_dict. Couldnt You just make a proper pandas setup with one big dataframe ?
- Line 36 - if group_name etc. If there is no group name in df_dict - it will get initialized with df_new, and than it will append df_new - resulting in [df_new, df_new] ?
- "x" if "beh" in treatment else "x" - is always x.
Hey, is it possible to train a neural network with multiple attributes, but use only one or a set of them when using? E.g.:
Training data has columns
string1, string2 ... string10, int1, int2 ... int1p, bool1
Input data has columns
string1, string2 ... string10
And returns a float between 0.0 and 1.0
by "attributes", I think you mean "features". and will the subset of features you want to use at prediction time always be the same?
Yes, features, sorry; AI noob here. Yes, the prediction would always be a set of 10 strings
then those are the only features you should use to train it.
is there no type of AI application where you could do that? Or maybe some way to transform such feature set in a way that a neural network could do that?
can't think of one.
why would you want your model to depend on features that you're never going to have?
Hmm that's not how I was thinking about it, more like those other training features would serve to bias similar inputs
Not sure how to get the idea across ๐ฆ
I'm trying to write an application that takes a set of, say, 10 types of players divided into two teams of 5, and returns a chance of the first composition beating the second
Can you guys recommend some trainable model that will work best on classifying hand/hand gesture?
Hey I have a doubt can we create a ai software which has consciousness ?? I just want yes or no ๐ค๐ค
No.
But also, what even is consciousness?
could someone help me with parsing some specific data from a small dataset? i feel like it's simple but im not familiar with data parsing at all
visualizing how hidden layers change during training ๐คค
Would you guys choose train NNs using clouds or 2080s?
Uhh, depends? If the model fits on my own GPU, and I can still do stuff while it trains, I guess I'd do that. Otherwise I'd see how much cloud compute would cost.
(but I haven't had this dilemma, because I've always had access to an hpc.)
(if you're a student, you might see if your university has one.)
"Provide a detailed analysis of the performance of your model under varying training set sizes. You can present and discuss this via a learning curve."
Is a learning curve a graph between performance and training size?
thanks man โค๏ธ
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rotation_range=40,
shear_range=0.2, zoom_range=0.2,
fill_mode="nearest",
horizontal_flip=True, vertical_flip=False,
)
i = 0
for batch in datagen.flow(np.array([img]), batch_size=100, save_to_dir=folder):
i += 1
if i > 100 : break
```Here I don't understand how the output works. Let's say one image will be augmented and the output will be 100 images.
How do they do it? If I mention rotation=40... will it create 40 copies? for zoom=0.2 will it create let's say another 40 copies? and then take one from each of these until we run out of them?
Pretty sure it just randomly flips, rotates, shears, zooms etc. for every image you load
But the size of the dataset for one epoch is still the same
It is just that all the images are augmented randomly
So that means every epoch uses "different" images
@grave token
ai]
I am confused about these codes. Basically, I am trying using MIDIDDSP to synthesis midi file. However, I couldn't find how to convert the following codes to MIDI. The link of this model is Minimal example
Here are the simple codes they've provided, and I didn't add anything, just follow what they've provided.
from midi_ddsp import synthesize_midi, load_pretrained_model
midi_file = 'ode_to_joy.mid'
# Load pre-trained model
synthesis_generator, expression_generator = load_pretrained_model()
# Synthesize MIDI
output = synthesize_midi(synthesis_generator, expression_generator, midi_file)
# The synthesized audio
synthesized_audio = output['mix_audio']
synthesized_audio
Synthesized_audio return an array, neither a wav nor midi file. Hope someone can tell me how to convert it to audio or midi. Thanks a lot
Synthesis of MIDI with DDSP (https://midi-ddsp.github.io/) - GitHub - magenta/midi-ddsp: Synthesis of MIDI with DDSP (https://midi-ddsp.github.io/)
I have a gan with a dataset that has frames of a show, so there are very many close to duplicate frames
the result at basically every stage of training was severe overfitting and blurriness
is there a way to reduce either of these problems?
can anyone link me to a better image GAN than midjourney?
which is opensource btw
and the GAN style should be artistic just like midjourney
Look up Clever Programming - Build Real Time AI Face Detection with Python
they have a tutorial that discusses something like that
ummm....ok
Hii
he could someone take a look
at my accuracy function
its giving me accuracy of greater than 232%
which is def wrong
probably because you're dividing by len(testset), and I don't even see testset anywhere else
consider doing something like total += len(label), and then your accuracy at each iteration is correct/total (*100 if you want)
ok
ok let me try that
@tidal bough i tried that but im getting a pretty weird accuracy graph
here was my new code
i ran it for ten more epochs
this is what my graphs look like i don't think its right
you're resetting correct every epoch, but total never
so understandably correct/total always goes down
reset total at the same time as correct.
alright thank yu
i just tried that
will update you on the results
thanks so much it worked!
@tidal bough
525/525 [==============================] - 1s 2ms/step - loss: 255.0017 - binary_accuracy: 0.7077 - recall: 0.1676 - precision: 0.2579 - val_loss: 8.5808 - val_binary_accuracy: 0.7693 - val_recall: 0.0252 - val_precision: 0.3750
Epoch 2/10
525/525 [==============================] - 1s 1ms/step - loss: 2.9742 - binary_accuracy: 0.7034 - recall: 0.1598 - precision: 0.2445 - val_loss: 4.5400 - val_binary_accuracy: 0.7710 - val_recall: 0.0031 - val_precision: 0.2000
Epoch 3/10
525/525 [==============================] - 1s 1ms/step - loss: 1.5921 - binary_accuracy: 0.7202 - recall: 0.1266 - precision: 0.2474 - val_loss: 3.2762 - val_binary_accuracy: 0.7712 - val_recall: 0.0031 - val_precision: 0.2143
Epoch 4/10
525/525 [==============================] - 1s 1ms/step - loss: 1.2256 - binary_accuracy: 0.7231 - recall: 0.1167 - precision: 0.2437 - val_loss: 3.4967 - val_binary_accuracy: 0.7717 - val_recall: 0.0021 - val_precision: 0.2000
Epoch 5/10
525/525 [==============================] - 1s 1ms/step - loss: 0.8427 - binary_accuracy: 0.7364 - recall: 0.0787 - precision: 0.2295 - val_loss: 2.7257 - val_binary_accuracy: 0.7726 - val_recall: 0.0010 - val_precision: 0.2500
Epoch 6/10
525/525 [==============================] - 1s 1ms/step - loss: 0.9427 - binary_accuracy: 0.7357 - recall: 0.0618 - precision: 0.1978 - val_loss: 2.3539 - val_binary_accuracy: 0.7726 - val_recall: 0.0010 - val_precision: 0.2500
Epoch 7/10
525/525 [==============================] - 1s 1ms/step - loss: 0.9245 - binary_accuracy: 0.7327 - recall: 0.0546 - precision: 0.1754 - val_loss: 3.5504 - val_binary_accuracy: 0.7724 - val_recall: 0.0010 - val_precision: 0.2000
Epoch 8/10
525/525 [==============================] - 1s 2ms/step - loss: 0.8026 - binary_accuracy: 0.7390 - recall: 0.0434 - precision: 0.1663 - val_loss: 2.5565 - val_binary_accuracy: 0.7724 - val_recall: 0.0010 - val_precision: 0.2000
Epoch 9/10
my model is getting worse with each epoch ?
model structure vvv
layer1 = keras.layers.Dense(15, activation='relu')(inputs)
layer2 = keras.layers.Dense(15, activation='relu')(layer1)
layer3 = keras.layers.Dense(7, activation='relu')(layer2)
outputs = keras.layers.Dense(1, activation='sigmoid')(layer3)
model = keras.Model(inputs, outputs)
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001), loss=keras.losses.binary_crossentropy, metrics=[keras.metrics.binary_accuracy, keras.metrics.Recall(), keras.metrics.Precision()])```
my model is getting worse with each epoch ?
What makes you think that? Loss is dropping and accuracy is rising.
nvm recall is not accuracy
yeah, I mixed recall and accuracy ๐
but why is it going down so slow ?
oh, hmm, that's some weird recall and precision though, these shouldn't be droppng. Not sure how it's possible for them to be dropping along with accuracy rising in fact.
also the model is performing very very bad on test data
yeah
I just cant understand binary classification at all :(
all my binary classifiers dont work at all
Are you doing some sort of anomaly detection? Seems that your model is increasingly only predicting the negative class, which would explain why accuracy is rising yet both precision and recall are dropping
This is probably a problem of class inbalance
no, should the amount of 1, 0 be equal ?
if the model sees 99% false and 1% true examples, it is reasonable to expect that the model will try to minimize its loss by simply predicting false each time
it depends on what data you give to the model for training
ideally, yes, the classes would be balanced as such
but that is rarely the case in practice
1 6636
Name: Default_Payment, dtype: int64```
I should probably remove some of the 0 cases right ?
that would be one way to mitigate the problem
ummm okok, imma try something
you can also try weighting the samples so that the rarer samples are worth more
how can I do that ?
scikit-learn has a utility function to do that
but tbh you have like a 75%/25% distribution which isn't really all that bad
perhaps there is simply not enough information for the model to distinguish properly, so it falls back to predicting the majority class
yeah, Imma search for others approach for the same dataset
huggingface have many different open source products, they have:
- models, place for models, similar to what github is for code, hf hub is for models,
- datasets, place for datasets
- spaces, place for interactive demos
- many open source libraries, like
transformers
I think it's good to have choice?
hello guys, i have a dataset of energy useage, which have datetime and energy consumption, is there any algorithm i should choose to detected outlier?
u mean something like grubbs?
yes, but my dataset contain so much 0. As the machine did not activate 24/7, i thing it may skew the result
got it thx
๐
Guys, who is doing or planning a data science project and would like to tackle it in pairs?
I currently search for someone with similar knowledge to do such a project and improve.
the definitions of algebra 2 and calculus 2 depend on your school/uni
a base level of linear algebra and multivar calculus is needed to understand the basics of gradient descent and back propagation
it also depends on what you mean by "be good at it"
you can overcome many problems by simply being familiar with them and knowing a lot about the tools you use, but ofc you have more flexibility the more in depth knowledge you have (i.e. the more you know about the maths)
Pretty sure I've already seen them ask this question. So I would advise to take a break from it if it get's too intense, if that was the hidden intent of that question.
sure, take a break if it's burning you out. maybe look at applications instead. but also, i won't sugarcoat it: AI/ML does not require math, it IS math, and so you can't avoid it. you'll have to learn some of it sooner or later
Why not, but not sure I have similar knowledge
Hey guys
How r u doing?
I was wondering which system is used the most in AI ?
Windows, Mac or Linux?
And which of them is better?
probably linux, but it doesn't matter all that much from the user perspective. this is because compute clusters usually run linux, and users submit tasks to the clusters. in that sense, it doesn't really matter all that much what you use on your own computer
i was watching some videos on youtube and all the companies where using macs
it just made me to think of this question
Iam working on windows
i want to migrate to one of linux or mac!
which one do u prefer?
i most often work with numpy (which runs on anything) and jax (which only runs well on linux), so i dualboot windows and linux. if you're already on windows, WSL is probably the nicest way of getting the best of windows and linux together
the cluster at the uni where i work runs centos 7, so i use linux to do small local tests of my code before submitting them there
Linux is doable in 2022 and onwards but unless you have a habit of reading documentation, mac/win is usual
you need to make sure your use case 'just works' (e.g. Ubuntu-like) if you plan to use linux unless you want to make it work (whatever distro you choose + your own configuration)
aha
that's partly why i recommend using WSL instead of directly using linux ๐
performance wise I think a configured Linux is actually optimal. But I wouldn't assume it's possible to even reach that configuration without serious effort (also choosing your OS because performance is microoptimisation, which is not recommended)
can we have a mac/linux dualboot?
Not really on the newer ones...I think?
mac is already a unix-like, you may as well stick to just mac
and yeah, to my knowledge only asahi runs on m1/m2, and only so-so at that
I would expect compatibility of macs to improve overtime as M1/M2 silicon gets mature and mac grows market share
is linux unix_based too?
so i assume there will be not much difference between mac and linux!?
there will be a huge difference
how?
if you only use the terminal, you'll find them quite similar, though with moderate differences in the file structure
if you use the desktop environment, mac is gonna be much more comfortable
the linux desktop experience is kinda... rough, let's say
im not talking about the desktop usage
i want to know if they differ in programming or not!
yeah i have heard that!
the short answer is it depends.
https://www.anaconda.com/blog/apple-silicon-transition
no, there's no difference in the programming at all if the language is portable, as python is
the more you rely on hardware, the less likely programming will be 'the same'
other than a few libraries whose functionality depends on the platform, coding python is the same anywhere
well the code is the same, but whether your Mac will run it depends on compatibility
but again this compatibility is being improved all the time
stuff like numpy, tensorflow, pytorch and jax will behave differently on win, mac and linux (and also depending on the version of each)
and compiled-for-mac things probably will pop up more and more
ok guys
really appreciate your guides
thanks for all your answers๐
good luck everybodyโ๏ธ
thanks for ur answer
so I use MIT courses to learn math before learning AI Ill be ok?
sure. idk which other ones people recommend, i would say gilbert strang's linalg for machine learning is pretty good
Finding a data science job and taking interviews and technical interviews while working full time is proving interesting where thereโs little cell signal outside the office ๐
Do you guys just take days off every month?
noted
have any of u guys studied at MIT?
Is it possible to use pretrained weights on another custom model? I have trained one with VGG16 and I have another model made from scratch.
you mean like transfer learning?
Yeah transfer learning but not with the pretrained weights which is imagenet in most other models.
I want to use the weights from my vgg16 model after training and use it on the custom model I made. Is that possible?
On Linux distros that come with package managers it can be easier to get access to certain SDKs / libraries. Installing such things on Windows is a pain.
Some only have Linux instructions / support.
If you are not running your models on your own machine(s) then it does not matter anyhow, the servers will be using Linux with or without you knowing.
If you plan on running on (relatively) small devices like a Raspberry PI then you have to use Linux anyhow (or no OS in the case of smaller devices than that).
you can do transfer learning on whatever you want. it just means you take a pre-trained chunk of whatever network, keep that constant, and append a few more trainable layers to it
Oh alright. Thanks.๐
hi, not sure where to ask this so I ask here. I'm struggling with understanding the difference between apache beam and apache spark (specifically for Google Cloud's Dataflow Vs Dataproc). Could someone help me out here?
Anyone know what could go be going wrong here, really struggling with kaleido on jupyter notebooks
does anyone know why a GAN would be producing blurry images?
is it more likely to do with the dataset or the architecture/hyperparameters?
after 270 epochs it still looks like this
can you compare the log likelihood of two different distribtions? So between a model using a normal distribution and a model using a t-distribution.
I know this is more of a stats question than python, but I figured it was worth an ask here
you can ask stats questions in this channel.
ok cool
Let's say I want to create an image classifier to label different breeds of dogs. MSCOCO has a dog class so transfer learning could be used right? Is there a reason to train such a classifier from scratch instead of using transfer learning?
is a loss of 2.94 okay? my model started off with a loss of 9.1
how are the actual performance metrics?
accuracy is 30%
in evaluation, it's closer to 35
is this single or multiclass? are you doing something where the true negatives are uncountable?
multiclass. im using sparse categorical entropy
and what does the model do? (by the way, this is all information that you should give in your first message about your question.)
model takes an array of 5, 8 bit integers representing characters (so an encoded string) and returns a label. the string is obfuscated, and it tries to find what the un-obfuscated string is (returns an index that can be looked up in an array)
oh wow i just wrote a much better model
sometimes explaining yourself helps you figure it out ๐
You can just ask here. Please do not offer money again. This is a warning
Copy that,
!rule 9
The first step to asking a pandas question is giving a copy-and-pastable sample of the dataframe with print(df.head().to_dict('list')). if it's not text, it's useless.
@lost nimbus please let me know when you have done that.
Done that
Okay. Can I see it?
Do you want me to post here or pm?
here.
{'id': [1, 2, 3, 4, 5], 'fnvalid': [1, 1, 1, 1, 1], 'first_name': ['Lenard', 'Siusan', 'Felipa', 'Morey', 'Jedd'], 'lnvalid': [1, 0, 0, 0, 0], 'last_name': ['Padgham', 'Barhems', 'Figures', 'Barnwell', 'Longmore'], 'email': ['lpadgham0@cdbaby.com', 'sbarhems1@lycos.com', 'ffigures2@wikimedia.org', 'mbarnwell3@exblog.jp', 'jlongmore4@taobao.com'], 'genderval': [1, 1, 1, 1, 1], 'gender': ['Male', 'Female', 'Female', 'Male', 'Male'], 'ip_address': ['240.189.125.212', '27.218.82.162', '227.219.128.88', '39.204.201.15', '24.124.86.219']}
Great. What do you want to do to it?
And thanks for giving the sample. Next time you have a pandas question, remember to give the sample in your first message about the question, so that no one has to waste time asking.
I want to grab specific columns based off of other columns. For instance, I want to grab the Firstname rows based off of the FNvalid row, and Lastname off of the LNValid row. If it is a 1, grab it, if it is a 0 then grab NaN or N/a
can you give an exact example?
or is it just that you want lnvalid == 1 and genderval == 1?
Consider this:
In [3]: df.loc[df['fnvalid'] == 1, 'first_name']
Out[3]:
0 Lenard
1 Siusan
2 Felipa
3 Morey
4 Jedd
Name: first_name, dtype: object
See if you can figure out how to use df.loc[ ] to do your other queries.
For this example, the code would make a df containing:
| First | Last |
1 | Lenard | Padgham
2 | Siusan | NaN
3 | Felipa | NaN
4 | Morey | NaN
5 | Jedd | NaN
df.loc[ ] can take two arguments. the first is a row indexer, and the second is a column indexer. the column indexer is optional.
in df.loc[df['fnvalid'] == 1, 'first_name'], the row indexer is df['fnvalid'] == 1, which means "select rows where fnvalid == 1"
the column indexer is simply the name of the column to select.
@lost nimbus make sense?
Yes it does, I will play with that! Thank you very much
No problem. And that's free ๐ธ
Ive been searching for the solution for a bit and have just been getting frustrated haha
pandas is like its own mini-language embedded within python, in some ways.
It is a really neat utility, I think there is a lot to learn within it
Now is there a way to do multiple columns per 1x df.loc?
Like df.loc[df['fnvalid', 'firstname']['lnvalid', 'lastname'] == 1:
@serene scaffold
that won't work. you'd need to use two conditions and & for and
df.loc[(df['fnvalid'] == 1) & (df['lnvalid'] == 1), ['firstname', 'lastname']]
though you should probably make the valid columns bools
but in either case, you can do df.loc[df['fnvalid'] & df['lnvalid'], ['firstname', 'lastname']]
Perfect, thank you again
by the way, if you ever write something that looks like df[ ][ ], it's wrong.
hello guys, i have a question. We standlize the data to 0 to 1 before we training the network, if i use sklearn standlizer, standlizer my trainging data and train the modal. How can i use the model predict the new data?
for example i use 3000 data to train the network, and i would like to predict my newest data, the new data is not standlize.
then it won't really work ๐ the data needs to look like the training data
Does anyone have resources for GANs, mostly for generating realistic images based on some input.
Stuff has been very popular recently
@dusty valve hm?
the implementation would be described in a whitepaper, rather than docs.
I mean do you know any resources I can start with
print(df.head().to_dict('list'))
this is code that you have to run.
where?
your program. wherever the dataframe (df) is.
bro
the point is that I can't help you until I know what the schema of the dataframe is.
okay...
you have to use a text channel.
ok , so i have a column with various outputs
now there are certain output which begins with exact words
i need to find all of them which begins with same words and replace them all with one output
how do i do it using pandas
I can answer this once you give an example of the column in question as text.
suppose my column is [ ab123,ab343,ab77,ab6768621]
column is [ ab123,ab343,ab77,ab6768621 ,as123,fd678]
now i want to replace the ones which begins with ab with YES
means the output will be [YES , YES , YES , YES ,as123,fd678]
this is jst one column in a dataframe
!docs pandas.Series.str.startswith
Series.str.startswith(pat, na=None)```
Test if the start of each string element matches a pattern.
Equivalent to [`str.startswith()`](https://docs.python.org/3/library/stdtypes.html#str.startswith "(in Python v3.10)").
you can use this with .loc
do i have to make a function for this
up to you.
ok
so if i write it this way
df.loc [Series.str.startswith(pat, na=None)]
will it work
no
and how will i replace it with yes
!docs pandas.DataFrame.loc
property DataFrame.loc```
Access a group of rows and columns by label(s) or a boolean array.
`.loc[]` is primarily label based, but may also be used with a boolean array.
Allowed inputs are:
>>> df
max_speed shield
cobra 1 2
viper 4 5
sidewinder 7 8
>>> df.loc[df['shield'] > 35] = 0
>>> df
max_speed shield
cobra 30 10
viper 0 0
sidewinder 0 0
what you're doing is similar. but you're using the startswith method instead of a comparison.
i think there is some problem with this data
'Series' object has no attribute 'startswith'
what does this error mean?
df.loc[df['Remarks'].startswith('?)]')] = "xxx"
looks like you're on the right track
in the REMARKS column some sentences starts with "?)]"
i want to replace all such sentnces into "XXX"
df.loc[df['Remarks'].startswith('?)]'), 'Remarks'] = "xxx"
you also need a column indexer. which I didn't tell you about, so I'm giving it to you for free.
bro it is showing the same issue
you're missing the .str.
where to put iy
I would appreciate it if you didn't call me "bro". It has a negative connotation for me.
df['Remarks'].str.startswith('?)]'
then how should i address you?
actually i am unaware of the rules
no address is required. anyway, try the fix I just provided.
you have to overwrite the old one
i ran all the lines from beginning
Hey @terse jackal!
It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.
Feel free to ask in #community-meta if you think this is a mistake.
Maybe save as pdf?
we don't allow that, so you'll have to send it as a CSV. But I don't think I have time to help with this.
you can wait here and see who else arrives.
the changes have been made in the dataframe for now
you need to save that dataframe to see the changes
i dmd you the file
can you plzz tell me the code
im having a lot of problems while doing data preprocessing can anyone recommend some good reading material or youtube videos for that
As Stelercus guided you, you simply have to save your data frame to see your changes
If you want to save it as a CSV
Otherwise for excel
Is network science used in machine learning/data science?
I don't see a frequently asked questions in the pins :( so I'll just ask what I wanted to know, it's probably very frequently asked so forgive me for the repetitiveness.
[ {
"number": 1, "name": "first name", "location": "first location"
},
{
"number": 2, "name": "second name", "location": "second locatiom"
},
{
"number": 3, "name": "third name", "location": "third location"
}...... so on]
I want to find rows that satisfy a given condition
for example this is what I've been using
testdf.loc[
(testdf['number'] == 115)
|
(testdf['name'] == 'NamE_I_neED-toFiND')
]
But I want the string matching in line (testdf['name'] == 'NamE_I_neED-toFiND') to go through a function that will remove all spaces and underscores. Is that even possible. If not are there any alternatives to this?
and if im asking this question in the wrong channel, what would be a more appropriate channel. Thanks again.
you can either try to piece something together using replace(), or try some regex
I see, lemme try to read up some documentation before asking again. Thanks for your input! I might bother you again though ๐
maybe also maketrans() and translate() could help, i've never used those before though
aight ty ty lemme take a look at them
you can use this: https://pandas.pydata.org/docs/reference/api/pandas.Series.str.replace.html
and replace any unwanted characters with the empty string.
testdf['name'] = testdf['name'].apply(lambda x : normalizeString(x))```
did this, tysm
interesting, ill look into this too, thanks
lambda x: func(x) is the same as func
o I see
so you could have just done testdf['name'] = testdf['name'].apply(normalizeString)
but usually, you want to use whichever solution doesn't involve .apply
is it slow?
ah I plan on making this dataframe once and using it many times over
that's fine. just for future reference ๐
awesome
I've used c and cpp for most of my life, so this thing is completely new to me
thanks for your help though, appreciate it a lot
hope you enjoy not-pointers
I miss them ๐
why
years of using pointers, I feel empty without them
better than feeling empty all the time ๐

it will be okay.
if you like functional languages, you can write pandas in a functional style.
can you direct me to sources where I can learn more about this?
there's this: https://www.kaggle.com/learn/pandas
but when writing pandas code, it's more efficient to have lots of chained method calls, without having lots of variables for intermediate states.
Solve short hands-on challenges to perfect your data manipulation skills.
why is the reason behind that. does the compiler do some optimizations when you chain calls?
no. it's only more efficient for memory.
i guess i should've specified, since there is str.replace/translate and Series.str.replace/translate. you wanna use the Series one, which is pandas built-in, because it does the iteration for you in C
:0
though if you use dask (which is similar to pandas but isn't eagerly executed), that will optimize the computation graph.
I see thats good to know
since Python is interpreted, there's limited room for optimization.
also my first time joining a discord server to ask a programming question. You guys have been very kind to me and I appreciate it a lot โค๏ธ
No problem!
you had a fairly concise and well-written question, which always helps
Iโm glad you feel that way, catgirl enjoyer
always tried to do that after that one time I asked a really dumb question on stackoverflow when I was 13 many years ago and getting ripped apart to shreds ๐ฅฒ
yeah stackoverflow lives up to the memes it spawned. the lesson you learned is valuable, harshness notwithstanding ๐
I think the problem is
- SO is gamified, and questions that make it harder to get points spoil the game for the answerers
- The point of SO is to create a catalogue of questions and answers. Helping the asker is actually of secondary concern.

ea
does anyone know how to use hilbertcurve?
i have an array full of True and False, ik the numbers for it (n=2 p=16) but I don't know how to get the visualization
Does anyone here have experience using Spacy and Jupyter with the Apple M1 chip? I am able to install but every time I try to import the kernel dies.
They have an example for that (https://github.com/galtay/hilbertcurve/blob/main/scripts/make_image_2d.py)
Hello! I can use Python at a fairly good level and have a good command of NumPy, Pandas, matplotlib etc. libraries. But I don't know much about scikit-learn. I am looking for a tutorial that will not tell me everything from the beginning, but will give me the best understanding. I'm open to your suggestions and experiences, thank you!
Hello! Can anyone tell me why is anaconda download not starting on my pc?
dont bother with it sooner or later it will let u down so why not directly go with vscode/pycharm
just work on a random project and read the docs, that's the absolute best way
do u got a good project to start with tensorflow/pytorch tho?
I bought books for it and the examples dont work ๐ฟ
hhhmmm, well if this is your very first project, I'd go with classifying flowers by numerical features (Iris dataset) or predicting house prices by position (boston houses prices dataset), just to get used with your chosen stack's notation etc. Then I'll move to do some image classification on CIFAR or ImageNet, after doing that you should pretty much have all the basics to autonomously learn other stuff (without directly trying to implement a diffusion model from scratch or other rather big stuff, obviously)
got any githubs?
What he said but also consider miniforge or miniconda by terminal
This isnโt a great way to manage installs lol
i needed to install reinstall it for around 20times (which is a horror) and after all that time id stopped after i created a venv
so i wouldnt recommend it
to install lib and creat env yes
U can create an env with pip?
but im not advanced in using it so until now i didnt step to any major walls
inside vscode terminal as stated above
So u create an env every new project?
yes
after i crashed all my workflow in conda im going for the safe way
I wonder how that can occur
Problem is some projects require like 20 libraries installed
Itโs a pain to do that multiple times
yeh tensorflow is a big one ๐
Yeah I donโt like it at all
im too dumb for it yet
book recommendations
None
Official documentation
And google
U donโt need a book for tensorflow
Or PyTorch
Just use the official website
---------------------------------------------------------------------------
SystemExit Traceback (most recent call last)
Cell In [5], line 22
18 parser.add_argument('--seed', type=int, default=1, metavar='S',
19 help='random seed (default: 1)')
20 parser.add_argument('--log-interval', type=int, default=10, metavar='N',
21 help='how many batches to wait before logging training status')
---> 22 args = parser.parse_args()```
links for what?
i made a pretty accurate model, but it's too large to upload to github so i need to use a less accurate one :(
sadge moment
still, not bad
Do any of you host your deployed models online as part of a portfolio if so do you manage it for free or how much does it cost to keep up? Thank you for any insights!!
i have this code that attempts to compress all .h5 files ```py
def compress():
'''Compress data in all .h5 files'''
for i in glob.glob('./*.h5'):
l = gzip.compress(open(i, 'rb').read(),compresslevel=9)
open(i, 'wb').write(l)
def decompress():
'''Decompress data in all .h5 files'''
for i in glob.glob('./*.h5'):
l = gzip.decompress(open(i, 'rb').read())
open(i, 'wb').write(l)
print('compressing weights files...')
compress()
exit()``` it doesn't work in google colab and crashes it
it says that the kernel restarted on the logs
not a major problem, i can compress them locally
but just weird
I get that too when trying to go through dataset loaded with keras dataset_from_directory.
I would use the
"for x,y in datset:"
And then Colab would stop after a while saying I used up all the system memory resource and try ugrading to colab+
?
How are we supposed to decode this
hey should I use pytorch or tensor flow?
@wooden sail can u succinclty explain the difference between just 'PCA' and SVD?
I somehow need to explain this in like 5 lines or less
not good enough at maffs to just not try to write everytrhing ever written about them
people are tending towards pytorch these days.
Hi guys. Now, I'm learning about GAN with PyTorch and I have a problem like this. How to fix this error?
oh ok, actually i don't know python yet, i know node js, and i don't know if i should learn python because I want to do ethical hacking, machine learning, web scraping... should I learn python or just keep using node for all that
Python
ethical hacking is part of cybersecurity, and you kind of have to pick between cybersecurity or AI. They're too broad for one to have an effective career doing both.
I don't want to learn that for a career, it's just for fun
why?
it takes a long time to learn enough to accomplish anything substantial with machine learning. that isn't to suggest that you shouldn't learn it for fun, but it's important to know what you're getting into.
yeah yeah i know, although i might use it in an app to recommend you stuff
so it's not just for fun
so should I use python?
Python is the most popular language for AI. For other areas of programming, Python is one of several options.
ok and for the backend of websites / mobile apps, node js or python
It depends on what you are interested in?
Engineering ==> TensorFlow
Academia / Research ==> JAX or PyTorch
Guys at DeepMind uses JAX now. JAX is quite an interesting framework especially if you already are familiar with PyTorch.
At the end of the day, tools are tools so pick one and don't waste much time contemplating which one to learn or not.
NB: It's good to know at least 2 Deep Learning frameworks so you don't become overly dependent on one. However, you have to start by learning one first, then you can come back later to learn another one (if it interests you)
ok that's really helpful thanks!
I'm not into software development or mobile app but I think this is subjective.
In my country, this combo is more popular
Node.js (JavaScript) or Django (Python) ==> Backend
Flutter ==> Mobile App
oh, well i know react and node js and now I'm learning react native for mobile apps
hi @odd meteor ๐
Hi Pope Stelercus ๐, good morning. I trust you're doing great
I'm just fabulous! Also I'm a wizard now. Or at least whatever Midjourney thinks "green wizard reading a book with a storm in the background" is ๐
Hahaha wizard of Ox or wizard of U.S? Imma have to peep that on stable diffusion and see what it comes up with ๐
Hello! I have another issue I need guidance on finding the answer to.
I have two dataframes containing different information, I need to compare the dataframes to eachother based off of their corresponding 'ID' Column which has a different name, and merge to the other dataframe.
As long as those columns are set as the index for the two DataFrames, any operations that align rows will do so on those values.
What do you need to compare, exactly? And are you sure you don't just want to merge the favorite color column into the larger df?
Hello again @serene scaffold !
The specific problem I am having with my bigger project is that the data needs to be compared by specific IDs, I would imagine index would be fine, but the two dataframes I am matching are different lengths so there will be some missing data.
The goal is to get the Color Pref into the first DF and match the values to the correct ID
So, this has nothing to do with "comparison". What you're talking about is called "merging" in pandas. In SQL, it's called joining
if the smaller df is missing an employee id, are you okay with that employee having a NaN value, or do you just not want to include that employee at all?
Still getting an error. Im fine with it being a NaN error
Is there a way that I can pull only one column? The other dataframe on my project has thousands of columns I dont want to merge
Or would this solution work: Grab the id and other column I care about, add it to another df, then merge that way
Remember to never say that you "got an error" without giving the error message.
im trying to make a text recognizer. most tutorials use opencv and tesseract, but im having issues with tesseract on mac. is there an alternative that I can use?
Please give all the code in that cell as text (no screenshots) for us to continue
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
import pandas as pd
import numpy as np
df = pd.read_csv("C:/Users/jonah/OneDrive/Documents/test1.csv")
df1 = pd.read_csv("C:/Users/jonah/OneDrive/Documents/test2.csv", usecols={'employeeid',')
df.merge(df1, left_on='id', right_on='employeeid')
print(df)
No errors, just didnt merge as expected
import pandas as pd
import numpy as np
df = pd.read_csv("C:/Users/jonah/OneDrive/Documents/test1.csv")
df1 = pd.read_csv("C:/Users/jonah/OneDrive/Documents/test2.csv")
print(df.columns)
print(df1.columns)
Please run this and give the text as text (no screenshots).
Index(['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address',
'Favorite Color'],
dtype='object')
Index(['employeeid', 'colorpref'], dtype='object')
okay, now do pd.merge(df, df1, left_on='id', right_on='employeeid')
!docs pandas.merge
pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)```
Merge DataFrame or named Series objects with a database-style join.
A named Series object is treated as a DataFrame with a single named column.
The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.
Warning
If both key columns contain rows where the key is a null value, those rows will be matched against each other. This is different from usual SQL join behaviour and can lead to unexpected results.
this won't retain rows that don't have a match in both dataframes. if you want that, you need to include how='outer'
"full" is the same as "outer". but for pandas, you have to say "outer".
import pandas as pd
import numpy as np
df = pd.read_csv("C:/Users/jonah/OneDrive/Documents/test1.csv")
df1 = pd.read_csv("C:/Users/jonah/OneDrive/Documents/test2.csv", usecols={'employeeid','colorpref'})
df.merge(df1, how='outer', left_on='id', right_on='employeeid')
print(df)
still not getting exp result, its not merging
Wouldnt let me paste text
!paste
Pasting large amounts of code
If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
!paste
get rid of usecols={'employeeid','colorpref'} for now.
read what the bot said.
@lost nimbus drag both CSV files into this chat.
@lost nimbus it worked when I did it https://paste.pythondiscord.com/goduriripe
you can also do it like this: https://paste.pythondiscord.com/maviyaqeco
so in any statement like "for" or "if" if I try to do something before a "break" statement why does break get executed first instead of my statements?
this is a general python question. see #โ๏ฝhow-to-get-help. also, remember to give an exact example.
@serene scaffold I wonder why mine wasnt working
@serene scaffold Got it to work with your fix, I did the first example, you are a legend
Not sure. You might check that numbers weren't being parsed as strings, or something stupid like that.
Which you can check with df.dtypes
stel is def a legend. still waiting for his substack newsletter and/or youtube series so i can lowkey subscribe
i will also accept podcast episodes or conference talks

@serene scaffold @misty flint Iโll be your first patreon sub lmao
What do people expect out of something like a data science substack?
I guess I already feel like I'm perpetually behind jsut trying to keep up with lucidrains repos and Yannic Kilcher YouTube
guy help, i create a simple NNs to do binary classification, and ready X and y, y is a dataframe contain only 0 and 1. However the predicted result are lower than 0.07
'''
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(64, input_shape=[4], activation='relu'),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=28)
class MyCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if logs.get('accuracy') > 0.95:
self.model.stop_training = True
myCallBack = MyCallback()
model.fit(X_train, y_train, epochs=50, verbose=1, validation_data=(X_test, y_test), callbacks=[myCallBack])
My expected result is there are about 3% of them prediected result is > 0.5
Here is my predicted result hist
to upload on github, something that you could try, is publishing it as a software release instead of as code, or something that many do is simply to put it on drive or mega and link it in the README.md
it's very likely that your dataset is unbalanced (e.g. 80% zeros and just 20% ones)
oh, yea, it's unbalanced
basically your model is going like this: "hhmmmm, well, since 97% of the data is zero, who is going to notice if I cheat a bit and output zero each time, hehe"
unfortunately the only real thing that you could do is to cut out most of the examples, so you have an equal number of zeros and ones, but I think it's not going to be too much of an issue (based on the plot you sent I can see that your dataset is absolutely massive)
but the 0 and 1 are represent two classes
can one tell me why the pytorch example is resulting in an error?
---------------------------------------------------------------------------
SystemExit Traceback (most recent call last)
Cell In [1], line 24
20 parser.add_argument('--render', action='store_true',
21 help='render the environment')
22 parser.add_argument('--log-interval', type=int, default=10, metavar='N',
23 help='interval between training status logs (default: 10)')
---> 24 args = parser.parse_args()
27 env = gym.make('CartPole-v0')
28 env.seed(args.seed)
File ~\AppData\Local\Programs\Python\Python310\lib\argparse.py:1829, in ArgumentParser.parse_args(self, args, namespace)
1827 if argv:
1828 msg = _('unrecognized arguments: %s')
-> 1829 self.error(msg % ' '.join(argv))
1830 return args
File ~\AppData\Local\Programs\Python\Python310\lib\argparse.py:2583, in ArgumentParser.error(self, message)
2581 self.print_usage(_sys.stderr)
2582 args = {'prog': self.prog, 'message': message}
-> 2583 self.exit(2, _('%(prog)s: error: %(message)s\n') % args)
File ~\AppData\Local\Programs\Python\Python310\lib\argparse.py:2570, in ArgumentParser.exit(self, status, message)
2568 if message:
2569 self._print_message(message, _sys.stderr)
-> 2570 _sys.exit(status)
SystemExit: 2```
this is the full traceback i recieve
pytorch:examples:ac
Hi I have this dataframe and it has bigrams in a list, corresponding to the "score".
I want to keep track of total number of occurrences that a particular bigram (one element in the list of bigrams) occurs in total for all rows with score 5. Whats a good way of doing this?
Is there a way to do this without manually keeping track of count, because I have alot of rows with score 5, so runtime might take minutes with I manually for loop it
yea
that's why you should cut some ones
because you passed to it an unknown arg
but why is it unknown i just used the example directly out of pytorch Q_Q
https://paste.pythondiscord.com/avukopapes
this is pytorch:examples:ac
yea but there is no issue with the code
error occurs in line 24
it is working properly, there is no error there
but why do i face an error when i execute?
as I said you passed an unknown arg
what do u mean by that?
its my first time using pytorch
how do i pass args to it
which args does it need
no no, sorry, I explained myself wrong, it has nothing to do with pytorch, it's standard python
i think its not loading example from pytorch cause all defined arguments are empty is that what u meant?
no, they have a default
it works properly, the code is fine, you just gave it something it wasn't expecting
yea but what does this have to do with your error?
the problem is not in the code, but in the args
no it's not
this is making the problems i guess
nope
traceback tells me so