#data-science-and-ml

1 messages ยท Page 14 of 1

mild dirge
#

And then classify each image separately

shrewd grove
#

would it not take longer?

mild dirge
#

Longer than?

shrewd grove
#

longer than running it once and expecting a reasonable output ?

mild dirge
#

How are you planning on running it once and getting that "reasonable output" though?

#

This model won't do that

slender pier
#

H

#

553 days on this server damn

shrewd grove
#

OCR somehow does it.

mild dirge
#

Right, but OCR is not this model ๐Ÿ˜›

shrewd grove
#

Would it be possible to find what an OCR model looks like ?

mild dirge
#

You can read the paper on google tesseract if you are interested

#

There are many other solutions too

#

Some use recurrent neural networks

slender pier
#

@shrewd grove brooo you been 1096 days on this server damn

mild dirge
#

One method is just what I said, which is clustering (separating) the letters

#

And classifying each one separately

#

And considering you already are working on a model to classify those letters, that would for sure be the easiest solution, as long as the letters are separated quite well already in the image

shrewd grove
#

how different is classifying two letters to one letter ?

mild dirge
#

Well the problem is not just getting the letters right

#

Because your model will already give you a higher value for 'a' and for 'f' if those are in the image

#

But it is knowing what order they are in

#

And dealing with for example duplicates

#

And maybe you would also like context to play a role, and you want the process to more likely classify a letter as e after its found 'airplan' and not an f

shrewd grove
#

I could obtain a list of "allowed" words.

#

which would turn it into classification problem - but the list is huge.

mild dirge
#

Ideally you want to teach your model what letter is likely to follow from some other letters it has found

#

And not store it in some database

shrewd grove
#

Can I approach these problems one-at-a-time, rather than try solutions clumped into a huge model ?

mild dirge
#

Sure you could

#

But that means you would have to extract info yourself, and apply it when creating a model

#

Instead of making a model that can learn this info itself

#

It might be good to look at some tutorials on making OCR and seeing what options there are

#

Or maybe even read some papers if you are into that

shrewd grove
#

I tried googling and most of my findings could be summed up with "train pre-trained model from this library".

mild dirge
#

In that case look up what the model is

#

What stuff you don't understand, and look up those parts

#

Modern models can be quite complex though, so for good results, it might get a bit complicated

#

You will have to understand stuff like attention and transformers and recurrent neural network

shrewd grove
#

I wish there was a simple model, that I could learn from.

mild dirge
#

Well, you can be creative yourself as well

#

One idea would to f.e. try and slide a window over the image

#

Then classify each window

#

So for an image like this you would get stuff like 'aaaaaoaaassssssssdddbddddddfffffff'

#

And then try to make something from that

shrewd grove
#

is it not what convolution does already ?

mild dirge
#

Yeah, a convolutional layer also slides a window across an image

#

It does something different with each window though

#

It sums the product of the window with the kernel

mild dirge
shrewd grove
#

Im getting fixated on an idea, but not sure how to implement.

#

assuming i have an image (matrice) of x * y.

#

If it slided a window 1*y, finding patterns.

#

and I would get (somehow) a matrice of x*1 filled with patterns.

#

than it would simply be consuming that... into letters?

mild dirge
#

That is indeed a good idea, because it already exists ๐Ÿ˜›

#

It would probably have to be a recursive neural network though

#

Since then you don't just use each slice itself to classify the slice

#

But also data from previous slices

shrewd grove
#

image -> [a_1, a_2, a_3, s_1, s_2 ... ] -> "asdf" ?

#

assuming that letters are in constant places and always look the same - no need for recursion ?

mild dirge
#

hmm, you mean each slice is classified itself?

shrewd grove
#

yes.

#

and than "grouped".

mild dirge
#

hmm

shrewd grove
#

so if you got a begging a letter "a", followed by a middle of letter "a", finished with an end of letter "a" - it is likely an "a".

mild dirge
#

Alright

#

but take this example

#

What letter are you seeing (letter is black, bg white)

shrewd grove
#

could be... D, P, B, K, L, E

#

and possibly a few more.

mild dirge
#

So then it would be kinda hard right

#

You can't classify a single slice

#

Not enough information

shrewd grove
#

yeah, but You could classify it into "straight line"

#

so a line of patterns makes a letter.

mild dirge
#

Hmm okay, but you are left with a 1d sequence of patterns then

#

Then what

shrewd grove
#

could i not teach my model to recognise sequences of patterns ?

mild dirge
#

Yeah, but sequences are often done with reccurrent neural networks

#

Or transformers with some more modern networks

#

And what would you use to even teach the model the "patterns"

#

For an image with a single letter you know the answer

shrewd grove
#

output - letters.

mild dirge
#

But for a "straight line" you'd have to annotate each slice of the image

shrewd grove
#

Wouldnt it be crammed into one model though ?

#

so effectivly i would not care how does it calls each slices ?

mild dirge
#

Well if you want the model to output stuff like "abc" and it would be 1 convolutional neural network

#

Then you would probably need 26*26*26 output neurons

#

And it would be hardcoded on amount of letters

shrewd grove
#

that is assuming I use one-hot encoding at the output layer

mild dirge
#

Yes

shrewd grove
#

what if i wanted numbers, that i could than round to ascii codes?

mild dirge
#

Well that would be way more output nodes then if 1-hot encoded

#

But I think it might just be good to look into recurrent neural networks before trying to avoid them at all cost

#

They are probably the more simple way of doing this

shrewd grove
#

the problem I see that most examples I encountered are ... either very basic

#

or using all-fancy models, that are not described.

mild dirge
#

Then you would probably need to educate yourself a bit, maybe read a book on neural networks

shrewd grove
#

is it my research skills or google-based tutorials are off ?

mild dirge
#

The creators of pytorch have a good one on deep neural networks, but they don't go into recurrent I think

shrewd grove
#

A book it is than. Any recommendations ?

mild dirge
shrewd grove
#

I wouldnt mind creating a few dozens models just to learn.

mild dirge
#

The "quick and easy" tutorials can also sometimes/often contain misleading or just wrong information

shrewd grove
#

but most tutorials call for classification which I (apologies) found not-interesting, really.

shrewd grove
mild dirge
#

hmm yeah..

#

But that might also not be the solution

#

As you would end up with a model that has 480 million parameters without knowing why it has so many f.e. ๐Ÿ˜›

shrewd grove
#

I have some security/programming background, so bruteforce is usually the solution.

mild dirge
#

Well not in data science and machine learning

#

Just a bunch of math to begin with

#

And once you understand the basics you can learn about the perceptron/linear regression and other more simple methods

#

And build your way to complex models

#

There is (unfortunately) just a lot of theory to go through before you can really understand what you are doing with the models

shrewd grove
#

which is a problem, as most Machine Learning is to me a black-box.

mild dirge
#

Right

shrewd grove
#

You put some data in, You might or might not get a nice model.

mild dirge
#

And that works if someone has already made a model that is easy to use that you can grab

shrewd grove
#

And no way to validate mid-results.

mild dirge
#

Jup

#

And you can honestly get a lot done without understanding anything about ML, but once you get stuck, you get stuck

#

And for this project, this might be a bit of a wall

shrewd grove
#

What would you recommend than? Is there a book that would give me the basics?

mild dirge
#

Imo the models you are trying to use at this moment are quite complex already for someone who might not have that much experience with ML

#

Preferably you already have a bit of experience with calculus and linear algebra

#

Then you also want to get into probability theory/statistics

shrewd grove
#

I can calculate simple probabilities, not really touching on normal distribution, as it requires integrals.

mild dirge
#

You don't need to calculate it by hand, but just getting the intuition is pretty important

#

But it depends on how in-depth you want to go, I'm also just a student, I don't have all the answers as I'm still learning too

#

But I have read some books that I thought were pretty useful on some topics

shrewd grove
#

I think what im looking for is "Machine Learning for dummies". As if "This is a convolution layer. Input is this, weights are constant - what do you expect to get out?"

#

so than I can build models with an intuition of "oh, i expect these data to be in a certain range"

#

but perhaps im thinking wrong and the black-boxness of ML prevents this approach

iron basalt
# shrewd grove Im getting fixated on an idea, but not sure how to implement.

Sliding windows to classify sequences is a thing, e.g. to recognize objects in vision. And it's what humans already do via saccading (they also use motion detection (optical flow) from the jumps). But there needs to be some memory used to link the slices together as a sequence, often done via an RNN, but there are other options too (RNNs are good at some things, and not at others), you may also use RNNs in combination with other methods, they can help with short term stuff (e.g. one jump from one slice to another).

mild dirge
desert oar
mild dirge
#

It's good to get a bit of a feel from each field to get an idea that is more intuitive for you specifically

#

It also covers some more basic concepts useful for ML

#

think about accuracy/precision/recall etc.

#

And loss, and gradients and stuff

iron basalt
#

*Humans can't do convolutions, it's not biologically plausible, but they can do some very similar stuff (saccading, tiled receptive fields (but without shared weights)).

#

However, convolutions work really well on computers.

shrewd grove
iron basalt
shrewd grove
mild dirge
#

Haven't read that one

#

But I really liked this book

#

I kinda read some of these books in my free-time, some are pretty good reads with illustrations and stuff

#

But you do have to like this type of stuff if you want to read through these type of books

#

Maybe there is some "demo ;)" out there where you can just read some of the first pages and see if you like the book

desert oar
#

Real-World Machine Learning has a free trial

#

there's also the fastai course which people seem to enjoy

modest onyx
#

hello friends, I found something superrrrr interesting when I was experimenting around with neural networks

#

so I was trying to visualize what a neural network was doing by visualizing each layer as some set of transformations to the input space

#

affine_transformation + nonlinearity in each layer

#

it's impossible to visualze this for very high dimentional spaces, so I limited myself to 2 neurons

#

btw this was inspired from seeing andrej karpathy's convnet.js simulation which does exactly that

mild dirge
#

Got some cool images then?

modest onyx
#

so when I tried separating two circles using only two neurons, nothing happened!!!!!!

#

it didn't budge and it didn't separate no matter how many layers I chained

#

(the slider shows the order of the transformation)

#

but by just moving 1 dimention up, the separation almost became trivial, super easy for the network

#

I don't know i just found this super interesting so I thought I'd share it with ya'll

desert oar
iron basalt
desert oar
#

congrats, you've just rediscovered not only the field of "kernel methods" (which were a very big deal at one point) but also fundamentally why constructing higher-order features is useful for learning arbitrarily complicated highly-nonlinear problems. you should feel legitimately proud of figuring this out!

modest onyx
#

thank you so much but all I did was reimplement what I saw in convnet.js by Andrej Karpathy and mess around with it ๐Ÿ˜‚

modest onyx
modest onyx
#

it's suuuuper interesting though

mild dirge
#

Yeah cool animation too ok_handbutflipped

desert oar
#

oh yeah the animation shows exactly what i was describing. "pulling" the circles through the extra dimension lets you separate them freely.

modest onyx
#

Thanks it's for a video I'm working on where I wanted to eventually talk about the manifold hypothesis ๐Ÿ™

steady basalt
#

anyone know the syntax to make thi sinto a grouped bar chart with 0-9 x axis, values y axis and hue by model?

serene scaffold
steady basalt
#

{'rf': [0.125, 0.145, 0.137, 0.164, 0.195], 'lr': [0.156, 0.168, 0.11, 0.17, 0.2], 'nb': [0.174, 0.189, 0.152, 0.187, 0.208], 'xgb': [0.059, 0.137, 0.11, 0.139, 0.23]}

steady basalt
#

well i have no y axis/hue seperation so ic ant

#

thats what seaborn asks for

serene scaffold
#

how is this different from what you want

steady basalt
#

its EXACTLY what i want but i want it in seaborn to match my other plots

#

how did u do it?

serene scaffold
#

all I did was df.plot.bar(rot=0)...

steady basalt
#

do u know how to make it work i nseaborn?

#

i have to keep the ggplot styling

serene scaffold
#

never used seaborn.

steady basalt
#

cripes

#

my report has to stay consitant so i cant just do that sadly

#

seaborn requires x,y, and hue

serene scaffold
#

what is hue

steady basalt
#

for example,

#
    data=penguins, kind="bar",
    x="species", y="body_mass_g", hue="sex",
    ci="sd", palette="dark", alpha=.6, height=6
)
g.despine(left=True)
g.set_axis_labels("", "Body mass (g)")
g.legend.set_title("")```
#

so for me its hue = titles/first column depending on transpoed or not

#

then idk what is y

#

they have a column for all three

#

categorical

serene scaffold
#

@steady basalt the docs have this df as an example

     total_bill   tip     sex smoker   day    time  size
0         16.99  1.01  Female     No   Sun  Dinner     2
1         10.34  1.66    Male     No   Sun  Dinner     3
2         21.01  3.50    Male     No   Sun  Dinner     3
3         23.68  3.31    Male     No   Sun  Dinner     2
4         24.59  3.61  Female     No   Sun  Dinner     4

and then they do this

sns.barplot(x="day", y="total_bill", hue="sex", data=tips)

and they get this

#

you end up having to do some fucky stuff.

In [38]: df
Out[38]:
      rf     lr     nb    xgb
0  0.125  0.156  0.174  0.059
1  0.145  0.168  0.189  0.137
2  0.137  0.110  0.152  0.110
3  0.164  0.170  0.187  0.139
4  0.195  0.200  0.208  0.230

In [39]: df.reset_index().melt(id_vars='index')
Out[39]:
    index variable  value
0       0       rf  0.125
1       1       rf  0.145
2       2       rf  0.137
3       3       rf  0.164
4       4       rf  0.195
5       0       lr  0.156
6       1       lr  0.168
7       2       lr  0.110
8       3       lr  0.170
9       4       lr  0.200
10      0       nb  0.174
11      1       nb  0.189
12      2       nb  0.152
13      3       nb  0.187
14      4       nb  0.208
15      0      xgb  0.059
16      1      xgb  0.137
17      2      xgb  0.110
18      3      xgb  0.139
19      4      xgb  0.230
#
sns.barplot(hue='variable', x='index', y='value', data=df.reset_index().melt(id_vars='index'))
strong sedge
#
Output exceeds the size limit. Open the full output data in a text editor
Epoch 1/100
6/6 [==============================] - 0s 1ms/step - loss: 6.6317 - accuracy: 0.4778
Epoch 2/100
6/6 [==============================] - 0s 1ms/step - loss: 6.3355 - accuracy: 0.4778
Epoch 3/100
6/6 [==============================] - 0s 1ms/step - loss: 6.0349 - accuracy: 0.4778
Epoch 4/100
6/6 [==============================] - 0s 1ms/step - loss: 5.7364 - accuracy: 0.4778
Epoch 5/100
6/6 [==============================] - 0s 2ms/step - loss: 5.4379 - accuracy: 0.4778
Epoch 6/100
6/6 [==============================] - 0s 2ms/step - loss: 5.1394 - accuracy: 0.4778
Epoch 7/100
6/6 [==============================] - 0s 2ms/step - loss: 4.8425 - accuracy: 0.4778
Epoch 8/100
6/6 [==============================] - 0s 2ms/step - loss: 4.5507 - accuracy: 0.4778
Epoch 9/100
6/6 [==============================] - 0s 2ms/step - loss: 4.2534 - accuracy: 0.4778
Epoch 10/100
6/6 [==============================] - 0s 2ms/step - loss: 3.9574 - accuracy: 0.4778
Epoch 11/100
6/6 [==============================] - 0s 2ms/step - loss: 3.6707 - accuracy: 0.4778
Epoch 12/100
6/6 [==============================] - 0s 2ms/step - loss: 3.3756 - accuracy: 0.4778
Epoch 13/100
...
Epoch 99/100
6/6 [==============================] - 0s 2ms/step - loss: 0.1075 - accuracy: 0.4778
Epoch 100/100
6/6 [==============================] - 0s 2ms/step - loss: 0.1068 - accuracy: 0.4778```
my network isnt improving ?
the model is very very simple 

inputs = keras.Input(1)
outputs = keras.layers.Dense(1, activation='softmax')(inputs)
model = keras.Model(inputs, outputs)
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])```
the data is also simple

data_set = pd.DataFrame()
data_set['x'] = [i for i in range(-100, 100)]
data_set['y'] = [1 if random_sigmoid(i) >= 0.5 else 0 for i in range(-100, 100)]```
steady basalt
#

so u un transposed it

#

then made a cat for model

#

hmm i see very nice

serene scaffold
#

@hollow jetty this server isn't a place for recruitment. sorry

hollow jetty
#

No worries just wanted to also showcase some of my work
I have been working on bit on generative art and nerfs

silent mesa
#

I have a big list of ints and corresponding a and b int values...how do i train a model so that when i give it that list of ints..it gives me a good estimate for the a and b?
also some potential resources for the same?

ancient fog
#

what is the value of yi in this soft margin classification equation for svm

serene scaffold
desert oar
ancient fog
desert oar
#

why should it be?

ancient fog
#

shouldnt that be what we are trying to figure out

#

so if the label is unknown

#

why is it like that

misty flint
#

why not

#

lel

ancient fog
#

wouldnt there be a reason the equation was made that way

iron basalt
strange elbowBOT
rigid talon
#

does anyone here have experience with making twitter scraper bots?

iron basalt
#

!rule 5

arctic wedgeBOT
#

5. Do not provide or request help on projects that may break laws, breach terms of services, or are malicious or inappropriate.

iron basalt
#

(Unless you are using Twitter's official APIs)

rigid talon
iron basalt
rigid talon
#

damn okay

brisk apex
#

son of a.... i spent at least a day trying to figure out why I couldn't connect to S3 in python with hadoop-aws. type of credential provider was the reason ha

barren snow
#

Heyyyy, could someone explain to me about this one. I am new to signal processing, and confused of it. Basically, I want to input my audio and calculate the harmonic distribution. But I don't know what's the correlation between amplitude and harmonic distribution in the following codes.

wooden sail
velvet birch
#

Now is such a graph acceptable? The scatterplot is the column vs target variable plot and the histplot behind it is the distribution for the column

mild dirge
#

It's pretty chaotic with the histogram behind it

#

The scatter plot makes it hard to read the histograms, maybe just make it two separate figures

lyric basin
#

hello does anyone know why cuda can't see my gpu is available i got newest version of cuda toolkit installed and latest drivers for gpu and my gpu is a card that works with cuda

shrewd grove
lyric basin
shrewd grove
#

Have You got a compiler installed ?

lyric basin
#

it's just cuda isn't seeing my gpu as available device

#

import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()

#Additional Info when using cuda
if device.type == 'cuda':
print(torch.cuda.get_device_name(0))

#

this is the code that i'm using to recognize if it's finding it

shrewd grove
#

Have You considered running some cuda samples?

lyric basin
shrewd grove
#

Try it. You will see if its cuda or pytorch

lyric basin
lapis sequoia
#

do you know how i could print out the whole table

shrewd grove
lyric basin
solar umbra
#

hey

#

could one be data scientist without a degree

shrewd grove
lyric basin
#

like runs with cpu

#

but not gpu

shrewd grove
lyric basin
solar umbra
#

could one be data scientist without a degree
?

#

i asked

shrewd grove
solar umbra
#

what do you specialize in?

shrewd grove
#

But generally, if I were to guess: Yes, but... people are getting abused this way.

#

Im a programmer.

solar umbra
#

python?

quaint loom
shrewd grove
#

cpp.

#

anyhow - I have seen people programming without a degree.

#

However, they were usually underpaid and abused.

solar umbra
#

only python programmer?

#

like do you specialize in python only?

shrewd grove
#

C++

#

but i do some python too, if necessary

solar umbra
#

damn this language is really well paid

shrewd grove
#

it is not - comparable where I live, really.

solar umbra
#

where do you live?

shrewd grove
solar umbra
#

wait seriously

#

so should i pursue for bacholers

quaint loom
shrewd grove
shrewd grove
#

so if you have a CS degree, dont do another one - one is enough. If You dont, Id advise to get one.

quaint loom
solar umbra
#

theta_change is not defined bro

quaint loom
shrewd grove
solar umbra
quaint loom
quaint loom
solar umbra
#

damn

shrewd grove
quaint loom
solar umbra
#

im amazed to know that python could plot a graph

shrewd grove
#

import sympy would be used as sympy.symbols()

solar umbra
#

i thought only r could do it

quaint loom
quaint loom
solar umbra
#

what language do you prefer for data science pyhton or r ?

shrewd grove
#

syntax - think "grammar" - how You use stuff.

quaint loom
#

I am still a noob in python*

shrewd grove
quaint loom
shrewd grove
quaint loom
solar umbra
#

i don't really know much about the r tho

quaint loom
solar umbra
#

i have prior experience in python but my course includes r so

shrewd grove
solar umbra
#

i was just asking which is reliable the most

quaint loom
shrewd grove
quaint loom
shrewd grove
quaint loom
#

I am sorry XD

#

I have not done the change yet to

shrewd grove
#

okay

quaint loom
#

tho*

shrewd grove
#

so you see - "import sympy as sp"

#

which means all sympy functions need to be predicated by "sp."

quaint loom
quaint loom
shrewd grove
#

so your "symbols" becomes "sp.symbols"

quaint loom
#

I don`t know why but sometimes my codes do not run. Or well they run but wont be give me the solution. It works sometimes to close jupyer and open it again.

#

Like now, it kind of works but wont show much

shrewd grove
#

You see that [*] ?

quaint loom
#

@shrewd grove Do you know why?

shrewd grove
#

that means its yet to be run

quaint loom
shrewd grove
#

look here: the upper field has a [6] - means it finished running.

#

and its 6th thing ive run in this notebook

#

the next one has a *, which means it is either running or scheduled to be run.

quaint loom
shrewd grove
#

can You show me the top code with a * ?

#

could be it is running for a long time, could be cpu-intensive (hence taking a long time).

#

Or could be you made a mistake and ended up with an infinite-loop - so it will never finish!

quaint loom
shrewd grove
#

im not sure what sp.solve does here - but it could take a while.

shrewd grove
#

you see all of them have stars. So the one which is executing is either the top one or one of the sections before Question 3.

quaint loom
shrewd grove
#

Right...

quaint loom
#

Seem like python have a hard time importing the functions

shrewd grove
#

click this one.

#

and start from the beggining, one by one.

quaint loom
lapis sequoia
#

does anyone know how i can label the y axis when i make a histogram using pandas?

shrewd grove
#

Always easier if you show the code ๐Ÿ™‚ But it could be You are loading too much data. How much you loading ?

serene scaffold
lapis sequoia
#

oh oops sorry i meant x axis

serene scaffold
lapis sequoia
#

yea i did that but it doesnt show up

serene scaffold
#

assuming that you do. I can't actually see all the code, since you did a screenshot.

serene scaffold
#

I won't look at another screenshot of text. I will only look at screenshots of the actual plots.

lapis sequoia
serene scaffold
quaint loom
lapis sequoia
#

yea sorry i wasnt thinking striaght lol!

serene scaffold
shrewd grove
quaint loom
shrewd grove
quaint loom
shrewd grove
#

No worries ๐Ÿ™‚

red bane
#

How long would it usually take to train a neural network with 40000 training data and 10,000 testing data with GPU?

mild dirge
#

It depends on the size of the model and data

#

And gpu ๐Ÿ˜›

red bane
#

there is 2 inputs that can be expressed as a float from 0 to 1, and the model is relatively small to 1 hidden layer + using gtx 1660 ti

upbeat hedge
#

Hi all, question about open data sources - I wanted to play around with any nutritional data sources, if there are any open ones that include nutritional information for food, including brands - does such a thing exist? (apologies if this is not the right place to ask)

serene scaffold
upbeat hedge
#

Ok thanks for the tip, I will look in to that angle as well.

copper dagger
#

hello sorry to bother can anyone recomend a good python machine learning course on youtube?

serene scaffold
quaint loom
serene scaffold
#

I know I'm the one who's supposed to know this, but is there a way to make spaCy use the same tokenizer as any huggingface BERT model? If you don't already know how to do this, do not answer, as I have already crawled Google, and we don't need a duplication of efforts.

shrewd grove
mild dirge
#

Ah cool

#

That does use a recurrent neural network btw

#

LSTM (or Long-Short term memory) is a recurrent layer

shrewd grove
#

I brute-force-programmed my goal in it

#

as I was reviewing the data i noticed that the text I am after is in different places

#

sooo Im assuming this wont trace it.

#

I thought the bi-directional layers are recurrent.

mild dirge
#

It uses a bidrectional LSTM layer

shrewd grove
#

oh, true that.

lapis sequoia
#

Can someone help me what this thing is

tidal bough
lapis sequoia
#

What is gm.overall telling us

tidal bough
#

overall score for a metric, e.g. overall accuracy

lapis sequoia
lapis sequoia
tidal bough
#

Accuracy, in this case.

#

And then you get the breakdown of the same metric, accuracy, by group

lapis sequoia
#

Oh just normal accuracy

#

Oh

#

Okay.

#

It's just normal stuff

#

I get it now

#

Cool

tidal bough
lapis sequoia
#

I thought it has some automatic fairness calculator

lapis sequoia
#

10x more cat images, same f1 score, thats a bias problem right?

#

1000 -> 10000 images

shrewd grove
#

@mild dirge I am trying to understand what is actually happening in my model. Up for a chat sometime ?

mild dirge
#

I don't use recurrent neural networks that much, so I don't know if I could explain the entire model to you

#

I just have some theoretical knowledge about them

shrewd grove
#

I wrote up everything until the recurrent part.

mild dirge
#

And it's pretty late here, so maybe we can talk tomorrow if you still want then

shrewd grove
#

But i guess I am wrong

#

sooo what I suggest we do, if You want to help me - Im gonna write it up till tommorow and than send it to You

mild dirge
#

That would be fine too, but again, not that comfortable with RNNs, so probably just ask here, and ping me

#

And I'll definitely help if I can ok_handbutflipped

shrewd grove
#

cheers!

young granite
shrewd grove
#
  1. Youre reading the file twice. Once with pandas, once in a classic pythonic-way.
    Also "file" should be "filename" or "fname", if I were to comment on variable naming.
  2. Not sure if there is a point to sort the glob in top loop.
  3. You are calling split multiple times, where You can do something like this:
    a, b, c = "a,b,c".split(",")
    or:
    abc = "a,b,c".split(",")
    a = abc[0]
    b = abc[1]
    ...
  4. group_name - dont call so many str's there - "#" is a string anyway. I think it could be an f-string or use join or something.
    If You simplify it and take point 8 into account - You will get something like group_name = f"#{iteration} x {foam} x"
    And than last loop will not have to check for first char - as it is always '#'.
  5. Line 50 - You are defining a variable just to use it once? Make it inline.
  6. what is data? dictionary of pandas dataframes ? Same with df_dict. Couldnt You just make a proper pandas setup with one big dataframe ?
  7. Line 36 - if group_name etc. If there is no group name in df_dict - it will get initialized with df_new, and than it will append df_new - resulting in [df_new, df_new] ?
  8. "x" if "beh" in treatment else "x" - is always x.
mossy badger
#

Hey, is it possible to train a neural network with multiple attributes, but use only one or a set of them when using? E.g.:
Training data has columns
string1, string2 ... string10, int1, int2 ... int1p, bool1
Input data has columns
string1, string2 ... string10
And returns a float between 0.0 and 1.0

serene scaffold
mossy badger
#

Yes, features, sorry; AI noob here. Yes, the prediction would always be a set of 10 strings

serene scaffold
mossy badger
#

is there no type of AI application where you could do that? Or maybe some way to transform such feature set in a way that a neural network could do that?

serene scaffold
#

why would you want your model to depend on features that you're never going to have?

mossy badger
#

Hmm that's not how I was thinking about it, more like those other training features would serve to bias similar inputs

#

Not sure how to get the idea across ๐Ÿ˜ฆ

#

I'm trying to write an application that takes a set of, say, 10 types of players divided into two teams of 5, and returns a chance of the first composition beating the second

grave token
#

Can you guys recommend some trainable model that will work best on classifying hand/hand gesture?

sacred lintel
#

Hey I have a doubt can we create a ai software which has consciousness ?? I just want yes or no ๐Ÿค”๐Ÿค”

serene scaffold
#

But also, what even is consciousness?

valid kelp
#

could someone help me with parsing some specific data from a small dataset? i feel like it's simple but im not familiar with data parsing at all

modest onyx
tacit horizon
#

Would you guys choose train NNs using clouds or 2080s?

serene scaffold
#

(but I haven't had this dilemma, because I've always had access to an hpc.)

#

(if you're a student, you might see if your university has one.)

lapis sequoia
#

"Provide a detailed analysis of the performance of your model under varying training set sizes. You can present and discuss this via a learning curve."
Is a learning curve a graph between performance and training size?

grave token
#
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    rotation_range=40, 
    shear_range=0.2, zoom_range=0.2,
    fill_mode="nearest",
    horizontal_flip=True, vertical_flip=False,
)

i = 0
for batch in datagen.flow(np.array([img]), batch_size=100, save_to_dir=folder):
    i += 1
    if i > 100 : break
```Here I don't understand how the output works.  Let's say one image will be augmented and the output will be 100 images. 
How do they do it? If I mention rotation=40... will it create 40 copies? for zoom=0.2 will it create let's say another 40 copies? and then take one from each of these until we run out of them?
mild dirge
#

Pretty sure it just randomly flips, rotates, shears, zooms etc. for every image you load

#

But the size of the dataset for one epoch is still the same

#

It is just that all the images are augmented randomly

#

So that means every epoch uses "different" images

#

@grave token

spare goblet
#

ai]

barren snow
#

I am confused about these codes. Basically, I am trying using MIDIDDSP to synthesis midi file. However, I couldn't find how to convert the following codes to MIDI. The link of this model is Minimal example

Here are the simple codes they've provided, and I didn't add anything, just follow what they've provided.

from midi_ddsp import synthesize_midi, load_pretrained_model

midi_file = 'ode_to_joy.mid'
# Load pre-trained model
synthesis_generator, expression_generator = load_pretrained_model()
# Synthesize MIDI
output = synthesize_midi(synthesis_generator, expression_generator, midi_file)
# The synthesized audio
synthesized_audio = output['mix_audio']
synthesized_audio

Synthesized_audio return an array, neither a wav nor midi file. Hope someone can tell me how to convert it to audio or midi. Thanks a lot

GitHub

Synthesis of MIDI with DDSP (https://midi-ddsp.github.io/) - GitHub - magenta/midi-ddsp: Synthesis of MIDI with DDSP (https://midi-ddsp.github.io/)

plush jungle
#

I have a gan with a dataset that has frames of a show, so there are very many close to duplicate frames

#

the result at basically every stage of training was severe overfitting and blurriness

#

is there a way to reduce either of these problems?

worthy phoenix
#

can anyone link me to a better image GAN than midjourney?

#

which is opensource btw

#

and the GAN style should be artistic just like midjourney

night turret
#

they have a tutorial that discusses something like that

worthy phoenix
#

ummm....ok

empty light
#

Hii

grand canyon
#

he could someone take a look

#

at my accuracy function

#

its giving me accuracy of greater than 232%

#

which is def wrong

tidal bough
grand canyon
#

test set exists

#

its like the test dataset

#

length

tidal bough
#

consider doing something like total += len(label), and then your accuracy at each iteration is correct/total (*100 if you want)

grand canyon
#

ok

#

ok let me try that

#

@tidal bough i tried that but im getting a pretty weird accuracy graph

#

here was my new code

#

i ran it for ten more epochs

#

this is what my graphs look like i don't think its right

tidal bough
#

you're resetting correct every epoch, but total never

#

so understandably correct/total always goes down

#

reset total at the same time as correct.

grand canyon
#

alright thank yu

#

i just tried that

#

will update you on the results

#

thanks so much it worked!

#

@tidal bough

strong sedge
#
525/525 [==============================] - 1s 2ms/step - loss: 255.0017 - binary_accuracy: 0.7077 - recall: 0.1676 - precision: 0.2579 - val_loss: 8.5808 - val_binary_accuracy: 0.7693 - val_recall: 0.0252 - val_precision: 0.3750
Epoch 2/10
525/525 [==============================] - 1s 1ms/step - loss: 2.9742 - binary_accuracy: 0.7034 - recall: 0.1598 - precision: 0.2445 - val_loss: 4.5400 - val_binary_accuracy: 0.7710 - val_recall: 0.0031 - val_precision: 0.2000
Epoch 3/10
525/525 [==============================] - 1s 1ms/step - loss: 1.5921 - binary_accuracy: 0.7202 - recall: 0.1266 - precision: 0.2474 - val_loss: 3.2762 - val_binary_accuracy: 0.7712 - val_recall: 0.0031 - val_precision: 0.2143
Epoch 4/10
525/525 [==============================] - 1s 1ms/step - loss: 1.2256 - binary_accuracy: 0.7231 - recall: 0.1167 - precision: 0.2437 - val_loss: 3.4967 - val_binary_accuracy: 0.7717 - val_recall: 0.0021 - val_precision: 0.2000
Epoch 5/10
525/525 [==============================] - 1s 1ms/step - loss: 0.8427 - binary_accuracy: 0.7364 - recall: 0.0787 - precision: 0.2295 - val_loss: 2.7257 - val_binary_accuracy: 0.7726 - val_recall: 0.0010 - val_precision: 0.2500
Epoch 6/10
525/525 [==============================] - 1s 1ms/step - loss: 0.9427 - binary_accuracy: 0.7357 - recall: 0.0618 - precision: 0.1978 - val_loss: 2.3539 - val_binary_accuracy: 0.7726 - val_recall: 0.0010 - val_precision: 0.2500
Epoch 7/10
525/525 [==============================] - 1s 1ms/step - loss: 0.9245 - binary_accuracy: 0.7327 - recall: 0.0546 - precision: 0.1754 - val_loss: 3.5504 - val_binary_accuracy: 0.7724 - val_recall: 0.0010 - val_precision: 0.2000
Epoch 8/10
525/525 [==============================] - 1s 2ms/step - loss: 0.8026 - binary_accuracy: 0.7390 - recall: 0.0434 - precision: 0.1663 - val_loss: 2.5565 - val_binary_accuracy: 0.7724 - val_recall: 0.0010 - val_precision: 0.2000
Epoch 9/10

my model is getting worse with each epoch ?

#

model structure vvv

layer1 = keras.layers.Dense(15, activation='relu')(inputs)
layer2 = keras.layers.Dense(15, activation='relu')(layer1)
layer3 = keras.layers.Dense(7, activation='relu')(layer2)
outputs = keras.layers.Dense(1, activation='sigmoid')(layer3)
model = keras.Model(inputs, outputs)
model.compile(optimizer=keras.optimizers.Adam(learning_rate=0.001), loss=keras.losses.binary_crossentropy, metrics=[keras.metrics.binary_accuracy, keras.metrics.Recall(), keras.metrics.Precision()])```
tidal bough
#

my model is getting worse with each epoch ?
What makes you think that? Loss is dropping and accuracy is rising.

strong sedge
#

nvm recall is not accuracy

strong sedge
#

but why is it going down so slow ?

tidal bough
#

oh, hmm, that's some weird recall and precision though, these shouldn't be droppng. Not sure how it's possible for them to be dropping along with accuracy rising in fact.

strong sedge
#

also the model is performing very very bad on test data

strong sedge
#

I just cant understand binary classification at all :(

#

all my binary classifiers dont work at all

hasty grail
#

This is probably a problem of class inbalance

strong sedge
hasty grail
#

if the model sees 99% false and 1% true examples, it is reasonable to expect that the model will try to minimize its loss by simply predicting false each time

hasty grail
#

ideally, yes, the classes would be balanced as such

#

but that is rarely the case in practice

strong sedge
#

I should probably remove some of the 0 cases right ?

hasty grail
#

that would be one way to mitigate the problem

strong sedge
hasty grail
#

you can also try weighting the samples so that the rarer samples are worth more

hasty grail
#

scikit-learn has a utility function to do that

#

but tbh you have like a 75%/25% distribution which isn't really all that bad

#

perhaps there is simply not enough information for the model to distinguish properly, so it falls back to predicting the majority class

strong sedge
#

yeah, Imma search for others approach for the same dataset

tacit basin
#

huggingface have many different open source products, they have:

  • models, place for models, similar to what github is for code, hf hub is for models,
  • datasets, place for datasets
  • spaces, place for interactive demos
  • many open source libraries, like transformers
    I think it's good to have choice?
tacit horizon
#

hello guys, i have a dataset of energy useage, which have datetime and energy consumption, is there any algorithm i should choose to detected outlier?

tacit horizon
#

yes, but my dataset contain so much 0. As the machine did not activate 24/7, i thing it may skew the result

young granite
#

u can drop data aswell

#

for value != 0:

#

got it?

tacit horizon
#

got it thx

young granite
#

๐Ÿ‘

#

Guys, who is doing or planning a data science project and would like to tackle it in pairs?
I currently search for someone with similar knowledge to do such a project and improve.

fiery dust
#

is algebra 2 needed to learn AI and be good at it?

#

question goes for calculus 2 also

wooden sail
#

the definitions of algebra 2 and calculus 2 depend on your school/uni

#

a base level of linear algebra and multivar calculus is needed to understand the basics of gradient descent and back propagation

#

it also depends on what you mean by "be good at it"

#

you can overcome many problems by simply being familiar with them and knowing a lot about the tools you use, but ofc you have more flexibility the more in depth knowledge you have (i.e. the more you know about the maths)

unique flame
#

Pretty sure I've already seen them ask this question. So I would advise to take a break from it if it get's too intense, if that was the hidden intent of that question.

wooden sail
#

sure, take a break if it's burning you out. maybe look at applications instead. but also, i won't sugarcoat it: AI/ML does not require math, it IS math, and so you can't avoid it. you'll have to learn some of it sooner or later

errant lake
full galleon
#

Hey guys

#

How r u doing?

#

I was wondering which system is used the most in AI ?

#

Windows, Mac or Linux?

#

And which of them is better?

wooden sail
#

probably linux, but it doesn't matter all that much from the user perspective. this is because compute clusters usually run linux, and users submit tasks to the clusters. in that sense, it doesn't really matter all that much what you use on your own computer

full galleon
#

i was watching some videos on youtube and all the companies where using macs

#

it just made me to think of this question

#

Iam working on windows

#

i want to migrate to one of linux or mac!

#

which one do u prefer?

wooden sail
#

i most often work with numpy (which runs on anything) and jax (which only runs well on linux), so i dualboot windows and linux. if you're already on windows, WSL is probably the nicest way of getting the best of windows and linux together

#

the cluster at the uni where i work runs centos 7, so i use linux to do small local tests of my code before submitting them there

full galleon
#

nice๐Ÿ‘

#

thanks for your guidance๐Ÿ™ ๐Ÿ™

shell crest
#

Linux is doable in 2022 and onwards but unless you have a habit of reading documentation, mac/win is usual

#

you need to make sure your use case 'just works' (e.g. Ubuntu-like) if you plan to use linux unless you want to make it work (whatever distro you choose + your own configuration)

full galleon
#

aha

wooden sail
#

that's partly why i recommend using WSL instead of directly using linux ๐Ÿ˜›

shell crest
# full galleon And which of them is better?

performance wise I think a configured Linux is actually optimal. But I wouldn't assume it's possible to even reach that configuration without serious effort (also choosing your OS because performance is microoptimisation, which is not recommended)

full galleon
#

can we have a mac/linux dualboot?

shell crest
#

Not really on the newer ones...I think?

wooden sail
#

mac is already a unix-like, you may as well stick to just mac

#

and yeah, to my knowledge only asahi runs on m1/m2, and only so-so at that

shell crest
#

I would expect compatibility of macs to improve overtime as M1/M2 silicon gets mature and mac grows market share

full galleon
#

is linux unix_based too?

wooden sail
#

certainly

#

most operating systems other than windows are ๐Ÿ˜›

full galleon
#

so i assume there will be not much difference between mac and linux!?

wooden sail
#

there will be a huge difference

full galleon
#

how?

wooden sail
#

if you only use the terminal, you'll find them quite similar, though with moderate differences in the file structure

#

if you use the desktop environment, mac is gonna be much more comfortable

#

the linux desktop experience is kinda... rough, let's say

full galleon
#

im not talking about the desktop usage

#

i want to know if they differ in programming or not!

full galleon
wooden sail
#

no, there's no difference in the programming at all if the language is portable, as python is

shell crest
#

the more you rely on hardware, the less likely programming will be 'the same'

wooden sail
#

other than a few libraries whose functionality depends on the platform, coding python is the same anywhere

shell crest
#

well the code is the same, but whether your Mac will run it depends on compatibility

#

but again this compatibility is being improved all the time

wooden sail
#

stuff like numpy, tensorflow, pytorch and jax will behave differently on win, mac and linux (and also depending on the version of each)

shell crest
#

and compiled-for-mac things probably will pop up more and more

full galleon
#

ok guys

#

really appreciate your guides

#

thanks for all your answers๐Ÿ‘

#

good luck everybodyโœŒ๏ธ

fiery dust
#

so I use MIT courses to learn math before learning AI Ill be ok?

wooden sail
#

sure. idk which other ones people recommend, i would say gilbert strang's linalg for machine learning is pretty good

steady basalt
#

Finding a data science job and taking interviews and technical interviews while working full time is proving interesting where thereโ€™s little cell signal outside the office ๐Ÿ˜‚

#

Do you guys just take days off every month?

full galleon
#

have any of u guys studied at MIT?

earnest widget
#

Is it possible to use pretrained weights on another custom model? I have trained one with VGG16 and I have another model made from scratch.

wooden sail
#

you mean like transfer learning?

earnest widget
#

Yeah transfer learning but not with the pretrained weights which is imagenet in most other models.

#

I want to use the weights from my vgg16 model after training and use it on the custom model I made. Is that possible?

iron basalt
#

Some only have Linux instructions / support.

#

If you are not running your models on your own machine(s) then it does not matter anyhow, the servers will be using Linux with or without you knowing.

#

If you plan on running on (relatively) small devices like a Raspberry PI then you have to use Linux anyhow (or no OS in the case of smaller devices than that).

wooden sail
earnest widget
vestal saffron
#

hi, not sure where to ask this so I ask here. I'm struggling with understanding the difference between apache beam and apache spark (specifically for Google Cloud's Dataflow Vs Dataproc). Could someone help me out here?

haughty topaz
#

Anyone know what could go be going wrong here, really struggling with kaleido on jupyter notebooks

plush jungle
#

does anyone know why a GAN would be producing blurry images?

#

is it more likely to do with the dataset or the architecture/hyperparameters?

#

after 270 epochs it still looks like this

sand sandal
#

can you compare the log likelihood of two different distribtions? So between a model using a normal distribution and a model using a t-distribution.

#

I know this is more of a stats question than python, but I figured it was worth an ask here

serene scaffold
sand sandal
unique flame
dusty valve
#

is a loss of 2.94 okay? my model started off with a loss of 9.1

serene scaffold
dusty valve
#

in evaluation, it's closer to 35

serene scaffold
dusty valve
serene scaffold
dusty valve
#

oh wow i just wrote a much better model

serene scaffold
serene scaffold
#

You can just ask here. Please do not offer money again. This is a warning

lost nimbus
#

Copy that,

serene scaffold
#

!rule 9

arctic wedgeBOT
#

9. Do not offer or ask for paid work of any kind.

serene scaffold
#

The first step to asking a pandas question is giving a copy-and-pastable sample of the dataframe with print(df.head().to_dict('list')). if it's not text, it's useless.

#

@lost nimbus please let me know when you have done that.

lost nimbus
#

Done that

serene scaffold
#

Okay. Can I see it?

lost nimbus
#

Do you want me to post here or pm?

serene scaffold
#

here.

lost nimbus
#

{'id': [1, 2, 3, 4, 5], 'fnvalid': [1, 1, 1, 1, 1], 'first_name': ['Lenard', 'Siusan', 'Felipa', 'Morey', 'Jedd'], 'lnvalid': [1, 0, 0, 0, 0], 'last_name': ['Padgham', 'Barhems', 'Figures', 'Barnwell', 'Longmore'], 'email': ['lpadgham0@cdbaby.com', 'sbarhems1@lycos.com', 'ffigures2@wikimedia.org', 'mbarnwell3@exblog.jp', 'jlongmore4@taobao.com'], 'genderval': [1, 1, 1, 1, 1], 'gender': ['Male', 'Female', 'Female', 'Male', 'Male'], 'ip_address': ['240.189.125.212', '27.218.82.162', '227.219.128.88', '39.204.201.15', '24.124.86.219']}

serene scaffold
#

Great. What do you want to do to it?

#

And thanks for giving the sample. Next time you have a pandas question, remember to give the sample in your first message about the question, so that no one has to waste time asking.

lost nimbus
#

I want to grab specific columns based off of other columns. For instance, I want to grab the Firstname rows based off of the FNvalid row, and Lastname off of the LNValid row. If it is a 1, grab it, if it is a 0 then grab NaN or N/a

serene scaffold
#

or is it just that you want lnvalid == 1 and genderval == 1?

#

Consider this:

In [3]: df.loc[df['fnvalid'] == 1, 'first_name']
Out[3]:
0    Lenard
1    Siusan
2    Felipa
3     Morey
4      Jedd
Name: first_name, dtype: object
#

See if you can figure out how to use df.loc[ ] to do your other queries.

lost nimbus
#

For this example, the code would make a df containing:

| First | Last |

1 | Lenard | Padgham
2 | Siusan | NaN
3 | Felipa | NaN
4 | Morey | NaN
5 | Jedd | NaN

serene scaffold
#

df.loc[ ] can take two arguments. the first is a row indexer, and the second is a column indexer. the column indexer is optional.

#

in df.loc[df['fnvalid'] == 1, 'first_name'], the row indexer is df['fnvalid'] == 1, which means "select rows where fnvalid == 1"

#

the column indexer is simply the name of the column to select.

#

@lost nimbus make sense?

lost nimbus
#

Yes it does, I will play with that! Thank you very much

serene scaffold
lost nimbus
#

Ive been searching for the solution for a bit and have just been getting frustrated haha

serene scaffold
lost nimbus
#

Now is there a way to do multiple columns per 1x df.loc?

#

Like df.loc[df['fnvalid', 'firstname']['lnvalid', 'lastname'] == 1:

#

@serene scaffold

serene scaffold
#
df.loc[(df['fnvalid'] == 1) & (df['lnvalid'] == 1), ['firstname', 'lastname']]
#

though you should probably make the valid columns bools

#

but in either case, you can do df.loc[df['fnvalid'] & df['lnvalid'], ['firstname', 'lastname']]

lost nimbus
#

Perfect, thank you again

serene scaffold
tacit horizon
#

hello guys, i have a question. We standlize the data to 0 to 1 before we training the network, if i use sklearn standlizer, standlizer my trainging data and train the modal. How can i use the model predict the new data?

#

for example i use 3000 data to train the network, and i would like to predict my newest data, the new data is not standlize.

wooden sail
#

then it won't really work ๐Ÿ˜› the data needs to look like the training data

small ferry
#

Does anyone have resources for GANs, mostly for generating realistic images based on some input.

dusty valve
small ferry
#

@dusty valve hm?

dusty valve
#

Check midjourney and dalle docs/code

#

If they exist

small ferry
#

i dont think DALL-E has docs for its implementation

#

neither mid journey

serene scaffold
small ferry
#

I mean do you know any resources I can start with

terse jackal
#

print(df.head().to_dict('list'))

serene scaffold
terse jackal
serene scaffold
terse jackal
#

bro

serene scaffold
#

the point is that I can't help you until I know what the schema of the dataframe is.

terse jackal
#

my problem is dofferent

#

i need to explain it

serene scaffold
#

okay...

terse jackal
#

i dnt have access to speak

#

how do i explain it

serene scaffold
#

you have to use a text channel.

terse jackal
#

ok , so i have a column with various outputs

#

now there are certain output which begins with exact words

#

i need to find all of them which begins with same words and replace them all with one output

#

how do i do it using pandas

serene scaffold
#

I can answer this once you give an example of the column in question as text.

terse jackal
#

suppose my column is [ ab123,ab343,ab77,ab6768621]

#

column is [ ab123,ab343,ab77,ab6768621 ,as123,fd678]

#

now i want to replace the ones which begins with ab with YES

#

means the output will be [YES , YES , YES , YES ,as123,fd678]

#

this is jst one column in a dataframe

serene scaffold
#

!docs pandas.Series.str.startswith

arctic wedgeBOT
#

Series.str.startswith(pat, na=None)```
Test if the start of each string element matches a pattern.

Equivalent to [`str.startswith()`](https://docs.python.org/3/library/stdtypes.html#str.startswith "(in Python v3.10)").
serene scaffold
#

you can use this with .loc

terse jackal
#

do i have to make a function for this

serene scaffold
#

up to you.

terse jackal
#

ok

#

so if i write it this way

#

df.loc [Series.str.startswith(pat, na=None)]

#

will it work

serene scaffold
#

no

terse jackal
#

and how will i replace it with yes

serene scaffold
#

!docs pandas.DataFrame.loc

arctic wedgeBOT
#

property DataFrame.loc```
Access a group of rows and columns by label(s) or a boolean array.

`.loc[]` is primarily label based, but may also be used with a boolean array.

Allowed inputs are:
serene scaffold
#
>>> df
            max_speed  shield
cobra               1       2
viper               4       5
sidewinder          7       8

>>> df.loc[df['shield'] > 35] = 0
>>> df
            max_speed  shield
cobra              30      10
viper               0       0
sidewinder          0       0
#

what you're doing is similar. but you're using the startswith method instead of a comparison.

terse jackal
#

thank you

terse jackal
#

'Series' object has no attribute 'startswith'

#

what does this error mean?

small ferry
#

how are you implementing it

#

show the code

terse jackal
#

df.loc[df['Remarks'].startswith('?)]')] = "xxx"

serene scaffold
terse jackal
#

in the REMARKS column some sentences starts with "?)]"

#

i want to replace all such sentnces into "XXX"

serene scaffold
#
df.loc[df['Remarks'].startswith('?)]'), 'Remarks'] = "xxx"
#

you also need a column indexer. which I didn't tell you about, so I'm giving it to you for free.

terse jackal
#

bro it is showing the same issue

serene scaffold
#

you're missing the .str.

terse jackal
#

where to put iy

serene scaffold
#

I would appreciate it if you didn't call me "bro". It has a negative connotation for me.

#

df['Remarks'].str.startswith('?)]'

terse jackal
#

actually i am unaware of the rules

serene scaffold
terse jackal
#

ok

#

the code has run

#

but no change in the csv file

serene scaffold
terse jackal
#

i ran all the lines from beginning

arctic wedgeBOT
#

Hey @terse jackal!

It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.

Feel free to ask in #community-meta if you think this is a mistake.

terse jackal
#

i want to send an xlsx file

#

how do i do that

dusty valve
#

Maybe save as pdf?

serene scaffold
#

we don't allow that, so you'll have to send it as a CSV. But I don't think I have time to help with this.

terse jackal
#

ok

#

anybody else?

serene scaffold
#

you can wait here and see who else arrives.

small ferry
#

the changes have been made in the dataframe for now

#

you need to save that dataframe to see the changes

terse jackal
#

can you plzz tell me the code

velvet turtle
#

im having a lot of problems while doing data preprocessing can anyone recommend some good reading material or youtube videos for that

small ferry
#

As Stelercus guided you, you simply have to save your data frame to see your changes

#

If you want to save it as a CSV

#

Otherwise for excel

quiet kayak
#

Is network science used in machine learning/data science?

hardy kernel
#

I don't see a frequently asked questions in the pins :( so I'll just ask what I wanted to know, it's probably very frequently asked so forgive me for the repetitiveness.

[ {
    "number": 1, "name": "first name", "location": "first location"
  },
  {
    "number": 2, "name": "second name", "location": "second locatiom"
  },
  {
    "number": 3, "name": "third name", "location": "third location"
  }...... so on]

I want to find rows that satisfy a given condition

for example this is what I've been using

testdf.loc[
    (testdf['number'] == 115) 
    |
    (testdf['name'] == 'NamE_I_neED-toFiND')
    ]

But I want the string matching in line (testdf['name'] == 'NamE_I_neED-toFiND') to go through a function that will remove all spaces and underscores. Is that even possible. If not are there any alternatives to this?

and if im asking this question in the wrong channel, what would be a more appropriate channel. Thanks again.

wooden sail
hardy kernel
#

I see, lemme try to read up some documentation before asking again. Thanks for your input! I might bother you again though ๐Ÿ˜„

wooden sail
#

maybe also maketrans() and translate() could help, i've never used those before though

hardy kernel
#

aight ty ty lemme take a look at them

serene scaffold
hardy kernel
hardy kernel
serene scaffold
hardy kernel
#

o I see

serene scaffold
#

so you could have just done testdf['name'] = testdf['name'].apply(normalizeString)

#

but usually, you want to use whichever solution doesn't involve .apply

hardy kernel
#

is it slow?

serene scaffold
#

it is

#

it's the same as writing a for loop that calls the function iteratively

hardy kernel
#

ah I plan on making this dataframe once and using it many times over

serene scaffold
#

that's fine. just for future reference ๐Ÿ˜„

hardy kernel
#

awesome

#

I've used c and cpp for most of my life, so this thing is completely new to me

#

thanks for your help though, appreciate it a lot

serene scaffold
#

hope you enjoy not-pointers

hardy kernel
#

I miss them ๐Ÿ˜”

serene scaffold
#

why

hardy kernel
#

years of using pointers, I feel empty without them

serene scaffold
#

better than feeling empty all the time ๐Ÿ˜„

hardy kernel
serene scaffold
#

it will be okay.

#

if you like functional languages, you can write pandas in a functional style.

hardy kernel
#

can you direct me to sources where I can learn more about this?

serene scaffold
#

there's this: https://www.kaggle.com/learn/pandas

but when writing pandas code, it's more efficient to have lots of chained method calls, without having lots of variables for intermediate states.

hardy kernel
#

why is the reason behind that. does the compiler do some optimizations when you chain calls?

serene scaffold
#

no. it's only more efficient for memory.

wooden sail
#

i guess i should've specified, since there is str.replace/translate and Series.str.replace/translate. you wanna use the Series one, which is pandas built-in, because it does the iteration for you in C

hardy kernel
#

:0

serene scaffold
#

though if you use dask (which is similar to pandas but isn't eagerly executed), that will optimize the computation graph.

hardy kernel
#

I see thats good to know

serene scaffold
#

since Python is interpreted, there's limited room for optimization.

hardy kernel
#

also my first time joining a discord server to ask a programming question. You guys have been very kind to me and I appreciate it a lot โค๏ธ

serene scaffold
#

No problem!

wooden sail
#

you had a fairly concise and well-written question, which always helps

steady basalt
hardy kernel
wooden sail
#

yeah stackoverflow lives up to the memes it spawned. the lesson you learned is valuable, harshness notwithstanding ๐Ÿ˜›

serene scaffold
#

I think the problem is

  1. SO is gamified, and questions that make it harder to get points spoil the game for the answerers
  2. The point of SO is to create a catalogue of questions and answers. Helping the asker is actually of secondary concern.
hardy kernel
lapis sequoia
#

Sterlclus

#

You removed rainbow pfp ๐Ÿฅฒ

spare goblet
#

ea

frosty raft
#

does anyone know how to use hilbertcurve?
i have an array full of True and False, ik the numbers for it (n=2 p=16) but I don't know how to get the visualization

analog kestrel
#

Does anyone here have experience using Spacy and Jupyter with the Apple M1 chip? I am able to install but every time I try to import the kernel dies.

cloud sand
lavish crypt
#

Hello! I can use Python at a fairly good level and have a good command of NumPy, Pandas, matplotlib etc. libraries. But I don't know much about scikit-learn. I am looking for a tutorial that will not tell me everything from the beginning, but will give me the best understanding. I'm open to your suggestions and experiences, thank you!

plain copper
#

Hello! Can anyone tell me why is anaconda download not starting on my pc?

young granite
cloud sand
young granite
cloud sand
# young granite do u got a good project to start with tensorflow/pytorch tho? I bought books for...

hhhmmm, well if this is your very first project, I'd go with classifying flowers by numerical features (Iris dataset) or predicting house prices by position (boston houses prices dataset), just to get used with your chosen stack's notation etc. Then I'll move to do some image classification on CIFAR or ImageNet, after doing that you should pretty much have all the basics to autonomously learn other stuff (without directly trying to implement a diffusion model from scratch or other rather big stuff, obviously)

steady basalt
steady basalt
young granite
#

so i wouldnt recommend it

steady basalt
#

I donโ€™t use anaconda either but condas good

#

Do u use pip for everything

young granite
steady basalt
#

U can create an env with pip?

young granite
#

but im not advanced in using it so until now i didnt step to any major walls

young granite
steady basalt
#

So u create an env every new project?

young granite
#

yes

steady basalt
#

Bruh I cudnt

#

I just have one env with my main installs and run through most times

young granite
#

after i crashed all my workflow in conda im going for the safe way

steady basalt
#

I wonder how that can occur

young granite
#

i dunno

#

was real pain

steady basalt
#

Problem is some projects require like 20 libraries installed

#

Itโ€™s a pain to do that multiple times

young granite
#

yeh tensorflow is a big one ๐Ÿ˜„

steady basalt
#

Yeah I donโ€™t like it at all

young granite
#

im too dumb for it yet

steady basalt
#

Itโ€™s unreliable as fk

#

PyTorch better

young granite
#

book recommendations

steady basalt
#

None

#

Official documentation

#

And google

#

U donโ€™t need a book for tensorflow

#

Or PyTorch

young granite
#

in need some github for first touchy touches

#

to check logic

steady basalt
#

Just use the official website

young granite
# steady basalt Just use the official website
---------------------------------------------------------------------------
SystemExit                                Traceback (most recent call last)
Cell In [5], line 22
     18 parser.add_argument('--seed', type=int, default=1, metavar='S',
     19                     help='random seed (default: 1)')
     20 parser.add_argument('--log-interval', type=int, default=10, metavar='N',
     21                     help='how many batches to wait before logging training status')
---> 22 args = parser.parse_args()```
cloud sand
dusty valve
#

i made a pretty accurate model, but it's too large to upload to github so i need to use a less accurate one :(

#

sadge moment

#

still, not bad

solar yew
#

Do any of you host your deployed models online as part of a portfolio if so do you manage it for free or how much does it cost to keep up? Thank you for any insights!!

dusty valve
#

i have this code that attempts to compress all .h5 files ```py
def compress():
'''Compress data in all .h5 files'''
for i in glob.glob('./*.h5'):
l = gzip.compress(open(i, 'rb').read(),compresslevel=9)
open(i, 'wb').write(l)

def decompress():
'''Decompress data in all .h5 files'''
for i in glob.glob('./*.h5'):
l = gzip.decompress(open(i, 'rb').read())
open(i, 'wb').write(l)

print('compressing weights files...')
compress()
exit()``` it doesn't work in google colab and crashes it

#

it says that the kernel restarted on the logs

#

not a major problem, i can compress them locally

#

but just weird

unique flame
#

I get that too when trying to go through dataset loaded with keras dataset_from_directory.
I would use the
"for x,y in datset:"
And then Colab would stop after a while saying I used up all the system memory resource and try ugrading to colab+

steady basalt
#

How are we supposed to decode this

junior sequoia
#

hey should I use pytorch or tensor flow?

steady basalt
#

@wooden sail can u succinclty explain the difference between just 'PCA' and SVD?

#

I somehow need to explain this in like 5 lines or less

#

not good enough at maffs to just not try to write everytrhing ever written about them

serene scaffold
bold timber
#

Hi guys. Now, I'm learning about GAN with PyTorch and I have a problem like this. How to fix this error?

junior sequoia
steady basalt
#

Python

serene scaffold
junior sequoia
junior sequoia
serene scaffold
junior sequoia
#

so it's not just for fun

#

so should I use python?

serene scaffold
junior sequoia
odd meteor
# junior sequoia hey should I use pytorch or tensor flow?

It depends on what you are interested in?

Engineering ==> TensorFlow
Academia / Research ==> JAX or PyTorch

Guys at DeepMind uses JAX now. JAX is quite an interesting framework especially if you already are familiar with PyTorch.

At the end of the day, tools are tools so pick one and don't waste much time contemplating which one to learn or not.

NB: It's good to know at least 2 Deep Learning frameworks so you don't become overly dependent on one. However, you have to start by learning one first, then you can come back later to learn another one (if it interests you)

junior sequoia
odd meteor
junior sequoia
serene scaffold
#

hi @odd meteor ๐Ÿ’š

odd meteor
serene scaffold
odd meteor
lost nimbus
#

Hello! I have another issue I need guidance on finding the answer to.
I have two dataframes containing different information, I need to compare the dataframes to eachother based off of their corresponding 'ID' Column which has a different name, and merge to the other dataframe.

serene scaffold
#

What do you need to compare, exactly? And are you sure you don't just want to merge the favorite color column into the larger df?

lost nimbus
#

Hello again @serene scaffold !

#

The specific problem I am having with my bigger project is that the data needs to be compared by specific IDs, I would imagine index would be fine, but the two dataframes I am matching are different lengths so there will be some missing data.

#

The goal is to get the Color Pref into the first DF and match the values to the correct ID

serene scaffold
#

if the smaller df is missing an employee id, are you okay with that employee having a NaN value, or do you just not want to include that employee at all?

lost nimbus
#

Still getting an error. Im fine with it being a NaN error

#

Is there a way that I can pull only one column? The other dataframe on my project has thousands of columns I dont want to merge

#

Or would this solution work: Grab the id and other column I care about, add it to another df, then merge that way

serene scaffold
sacred bear
#

im trying to make a text recognizer. most tutorials use opencv and tesseract, but im having issues with tesseract on mac. is there an alternative that I can use?

serene scaffold
#

Please give all the code in that cell as text (no screenshots) for us to continue

#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

lost nimbus
#
import pandas as pd
import numpy as np

df = pd.read_csv("C:/Users/jonah/OneDrive/Documents/test1.csv")
df1 = pd.read_csv("C:/Users/jonah/OneDrive/Documents/test2.csv", usecols={'employeeid',')

df.merge(df1, left_on='id', right_on='employeeid')

print(df)
lost nimbus
serene scaffold
#
import pandas as pd
import numpy as np

df = pd.read_csv("C:/Users/jonah/OneDrive/Documents/test1.csv")
df1 = pd.read_csv("C:/Users/jonah/OneDrive/Documents/test2.csv")

print(df.columns)
print(df1.columns)

Please run this and give the text as text (no screenshots).

lost nimbus
#

Index(['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address',
'Favorite Color'],
dtype='object')
Index(['employeeid', 'colorpref'], dtype='object')

serene scaffold
#

okay, now do pd.merge(df, df1, left_on='id', right_on='employeeid')

#

!docs pandas.merge

arctic wedgeBOT
#

pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)```
Merge DataFrame or named Series objects with a database-style join.

A named Series object is treated as a DataFrame with a single named column.

The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes *will be ignored*. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on. When performing a cross merge, no column specifications to merge on are allowed.

Warning

If both key columns contain rows where the key is a null value, those rows will be matched against each other. This is different from usual SQL join behaviour and can lead to unexpected results.
serene scaffold
#

this won't retain rows that don't have a match in both dataframes. if you want that, you need to include how='outer'

#

"full" is the same as "outer". but for pandas, you have to say "outer".

lost nimbus
#
import pandas as pd
import numpy as np

df = pd.read_csv("C:/Users/jonah/OneDrive/Documents/test1.csv")
df1 = pd.read_csv("C:/Users/jonah/OneDrive/Documents/test2.csv", usecols={'employeeid','colorpref'})

df.merge(df1, how='outer', left_on='id', right_on='employeeid')

print(df)

#

still not getting exp result, its not merging

#

Wouldnt let me paste text

serene scaffold
arctic wedgeBOT
#

Pasting large amounts of code

If your code is too long to fit in a codeblock in Discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

lost nimbus
#

!paste

serene scaffold
#

get rid of usecols={'employeeid','colorpref'} for now.

serene scaffold
lost nimbus
serene scaffold
#

@lost nimbus drag both CSV files into this chat.

lost nimbus
serene scaffold
scenic tulip
#

so in any statement like "for" or "if" if I try to do something before a "break" statement why does break get executed first instead of my statements?

serene scaffold
scenic tulip
#

oh crap this is the wrong channel for this question sorry lol

#

yeah i gotcha sorry

lost nimbus
#

@serene scaffold I wonder why mine wasnt working

#

@serene scaffold Got it to work with your fix, I did the first example, you are a legend

serene scaffold
#

Which you can check with df.dtypes

misty flint
#

i will also accept podcast episodes or conference talks

lost nimbus
#

@serene scaffold @misty flint Iโ€™ll be your first patreon sub lmao

runic raft
#

What do people expect out of something like a data science substack?

#

I guess I already feel like I'm perpetually behind jsut trying to keep up with lucidrains repos and Yannic Kilcher YouTube

tacit horizon
#

guy help, i create a simple NNs to do binary classification, and ready X and y, y is a dataframe contain only 0 and 1. However the predicted result are lower than 0.07

#

'''

#
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(64, input_shape=[4], activation='relu'),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=28)

class MyCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
        if logs.get('accuracy') > 0.95:
            self.model.stop_training = True

myCallBack = MyCallback()
model.fit(X_train, y_train, epochs=50, verbose=1, validation_data=(X_test, y_test), callbacks=[myCallBack])
#

My expected result is there are about 3% of them prediected result is > 0.5

#

Here is my predicted result hist

cloud sand
cloud sand
cloud sand
#

basically your model is going like this: "hhmmmm, well, since 97% of the data is zero, who is going to notice if I cheat a bit and output zero each time, hehe"

#

unfortunately the only real thing that you could do is to cut out most of the examples, so you have an equal number of zeros and ones, but I think it's not going to be too much of an issue (based on the plot you sent I can see that your dataset is absolutely massive)

tacit horizon
#

but the 0 and 1 are represent two classes

young granite
#

can one tell me why the pytorch example is resulting in an error?

#
---------------------------------------------------------------------------
SystemExit                                Traceback (most recent call last)
Cell In [1], line 24
     20 parser.add_argument('--render', action='store_true',
     21                     help='render the environment')
     22 parser.add_argument('--log-interval', type=int, default=10, metavar='N',
     23                     help='interval between training status logs (default: 10)')
---> 24 args = parser.parse_args()
     27 env = gym.make('CartPole-v0')
     28 env.seed(args.seed)

File ~\AppData\Local\Programs\Python\Python310\lib\argparse.py:1829, in ArgumentParser.parse_args(self, args, namespace)
   1827 if argv:
   1828     msg = _('unrecognized arguments: %s')
-> 1829     self.error(msg % ' '.join(argv))
   1830 return args

File ~\AppData\Local\Programs\Python\Python310\lib\argparse.py:2583, in ArgumentParser.error(self, message)
   2581 self.print_usage(_sys.stderr)
   2582 args = {'prog': self.prog, 'message': message}
-> 2583 self.exit(2, _('%(prog)s: error: %(message)s\n') % args)

File ~\AppData\Local\Programs\Python\Python310\lib\argparse.py:2570, in ArgumentParser.exit(self, status, message)
   2568 if message:
   2569     self._print_message(message, _sys.stderr)
-> 2570 _sys.exit(status)

SystemExit: 2```
this is the full traceback i recieve
#

pytorch:examples:ac

cloud jetty
#

Hi I have this dataframe and it has bigrams in a list, corresponding to the "score".
I want to keep track of total number of occurrences that a particular bigram (one element in the list of bigrams) occurs in total for all rows with score 5. Whats a good way of doing this?

#

Is there a way to do this without manually keeping track of count, because I have alot of rows with score 5, so runtime might take minutes with I manually for loop it

cloud sand
#

that's why you should cut some ones

cloud sand
young granite
cloud sand
#

when you called the command

#

you eithier did pass no arg at all

#

or a wrong one

young granite
cloud sand
#

yea but there is no issue with the code

young granite
#

error occurs in line 24

cloud sand
#

it is working properly, there is no error there

young granite
#

but why do i face an error when i execute?

cloud sand
#

as I said you passed an unknown arg

young granite
#

what do u mean by that?
its my first time using pytorch

#

how do i pass args to it

#

which args does it need

cloud sand
#

no no, sorry, I explained myself wrong, it has nothing to do with pytorch, it's standard python

young granite
#

i think its not loading example from pytorch cause all defined arguments are empty is that what u meant?

cloud sand
#

no, they have a default

#

it works properly, the code is fine, you just gave it something it wasn't expecting

young granite
#

but i did not change anything

#

just straight run the code

cloud sand
#

yea but what does this have to do with your error?

#

the problem is not in the code, but in the args

young granite
#

yeh its line 24

#
args = parser.parse_args()```
cloud sand
#

no it's not

young granite
#

this is making the problems i guess

cloud sand
#

nope

young granite
#

traceback tells me so