#data-science-and-ml
1 messages ยท Page 324 of 1
for generating images I usually dont use GAN based stuff
I kinda do a reverse sparse matrix from the original image and parse it through some autoencoder
like double feature extractor
most of the time its convolution then remap and deconvolution
if somebody help me with what I'm missing here that will be very helpful for me
I have json file
i want to convert it into df
and I'm using json_normalize of pandas
but in doc it said we need common column name in each dict key so values can be read
but in my case it not applicable
what I'm looking is how can I transfer this into df
this is desired output for me
please somebody help what I'm missing here
guys if the testing dataset columns are not in same order as training dataset then will it work ?
No - gotta be aligned
@somber prism there's a method to align them
Ok thanks
is it OK to ask pandas questions here? or is there another channel/discord better suited?
my question is, I have a dataframe with col [a,b,c] and I want to assign df["b_diff"] = df["b"] - df["b"].shift(1) with the condition that c == c, unsure how to accomplish this without a for-loop
what is c?
maybe they want this?
mask = some_condition(df['b'])
df['b_diff'] = None
df.loc[mask, 'b_diff'] = df.loc[mask, 'b'] - df.loc[mask, 'b'].shift(1)
but i think that's the same as
mask = some_condition(df['b'])
df['b_diff'] = None
df.loc[mask, 'b_diff'] = df.loc[mask, 'b'].diff(-1)
@austere swift hi. Remember the time issues while training? well, i think it was the for loops. @desert oar helped me using boolean matrixs and now it takes the same time as before, 40-50 mins
How do I change column names of '0' and '1'?
take a look at the rename command for dataframes
Ok I figured that out
Can u help me with another query?
Why am I not getting the correct dictionary values?
Clearly Wednesday10:00 should have 75 and 791.4
Is it taking mean? Cuz there might be more Wednesday10:00s.
no, it's taking the last value. dictionary keys are unique
so if you try to ingest multiple of the same key, only the last value will "stick"
well, if i were to answer the question you just asked, i'd say do a groupby in pandas, and then take mean of the two columns
however, im realising that the question you asked isn't the question you should be asking.
is Users column showing the "number" of users?
Yes
I was studying about ResNets/ Residual Networks
Am i correct if i think they are just methods that do not hinder performance(when relu is used) but "may" turn out to improve accuracy ,though improvement isnt neccessary....as far as very deep NN is concerned
?
what
Do you do a lot of NLP work?
yes
Do you have experience using BERT?
yes
May I send you a detailed question tonight regarding BERT?
you can ask it here, but I'm not sure if I'll be able to help.
Yes sir
I guess just ping me whenever you've done that
Not tryna disrespect you
can you give me source code python graph search for first choice hill climbing ? please . Thanks
Pardon sir?
google it?
he wants you to give him source code python graph search for first choice hill climbing
no i try find it but no
thank you for the elucidation kind sir, I hadn't understood his request
we can only help your write it, not supply the whole thing
yes thanks
your welcome
as was said to you several times before-- just post the question here
don't ask to ask, and don't dm people
how do I decrease the size of the box that recognizes face? I am using face_recognition
Anyone?
Please
is tesla V4 good?
Anyone got some good resources for quickly grasping and being able to deploy GAN's?
yo I can help you alot with this
add me
the boxes are cv2 tho if i recall correctly
I am working with mat-plot lib. How do I color certain points based on multiple conditions?
for example, if value x<25 I want to be one color, and x>=25 & x<75 and so on...
I think you usually pass an array of colors where the nth element of the color array refers to whatever the nth row of what you're plotting is.
markerfacecolor='blue', markersize=12)```
put this in a for loop with an if condition for each of the conditions.
for x in list_of_points:
if x<25 :
color='green'
etc:
plot(x, y, color=color, linestyle='dashed', marker='o',
markerfacecolor='blue', markersize=12)
i tried..
I see. I feel like that is such an inefficient way to do it.
cause you are plotting each thing point by point right...
you could add it to the end
but doesn't matplotlib plot each point?
you can plot each group of points too i guess.
No, I am using plt.scatter()
to avoid plotting each point individually.
plt = Matplotlib.pyplot btw
did you check out https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html
has a doc with keywords matching your search
Yeah I did. I have a solution that I think works. But, I feel like there is a more clever way to do it.
I will share myne in a sec ๐
have you heard of stylegan2 by the way?
Nope. What is that:?
its a machine vision project
Here is what I did. But, I feel this is an inefficient way to do this, especially because I need to also put change the size based on a different metric lol
x=['red' for x in episode_number<.25]]['episode_number']```
i forgot list comprehension tbh
nothing like this exist?
I do not think so lol. Because color dict is basically a dictionary that holds the location to parts of the dataframe based on the conditions I set.
ahh well I'm out of ideas.
Thanks anyways. I will think about this for a little longer and hopefully figure it out.
goodluck
I think I did it.
@thorn bobcat I used 2 extremely long np.where statements to do it. Let me know if you want to see what I did specifically ๐
Have you tried to create a separate column with values that satisfy your condition accordingly? Like:
if x<=25:
return 1
elif ...:
return 2
After that you can do plt.scatter(...) and pass that new column in the c parameter.
hey, is there any limit to how much a kdeplot from seaborn to make a heatmap can take in data? im trying to make a heatmap using about 150mb of data from a CSV file
and its been stuck on this for almost an hour
just want to be sure its actually doing something
When we have a dataset, what's firstly to use: scaling data or splitting data?
Its possible to do neural network with out numpy liberty?
Yes, though you'd have to write a lot of logic, including backprop which would be painful to write. If you want to do it for self learning go ahead, but for any actual task use a deep learning library.
Can anyone refer me to a link where I can pratice ML problems for written exam for ML intership?
are there any good tutorials/videos/resources on doing segmentation on medical images as i find some but they're not that useful?
i am trying cluster with Variational AutoEncoder.
after using Variational AutoEncoder on minist dataset the output it gives it keeps the similiar digits same place so i am trying to cluster those classes can anybody help??
Does anyone have good reading materials for kerastuner? or maybe have another good hyperparameters tuner library which can used own validation set (not split percentage from training)
Unable to allocate 16.1 GiB for an array with shape (156060, 13835) and data type float64 what does it mean ? can you explain please
code is (data["Phrase"]).apply(lambda x: pd.value_counts(x.split(" "))).sum(axis=0).reset_index()
I want to develop my skills to mbey end up seeing a carreer in this python-ai industry,
Where sohuld I start?
Any tutorials or paths? I have created a thing using haaar cascade for a simple face detection app using openCV following a tutorial,
*Im fluent in normal python btw
i prefer raytune to kerastuner even when working with keras, it's framework agnostic and you set up the training function so you can train/validate however you want
actually, DL == Maths. AI takes even more fields in it
raytune is the worst with TPUs tho
but then everything is, including XLA
only TF works as sweet as butter with TPUs (and Jax ofc)
Hi, what would be an optimal way to restrict assignment of responses to other variables than numresponses?
for tg in branchdata["intents"]:
if tg['tag'] == branchtag:
branchresponses = tg['responses']
for tg in numdata["intents"]:
if tg['tag'] == numtag:
numresponses = tg['responses']
for tg in daydata["intents"]:
if tg['tag'] == daytag:
dayresponses = tg['responses']
for tg in monthdata["intents"]:
if tg['tag'] == monthtag:
monthresponses = tg['responses']
for tg in perioddata["intents"]:
if tg['tag'] == periodtag:
periodresponses = tg['responses']```
That's what I get when I print branchmodel[0], nummodel[0], and so on
For example, my input is 6 in the image below, I want it to assign tg['responses'] to numresponses and ignore the rest
What would be an optimal way to do that? I thought about using those float values but don't think I can, as branch also gets close values (first ai project), thanks in advance (branches are branches of medicine like dermatology etc, has no context with nums but somehow shows such values)
I'm currently developing a simple resnet 50 for a regression which takes in a input image and predicts the target value
but when I try predicting test dataset i always turn out to get the same value
so im not sure, if anyone can point me to some resource that'd be great
matrix = tf.constant([["ok"],["ok"]],tf.string)
this line created a 2x2 matrix? with 2 dimensions in string form?
will try it
kerastuner cannot specifically set own valid set, only can be set from percentage of train
it can
tuner.search(x_train, y_train, epochs=5, validation_data=(x_val, y_val)) just set x_val and y_val
thats from docs
never think this can be works... i will check this later
!rule 6
I try the kerastuner but this is pretty weird. I initialized my model same like the tutorial:
model = Sequential()
model.add(layers.Conv1D(filters=hp.Choice('int', [16, 32, 64]),kernel_size=hp.Choice('int', [3, 5, 7]),activation=tf.nn.leaky_relu,input_shape=[15360,1]))
model.add(layers.Conv1D(filters=hp.Choice('int', [16, 32, 64]),kernel_size=hp.Choice('int', [3, 5, 7]),activation=tf.nn.leaky_relu))
model.add(layers.Conv1D(filters=hp.Choice('int', [16, 32, 64]),kernel_size=hp.Choice('int', [3, 5, 7]),activation=tf.nn.leaky_relu))
model.add(layers.Dropout(rate=hp.Choice('float',[0.3,0.5,0.7])))
model.add(layers.Conv1D(filters=hp.Choice('int', [16, 32, 64]),kernel_size=hp.Choice('int', [3, 5, 7]),activation=tf.nn.leaky_relu))
model.add(layers.Conv1D(filters=hp.Choice('int', [16, 32, 64]),kernel_size=hp.Choice('int', [3, 5, 7]),activation=tf.nn.leaky_relu))
model.add(layers.Dropout(rate=hp.Choice('float',[0.3,0.5,0.7])))
model.add(layers.MaxPool1D(pool_size=hp.Choice('int', [2, 5, 7])))
model.add(layers.Flatten())
model.add(layers.Dense(2,activation='softmax'))
model.compile(loss = 'sparse_categorical_crossentropy',optimizer='Adam', metrics=['accuracy','mse'])
return model```
and call it like this:
...
print("Training with data from ",(30-k)," Before SCA")
trainX,trainY,shape = dataset_maker2(SDDB_tra,NSR_tra,(30-k),(k+1))
print(shape)
print("VALIDATION of minutes: ",(30-k),(k+1))
validX,validY= test_maker2(SDDB_val,NSR_val,(30-k),(k+1))
model = GetModel()
tuner = Hyperband(model,max_epochs=100,objective='val_accuracy',seed=seeds,executions_per_trial=2,directory='/content/gdrive/MyDrive/Paper Reproduce/Weights/',project_name='heartbeat')
# history = model.fit(x=trainX,y=trainY,validation_data=(validX,validY),epochs=100)
print(tuner.search_space_summary())
tuner.search(x_train, y_train, epochs=100, validation_data=(validX,validY),callbacks=[stop_early])
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]
...
it always give error that hp is not initialized
it happens when u copy paste without even reading the code XD
check ur GetModel function
you should put the GetModel function as the first argument for the HyperBand
and u will see why u get that error
don't call the function
my bad... this is my first time using kerastuner or tuner in general
so like this
print("Training with data from ",(30-k)," Before SCA")
trainX,trainY,shape = dataset_maker2(SDDB_tra,NSR_tra,(30-k),(k+1))
print(shape)
print("VALIDATION of minutes: ",(30-k),(k+1))
validX,validY= test_maker2(SDDB_val,NSR_val,(30-k),(k+1))
tuner = Hyperband(GetModel,max_epochs=100,objective='val_accuracy',seed=seeds,executions_per_trial=2,directory='/content/gdrive/MyDrive/Paper Reproduce/Weights/',project_name='heartbeat')
# history = model.fit(x=trainX,y=trainY,validation_data=(validX,validY),epochs=100)
print(tuner.search_space_summary())
tuner.search(x_train, y_train, epochs=100, validation_data=(validX,validY),callbacks=[stop_early])
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]
i hope the tuner work fast since the data are very small...
yep works as feather...
Trial 1 Complete [00h 00m 11s]
val_accuracy: 0.6666666567325592
Best val_accuracy So Far: 0.6666666567325592
Total elapsed time: 00h 00m 11s
Search: Running Trial #2
Hyperparameter |Value |Best Value So Far
int |64 |16
float |0.3 |0.7
tuner/epochs |2 |2
tuner/initial_e...|0 |0
tuner/bracket |4 |4
tuner/round |0 |0
Epoch 1/2
1/1 [==============================] - 54s 54s/step - loss: 0.6934 - accuracy: 0.4737 - mse: 0.2500 - val_loss: 0.6498 - val_accuracy: 0.5000 - val_mse: 0.2731
Epoch 2/2
If your data set is small try bootstrapping it to control the entropy/continuity between points
Idk though, I only use it when it looks like it could have a Gaussian probablility curve
I am currently doing a project on evaluation metrics of nlg. One person in my project committee asked me "do you have any proof that this function can be considered as a metric? does it satisfy the properties of a metric?" I have been searching on the internet for a long time but could not find anything related to properties of evaluation metrics. Can anybody help?
they are talking about a "distance metric" https://en.wikipedia.org/wiki/Metric_(mathematics)#Definition
requirements for a metric:
- "identity"
d(x, y) = 0 โ x == y - "symmetry"
d(x, y) == d(y, x) - "triangle inequality"
d(x, y) โค d(x, z) + d(z, y)
what is nlg? and what are you trying to evaluate specifically?
the first two criteria should be intuitive. the third criterion is a generalization of the idea that the shortest distance between any two points should be a straight line.
the third criterion is where come commonly used ad-hoc similarity scores like cosine similarity fail as proper distance metrics
I'm learning 
fortunately, you don't necessarily need a proper distance metric to evaluate a model, but if the triangle inequality doesn't hold you lose a lot of theoretical guarantees
Thanks for the clarification. But I do not understand how to consider texts as x and y. Is there any proof for BLEU or METEOR?
it's been many years since my undergrad topology class so that's about where my knowledge ends...
I see.
I do not understand how to consider texts as x and y
there are many ways to do it. one easy way is the "bag of words" technique. fancy vector embeddings like word2vec are another common solution.
for word2vec (or other word/ngram-level vector embeddings), you would take the average across all the individual word vectors in the text
but that is for words. BLEU gives a value for generated text given a reference text
what is BLEU?
@serene scaffold i'm surprised you didn't learn this stuff about distance metrics in your NLP work!
now you know ๐ง
google suggests that BLEU is a machine translation performance score?
Oh it is basically a metric that says how close a generated sentence/text is to a reference sentence/text
https://aclanthology.org/P02-1040.pdf i don't see any proof of the distance metric properties in this paper
i'm just skimming it. the expression doesn't look that complicated, maybe it's "obvious" to the real big math brain people
then again, people use non-valid metrics as similarity scores all the time. again, cosine similarity is a good example
That is the problem. I have reading a lot of metric related papers and not once they talk about any kind of properties. Now suddenly this professor wants me to "prove" that the metric i am considering is actually metric
if you're trying to use something like clustering, you should definitely prove that it's valid
Thanks for the help anyways. I can finally start somewhere
otherwise the algorithm might not be valid
proving those properties is a common exercise in undergrad topology classes
I shall finish my post grad tomorow. And I never had topology
if you're finishing tomorrow, it sounds like someone else's problem ๐
I have presentation. That person is going to ask the same stupid question again. I'm just preparing.
the answer is "i don't know, but the field seems to be comfortable using it as a general practice"
unless you want to stay up all night trying to make one person happy about one minor point
"So you can not even prove that what you are using is actually a metric? This invalidates the entire project."
does it?
what are you doing with it?
if you're just using it as a goodness score, who cares? accuracy isn't a distance metric either, neither is cosine similarity which people use all the time
if you're using it in a clustering algorithm, then yes you have issues and someone should have thought of this before the last day
That is the point I have to convey somehow politely. Idk best of luck to me.
i think jaccard similarity also isn't a proper distance metric
I am working with data-to-text generation models. Not a single paper on the related field talks about these properties.
i'd say this:
- "goodness of fit" scores don't need to be strictly valid distance metrics, because we usually just care about "bigger is better" and don't need to do operations on them that depend on the theoretical guarantees of distance metrics
- people use "similarity" scores like cosine similarity all the time, which aren't valid distance metrics
- the ML translation seems very comfortable using BLEU on a regular basis, so i am following standard practice in the field
- the validity of BLEU as a distance metric would be important in clustering or other explicitly distance-based algorithms (e.g. anything that relies on the triangle inequality being true), but we aren't doing that here, so i don't think we should be worried. but it might be worth proving or disproving, to support (or rule out) that use case.
assuming I learned stuff
guys i have one doubt and i want you guys opinion , if i find correlated features, what threshold should i choose to drop it . so for eg if most of the features are in range from 0.1 - 0.4 corr and i find one 0.6 for eg should i drop whichever feature that is above 0.5 ?
@somber prism whether or not collinearity is a problem depends entirely on your model
what is your model?
and what is the task you're trying to do with the model?
can you elaborate little more on that
classification
binary clf
what is the model? logistic regression? guessing the same value every time? rolling dice?
how many features? what are the features? how many data points?
don't force people to interrogate you in order to get enough information to help...
Hey please does anyone know how to code a generate case of this Linear programming using pulp or scipy?
Anyone have experience with color correction?
BLEU is not a distance metric. It is an evalution metric for extrinsic evaluation commonly used in machine translation. See: https://aclanthology.org/P02-1040/
It is also used in many other NLG tasks like data-to-text as you mentioned, but it is usually considered not a good choice in this context. See: https://aclanthology.org/J18-3002/
my problem was one of professors asking me to prove that PARENT (state of the art metric for data2text) is a metric. And I had no idea what he was talking about. I have read all these papers and they are part of my project.
So a question from your committee on why are you using it, is a valid question. You might have to include addtional metrics to support your claims.
He didnt ask why? he wanted me to prove
It looks like there might be some confusion surrounding the term "metric".
It could mean a distance metric with the three properties specified in the messages above, but in this context, a metric here is something which can be used to assess how good are your outputs compared to reference texts.
And I need to resolve whatever that is tomorrow ๐
All I wanted to know.. is there any "properties of metrics for evaluation of nlg models". But I guess there is not since I could not find anything. Thanks for the help, I am going to confront him tomorrow and see how it goes
In terms of properties, you could say that BLEU is somewhat of a measure of fluency of the generated text as it measures precision over n-grams. It does not really measure the accuracy of the information in the output, hence something like PARENT might be more useful..
man you don't need to put all these links. I have been going through these things for months. I appreciate your effort and you can just coin a term and I would understand it.
"measure of fluency of the generated text as it measures precision over n-grams" I shall try to say this
Anyone here have a 1650 super they use? Does it even have tensor cores for this purpose?
It looks like a no from what I see, just confirming
How to create a elastic search type functionlity using postgres
where a common person will search like diesel prices or something and in return it will fetch data like fuel prices or something from my database
Hi, I received a recruitment task to analyze the data set in csv file from data.world website. The question is: Which brewery produces the strongest beers by abv?. Well this seems like an easy question to answer. I just need to import this data to python and with pandas extract the beer name with the highest abv.
This is more complicated one, that I am not sure how to answer
If you had to pick 3 beers to recommend to someone, how would you approach the problem ?
Can you give me any ideas how to approach this problem? How can I analyze the data to pick 3 best beers?
Here's the data set https://data.world/socialmediadata/beeradvocate
matrix = tf.constant([["ex"],["ok"]],tf.string)
is this a tensor of rank 2?
is this a 2 dimensional tensor?
Use a RNN to get a list of beer reviews embeddings, generate embeddings for the users and use these two for training a model that should output whether a given user might or might not like a beer, then get the probabilities of the user liking each beer and use them for sorting
Did the same thing 2 days ago, but with food
@grave breach is an RNN/LSTM still the "go-to" default, as opposed to transfer learning and/or fine tuning w/ a pre-trained transformer model?
and do you feel like RNN/LSTM does better as a default compared to something "stupider" like bag-of-word-embeddings (like word2vec)?
@desert oar sorry, didn't explain well, I'm not talking about word embeddings, but about sentences embeddings (in this case, the reviews)
oh, sure. but you can make sentence/document embeddings by averaging together word embeddings
I think that the best to go might be combining the two
don't you need to pad sentences and/or break them into chunks for LSTM? do you end up averaging those chunks or something?
you train the lstm with a sequence of embeddings
oh interesting
honestly this is one dark corner of machine learning i have never touched in a serious capacity
that is ringing a bell. plus the "attention" weighting mechanism
Thanks for the answer, do you maybe know any youtube tutorial that reviews the things that you mentioned? I am kinda new in the ML field
No, sorry, I don't
But it isn't too complex
If you decide to use LSTMs they're pretty easy to implement in keras
There's a number of tutorials you can find for like weather prediction that would be quite similar
okay, thanks. Can I maybe dm you if you could check my solution for this problem?
Of course not exactly copy/paste, but close
Sure
Hello everyone, I'm having some doubts with a dataset i'm trying to "clean":
So, I'm trying to visualize some data based on a dataset of wine reviews, and what I want to see is the amount of wines that every taster tasted
So I filled the NA values with "Anonymous", and then tried to plot it using matplotlib, but there's a discrepancy in the plot and the value_counts() that I have, does anyone know what it could be?
This is my code of the fillna, and the plot
This is the plot I got
These are the values that do not match the ones in the plot
And more values have that discrepancy}
At first look
x and y might not be paired correctly
Like, does wines.taster_name.unique() returns values with the same order of wines.taster_name.value_counts()?
Ooooh, you got a point there
And I faced the same problem a few days ago lmao
Thank you so much, that's why it doesn't match hahahahaa
No problem mate
And no, it doesn't have the same order hahahaha
Now it matches ๐
Thank you so much!!!
You're welcome
is anyone familiar with Bokeh and/or Surface3d?
You're best off asking the question you would ask if someone said yes.
class OurDenseLayer(tf.keras.layers.Layer):
def __init__(self, n_output_nodes):
super(OurDenseLayer, self).__init__()
self.n_output_nodes = n_output_nodes```
could someone explain this block?
it's making a Layer with n output nodes. Most of that code is just Python object stuff rather than neural network stuff
for example, super(OurDenseLayer, self).__init__() is just initializing everything from the Layer class
i tried writing a function to smooth an image using a gaussian blur in numpy
and it's mind bogglingly slow
def smooth(img, sigma, r):
s2 = sigma ** 2
new = np.empty(img.shape)
h, w = img.shape[:2]
for y, x in np.ndindex((h, w)):
sum = 0
for i in range(max(y - r, 0), min(y + r + 1, h)):
for j in range(max(x - r, 0), min(x + r + 1, w)):
dy2 = (i - y) ** 2
dx2 = (j - x) ** 2
sum += 0.5 * np.exp(-0.5*(dx2 + dy2)/s2) / (np.pi * s2) * img[i, j][:3].mean()
new[y, x] = np.array([sum, sum, sum, 1])
if x == 0:
print(y, x)
return new
how can i make it not useless
using a library that already implements gaussian blur
not allowed
then read the code of a library that implements gaussian blur
Use slices.
It is possible to get stuck in a local maximum in simulated annealing (True/False) . Can anyone help me ? Thanks
Hi
Hi can anyone tell me whats the difference between statistical models and machine learning models ?
Hi i so confuse about this: do we must using PCA model before apply KMeans or Hierarchical model?
does anyone have access to open ai api here ?
I have a problem in my confusion matrix implement by VGG16 model . Here is the link : https://drive.google.com/file/d/1MSw5acwkTz8p23XUPTzU_1mcRojHNUPQ/view?usp=sharing
I hope you can help me.
Me
hi
i need helpppp
i have deep learning model
and i want to deploy it an android app
model of segmentation
@zinc rampart Never done this, but I can think about a bounch of solutions
The 1st would be to expose the model via an API
Since mobile devices have shared gpus or they don't have at all
If you really want to deploy the model inside the app there are others solutions
You could use DL4J (Deep Learning For Java)
Or, use PyTorch mobile
Hello there, Does anyone know how to acquire data from 'Earth Data' source with the help of API or any other technique? I want to work on 'Air Quality and Water Quality' of specific region. I know this is Geo-spatial data analysis. But I don't want Images, I want to only sensor's sensing data. Please if anyone knows anything. Let me know.
I have a problem in my confusion matrix implement by VGG16 model . Here is the link : https://drive.google.com/file/d/1MSw5acwkTz8p23XUPTzU_1mcRojHNUPQ/view?usp=sharing
I hope you can help me.
Yea it's the python stuff I don't understand lol
Read about the method resolution order. It will make more sense once you understand that.
sounds scary but I'll try
class OurDenseLayer(tf.keras.layers.Layer):
def __init__(self, n_output_nodes):
super(OurDenseLayer, self).__init__()
self.n_output_nodes = n_output_nodes
``` lemme break this down according to my limited understanding
tf.keras.layers.Layer is a param in OurDenseLayer
init is a function that runs when the object is called
with params of self meaning i can call all objects within the function using the class.object notation
it also takes in n_output nodes
super(OurDenseLayer, self).init()
this part I can't explain
It is not a parameter. It is a parent class.
so if i say class Cool(TheCoolest)
TheCoolest is the parent class?
super(OurDenseLayer, self).__init__() calls the __init__ method of the next parent class in the method resolution order, which in this case is tf.keras.layers.Layer
can there be 2 parent classes?
there can be any number, but it's usually not very many.
super(super(OurDenseLayer, self)).init()
If there's more than one parent class, the ones you list first are sooner in the method resolution order.
does this call the parent class of that parent class
No, you only need to call super once, and it goes to the next __init__ method in the order. The next __init__ method will also need to have super().__init__ to continue.
so Method resolution order is parent class >> class >> class this is what you meant?
ahh i seee
makes sense
!e
class A: pass
class B: pass
class C(A, B): pass
print(C.__mro__)
so it's just like inheritance in c# and java
@serene scaffold :white_check_mark: Your eval job has completed with return code 0.
(<class '__main__.C'>, <class '__main__.A'>, <class '__main__.B'>, <class 'object'>)
No, in those two languages, you can only inherit from one class at a time. Python has multiple inheritance, which gives you a lot of flexibility, but must be used with caution.
!e For example:
class A: pass
class B(A): pass
class C(A, B): pass
print(C.__mro__)
@serene scaffold :x: Your eval job has completed with return code 1.
001 | Traceback (most recent call last):
002 | File "<string>", line 5, in <module>
003 | TypeError: Cannot create a consistent method resolution
004 | order (MRO) for bases A, B
B inherits from A, but then C tries to inherit from A before it inherits from B, which makes no sense.
class A:
def x():
return 1
class B:
def x():
return 2
class C(A, B):
def x():
return 3
print(C.x)
what would this return?
It would error out, because classes are not functions.
hm.. true
how about now?
been a while since I've coded from scratch
but hope you get the concept here..
It still doesn't make any sense because C.x isn't calling the method, and you have to have self as a first parameter.
what if i call C.x()? or C().x()
!e
class A:
def x(self):
return 1
class B:
def x(self):
return 2
class C(A, B):
def x(self):
return 3
c = C()
print(c.x())
@serene scaffold :white_check_mark: Your eval job has completed with return code 0.
3
You also have to make an instance of C before you can get anywhere.
!e
class A:
def x(self):
return 1
class B:
def x(self):
return 2 + super().x()
class C(A, B):
def x(self):
return 3 + super().x()
c = C()
print(c.x())
@serene scaffold :white_check_mark: Your eval job has completed with return code 0.
4
It returns 4 because super().x() goes to the next class in the MRO, which is A, but A.x doesn't call super().x(), so it stops.
yep. so what would c.x() return in that case?
if it were B,A it would return 3 + 2
yes. doesn't matter what B inherits from, at that point.
but look at what B.x returns
yes, and super().x() is following C's MRO, not B's.
i have a numpy array with shape (M, N, 4) and i want to convert it to one with shape (M, N) where [a, b, c, d] in the first one becomes mean(a, b, c) in the new one... how would i do that?
you always follow the MRO of the original object.
I'm pretty sure that's taking the mean along axis 2.
!docs numpy.ndarray.mean
ndarray.mean(axis=None, dtype=None, out=None, keepdims=False, *, where=True)```
Returns the average of the array elements along given axis.
Refer to [`numpy.mean`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html#numpy.mean "numpy.mean") for full documentation.
See also
[`numpy.mean`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.mean.html#numpy.mean "numpy.mean")equivalent function
they have nothing to do with the x in B or A
I want to say one hot vector encoding but that's just something i picked up.
also why don't you want d to be part of it? @ember sapphire
don't even know what that means
if you take out d won't you lose on some spacial data?
wrong reply but that was to sunshine
I'm answering sunshine's question.
it's an image with (r, g, b, a) pixels, i just want the luminosities
a is 1 for all of them
I guess you can take a slice of the array that doesn't include those values, but it will still be taking the mean along the 2 axis.
do you know how to do n-dimensional slices in numpy?
great ๐
image[:, :, :3].mean(axis=2)
looks good! did it work as expected?
yes, thank you
YAY! 
To remember which axis you want to do a calculation over, it's the index of whichever axis you want to disappear. So you wanted the M and N (indices 0 and 1)
seems intuitive enough in this case
the axis argument in some numpy functions breaks my brain though
you just have to remember that rule and it will start to make sense 
@thorn bobcat are you good btw?
I am good
as long as this is correct
super().x() only refers to the super class of the x in c
what if B has a super class too tho?
Yes
and a function x
Whats a great book about ML, Data Science etc ?
like A was B's super class and B and A were C's super class
the MRO is determined when a class is created. Try experimenting by making arbitrary classes and checking their .__mro__ attribute
how would python deal with that?
alright I will. Thanks alot! this was an important concept that you explained extremely well
thanks alot

You could try "Data Science from Scratch"
@serene scaffold WHat about the theory and the mathematics of it ?
if interested in the math backgeoung, I think there was a good course from the Standfort university... I myself didn't get everthing, but you might get a clue....
I'm looking for the video link... I think I saw it in UTunes (which doesn't exist anymore - but I might find it)
ah... there we go...
a lot of math... for me it was too much
@lapis sequoia Can you send the link ?
Thank you
๐
very much, yes
thank you
why using the super calls the constructor of A?
just cuz C extends A,B?
if it extends B,A would it have returned 5?
There aren't any constructor (well, init) calls in this example
Hey, does anyone know how can I get embeddings from data that holds user reviews of things (there are things that are reviewed by multiple users)? There are 4 columns of review_type, each holds a value from 1.0 to 5.0. I need to somehow group them to answer my question
5
then what does the super do?
ah okey, super refers to its parent class. so it is A because of the order?
@desert bear I don't understand. Can you show the data?
@desert bear I don't understand. Can you show the data?
I need to answer a qustion: If you had to pick 3 beers to recommend to someone, how would you approach the problem?
I've been suggested to use PCA to get embeddings. But I just cannot find any example online showing on how to do it when there is large amount of data (reviews) refering to the same thing (there are over 10'000 reviews for one beer)
I just don't know how to implement it in python
that sounds like you want to do clustering - separate the data into 3 clusters, returns the centroids of them
Are you doing sentiment analysis on the reviews?
Well, right know I am not doing any analysis
This seems like the standard netflix movie dataset?
The treatment is just about the same
Above are screens of the data.
From what I see, the structure is the same
How can I pick 3 beers if there are almost 1'500'000 rows of data?
I have tried counting which beer name has the most number of 5.0 reviews. But I think that this problem should be approached with ml
But what that implies. I just cannot find any good example of the code on the web that could help me. I've literally spent few hours on this and I'm stuck
I mean, it depends by what metric do you want it to be best, for one
average predicted review score, or something?
and is it supposed to be an element from the data, or a "generated" name?
Well, I don't know that honestly. There are 5 columns for reviews (one of them is an overall score)
I think that question implies to give a name of the specific beer from the data
So I think I should group beer_name (or just the index of the review) with its reviews (taste, palate, appearance, aroma, overall score)
Does anybody have any idea on how can I pick the best beer_name with the given data?
So i have this apply function in pandas:
def apply_genre_transformer(row):
row_data = json.loads(row.replace("'", "\""))
tags = 0
for tag in row_data:
tags = tags | genre_df.loc[genre_df["tag"] == tag, ["id"]]["id"]
return tags
frame["genre"] = frame["genre"].apply(apply_genre_transformer)```
but im getting an error:
Traceback (most recent call last):
File "F:/Crunchy-Bot/data-scraper/processor.py", line 38, in <module>
frame["genre"] = frame["genre"].apply(apply_genre_transformer)
File "F:\Crunchy-Bot\data-scraper\venv\lib\site-packages\pandas\core\series.py", line 4143, in apply
return self._constructor_expanddim(pd_array(mapped), index=self.index)
File "F:\Crunchy-Bot\data-scraper\venv\lib\site-packages\pandas\core\frame.py", line 570, in __init__
arrays, columns = to_arrays(data, columns, dtype=dtype)
File "F:\Crunchy-Bot\data-scraper\venv\lib\site-packages\pandas\core\internals\construction.py", line 534, in to_arrays
return _list_of_series_to_arrays(
File "F:\Crunchy-Bot\data-scraper\venv\lib\site-packages\pandas\core\internals\construction.py", line 592, in _list_of_series_to_arrays
index = ibase.default_index(len(s))
TypeError: object of type 'int' has no len()
Process finished with exit code 1
is there an easier way of doing this?
Anyone know of any good models to use when predicting the performance of a student in his exams?
I also want to use indepth knowledge of hidden connections in my system to provide guidance that could help the student maximize his results and minimize failure.
should I use a minmax game?
along with some neural networks?
can you do print(genre_df.head().to_csv()) and print(row_data) and show me? And please ping me when you have done this.
And probably print(frame.head().to_csv()) as well.
I should mention the function is correctly logically, but i believe im not doing what i should be doing in pandas
I basically want to turn a list of strings to a set of bitflags which are pre-defined in another df
@serene scaffold
# print(frame.head().to_csv()) -> Index(['title', 'description', 'rating', 'img_url', 'link', '_id'], dtype='object')```
how is that possible? what is frame.__class__?
so there's no way that's the output of print(frame.head().to_csv())
it should be a string of comma separated values
No problem! Ping me when you have the print statements I asked for and I'll come back 
","['Action', 'Mystery', 'Supernatural', 'Vampire']",8.83,<redacted>,<redacted>-hen
no problem. be sure to let me know what print statement is which as well.
thats just print(frame.head().to_csv())
I think, or well, it should be
but idek
looks right to me ๐
Do you want to turn the genres into equivalent flags or are these flags a reduction in dimensions (i.e. 'groups' of genres)?
I just want to turn the array of genres into a given bit field
what is a bit field?
just your normal bitflag
e.i
foo = 1 << 0
bar = 1 << 1
if i had ["foo", "bar"] I would want the result to basically be foo | bar == 3
which i have just realised im calculating wrong
๐
should be a bitwise or
So e.g. if there are 11 genres in the universe you need 11 0s or 1s
That's a little 'optimised' but I'm not sure if python really stores bits properly as bits
no I need the given bit field of them
It's a string 11 length long of either 0s or 1s, right?
id tag
0 1 Drama
1 2 Fantasy
2 4 Adventure
3 8 Harem
4 16 Mecha
5 32 Vampire
6 64 Shounen Ai
7 128 School
8 256 Dementia
9 512 Seinen
10 1024 Cars
11 2048 Comedy
12 4096 Police
13 8192 Military
14 16384 Hentai
15 32768 Martial Arts
16 65536 Shoujo Ai
17 131072 Sports
18 262144 Horror
19 524288 Romance
20 1048576 Sci-Fi
21 2097152 Supernatural
22 4194304 Samurai
23 8388608 Kids
24 16777216 Shounen
25 33554432 Mystery
26 67108864 Super Power
27 134217728 Game
28 268435456 Parody
29 536870912 Space
30 1073741824 Action
31 2147483648 Shoujo
32 4294967296 Yuri
33 8589934592 Josei
34 17179869184 Ecchi
35 34359738368 Historical
36 68719476736 Psychological
37 137438953472 Slice of Life
38 274877906944 Yaoi
39 549755813888 Thriller
40 1099511627776 Magic
41 2199023255552 Music
42 4398046511104 Demons
id is the assigned bit flag
Yes, a single genre bit field is 42 0s or 1s
Im not entirely sure what you mean?
Is there a reason for the bit manipulation? Wouldn't integers or arrays of bools work
they're being stored in postgres
So i dont really want to have a given set of arrays
or have to deal with Joins
bit field far more suited for this stuff
000000000000000000000000000000000000000000
represents an anime contains no genre
000000000000000000000000000000000000000001
represents an anime containing the genre at position 42, etc.
from functools import reduce
from operator import or_
tag_to_id = df.set_index('tag').to_dict()
def encode(genres: list[str]):
return reduce(or_, (tag_to_id[g] for g in genres))
errr a bit field of 0 would be no genres
if a anime had the genres Magic and Demons it would be 6597069766656
!e
print(bin(6597069766656))
@chilly geyser :white_check_mark: Your eval job has completed with return code 0.
0b1100000000000000000000000000000000000000000
^I don't see the difference?
you can make this a massive length of binary digits yes
But Im not litterally storing them as 0s or 1s
Im just storing them as an int 
I'm pretty sure you can just .apply my encode function.
Ah ok....
Yeah so it seems like an or/sum function
I'd say the function above would probably work
had to modify it slightly to account for the data
but py def encode(genre: str): genres_ = json.loads(genre.replace("'", "\"")) return reduce(or_, (tag_to_id['id'][g] for g in genres_))
raises
return reduce(or_, (tag_to_id['id'][g] for g in genres_))
TypeError: reduce() of empty sequence with no initial value```
https://arxiv.org/abs/2005.00341 thank me later.
it's intended to be applied to the columns of lists of strings, where each string is found in the tag column of the dataframe you gave a moment ago.
it won't work if you apply it to an empty list.
ye but the data in the df is basically JSON format
but with single quotes instead of double
Idek why it is
but it is 
once it's in a dataframe, it doesn't matter if it had been in a json before.
not but i mean each column in the row for genre is a string
repr(frame["genre"][0])
'"[\'Comedy\', \'Sports\', \'Drama\', \'School\', \'Shounen\']"'
yes, you can have strings in the dataframe. are the quote characters part of the actual string?
why is it like that?
Idek
All i can say is someone messed up the scraper and I wasnt about to wait another 6 hours to re do it 
๐คทโโ๏ธ
Is it possible for you to just convert once to make it nicer
Just a curious question, why is the tag encoded in such a weird way via the id column?
Because that data is weird, and cleaner data is nicer
It's not like the '\ are adding any value to your data right now
wdym
It's powers of 2?
looks like it's each power of 2 so that each possible combination of bitwise OR is unique
bit flagsโข๏ธ they're uber useful
!e
from functools import reduce
from operator import or_
tags = ["Drama", "Fantasy", "Adventure"]
tag_to_id = {t:2**x for x, t in enumerate(tags)}
def encode(genres: list[str]):
return reduce(or_, (tag_to_id[g] for g in genres))
print(tag_to_id)
print(encode(["Fantasy", "Adventure"]))
@chilly geyser :white_check_mark: Your eval job has completed with return code 0.
001 | {'Drama': 1, 'Fantasy': 2, 'Adventure': 4}
002 | 6
I assume the id is simply a way to represent the tag, so why not simpler means like one hot encoding?
ahh, I don't know much about it. kinda curious why it was so weird
If you mean actually 42 binary variables, well, it's just another data representation
Means you have combinations of genres in a single int
which basically removes the need for any lookup or linking table in sql
!docs functools.reduce
functools.reduce(function, iterable[, initializer])```
Apply *function* of two arguments cumulatively to the items of *iterable*, from left to right, so as to reduce the iterable to a single value. For example, `reduce(lambda x, y: x+y, [1, 2, 3, 4, 5])` calculates `((((1+2)+3)+4)+5)`. The left argument, *x*, is the accumulated value and the right argument, *y*, is the update value from the *iterable*. If the optional *initializer* is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty. If *initializer* is not given and *iterable* contains only one item, the first item is returned.
Roughly equivalent to:
I guess you can set 0 as the initial value @buoyant vine
its kinda old tho
just added if genres_ else 0 to the end
just so it defaulted to 0 if the anime hasnt got any genres
so it's working?
yes! 

with MIR and Audio, even a single year old makes me feel its outdated
what is MIR and audio?
https://arxiv.org/abs/1802.04208 seen this?
Is there a model that replaces the work of sound engineers?
cause that would be really magical.
I'm actually a sound engineer.๐
it's sort of like sculpting, think an AI can do this better than a person provided guidance.
You'd lile https://youtu.be/GiAj9WW1OfQ
Try it out now! https://codeparade.itch.io/fractal-sound-explorer
Making music and sound effects directly from common fractals was an idea I though of one night, so I just had to try it out to see what it would be like. The results were really interesting and actually helped me understand even more about fractals and chaos.
Source Code:
https:...
checking it out right now
https://www.youtube.com/watch?v=O4Cxrk98ZBc interesting acapella. non tech related but this is some good quality data.
Like + Share + Subscribe (means so much!) Hit that bell for me too ๐
ROOOOOCK!!! Hope you like my tribute to Linkin Park and Chester Bennington. It was highly requested by YOU :) Such an awesome band ๐ค๐ Here are the songs included in the medley: What I've Done, Shadow of the Day, Heavy, Numb, In The End
Help me keep creating! Support my video...
wrong place sorry
why am i getting this error?
can some help me explain why I am getting this error try to run this code
I get this error
I just tried setting up my gpu for tensorflow today
so Im not sure if thats the reason this issue is occuring
@blazing bridge what's your env config (like what's the versions of all the stuff you have installed)
and cuDNN 8.0.5
mb for the late response, I had to remember what version I installed
yeah lol
oh ok
and you'd have to move the cudnn files into the proper locations again
what was happening was when I was installing the toolkit
for 11.2
it says it was already installed or an error message
do you know how I could start over
for a fresh installation
try uninstalling 11.0 first
sorry, how do I do that
in program files
do I delete this file
or like this
@austere swift
Hi
Hi
Guys I wanted to know about web scraping, maybe you could help me. I want to start a dropshipping store and gathering data is very important for me
So, do you know any other way for gathering data and doing sone research in that topic so that I can analyze the data and get into some ideas?
can anyone pls tell that pandas datareader for yahoo.finance is working or not?
Traceback (most recent call last):
File "d:/12 - Competetions/DLC AI ML/Stock predictor.py", line 19, in <module>
data = web.DataReader(company,'yahoo',start,end)
File "C:\Users\ritwi\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\util_decorators.py", line 199, in wrapper
return func(*args, **kwargs)
File "C:\Users\ritwi\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas_datareader\data.py", line 376, in DataReader
return YahooDailyReader(
File "C:\Users\ritwi\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas_datareader\base.py", line 253, in read
df = self._read_one_data(self.url, params=self._get_params(self.symbols))
File "C:\Users\ritwi\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas_datareader\yahoo\daily.py", line 153, in _read_one_data
resp = self._get_response(url, params=params)
File "C:\Users\ritwi\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas_datareader\base.py", line 181, in _get_response
raise RemoteDataError(msg)
pandas_datareader._utils.RemoteDataError: Unable to read URL: https://finance.yahoo.com/quote/TSLA/history?period1=1356993000&period2=1577917799&interval=1d&frequency=1d&filter=history
Response Text:
b'<!DOCTYPE html>\n <html lang="en-us"><head>\n <meta http-equiv="content-type" content="text/html; charset=UTF-8">\n <meta charset="utf-8">\n <title>Yahoo</title>\n <meta name="viewport" content="width=device-width,initial-scale=1,minimal-ui">\n <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">\n <style>\n html {\n height: 100%;\n }\n body {\n background: #fafafc url(https://s.yimg.com/nn/img/sad-panda-201402200631.png) 50% 50%;\n background-size: cover;\n height: 100%;\n text-align: center;\n font: 300 18px "helvetica neue", helvetica, verdana, tahoma, arial, sans-serif;\n }\n table {\n height: 100%;\n width: 100%;\n table-layout: fixed;\n border-collapse: collapse;\n border-spacing: 0;\n border: none;\n }\n h1 {\n font-size: 42px;\n font-weight: 400;\n color: #400090;\n }\n p {\n color: #1A1A1A;\n }\n #message-1 {\n font-weight: bold;\n margin: 0;\n }\n #message-2 {\n display: inline-block;\n *display: inline;\n zoom: 1;\n max-width: 17em;\n _width: 17em;\n }\n </style>\n <script>\n document.write('<img src="//geo.yahoo.com/b?s=1197757129&t='+new Date().getTime()+'&src=aws&err_url='+encodeURIComponent(document.URL)+'&err=%<pssc>&test='+encodeURIComponent('%<{Bucket}cqh[:200]>')+'" width="0px" height="0px"/>');var beacon = new Image();beacon.src="//bcn.fp.yahoo.com/p?s=1197757129&t="+new Date().getTime()+"&src=aws&err_url="+encodeURIComponent(document.URL)+"&err=%<pssc>&test="+encodeURIComponent('%<{Bucket}cqh[:200]>');\n
</script>\n </head>\n <body>\n <!-- status code : 404 -->\n <!-- Not Found on Server -->\n <table>\n <tbody><tr>\n <td>\n <img src="https://s.yimg.com/rz/p/yahoo_frontpage_en-US_s_f_p_205x58_frontpage.png" alt="Yahoo
Logo">\n <h1 style="margin-top:20px;">Will be right back...</h1>\n <p id="message-1">Thank you for your patience.</p>\n <p id="message-2">Our engineers are working quickly to resolve the issue.</p>\n </td>\n </tr>\n </tbody></table>\n </body></html>'
Pls help me getting this marvellous error
yeah because reading that code is a pleasure *-*
Lol
Says status code 404 not found, so maybe something wrong on the Server, resource changed locations, etc
Hey does anyone know about pandas
I know a little, however, looking at the timing of your question, It seems as though you were planning on getting someone to help me (im in help-honey) If that is not the case, I wouldn't mind trying to help
Oh hey yeah sorry about that I need some help on comapring dataframes
I have two dataframes this is one:
and this one:
I want to comapre the identical of these dataframes
you want to see if the first one is contained in the second?
yeah basically
you can get a row of a df by doing thispython df1[:][0]I think, so one way would be to just iterate through the slices. probably not the best way, but I think it would work
Oh i was mainly looking for comparing two dataframes and seeing if there is any unique row in the above dataframe which is not in the below one
if there is a built in function for it, it would do the same thing I am describing. I think. so you could iterate through the slices of the first df and on each slice of the first one, compare that to each slice of the second.
it would be a simple nested for loop
Oh okay then I'll do that, thanks for your help
no problem
anyone willing to explain to me how to change color of a scatter plot and create a legend
using ony matplotlib
The Yahoo finance API sever is deprecated. Meaning your data source no longer works
Can anyone clarify what's happening here with this lambda function being passed to the map() method? This is an example from the Pandas tutorial on Kaggle. Is it iterating through each value in reviews.points and assigning the value to p?
Hello, I am trying to perform an NLP task using BERT
I am working with a few variations of the same dataset and I want to find which combination of dataset variant + hyperparameters yields the best results
Though, how should I go about this??? ๐ฎ
I am using a decent set of hyperparameters right now, so should I first narrow down which dataset is the best using this working set of hyperparameters
and running the model on all the datasets? (or does this not effectively determine the right dataset variant to use?)
Or...
Should I go the long way and test various combinations of hyperparameters on EVERY dataset?
@serene scaffold

what task are you trying to perform, and what is the topic of the data?
label electronic health records (text documents) -- binary classification -- the important sentiment within these docs (which ive truncated to 512 tokens a piece) is contained within a few sentences in each
the truncation did not effect the sentiments
relatively small dataset of 2000 docs
using batch size 16 + bert base rn... which uses 15.5/16 GB of GPU memory ๐ฎ
not yet sure if bert large effects GPU memory
probably does
using 4 epochs
but bert large is apparently much better on small datasets
I'm gonna make a snake AI with genetic algorithm (or something like it), and I'm trying to plan out the nn. Ideally I just have binary values for inputs, but I'm struggling to figure out how I can fit data about the snake's body in a binary value. Any suggestions on how to do this?
Hey @onyx coyote!
It looks like you tried to attach file type(s) that we do not allow (.zip). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.
Feel free to ask in #community-meta if you think this is a mistake.
@serene scaffold any idea sir? ๐โบ๏ธ๐
Hey! Has anyone worked with LabelMe for annotating their dataset? So I have already annotated a few images in my training set using the label list now I want it to automatically annotate images in my test set using the label list I created, how should I do that? (Its my first time working with it so I'm kind of lost xD)
Share
Never done this, but you could make a w*h matrix representing 0 for empty cell and 1 for snake body
Like
0 1 0 0
0 1 0 0
0 1 0 0
0 1 1 1
by hand
split the dataset into folds, use some lib to parellize hyp search (I personally used RayTune) and dataset trials, for a simple method just do ordered iteration over each fold with a random selection of hyperparameters over n trials, where n is a reasonable values
gonna take a lot of time, but in no case it wouldn't
guys my model is suffering from high bias , i want to change the tolerance from 0.5 to something else for logistic regression, anyone know how to do it ?
Hello guys! hope you are all well.
I am following this layoutlm (document layout model) tutorial:
https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb
In the inference section, the number of words detected with pytesseract (164 in the notebook) are less than the number of word level predictions (229 in the notebook) even after eliminating the special tokens. It is supposed that they are equal. Does anyone of you have an idea about this ?
I need some help on pytorch
I trying to train a resnet pretrained in pytorch
I'm trying this code here:
but when I run it I get this error:
This is my model:
I understand its related to my input shape but I dont understand how I can change it work
so I need some help here
Does anyone know what the cause of a "contour levels must be increasing" error is?
why are you inputting 1, 64, 3, (224, 224)
how would you approach the challenge of generating studio quality audio files through phone recordings? basically de-noising, fixing tune, adding reverb and sound fx as appropriate? The Idea I had in my head was creating a dataset of raw vocals and using a gan trained on the attributes that constitute the average of that dataset's parms, then I'd be creating a model with the goal of outputting a passable audio file from input audio. Would this count as a gan, what's the best approach to this problem?
its obviously not a gan
and check out some of the already available models out there. They work pretty well too
Why is my kernel shutting down when running the third code block? I am only getting the GPU error, ignore if on CPU
because you are using huggingface
I am going to try to do this in a venv and see if that is better
I would use longformer not with hugging face but this seems very nice to use
I am watching my cpu it doesnt seem over clocked unless its happening really quickly
what has overclocking have anything to do with lol
Thanks, I want to use the longformer model, hopefully I can somewhere else
You stated out of vram
Thats what I mean
try reducing model size
ohh, you are inferencing
what's your system and GPU specs?
anyone know a ML model that generates studio quality recordings from poor quality recordings?
r u also executing the 2nd cell? try doing without that. the aim is to conserve your memory
it's too complicated for a simple GAN
you might be able to do it with reasonable quality, but without actual research I heavily doubt it
also I assume it'll try to make my audio input fit into the distribution curve of the dataset.
generate audio samples for training
it would have a discriminator as well where poor audio samples are excluded pushing for better quality generation, alteration and synthesis
I could give ratio's to the G and D
where I could control them based on the quality I want against information retain-ability.
very doubtful it would work
you can try ofc, but don't blame me if it doesn't work
because 100% it won't
I actually think it will
Imagine this
I take a studio quality wav file, compress it, apply random noise using an algo and change it to mp3 format.
the AI's job is to reconstruct the studio quality wav given a bad mp3.
someone suggestion I take an unsupervised approach where I let the AI define the integral and the values that make a sample studio quality.
Sounds like worth a shot. You may want to consider how exactly you're creating the poor quality mp3 though, it should ideally be representative of the actual poor quality data you get.
Hey can anyone recommend a good book for ensemble learning?
if only it were that simple lol
that's another issue..
I think I need to train an AI to train the AI
currently working on a program that would solve provide the solution to a rubiks cube from an image, does anybody know of a dataset that contains a bunch of images of rubiks cubes? knowing that i would possible have to annotate them individually
(or if someone knows a better solution where i could train an AI to recognize all colors on a face of the cube)
at some point the signal to noise ratio will definitely be too low
i could stop adding noise once the signal to noise ratio gets lower than a certain threshhold.
So I'm a little confused about how many weights there are. In this image, each input of one layer has a connection with a weight to each to the next. So for example, the first layer of weights should be 784 x 16. However, whenever I see it explained, the number of weights just matches the number of inputs. Why is this?
Hello everyone I'm not sure how I can transform my image with 224 224
I have a images with 224 * 224 * 3
but im not sure how I can add this images size as input shape
for this conv2d:
When we have a dataset, what's firstly to use for analysis: scaling data or splitting data?
I would say you scale the data first, try more than one method, for example: try StandardScaler, or try MinMaxScaler, or try PowerTransformer
Keep in mind that not all ML algorithms get affected by the data that are not scaled
Of course, by saying splitting data, you mean to split the data into training and testing sets, not features and label
why many source saying splitting data first? I really confused about that
why would you split the data, and then scale it ???
if you do this, you would write the scaling code twice for no good reason
new_train[x] = scaler.fit_transform(train_data[[x]])
new_test[x] = scaler.fit_transform(test_data[[x]])
or
new_data[x] = scaler.fit_transform(data[[x]])
x is is the numeric columns if you are asking
what do you think about that?
read this
I already said that you don't split the data into training and testing sets, but you must split the data into features and label, because you must not scale the data you want to predict
Whether it means when i splitting the dataset i must to scaling the training set with 'fit_Transform()' and then scaling the test set with 'transform()' ?
yes
but the pic you sent, independent variable is the features you have, and the dependent variable is the values you want to predict
whether any different method for different case?
let me make easy on you, follow the steps in the picture you sent, then you can split the data into training and testing sets
I already did, and then?
and then you can split the data into training and testing sets and use any ML algorithm to complete your task
Whether this method can be use for all Machine Learning model?
@narrow dagger
for any tabular data sets such as CSV files
or even Json files, you can convert them into CSV files
for all Machine Learning model, like KNN, Decision Tree, SVM, etc?
yes
Ok thank u very much!!!
Happy to help ๐
How's method when I scaling data first? @narrow dagger
you mean the code ?
yes
I am following this tutorial
https://www.kaggle.com/somaktukai/credit-card-default-model-comparison/notebook#Classification-of-Taiwan-Credit-Card-Payment-Default-Prediction
Credit Card Default- Model Comparison
Explore and run machine learning code with Kaggle Notebooks | Using data from Default of Credit Card Clients Dataset
And when I run it, I changed what classifiers to use and my code only does knn any tips on fixing this?
Hey @grand thicket!
It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.
Feel free to ask in #community-meta if you think this is a mistake.
The weigths of a dense layer is a matrix of shape number_of_outputs, number_of_inputs, yes.
reshape each image to be 1* 224 * 224 *3
Hello All, here is a short introduction to Data Visualization using FastAPI & EasyCharts, let me know what you think. https://joshjamison.medium.com/data-visualization-using-fastapi-and-easycharts-493eda3b1f3d
which platform is best for learning machine learning and ai???
I recommend this book: https://www.goodreads.com/book/show/25407018-data-science-from-scratch
If you attend a university, you might be able to read it online for free
wrong link, try this one: https://www.oreilly.com/library/view/data-science-from/9781492041122/
hi can i ask please
i would like to locate datamatrix 2d code from image
i'm currently i'm eroding the code and find its contour based on area
i'm wondering if there is something like a feature finding method in cv2
ideally a sample code to accompany the answer
i have two data frames
df1 with two columns Makes and Models, 100 row
df2 with two columns Models and Body type, 50 row
i want to merge between the two dataframes on the Models columns in order to have this result
makes/models/bodytype and 100 row (same rows as the df1)
it is like df2 being the dictonary for df1, each time the we have a match takes the values from Body type column in df2.
Maybe you can try pandas.merge with how=left
i did it, but i got a lot of rows, it seems like it is repeated lines
when i have deleted repetition from the df2 , i got results with 100 rows. but i am looking for a better solution please
the two dataframes doesnt have the same length and same location of the rows
Maybe you can post you dataframe
Hi guys, I need some help. I want to run a sentiment analysis on tweets that haven't been classified yet. I'm new to python, and I've been trying to use bert to do an unsupervised sentiment analysis and I want to know if this is a good option? Or are there easier options out there?
i am preparing it, thank you for your time
@raw temple you can use BERT for sentiment analysis, however, you need to fine tune them for your usecase. Also, if you don't have a training/test set, then you won't be able to tell how your model performs
Hi @acoustic forge , so I have some data that I've scraped off twitter. Since they don't have labels to begin with, can I still split it into a test/training set and go from there?
No, if they don't have labels, how would you test them? ๐ You don't know whether your model predicted correct or not
What you can do, is manually label a portion of them OR find some already labelled twitter data
Okay, I got you. I was quite confused for a while since all the tutorials out there had prelabelled data and I didn't know what to measure mine against since they weren't labelled ๐คฃ
You can train your model on already labelled twitter data, there should be plenty of that
Okay, and if it runs well then I can run it on my own data?
Hey @dusty swan!
It looks like you tried to attach file type(s) that we do not allow (.html). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.
Feel free to ask in #community-meta if you think this is a mistake.
Yeah, well, you can check the performance on that data, then you can evaluate whether you are satisfied with that performance
You know the metrics of natural language processing?
You mean like the accuracy?
Precision, Recall, Accuracy, F1 score and AUC/RUC curve
Don't rely on accuracy alone, if you have skewed data, the accuracy might be 99%, but that's not really representative
Okay, I will take note of that. I have a lot to look into it seems ๐คฃ
First, get comfortable cleaning and splitting data into train/test sets. Then try to fit some basic models (RandomForestClassifier, NaiveBayes etc) and see how they perform.
Okay, I'll do that, thanks for your advice and help. I shall pop back if I have any more questions โบ
Good luck! ๐
Hi guys i have uploaded above my issue with merging two dataframes
the results of the merged table should have same rows number as the dfnissandata dataframe (598 row)
as far as i know, removing duplicates is necessary.
before merging?
yeah
removing duplicates is easy with pandas, it's df[~df.duplicated()]? I am not sure.
or df.drop_duplicates()
I have forgot SQL, so there might be other solutions without removing duplicates.
thank you both, is it possible to merge on multiple columns ?
i dont have a solid knowledge on that. thats why i cant understand why i got duplicates
Cause you had duplicates in one of your dataframes before. When it then merges, it then creates a new row for each
yeah thats why when i have dropped manualy the duplicated rows i got results as i want
can i include a conditon with the merge function ?
for example: Merge on Column X if the Column Y has ' value '
When I read my exel file in my Jupyter Notebook some columns are coming back NaN and I want to fix this. Any suggestions?
Hi, has anyone here installed and worked with vadersentimentGER? I am having trouble finding and installing the package locally on my mac term as well as via PyCharm.
Hi all,
I was wondering what everyone's thoughts are on tracking feature and overall model performance throughout development and production. Things can get pretty convoluted over time and I was wondering if anyone has any tips for how to continuously keep track of how well features do, what features are being used, and model performance over time.
Currently I am using a CatBoost model that is in purgatory between full production and development with ongoing feature engineering and I need a better way to keep track of the models metrics.
Anything helps,
Thanks!
Hello guys. When I apply a oversampling strategy on the minority class of a dataset to re-balance it, can I argue that it is founded in the premise of the Bias-Variance Tradeoff? as when I train a classifier on a imbalanced dataset it tends to be biased towards the majority class. By applying a oversampling strategy (say, SMOTE for example) and training a model with it, I will obtain a classifier with lower bias, but in exchange it will be relatively higher in variance (compared to when trained with the original dataset), correct? the expectation is that the reduction in bias will be greater than the increase in variance, improving the classifier. Intuitively, I feel that way, but I can't find a specific reference in the literature for it.
no. the "bias variance tradeoff", as stated, applies to the tendencies of the model architectures, not the data.
so, simply forget the phrase, and just talk about oversampling when you oversample, when changing the distribution of your dataset
Is there any way to convert the Excel whose few rows are having equal column values into nested json??
I have till now managed to get a plain json object which has multiple keys of same value, I want it in nested form. Is there any way to do this?
I have been using pandas and openpyxl and excel2json but I haven't received the desired output yet...
take the plain json and then just write logic to create it into your desired format, no?
remember, you can always write code to get the data into whatever shape you want
The iteration will be way too much
Will that be feasible?
There are multiple nesting levels
ofcourse. machines are rather good at this stuff after all. how many rows are we talking about here, and roughly how many levels
Nesting level 5
Rows 8 or 9
Columns 10 or 11,000
Btw, I often get confused in rows and columns
The vertical depth is 10 or 11,000
๐
I tried as you said but I am getting the same flat json structure as before
I am having a problem setting up the longformer model.
Where does the the downloaded extracted model need to be on my computer? I am getting no longformer model found, even thought the conda enviroment runs
instructions documentation: https://github.com/allenai/longformer#how-to-use
How to use
- Download pretrained model
longformer-base-4096
- Install environment and code
conda create --name longformer python=3.7
conda activate longformer
conda install cudatoolkit=10.0
pip install git+https://github.com/allenai/longformer.git
3.Run the model
import torch
from longformer.longformer import Longformer, LongformerConfig
from longformer.sliding_chunks import pad_to_window_size
from transformers import RobertaTokenizer
code here.....
does anyone have experience with reading data from a PostgreSQL table into a Pandas dataframe?
on an unrelated note: how to do get your dataframe to have sans serif font like this?
perhaps more appropriate for #editors-ides i know
do u know how to make an image search on google? like, any api or something?
like, u upload the image, and google sais what is it
okey ive found pytineye, but a free version of it?
I have dataframe like this and want to make KDE plot to show intersection of known and unknown class. I try this but cannot show right graph
result.plot.kde()
!paste please give the data in a format we can use to replicate what you're doing, namely a csv
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pydis.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
does anyone know an alternative to matplotlib that is lightweight i only need to draw a line graph from data stored in an sqlite table. TIA
Dont worry about lightweight. If you're thinking about ease of use try seaborn which is a wrapper around matplotlib
okay
Hey @inland zephyr!
Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .csv attachments, so here are some tips to help you travel safely:
โข If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)
โข If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:
thanks
guyz can anyone get me started in machine learning
do youz know y = mx + c ?
whom are u asking?
You, probably, that is one of the basics of math behind the ML.
its the gradient slope right?
start from simple math... i guess
because every ML theory today is came from math we learn from high school or college, such as basic algebra, matrices calculation (which is backbone of CNN) and calculus
plus business acumen... even the business side doesn't care about that technical math first i mention before
also statistics.
Does anyone has a book recommendation on ethical AI?
I am not looking for code.
Something similar to the clean coder / the pragmatic programmer
I have a image with me but its not displaying for some reason in python using imshow
thats the image
the image might look black but it has some dots in it
if you zoom in just enough you might be just able to see it
i think if we talks about ethical in AI... it always mean how the AI relation with human being... like ethical aspect on AI implementation such as bias on specific gender or age or race or something about negative impact of AI implementation
the clean code or pragmatical programming is more talks about technical side in implementation but not talks about ethics in AI implementation imho
https://arxiv.org/ftp/arxiv/papers/1903/1903.03425.pdf @olive raven maybe this is a good start to read about Ethics in AI
Yeah I know, I wanted to imply that I want something similar that doesn't show code, but talks about it in general
I will take a look into this, thanks!
you're welcome
anyway i have done to make my kde plot but the problem is how i take threshold based on the plot. I want to take minimum number which can separate unknown and known data. if it greater than threshold i will discard the result if less than it i will accept the result. The context is the distance between known and unknown data to the dataset.
Let P1 be the cumulative probability density of known data, P2 of unknown. Let's say you take the cutoff point x (data below that is considered known, above unknown). Then you classify P1(x) of known data as known, 1-P1(x) of known data as unknown, 1-P2(x) of unknown data as unknown, and P2(x) of unknown data as known
the question is what the costs of the misclassifications are
if the cost of missclassifying known data as unknown is a, and the cost of misclassifying unknown as known is b, then the average loss will be (1-P1(x))*a + P2(x)*b.
You take the x that minimizes this function, depending on your a and b.
(this is often called the loss matrix)
if one is much more costly than the other (say, if you're detecting cancer, wrongly suggesting that some patient might have it is significantly less costly than missing a case), you'd want to minimize this kind of errors, even if that means increasing the amount of the other kind, say.
If you don't have any assumptions about the cost, assume a=b, say. Then you need to minimize 1-P1(x) + P2(x), which can be done numerically.
EDIT: well, actually not much numerical stuff needed, even. The minimum of (1-P1(x))*a + P2(x)*b is when p1(x)*a = p2(x)*b, where p1 and p2 are the probability densities. For the case of a=b, that means that the cutoff point should be where the pdfs intersect - at around 1.0 on your plot.
can anyone tell me how add columns of data live from the api in json python
Do you guys have any recommended reading material for RNNs?
Hi, I'm searching for an idea on what to do for my master's thesis. It should have something to do with measurements or electrotechnics in general. Any suggestions appreciated :D
How can I represent the following numpy array as x,y coordinates? https://paste.pythondiscord.com/iqujetociy.yaml similar to (1, 224, 224, 3)
this looks like a 3d array to me, though
ahahaha ฤฑ am trying now ๐
thanks
deepface/DeepFace.py line 789
, enforce_detection = enforce_detection)[0] #preprocess_face returns (1, 224, 224, 3)```
You're right, thanks a lot
the first element usually represents the batch size. There, it just mean there is 1 image. if it was 16,224,224,3, it means u have 16 images
Hello, I needed some ideas for a simple ML project, can be based on anything, just to get hang of the field. Please help me out.
Need some help with matplotlib. I am using plt.scatter to draw dots for a matplotlib animation. How do I adjust the size of dots such that I can define the radius of a dot in relation to the x/y scale.
That is, say my units are meters in my calculations and I wish to draw a dot that is 0.2 meters in radius
is there a completely random dataframe method using only pandas? I get that I can do this with both numpy and pandas via df = pd.DataFrame(np.random.randn(5, 5)) but I'd rather not import another library just to do this.
I would like to examine the correlation between a nerves diameter and how fast it gets damaged after being subjected to physical strain. The damage is expressed in a change of the nerves amplitude over 7 measurement points. E.g.:
before strain after strain ... strain 8 min strain 10 min
0 1.8 2.3 ... 0.0 0.0
1 3.4 3.2 ... 0.2 0.3
2 5.5 4.5 ... 0.1 0.0
3 4.1 4.1 ... 0.4 0.0
4 6.8 4.0 ... 0.0 0.5```
How would I statistically prove that (non-normal data distribution)? I can't just do
```py
from scipy.stats import spearmanr
spearmanr(diameter, amplitudes)```
Since the damage isn't expressed in a single column with yes/no but rather in the development of the values of 6 different columns in relation to the first column.
Just as a heads up: My statistical knowledge is basically 0.
should i learn numpy and pandas before i start learning tensorflow?
basics of numpy would be great, but pandas wouldn't be to necessary. it depends on the kind of stuff you wanna do
where would pandas be used
also 1 more question, is there even any programming in ai? because from what ive read most ai are like 100-200 lines
you have read wrong - if you use libraries then sure your model is done in 100-200 lines. but in reality its using more like 2000-5000 lines which are hell to debug (unless the lib is written pretty well, which it has in case of TF and PT).
you do require heavy programming expereince. you can read 100-200 lines but I am sure won't be able to understand a good chunk unless you have some idea of the underlying theory.
mostly dataframes and tabulated data
Hey you wonderful data science / ai people! I have an issue, and even a solution from Stack Overflow- but I dont know how to implement it for a GAN I'm using.
Everything works until an error at the end:
ValueError: Cannot feed value of shape (50, 128, 128, 4) for Tensor 'inputs_real:0', which has shape '(?, 128, 128, 3)'
The solution seems to be here https://stackoverflow.com/questions/45966301/tensorflow-cannot-feed-value-of-shape-100-784-for-tensor-placeholder0
My google colab notebook: https://colab.research.google.com/drive/1AHWQkqdMBur2l3lMdVuxB5xOyRQmkZ5L?usp=sharing
It's the very last codeblock
Any help would be just wonderful ๐
While current Deep Learning is simply just math and logic, programming is something that you should know at a decent level unless you are in research
in industry, your programming skills have to be the absolutely tip-top to create a good product
however, if your aim is research then you wouldn't require that much programming skills - moreso mathematicals knowledge
I can't open the notebook, but it seems from the error that you might have forgotten specifying the batch size
i know programming im just hesitating on learning ai because writing 100 lines to make an ai is just boring
i thought it would be more challenging
Weird thing is, there is a batch size. I can print it as an int before passing it through.
in what way? its extremely challenging if you are the person making them, not copying what other have created ๐คท
that happens in like....every field really lol
in what sense?
can you post the offending code here as a codeblock?
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
sure!
Here's the last two code blocks that seem to be the most relevant:
BASE_PATH = "/content/drive/My Drive/projects/facegen"
print(os.listdir(BASE_PATH))
DATASET_LIST_PATH = BASE_PATH + "/10k.txt"
print(DATASET_LIST_PATH)
INPUT_DATA_DIR = "/content/drive/My Drive/projects/facegen/dataset/cartoonset10k/"
print(INPUT_DATA_DIR)
OUTPUT_DIR = '/content/drive/My Drive/projects/facegen/results/'
#MODEL_PATH = BASE_PATH + "models/" + "model_" + str(EPOCH) + ".ckpt"
DATASET = [INPUT_DATA_DIR + str(line).rstrip() for line in open(DATASET_LIST_PATH,"r")]
print(DATASET)
DATASET_SIZE = len(DATASET)
print(DATASET_SIZE)
MINIBATCH_SIZE = DATASET_SIZE // BATCH_SIZE
#DATASET_SIZE = np.reshape(DATASET_SIZE, [0, IMAGE_SIZE, IMAGE_SIZE, 3])
# Training
#data_shape = (DATASET_SIZE, IMAGE_SIZE, IMAGE_SIZE, 3),
#data_shape = tf.placeholder(tf.float32 , [None, IMAGE_SIZE, IMAGE_SIZE, 1]),
#data_shape = tf.reshape(data_shape , [-1, IMAGE_SIZE, IMAGE_SIZE, 3]),
#data_shape = tf.reshape(data_shape , [-1, IMAGE_SIZE, IMAGE_SIZE, 3])
#print(data_shape.shape)
with tf.Graph().as_default():
train(data_shape=(DATASET_SIZE, IMAGE_SIZE, IMAGE_SIZE, 4),
epoch=EPOCH,
checkpoint_path=None)
the comments are me trying to implement the solution
Do you know any valuable alternatives to training word embeddings in NLP? There are Word2Vec, GloVe etc.
But are there any other approaches, encodings and so on, which might give an advantage?
Feel free to @ me. Thanks guys!
model embeddings - like from NLP models like BERT, RoBERTa etc.
notice the 4
(50, 128, 128, 4) .... which has shape (?, 128, 128, 3)
I don't know how you even got 4D data, assuming the data is an image
if its not RGB, then convert it to RGB
you're right I just had that revelation in a help channel I had opened
I think it has to do with the image channels?
The guy who wrote the script actually got back to me with this: the above (error )implies that you have a mismatch in the last dimension responsible for the number of channels in an image.
yeah, the error simply implies dimension mismatch, not in any specific place (like the "last" dimension)
interesting
perhaps you can remove a dim from all the images then
@grave frost https://www.youtube.com/watch?v=qFJeN9V1ZsI&t=1527s
This course will teach you how to use Keras, a neural network API written in Python and integrated with TensorFlow. We will learn how to prepare and process data for artificial neural networks, build and train artificial neural networks from scratch, build and train convolutional neural networks (CNNs), implement fine-tuning and transfer learnin...
should i start with this?
here's a primer for ML beginners on DS libraries: https://colab.research.google.com/github/yandexdataschool/Practical_RL/blob/coursera/week1_intro/primer/recap_ml.ipynb
well, actually it's from an RL course, but nevertheless it's a primer on numpy, pandas, matplotlib, and sklearn
Hey thank you for the resources ๐
You did really well for not seeing most of the code, thanks for your help ๐
so i should complete than then watch the vid i sent?
I suggest you take things at your own pace rather than doing a 3 hours free bootcamp which won't teach anything
question, when running a one way anova, do you use all observations or take a random sample of observations?
So this might be up this channel's alley. I have a 20mb text file full of speakers. Format is like this:
MAIN SPEAKER: LOREM IPSUM DOLAR
LOREM IPSUM DOLAR AMET
LOREM IPSUM DOLAR AMET
SPEAKER TWO: LORE IPSUM DOLAR AMET
MAIN SPEAKER: LOREM IPSUM DOLAR AMET
SPEAKER THREE: LOREM IPSUM
LOREM IPSUM
LOREM IPSUM
I basically need to get everything the "main speaker" is saying. I'm trying to come up with a regular expression, or python way to capture it. I know I can do ^SPEAKER TWO: and remove a single line pretty easily that way with Notepad++
But I can't find anything that will span multiple lines and remove them UNLESS those multiple lines belong to the MAIN SPEAKER: until the next SPEAKER ONE (two, etc) start talking
hi! short problem for you folks, what (or rather, why) on earth is this:
print(tf.keras.preprocessing.text.one_hot("a", 27, filters='', lower=True))
print(tf.keras.preprocessing.text.one_hot("a", 27, filters='', lower=True))
print(tf.keras.preprocessing.text.one_hot("n", 27, filters='', lower=True))
>>> [5]
>>> [5]
>>> [5] # ??? the same
i'm quite new to ai so i have no clue as to whether that's normal sadly
00
i can't imagine it's a bug but i'm definitely confused