#data-science-and-ml
1 messages · Page 398 of 1
it sounds like this is a homework assignment. is it?
At one point it was. I am re-using it for self-learning as I prepare for a new job this summer
Once I am able to figure out the structure of training I think I can take it from there!
Don't have too much experience using python classes
okay. it looks like you have all the pieces there, i recommend following the guide in the docs https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html
you don't need to know that much about python classes to use this, although writing your own classes benefits from some understanding of how they work
I can read the docs, however I am limited to numpy
the devs of pytorch have also written a book on deep learning which quickly goes over the basics too
numpy? this seems like pytorch
what do you mean by "limited to numpy"? you can't use pytorch without pytorch
Anyone here good with confirmatory factor analysis in Python? I am stuck on an issue in #help-falafel would appreciate help
Hey @analog kestrel!
It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.
Feel free to ask in #community-meta if you think this is a mistake.
i was about to say, you won't be able to upload an ipynb file. export it to python with jupyter nbconvert --to script (you need to pip install nbconvert first)
i think jupyter also has an option in the menu to save as a plain python file
Hey @analog kestrel!
It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com
!paste @analog kestrel
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
paste the contents in that site ☝️
interesting, it looks like this is a from-scratch implementation of the torch "Module" interface
who made this?
and why?
it's hard to give any feedback on this because it's someone's entirely custom work
i see
i suppose you're expected to call forward, backward, and gradientStep yourself
yes^
do you know what those 3 things mean?
i hate the forward/backward jargon
"forward" means "generate a prediction at the current weight values"
"backward" means "compute the gradient at the current weight values" (which uses the output from the "forward" part)
yes, i understand conceptually how it works - i am having implementation issues using what was provided in the classes
and then "gradientStep" is the gradient descent weight update
looks like you are just expected to call forward, backward, gradientStep in a loop
how do I update the weights across training epochs?
it looks like the gradientStep method does that for you
look at how it is defined
MLP.gradientStep calls fc1 and fc2 gradientStep methods
so you look at those methods
fc1 and fc2 are instances of Linear, so look at Linear.gradientStep
and you can see that it does exactly what you want: it updates the weights and biases
Hello there i am wondering if there is a library that i can use to generate a search tree for example see the above picture
as the data is also a 2d array and has a path
thanks
@analog kestrel is this all you have? do you have any notes on backpropagation or usage examples?
@desert oar thank you for the assistance. i appreciate it
and yes - that is all I have
I think I need to play around a bit more
been stuck for a while though
strange behavior with the gradientStep function...
(this is for the 1st training iteration, and the first batch of 16)
looks like you put in the wrong inputs
although i was about to post a code snippet before i had to leave for a few mins
and yeah that does look right
please do post code as text in the future though
its hard to read screenshots
!code see below for instructions on code formatting:
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
model = MLP(n_classes)
criterion = LeastSquareCriterion(n_classes)
for ecoch in range(n_epochs):
for (x_batch, y_batch) in batches:
y_pred = model.forward(x_batch)
loss = criterion.forward(x_batch, y_pred)
grad0 = criterion.backward(y_pred, y_batch)
grad = model.backward(x_batch, grad0)
model.gradientStep(learning_rate)
your code is more or less that, right?
loss.backward i think is wrong
i think the "x" in the backward function is supposed to be the y values in your batch
do you mind if I ping you later? hoping to get unstuck and complete this
i am going to be out this afternoon, so you can ping but i might not be available
got it working. thanks again so much! excellent!
great. what was the issue?
overlooked the correct inputs the loss function, as you pointed out
good
it helps to remember the rules for the size of matrices in matrix multiplication
Hello! I am attempting to dive into a project request and while I'm slightly familiar with this, i'm having a hard time getting started in the right direction. I am trying to forecast (predict) when orders will occur in the current month based on historical data. These orders will come from specific customers along with an order type. The idea is to determine if we will be able to complete the order based on our capacity given actual orders for that day and forecasted orders for that day.
I figured some type of time series analysis would fit, but haven't gotten much further as im not entirely sure of the measures a Time analysis uses/needed.
are the orders on a monthly basis? or are they on a fixed-duration basis?
that is, can you expect "one order every month" and you just don't know which day? or is it more like "there is an unknown number of days between orders, but usually it's 1 per month"?
based on the criteria of each order from each customer, it could be one order of type a and 3 for type b spaced x days apart, or for some customers it could be type a is ordered first, then a type b, etc. depending on how the orders fall, you may have a type a order for a customer in one month, then the following month there would be a type b. All of this information is recorded in the historical data.
and are there other features that might affect when the order arrives? are orders more frequent at certain times of year? presumably different customers tend to order different types and quantities of things
yes, or some customers may order from us for one type and someone else for anther type.
but of course, we would just try to forecast for what we have fulfilled in the past
ultimately are you just interested in forecasting order quantites across all customers? or do you need customer-level predictions?
this seems like a nontrivial problem btw
Quantities across all customers with one or two levels of criteria
hmm, that makes things a bit easier
you can model order quantity as a non-homogeneous poisson process
or you could apply some kind of time series model like (S)AR(I)MA(X) for monthly or weekly order quantities
Okay, that's the route I started on but then stopped ha. Any resources on this to get me started?
hard to go wrong with FPP https://otexts.com/fpp3/
this paper is also interesting, although the "arrival rates" are probably a lot higher than what you deal with in your business http://www.columbia.edu/~ww2040/4615S15/Forecasting_032818submit.pdf
I'll check it out, thanks for chatting it out with me
Anyone who wants to help me implement Keras Tuner?
this sounds like something that would require a significant donation of someone's time, so you're more likely to get help if you ask a more pointed question.
Is anyone good at coupled systems of ODEs?
Hi, anyone can gives me a recommendation source about machine learning deployment?
@desert oar any sources that use Python? what you referred to uses R 🙄
can someone help me navigate old TF code? https://github.com/ofirpress/UsingTheOutputEmbedding/blob/master/ptb_word_lm/ptb_word_lm.py#L101-L128
i'm not sure I understand at all what stuff like softmax_w = tf.get_variable("softmax_w", [size, vocab_size]) means
nvm, i'm dum, this isn't the file that ties the weights
Anyone here can handle leetcode medium?
where i can find this dataset?
For CNN word embedding layer, how do you decide whether to use GloVe or Word2Vec? Task: multi-label text classification with small data
IntroThis is a part of the series of blog posts related to Artificial Intelligence Implementation. If you are interested in the background of the story or how it goes: This week we'll showcase testing process and the early results of the model. We will be using SerpApi's Google Organic Results
Anyone can suggest pretrained model for text classifcation task?
Architecture : CNN.
Task: Resume parsing
if it's text classification, what are the classes? if it's text parsing, do you mean information extraction, and what are you trying to extract?
i want to extract all details in the resume such as the personal details, academic background etc.
So I need to do block segmentation to achieve that, where I cluster all text lines that labeled as entity of academic background (such as institution name, year of study, major name)
Here I need to classify each line whether it is academic background or personal details or not.
@tranquil sage is a text version of the resume already given?
yup, the resume originally was in PDF format. now i've extracted them to text using Apache Tika. all of it were in one text files now. to be annotated.
I would probably do named entity recognition first on the whole document first. things like address, institution, academic subject. and then figure out where each entity fits in to the bigger picture.
Hello!
I am trying to filter through a dataset(I refer to as "df") to find specific words in a column called df["Question"].
I used the following code
Code:
def word_filter(dataset, words):
filter = lambda x: all(word.lower() in x.lower() for word in words)
return dataset.loc[dataset["Question"].apply(filter)]
filtered = word_filter(df, ["king", "England's"])
print(filtered["Question"])
Result:
4953 Both England's King George V & FDR put their stamp of approval on this "King of Hobbies"
Name: Question, dtype: object
I am trying to figure out how I can make the function more uniform and applicable to all columns so I attempted the following but its not giving me the exact same results
Code:
def word_filter(dataset, words, column):
filter = lambda x: all(word.lower() in x.lower() for word in words)
return dataset.loc[dataset[column].apply(filter)]
filtered = word_filter(df, ["king", "England's"], "Question")
print(filtered)
Result:
Show_Number Air_Date Round Category Value \
4953 3003 1997-09-24 Double Jeopardy! "PH"UN WORDS 200.0
Question \
4953 Both England's King George V & FDR put their stamp of approval on this "King of Hobbies"
Answer
4953 Philately (stamp collecting)
it seems to be giving me the the rows of all columns associated with the data I am looking for
Any thoughts on how I can improve it?
how to fix this error?
the error message is telling you the vocab property no longer exists. presumably, the link right underneath explains how to use the functions and properties you are being recommended to use instead of vocab. tl;dr: word_vectors.vocab doesn't exist, read what the new equivalent is
ok, thank you
# Mean of word vector
def vectors(document_list):
document_embedding_list = []
for line in document_list:
doc2vec = None
count = 0
for word in line.split():
if word in model.wv.key_to_index:
count += 1
if doc2vec is None:
doc2vec = model[word]
else:
doc2vec = doc2vec + model[word]
if doc2vec is not None:
doc2vec = doc2vec / count
document_embedding_list.append(doc2vec)
return document_embedding_list
document_embedding_list = vectors(prac1['vendor_tag1'])
print('Number of document vector:',len(document_embedding_list))
I got an error like this: TypeError: "'Word2Vec' object is not subscriptable
how to fix this error?
on this @rugged ether
can you give me the clue what of the line can be replaced?
I get an error like this
this is what I change
The correct code is to use 'key_to_index', but thank you so much for helping me
Oh yeah, I'm sorry I don't see that Lol🤣
hey anyone got advice on how i would start learning how to make a neural network for medical images. so feed in medical images and label them. any resources you can ffer?
Don't know if this is the right place to ask but how do I open the data in a .nc file ? I've tried for an hour now and still can't see the entire data in the files
Hey @quasi pier!
It looks like you tried to attach file type(s) that we do not allow (.nc). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.
Feel free to ask in #community-meta if you think this is a mistake.
Hello,
I am using this function to predict the output of never seen images
`
def predictor(img, model):
image = cv2.imread(img)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (224, 224))
image = np.array(image, dtype = 'float32')/255.0
plt.imshow(image)
image = image.reshape(1, 224,224,3)
label_names = train_ds.class_indices
dict_class = dict(zip(list(range(len(label_names))), label_names))
clas = model.predict(image).argmax()
name = dict_class[clas]
print('The given image is of \nClass: {0} \nSpecies: {1}'.format(clas, name))
how to change it, if I want the top 2 accuracy`
i.e
70% chance its dog
15% its a bear
Guys, I just did kmeans(3).cluster_centers_
Which centroids does it return?
Because I haven't fit in my data yet.
mig = np.array(1).reshape(1,1)
mig
what is the meaning of reshaping a single element array
what it means mathematically
please don't ping random people to ask for help. instead, put an actual question into the chat that someone can read and start answering.
Hello Folks,
I'm trying to install caffe on MacOS Monterey but couldn't find any relevant article for a python newbie like me. Could anyone point me to the right direction please? Any help would be appreciated.
Thanks in advance
u can get it on python.org
from openpyxl import Workbook, load_workbook
import numpy as np
wb = load_workbook("Heat Exchanger Data.xlsx")
ws=wb.active
#initialising variables
Tcold=[]
Thot=[]
lmtd=[]
#columns im interested in
columns=[57,59]
#calculates and returns log mean temperature difference
def paralmtdcalc(hin,hout,cin,cout):
t1=(hin-cin)
t2=(hout-cout)
lmtdval=(t1-t2)/np.log(t1/t2)
return lmtdval
#iterates by row of excel sheet and stores them in an array
for row in range(2,5):
for clm in columns:
Tcold=np.append(Tcold,ws.cell(row,clm).value)
Thot=np.append(Thot,ws.cell(row,clm+3).value)
hi guys first time using openpyxl was wondering if the library has an issue reading reference cells (i.e values in this cell were calculated using other cells) as i repeatedly get nonetype errors or is this an issue with my code
there are values in columns 57 and 59
Hi Guys,
I have on question. Can we have the same bias for all neurons?
can someone tell me what's going on in this code
!code
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
Wdym. Is using a picture bad?
yes
Hey yall...I am trying to get st dev using numpy and needed some help. I need to convert the below line into numpy.
The question I have is how do I get st dev of the last n elements instead of the whole array? numpy.std() does not take the length as a parameter. Any thoughts?
def stDevHi = stDev(high, stDevLength);
is the array one-dimensional for sure?
Yes. Its just historical prices
centers = kmeans.cluster_centers_.reshape(num_cluster, 8, 8)
for axi, center in zip(ax.flat, centers):
axi.set(xticks=[], yticks=[])
axi.imshow(center) ```
also, keep in mind that the style in python is to write st_dev and st_dev_length.
you can use slicing to get the last five elements, so arr[-5:].std()
Help
Ok let me try that
looks like it's supposed to plot the cluster centers discovered by kmeans clustering. what does one of the figures look like? (you can use screenshots to show figures.)
The images look fine. I don't understand how the code implements that. I know what the images represent
But if you wanna see
These are centroids of the digit-clusters.
fig, ax = plt.subplots(2, 5, figsize=(8,4)) -- create a blank figure
kmeans.cluster_centers_.reshape(num_cluster, 8, 8) -- put the ten images into a (10, 8, 8)-sized array (each (8, 8) slice is one of those images)
for axi, center in zip(ax.flat, centers): -- for each of these...
axi.set(xticks=[], yticks=[]) -- make the axis labels blank
axi.imshow(center) -- show the image
And why was ax.flat done
I don't know
Hello, i have a doubt
My model gives different accuracies on the same test set each time i run it
How can i make sure this doesn't happen, as it extremely stochastic and idk which result to save
@serene scaffold I just tried that. While it works, it returns a scalar value however I need to return an array. I basically need st_dev at each index. Let me know if that does not make sense.
@serene scaffold I did arr[:length] and I think that worked. I need to validate the numbers Thanks for your help
so you need a sliding window of the standard deviation for five values?
Yes....sorry I should stated that
@spice mesa would be easier to do with a pandas Series: https://pandas.pydata.org/docs/reference/api/pandas.Series.rolling.html
Hello People!This is Prasanth and im a CS 3rd year Btech student from India. I need a deep learning model which takes only one picture of a person and then have to save his/her photo for future use. now whenever we given that persons picture to the model it needs to identify the person and predicts him accurately.(need face recognition) but the main problem is i want to store host the model online and develop an application with authentication of the user. so if the user ever logouts or uninstalls the application , and reinstalls it and then logged in his account, then the model already trained needs to identify his face
how can i make this ?
in simple words - as far my knowledge after deployement or training we can only use the model for trained set of faces only. (like if we trained with 5 person faces it can only recognize those 5 people) but i need a model that will be deployed and can be trained with just one picture of the user and has to identify them even they logged out and logged in the application(as once the training is done). and mainly as the model hosted online new users will use the app and i dont know how an already trained model can detect new users faces (bcoz model never hadnt trained with the new users face right?)]
how can i achieve the solution for this problem
im a beginner and guide me if im wrong about my knowledge.
i need an dl model that simlutaneously predicts already trained users faces and alse needs to be trained with new user faces too.
I would love to do Pandas Series however I am limited to numpy. Need to use array's only
who is imposing this limitation?
I am using a tool called deephaven which is developed in Java and works iwth Python. If I use Pandas, it creates a disconnected dataset. I have to use array's if I want real time data updates.
the point of training is for the model to be used with inputs that were not part of training correct?
So @serene scaffold I guess I need a rolling st_dev
@serene scaffold I think I found a solution. Wanted to share with you.
def rolling_window(a, window):
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
strides = a.strides + (a.strides[-1],)
return numpy.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
stDevHi = numpy.std(rolling_window(high, stDevLength), 1)
https://stackoverflow.com/questions/6811183/rolling-window-for-1d-arrays-in-numpy
return tf.nn.softmax_cross_entropy_with_logits(
Node: 'categorical_crossentropy/softmax_cross_entropy_with_logits'
logits and labels must be broadcastable: logits_size=[10,10] labels_size=[10,5]
[[{{node categorical_crossentropy/softmax_cross_entropy_with_logits}}]] [Op:__inference_train_function_925]
2022-04-22 22:26:05.679486: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: FAILED_PRECONDITION: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]
can anyone help me with this
Hey @steel vector!
You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.
code
I'm uninstalling anaconda from a windows machin and OF COURSE they cannot be bothered to make it straight forward to unistall like 95% of everyother software program
...so refering to this: https://docs.anaconda.com/anaconda/install/uninstall/
How do I know qwhich is the 'root of your install'?
Like here👍 is where navigator is: C:\Users\adankert\.anaconda
Should I delete that file?
or C:\Users\adankert\.ipython
?
THis file?:C:\Users\adankert\.spyder-py3
anyone has experience in deep learning // Extreme learning machine in particular ?
S/O to whoever is in charge of regularly designing and changing the profile image of this server ... I think it's kinda refreshing to see 😀
Ask your question, not if there is an expert that can help you
someone can give me resource to learn how to make neural network for medical images
?
hey is there any one can help me? what im trying to do is to get some values from some roi, and train that data to make a prediction
help how to add p values
like compare one group to another in plot
automatically work out and add p values
Am I stupid or is this calculation wrong?
n(truth) is not the same as correctly classified
Not sure what it actually is, but it is bigger than the n(classified) in a certain row
the number of truths (the number of actual cases found in the class) and the number of classified (the number of cases classified as belonging to that class)
oh
so i am stupid
:incoming_envelope: :ok_hand: applied mute to @hollow sun until <t:1650670189:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).
is pandas' .get_dummies() basically one hot encoding?

or am i misunderstanding the documentation
Hello,i have been training my model using kfold and i have used used the numpy set random to set the seed
Yet every time the model gives a different result on the test set abd idk why one to save
It is so stochastic,can someone please tell me how i can solve this
How did u guys begin learning data structures
why the number of rows is different from the amount of index?
it could be because of an index modification of some sort
reset the index by calling the df.reset_index() function and see if that works
I learned some basics with Datacamp&Dataquest, but I feel like they rush through the material sometimes. Now I practice more with my own small projects that I find online (free) & Codecademy
Isn't this Word Embedding? You could leverage already available word vector models instead of building yours from scratch. There's word2vec, GloVE, BloomEmbedding etc
Hi, how do I load a .svm and .jsonl format dataset into jupyter as csv format? Please help
ok thank you
Has anyone used blender to deliver better visualizations to clients? Can you tell me about your experience if so?
how to extract data from multiple pages at a same time?
multiple pages of what?
and by "at the same time" do you mean "in parallel"?
Any data science learner??
are we not all learners?
We all are.
Buddies I want to learn skills in data science bt i have no a good roadmap thah what to learn first, what to learn after this, I want a step by step roadmap.
Can you help me??
Yes here
Hello
Going to
What are you persuing now.
Engineering brother!!
B.tech?
Yup
Same here brother!
from the math side, the core competencies for introductory data science is usually linear algebra, multivariable calculus/optimization, and statistics. if you're using python, you have several options on the library side. you can do lower level stuff with numpy and jax, for example. scipy is a bit higher level. then there's stuff like pytorch and tensorflow, which is (usually, but not necessarily) higher level than the previous ones i mentioned (by higher level i mean more abstract and requiring you to deal less with the nitty-gritty)
interestingly, the 3 topics i mentioned CAN interact and overlap quite a bit, but can also be learned largely independently of each other, so you can almost learn them in whatever order you see fit
i want to learn ml which branch should i use
do you know what the branches of ML are?
uh i dont mean that i mean there are like deep learning and all sorry i used wrong word
deep learning requires a lot of advanced knowledge, and you can learn a lot about ML without touching it
ohh so where i should start
what is your goal?
what is a chat detector?
chat detector means that take a look at chat and autodetect bad sentences
bad sentences. what makes a sentence bad?
like rude and all
a profanity filter is relatively simple, but something that measures rudeness would be quite complicated
i want to do camera relative stuff
for now
and
like if i give him a photo of dog so it detect dog and all
this kinds of
for that, i'd recommend precisely what i had described above, if your goal is to understand image classification and segmentation well. if you only wish to use libraries, you don't really need much more than to watch a few videos on youtube or coursera on deep neural networks, classification, and segmentation. if you want to go more in depth and design your own stuff, you'll need to do statistics, linalg, convolutions, etc and be able to choose or create your own cost functions as needed (e.g. using differential equations or statistical targets)
hello i have set up grid search cv in this way, can someone pls tell me if it is correct
model1 = KerasClassifier(build_fn=create_model, epochs=50, batch_size=32, verbose=0)
# define the grid search parameters
params = {'learn_rate': [0.1, 0.01, 0.001],'dropout': [0.2, 0.3, 0.4, 0.5],'epochs': [50,80,100]}
grid = GridSearchCV(estimator=model1, param_grid=params, n_jobs=-10, cv=5,)
grid_result = grid.fit(X,Y,callbacks=[early_stopping])
# summarize results
print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_))
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):```
Are there any good resources on how to interpret accuracy and loss curves?
I'm also not sure why there was an outlier on epoch 9
it looks to me like both your train and val are steadily climbing still, so might be good with some more epochs
I don't know any general guides on how to interpret them, but there's plenty resources on how specific patterns (overfitting, underfitting, etc) look, I think
Yeah it might help to average the curves over multiple runs
It seems that accuracy changes a lot over each epoch, so it would be good to do multiple runs so you can visualize the average and the standard deviations for each epoch
Ahhh let me try that
I had noticed this too, so I wrote "As the validation curve has not started to plateau towards the end of the graph, another assumption can be made that the model has not reached its final potential and should be continued to train until there is a significant plateau."
Need to fix wording but gets that general idea across i hope
maybe dataset isnt consistent
Yeah one of my classes has something like 4 times the images of other classes
Because I needed to make sure that class was especially being correctly identified
if it's 80% of the dataset, then 80% accuracy can correspond to quite a low f-score (because reaching 80% acc on such a dataset is as simple as guessing the most common class no matter what)
I have that class with 2000 images, and then 5 other classes each with 500 images
I know it's not an ideal situation :I
do you using shuffle?
Mhmm
I learned a lot from this project tbh. I wish I could go back and re-do it all, but unfortunately the deadline is very soon, so time doesn't allow
ML tasks usually deal with grossly nonconvex target functions and noisy data. using stochastic gradient-like methods, this sort of behavior can occur due to the gradients not being quite right at each iteration (the step size schedule takes care of this through an averaging effect, eventually converging to a true, though possibly local, minimizer). the cost functions are also formulated statistically in many cases, too. this means that you expect the behavior of the learned model to be accurate "on average", not exactly right for every single data set you present the model with
contrast this with model-driven (as opposed to data-driven) techniques where exact knowledge of a couple of orders of differentials lets you more accurately define trust regions so that you can ensure the cost decreases at every single iteration
I do not look at screenshots of code. Can you explain what you mean by "real text" or "fake text"?
last time I looked into fake news detection, the technique is to track the flow of information on the internet back to a known source of misinformation.
i want to know news hedline or news is real or not
how is an LSTM supposed to do that?
wait
Sure. but keep in mind that I won't look at screenshots of code/text
I will show you how to do fake news detection in python using LSTM. LSTM is a deep learning method to train ML model. I will be also using here gensim python package to generate word2vec matrix. This method is 99% accurate as shown in this video lecture. Please like and subscribe this channel.
Dataset: https://github.com/laxmimerit/fake-real-ne...
check this video
I don't have time to watch an hour and a half long video.
well, like I said, the technique that I'm familiar with is about identifying instances of the same news story throughout the internet, and tracing the origin of the information. I'm skeptical that a model trained on existing data could potentially predict the truthfulness of future headlines.
but if you need help with a specific issue, you can ask.
wait
No model (unless the model is trained on a giant set of things considered true/real, but that is basically the same as checking some trusted sources (and it won't work for breaking news)) can tell if something is fake news or not based just on title (actually impossible), the best that can be done is what @serene scaffold mentioned, you check with some trusted source, and/or check if it's from an untrusted source. You could maybe do something like a clickbait detection (seems scam-ish / clickbait-like).
Hey @woven coral!
It looks like you tried to attach file type(s) that we do not allow (.ipynb). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.
Feel free to ask in #community-meta if you think this is a mistake.
you could also see if the headline contains words that are bombastic or absurd, but that hasn't been as effective since 2016.
x=[""]
x=tokenizer.texts_to_sequences(x)
x=pad_sequences(x,maxlen=maxlen)
I've said twice that I won't look at screenshots.
(model.predict(x) >=0.5).astype(int)
x=tokenizer.texts_to_sequences([""]) -- what is this intended to do?
x=["this is a real news"]
x=tokenizer.texts_to_sequences(x)
x=pad_sequences(x,maxlen=maxlen)
hi all im new to machine learning
(model.predict(x) >=0.5).astype(int)
i have a question: what does the training set mean? m = training set....
if x=[""] is real it will give me 1 out put if it is fake it will give 0
generally speaking, machine learning is about identifying patterns from training data, and then seeing if it can apply those patterns to get the right answer on test data.
so would you say the training set is the entire set of values ...all the x's and y's?
x=["“The Trump campaign has confirmed to Hannity.com that Mr. Trump did indeed send his plane to make two trips from North Carolina to Miami, Florida to transport over 200 Gulf War Marines back home.”
— quote in article titled “200 Stranded Marines Needed A Plane Ride Home, Here’s How Donald Trump Responded,” Sean Hannity Show website, May 19, 2016"]
x=tokenizer.texts_to_sequences(x)
x=pad_sequences(x,maxlen=maxlen)
in [" "] it is a fake news
both the training and test sets have x and y values.
test set means the values retrieve from the learning program?
yh
in this model i used lstm,gensim,word2vec
but when i used one hot encoding insted of word2vec
for the training data, the model is given both the x and y values, so that it can learn how to return the right y values given the x values.
for the test data, the x values go into the model, and you see if the y value it returns is the correct one.
ohhh ok the test is purely practice values. ok thank you that was a great explanation
you are welcome 💚
i dont know how can i check real or fake news on on hot encoding
neither do I. I don't think that you can.
what should i do
try a different beginner project
but i want to this
AI isn't a crystal ball. it's one thing to build a model that can predict things when the factors that cause that thing are known
you're trying to build an AI that can ascertain whether or not a proposition is true or not, even when that proposition hasn't been conceived yet.

It would be really complicated. Not a beginner project, and IDK if anyone has ever done it.
in fact, if this were possible, someone should build it and try headlines about stocks going up or down
Also way more than just ML involved.
it is deep learning model
The type of thing I would expect maybe some large government to invest into making, given all the data they have access to.
"deep learning" just means "neural network with lots of layers". it does not mean that it can predict the future.
But even then, I doubt anyone has made it (work well).
do you understand how what you are trying to do is tantamount to predicting the future?
one more question if you dont mind....is there a standard formula for the variables....x, y, h, m.....x = input value, y = output value, h = hypothesis, m = training set....
x_test, y_test, x_train, y_train

@woven coral I'm not trying to discourage you. I'm trying to help steer you towards a project that is more obtainable and potentially fulfilling for you
jsut a guy trying to help...or girl
i know
thanks
just rying not to be sexist 😂
I usually try to give short answers with emojis. that one was to indicate that I am boy.
no offense to this teacher but his english is difficult to understand...im assuming he is saying that m = the total number of value sets in this example... set meaning x and y
i understood
we would call those "{training,test} instances". looks like the point here is to discover the relationship between the size of a residence and the price?
I have a binary classifier. It has 98% accuracy. Is there anyway I could push it even further now that it's trained?
Depending on what library you used to create the model, you might be able to continue training it with additional data, if you have data it hasn't already been trained on. But that 2% might represent instances that are genuinely ambiguous
can anyone make out what is this symbol under the sigma symbol
i = 1, with the dot of the i confusingly placed
THANK YOUU
you are welcome 💚 any other questions about the notation used here?
Professors who write stuff by hand...
anyone who has a PhD or MD should not be allowed to write by hand.
can you help me read this formula...
I write on a tablet when I listen to lectures, and I tend to erase and rewrite stuff a lot to make it all look good. but I think students would find that obnoxious if I were to display my tablet on a screen and write on it while lecturing. (I have never been a lecturer)
I would need to see the rest of the context to be sure about h, but the superscript (i)s are "the ith element", presumably of the weights.
i think he is just setting up the formula so we have something to base off of, similar to a quadratic formula ax2 + bx + c
ok i found out the name of the formula, square error function(cost function) which he said is used in most cases of regression
It appears your model could still learn a thing or two (pun intended) 😀 from your data had you allowed it more training time
If we're strictly gonna use this curve to judge, you could see that your validation accuracy started to decline beyond the 13th epoch but there's every possibility it could still peak.
Looking at both curves very well, one could argue that this curve is simply telling you that your model hasn't quite finished learning. It needs more data!
Try to increase the number of epochs and monitor what's gon happen next. I think it'll be fun to find out 😀
geopandas is pretty cool
def recommend if you are working with spatial data
you just need latitude and longitude
Does anyone here understand variational inference? I'm havinga hard time getting some things about it.
To anybody using dataclasses, where exactly would you use it?
I'm struggling to find any uses that aren't easier with dictionaries or pandas
dataclasses are intended to reduce the amount of boilerplate when you define a class. but we don't define classes very often on the data science side of things.
makes sense
using a dataclass is preferable to having a bunch of dicts with equivalent sets of keys. but if you're in that situation when you're doing data science, then it probably makes more sense to have all those would-be dicts/dataclass instances as rows of a dataframe
yeah that's what I was thinking, most of my day to day processing just involves piping various dataframe apply functions
if it's not too complex I tend to just default to dictionaries
are you sure you have to be using .apply?
should I not be?
apply doesn't benefit from any of pandas' optimizations
it's just there for convenience if nothing else will do.
I thought it was the faster way to do row by row operations
apply with lambda functions
nope, it's the same as looping over the rows.
you have to use one of the actual pandas methods
I mean I vectorise where possible
when you say that you vectorize, do you mean that you're using apply?
I mean if it's simple enough I try to default to numpy arrays and just use normal quant functions
If it's more complicated, like a series of if statements, I'll define the condition as a function and use .apply with a lambda function
If I'm just slicing a dataframe, like identifying specific date ranges, I'll use the build in functions
well, like I said, that's the same as looping over the series/dataframe in pure Python, and it doesn't benefit from any optimizations. so you might see if there's ways to accomplish what you're doing in terms of the pandas API
I mean I try to use built in functions where possible
I'm just wondering where a dataclass would fit into this
can any one help me with sql
What’s your question
there's a #databases channel for that, but please always ask your actual question; you'll irritate people if you post teasers for questions without giving enough information for someone to answer it.
I’ll help you out in the db channel if I know the answer too!
Unrelated, but if I have a question about a git api where would I ask it?
Anyone here who is learn IBM professional certificate form Coursera?
Need help please: I want to find the n most frequent Sequences of Strings in a pandas DF. So the DF has a Column containing Names of Persons and I want to find out which names typically occur in succession. Sequence length is 2 and no gaps.
from nltk import ngrams
from collections import Counter
ngram_counts = Counter(ngrams(text.split(), 2))
ngram_counts.most_common(25)
this is the answer 😄
Hey guys, quick question: Can somebody recommend a tutorial for a model pipeline? Should contain data cleaning to perhaps improving data, following usage of a ML algorithm?
I'm trying to learn to use that model for a web application.
Hi can someone please advise on the best metric to use to automatically evaluate generated sentences, please @ when replying
what are you trying to evaluate about them?
so it generates sentences, and you want to evaluate if they sound realistic or not?
How to fix missing values with pipeline?
ideally, but i dont think we have the tech for that..closest thing to that is similarity measures with ngram overlap etc
you would have to get a bunch of humans and ask them to rate how realistic they think the sentences sound. or see if they can spot the generated sentences among real ones.
f1 is for classification tasks and this is not that.
so i should go for recall and precision? as far as automatic evaluation is concerned
no. those are also for classification tasks. you can't evaluate how realistic a sentence sounds automatically.
most papers included it though along with human evaluation
am only an undergrad student i wont have time to survey bunch of people
then they're doing some kind of classification
Ikkor said they had some keywords and target sentences
So there already are desired sentences to use to measure how good the generated ones are
so you're trying to measure how similar a generated sentence is to a target sentence?
this one used bleu
i guess i would have to do that
and then conclude that the best metric would be human
for this do you recomend running the metric through each row of my dataset and avaeraging the scores>
so you've decided what metric to use and how you will calculate it in the context of what you are doing?
you can't say that you're going to use "every evaluation metric" if you don't know how you're going to calculate it
bleue and rouge for example is just 1 line code
can you show me with code how you are going to calculate the bleue score?
pseudocode yeah like
calculate_bleu(generated sentence,expected sentence)
perfect bleu is 1 when both are same
nope. has to be working code.
from nltk.translate.bleu_score import sentence_bleu
reference = [['this', 'is', 'a', 'test'], ['this', 'is' 'test']]
candidate = ['this', 'is', 'a', 'test']
score = sentence_bleu(reference, candidate)
print(score)
the closer the candidate is to the reference, the higher the score
alright. so yes, you can take the average of all those scores to report the performance of the whole system.
do you at least now have an appreciation for what AI actually is, as compared to how it's understood by the public?
XD didn't know it was that hard like, i don't even grasp how people write machine learning models from scratch
layers and stuff, am just finetuning existing one
How to fix missing values with pipeline?
Guys how can i hot encode an array so i can use as training data
this are my datas
you can one-hot encode each genre
how can i do that
sklearn has it. try looking into it and come back if you can't figure it out
well, try looking into one hot encoding with sklearn
I've done and it seems to work
but...
when i run the fit
it never compleates
my codelab model crushed because took all the ram
push
I am trying to check if there are any duplicate values in my dataframe (115x6) and at the moment I'm just using a for-loop with an if statement. Is there an inbuilt function that does that? After googling for a while I came across Dataframe.duplicated(), but I think that only checks for duplicates in a column or row (https://www.geeksforgeeks.org/python-pandas-dataframe-duplicated/).
Yes there is an inbuilt function for that in Pandas.
df.duplicated().sum()<---- to check the total number of duplicates in your datasetdf.drop_duplicates(keep='first', inplace=True)<---- drop duplicate rows and keep the first
Hey @eager remnant!
It looks like you tried to attach file type(s) that we do not allow (.pdf). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a, .csv, .json.
Feel free to ask in #community-meta if you think this is a mistake.
hey guys i'm working on "data mining "project for university and i'm running into an error when trying to set up the model for prediction
ValueError: Found input variables with inconsistent numbers of samples: [5090096, 1272524]
exported an html version( to pdf )of the notebook to give more context but i can't upload it here
when i looked online i saw that the error is caused by the two variables not havingthe same structure but when i printed them out to see it didn't seem like the case for me.
i tried dropping both rows from the both variables x and y but i still get the same error just on different rows (one of them non existant ~out of bounds)
hey guys i'm working on "data mining "project for university and i'm running into an error when trying to set up the model for prediction
ValueError: Found input variables with inconsistent numbers of samples: [5090096, 1272524]
exported an html version( to pdf )of the notebook to give more context
when i looked online i saw that the error is caused by the two variables not havingthe same structure but when i printed them out to see it didn't seem like the case for me.
i tried dropping both rows from the both variables x and y but i still get the same error just on different rows (one of them non existant ~out of bounds)
You can import an imputer from sklearn and then add the imputer object in the pipeline you created to handle missing values.
Hi guys! Hope you’re fine.
Need some help from an expert for an IA and data analysis Python project
I’m help inge a girl with her homework but i got lost and i cant do it alone 😢
Helping *
Ask the question, let me see if I can get it!
@flat geode ?
Hey I have a question about Data Science Career
They are exercises in classification algorithms, validation techniques and evaluation measures.
Also decision training, implemented in scikit-learn, and calculate the training time.
I think it's hard for me to help 🙄
For example in an exercise talk about matplotlib.errorbar to represent the standard deviation.
from matplotlib import pyplot as plt
from tensorflow.keras.datasets import cifar100, mnist
(x_train, y_train), (x_test, y_test) = cifar100.load_data()
bed_id = (y_train == 5).reshape(x_train.shape[0])
bicycle_id = (y_train == 8).reshape(x_train.shape[0])
girl_id = (y_train == 35).reshape(x_train.shape[0])
keyboard_id = (y_train == 39).reshape(x_train.shape[0])
orchid_id = (y_train == 54).reshape(x_train.shape[0])
rocket_id = (y_train == 69).reshape(x_train.shape[0])
streetcar_id = (y_train == 81).reshape(x_train.shape[0])
bed_images = x_train[bed_id]
bicycle_images = x_train[bicycle_id]
girl_images = x_train[girl_id]
keyboard_images = x_train[keyboard_id]
orchid_images = x_train[orchid_id]
rocket_images = x_train[rocket_id]
streetcar_images = x_train[streetcar_id]
for i in range(70):
plt.subplot(7, 10, i + 1)
if i < 10:
plt.imshow(bed_images[i % 10])
elif 10 <= i < 20:
plt.imshow(bicycle_images[i % 10])
elif 20 <= i < 30:
plt.imshow(girl_images[i % 10])
elif 30 <= i < 40:
plt.imshow(keyboard_images[i % 10])
elif 40 <= i < 50:
plt.imshow(orchid_images[i % 10])
elif 50 <= i < 60:
plt.imshow(rocket_images[i % 10])
elif 60 <= i < 70:
plt.imshow(streetcar_images[i % 10])
plt.show()
I want to add a class label text in front of each line like this documentation example image: https://www.cs.toronto.edu/~kriz/cifar.html
How can I add text in the beginning of each line?
Hi guys
I am gonna start learning Python for data analysis. Does anyone have some tips and useful info to share with a noob? 
solved ☑️
If you need help here, please ask a concrete question that can be answered
can someone please help me understand how is the professor simplifying J of theta 1????
It's the vertical difference between the actual y label and the predicted y co-ordinate for that same x. After that they square it and take average of it for all the predicted data points.
yea i was looking on youtube to see.... its basically y(actual value) - mx + b
Because. The points lie on the line.
So zero vertical distance
Y actual=MX+b
hypothesis line has zero( y value) distance from the actual line....correct?
I think so
Yup. There's no actual line though. Just actual points
Only hypothesis line. Because it's linear regression
yea man, linear regression is what im learning. we hit a subject call cost function. and the teacher isnt explaining it right so i had to go to a 3rd party to better understnd
hey any data analysts here ?
how do you guys feel about these points https://en.m.wikipedia.org/wiki/AI_Superpowers
AI Superpowers: China, Silicon Valley, and the New World Order is a 2018 non-fiction book by Kai-Fu Lee, an Artificial Intelligence (AI) pioneer, China expert and venture capitalist. Lee previously held executive positions at Apple, then SGI, Microsoft, and Google before creating his own company, Sinovation Ventures.

Hello there. I've a question: since almost every Data Science Jobs I've requiere at least +2 years of experience in de field... Is there any way to break in the field? Maybe from web design full stack first?
I had to apply to like 200+ positions before I found one that would hire me fresh out of undergrad, so it's a difficult space to break into for sure. what made the difference is that I published.
web development isn't going to help you break into data science. if nothing is working out currently, you may need to get a masters.
how would i start learning to make ais in python? like where can i start what lib i should use and things i should learn?
!resources data science
The Resources page on our website contains a list of hand-selected learning resources that we regularly recommend to both beginners and experts.
thank you!
Actually great points to discuss over coffee cant argue...unless the paradigm shifts again then their advantages hold..
i think their advantages will only snowball
tiktok was based in Hong Kong (initially) and their Recommendation System was super-coveted
(and probably point #2 was super pertinent in order to build it)

hey anyone want to make a poem writer project
which makes poem of about 100 words with only 5 to 10 words of input
Welcome To Our channel
In this you will make a poem writer project using TensorFlow that will generate a poem for you on its own.
So consider Giving a video a like, share it with your friends and Subscribe to our channel too for more videos.
Instagram link:https://www.instagram.com/rohitnarwal7988/
Python book: https://www.amazon.com/Python-sim...
Here is my tutorial
yo if for example i trained a cnn model on keras to classify cat and dog then saved the model and a few weeks i decided that i want to add rabbit to the model to be classified is it possible like i will add a new class to my trained model?
Yes it's possible. But adding a new class 'rabbit' will demand you retrain your model and also adjust your model architecture a bit to accommodate for probability of each 3 classes being predicted.
do i need to retrain it together with the previous training set like the cat and dogs?
Yes you do need to retrain it. Supply both pictures of cats, dogs, and rabbits and work from there.
and on the architecture i add 1 more for the softmax out put?
Yes. you need to add softmax activation function to the output layer to capture the probability of each class being the correct prediction which then makes use of argmax to output the class with the highest probability.
nice nice thank you sir
I'm using plotly to plot edcf graphs of each of my columns.
but I only want to plot the bottom 10% of values of each column. what would be the best approach?
Anyone know if you’re supposed to encode target class if it’s “yes” or “no”
To 1,0 and 1,0 columns
I'm not sure there's a way to do that inside plotly's ecdf function. However, you should be able to grab the bottom 10% of each column from Pandas and then pass it to ecdf plot instead of passing the entire df. I guess that should do the job.
yea that's what I'm doing now. Although that might give me another problem 🤔 We'll see
This is off point, but which clustering algorithm did you eventually use for that problem you were trying to solve? 😀
Still haven't solved it 😂 I took a break of 4 weeks lol
Lmao 😂😂😂
You said KMeans, Agglomerative hierarchical, DBSCAN didn't work... Perhaps you should try Fuzzy C-Means and MeanShift algorithm if you've not given up 😁
well I'm plotting those ECDF graphs to have an even better idea of what features make sense to use 😅
I'll probably have another crack at it soon
anyone here with a helpful resources collections on how to run a project on image detection
How you encode this target class really doesn't matter to be honest.. So long you can differentiate your positive class and negative class you're good to go.
A "Yes" or "No" class could be encoded as 1 and 0 (the conventional way) or 0 and 1 (if you're interested in "No" as it's your positive class)
For instance a target label with, say "Male" and "Female", can be encoded
A) Male = 1 vs. Female = 0
B) Male = 0 vs. Female = 1
The most important point is knowing your positive class and negative class.
For example:
Supermoon's wife is pregnant, and we wanna build a model that can predict the chances of Supermoon's wife giving birth to a female child.
You could encode your target using either #A or #B but when doing your model.predict_proba() you must call the appropriate class of interest (which is the female class in this case)
Ye I agree
BTW, does sklearn have a binary cross entropy metric
I’m evaluating and want more than just AUC
If you're cool with learning from YouTube, I'm sure you'll find a good video there. https://www.kaggle.com/code/ritvik1909/object-detection-sliding-window
Check Kaggle for more resources as well
cross_entropy loss function is synonymous to Logloss.
If you want more metrics run this code
from sklearn import metrics
metrics.SCORERS.keys()
If you still want to make your own custom metric called, maybe, sonic_moon then you need to import the make_scorer object from sklearn to create that. 😊
Thanks bro, I will need a way to measure and output cross entropy loss for my assignment
To compare models
Guys, how do I get accuracy for my finetuned gpt2 model? Like for example, in this lstm article, at the end of the article he can print accuracy https://data-flair.training/blogs/machine-learning-text-summarization/
please @ when replying
thanks so much
Does someone here understand good the math behind backpropagation? I want to be sure if I understood everything so I tried to do all the derivatives but I am not sure if I was right. If someone can check it, it will be great!
Anyone here done leetcode 35
how useful is leetcode on the scale of 1 to 5 when it comes to SQL in the interviews?
I have always have the feeling that China would lead in the AI raise and this book gives more evidence to that.
USE the key present in dict
Read documentation, I think there is a x_limit and y_limit method in update_traces and layout method.
it makes me wonder what the short-term consequences will be. i dont think we will start having fang company equivalents in china but still
In the short-term, I don't think so too, but never say never. Anything could happen. 🙂
I am skeptical because the competitive advantages listed could apply to other, better established technical industries where China remains far behind
like silicon
the only unique advantage I see is related to government involvement in data collection, but I don't think this is needed in a world where everyone has already willingly given their data to FAANG
besides I think that the data governance problem can be solved by eg federated learning so that people can benefit from algorithms without forfeiting privacy
I have an idea for kind of a strange ML/Art project, but I don't really know where to start with modeling. Basically, I want to train a model on my catalog of photos that I have taken. Ideally I would want some unsupervised model to learn to recognize features/commonalities within the photos. Then, with this model, be able to recognize other photos that share some threshold of feature similarity with the trained model.
In other words, I want to teach a model to recognize commonalities in the photos I take, and be able to identify other images that match stylistically.
I am having a hard time figure out what the right type of model for this is and the right type of search terms to be using. I've done unsupervised PCA and TSNE projects before, but this would be my first time branching into deep learning/image analysis. Any pointers would be appreciated!
Hmm, try K Means maybe
It will try cluster images together with the most similar means
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
Looks like SciKit has this, but im not if it works with images
Examples using sklearn.cluster.KMeans: Release Highlights for scikit-learn 0.23 Release Highlights for scikit-learn 0.23, Demonstration of k-means assumptions Demonstration of k-means assumptions, ...
Hi, I need assistance with a project that requires developing a chatbot in Python that will serve as both student support and a teacher assistant for university students. I have an idea, but I'm not sure whether it's the proper execution; I want to use the Google Search API to allow as many inquiries as possible to return responses. I'd like to know how to go about developing this project; any help would be greatly appreciated. Thank you very much.
Thanks. You are the second person to recommend k means to me. So I suppose I will give it a shot then
why in some cases is it (predicted - actual)^2 and other cases its (actual - predicted)^2....wouldnt that give seperate answers...i know because of the square its always going to be positive
From my experience, when it comes to clustering with an image data, MeanShift usually outperform KMeans. So if you're interested in doing image segmention, you could try MeanShift.
For Image similarity detection, I'm not familiar with it yet, but you can always check online to get a cue.
!e
print((12-3)**2 == (3-12)**2)
@mild dirge :white_check_mark: Your eval job has completed with return code 0.
True
@surreal rock
yes
The difference is either x or flipped: -x, and x^2 == (-x)^2
So it doesn't matter which term is subtracted from which term
ok so basically it doesnt matter
ok
thats good to know cause in algebra what's first mtters
Well it does matter which comes first when subtracting, the point is that squaring makes it so it doesn't matter whether the number is positive or negative
it seems like you guys use the cost function everyday if u remember it so well
You need to explore NLU in NLP. Then learn about intent and entity recognition. You could then decide to leverage DiagFlow or Rasa for the project.
Oh, there's GPT-3 as well. So you decide which one you wanna use.
i guess yea 75 - 100 is the same as 100 - 75 when squared
😊 The presence of square will still make it positive regardless of how you flip it. However, statistically, when OLS is used:
RSS = TSS - ESS
and MSE = 1/N * RSS
So it ought to be Actual - Pred (just the way MSE formula (the cost function) was defined in the first image.
what is OLS RSS TSS ESS MSE
^^^^
It's just some Stats terminologies that involves minimizing the cost function.
OLS = Ordinary Least Squares
ESS = Explained Sum of Squares
RSS = Residual Sum of Squares
TSS = Total Sum of Squares
So without going deep in Stats, OLS can be likened to what Gradient Descent is in Neural Nets. Although, OLS isn't exactly regarded as an optimizer.
can anyone help me understand the code written here... explain to me like im an idiot lol
thank you, I'd look them up
please send the code as text to make it easier for people to read
Someone who has some experience in Adversarial Networks, especially in DCGANs, please, tell me... Is it normal to do everything right with your code, but you still can't make it work because you have to blindly keep testing the learning rate for both discriminator and generator until you can find a good number?
Hi, does anyone know how to solve the occlusion problem when tracking players in a basketball game? Deep sort didn't work as it assigned the player a new id after he had been occluded, we thought of using optical flow to solve this problem, but it doesn't work very well as the change of motion between two consecutive frames might be too large, any ideas on how to solve this?
Yes, training gan is quite hard in general, and it's merely impossible to get it right from the first run even if the code is perfectly fine
you usually have to do a couple of tricks
like trying different learning rates, sampling your latent space from a gaussian, not hard coding the labels (as 0 or 1), rather do something like (0-0.3, 0.7-1.3) etc
training on real batch first and then on fake batch (don't shuffle them)
Oh, I see... I've seen those labels and batch tricks in an OpenAI paper... They also told about inserting a random noise into the discriminator, even though I don't know if that's been quite effective for me...
Thanks. That's reassuring. I'll try different tricks, then.
You're welcome mate, here are some tricks you could start with
https://github.com/soumith/ganhacks
Oh yeah, I've read that article too. It's awesome
I'm also trying NVidia's Progressive DCGAN, which makes it a bit more complicated...but I'm ambitious

you got this
Hello! Does anyone know if there's a way to achieve np.subtract.outer(a,b) but with cupy?
did you mean subtract instead of substract?
yes, sorry
That apparently doesn't exist, so I'm not sure what you're referring to
CuPy’s ufunc currently does not provide methods such as reduce, accumulate, reduceat, outer, and at.
https://docs.cupy.dev/en/stable/reference/ufunc.html
😦
I don't; I have never actually used cupy
CuPy provides universal functions (a.k.a. ufuncs) to support various elementwise operations. CuPy’s ufunc supports following features of NumPy’s one:
Broadcasting
this means that for any vectorized function, func(a.reshape(-1, 1), b.reshape(1, -1)) will do what outer does.
if a and b are already both vectors, you can also just do func(a[:, None], b[None, :]), I think
so I have this transformer code that I'm trying to train on a simple dataset to better understand the transformer architecture https://www.toptal.com/developers/hastebin/afarisisiv.py
and my dataset consists of 3 different 6 character sequences of 1's and 0's:
dataset = [[np.array([1,1,1,0,0,0]),
np.array([0,0,0,1,1,1]),
np.array([0,1,0,0,1,0])]]```
basically the idea is the transformer should eventually be able to predict the second three by being passed the first three, like:
111->000
000->111
010->010```
the problem is, I can't get it to train to any degree of accuracy no matter what I do
I've tried high learning rates, low learning rates, and lots of epochs, as well as shuffling the dataset, but nothing makes the loss go down significantly
this game uses word2vec and semantic meanings https://semantle.novalis.org/
highly recommend if you like wordle
ive been playing that for a couple weeks now and its so fun
I played since puzzle 23
What is the minimum number of samples do i need per class for a sign language translator?
@whole shore there's no way to definitively answer that. Also, sign language translators are not classifiers
But the total number of training instances you'd need for one is going to depend on a lot of things
Hi all im in need of assitance in regards to Azure ML for computer vision
i want to implement the following using Python from Azure:
- Face detection System to track face using webcam
- custom vision to detect between forks and spoons
please ping meee
ah, I was being vague. just basic static gestures like fingerspelling
Can anyone help? i know this is basic, but i'm not knowing how to solve this D:
sk is a series, so what do you mean by if sk == 3?
my idea is to get values that are equal to three and print it out saying that this row is normal. imagine that ID is equal to three, then let's print it out.
@toxic palm just ask
This question should go in #algos-and-data-structs
Please copy it to there and remove it from here
removed. please remove your comment also, so that everything will be cleared here..
I'm trying to calculate the tf-idf value of a column in my dataframe but when I run in through the sklearn lib, it returns an array of 0s
from sklearn.feature_extraction.text import TfidfVectorizer
v = TfidfVectorizer()
x = v.fit_transform(df[df.columns[0]])
x.toarray()
try taking the sum of the array, to see if there are 1s in it that aren't displayed
I am working on the housing data set as a test set
And I had some questions about the process I should be doing
When doing the linear regression process (assuming the relationship between two variable is linear)
I need to choose a loss function to calculate loss correct?
So far I’m only familiar with MAE, MSE, and Huber loss
Then I have to check for the minima and maxima to get the gradient
And then when I get the gradient I subtract by each point along the gradient to get the minima
Is this process correct? I can’t share my code right now but I’m having a bit of trouble with implementing the math . I want to be sure I remember the formula.
You need not worry much about gradient descent if you're not using neural network. It's okay to use OLS.
Now when it comes to finding the gradient of a weight (or slope of a weight )... You need to multiply 3 things
-
calculate the slope of your loss function with respect to the value at the node you feed into
-
The value of the node that feeds into the weight
-
Slope of the activation function with respect to value you feed into.
Then multiply #1 x #2 x #3 = The gradient of that weight.
Now to update the weight, you'll do
W - LR x Gradient = New Weight
Where W = The weight you want to update
LR = Learning Rate (a hyperparameter)
Gradient = The gradient of the weight we got when we multiplied #1, #2, and #3
Hi all - is there a way to write a dataframe into an existing excel sheet, preserving the VBA macros already in the file?
Essentially, this whole process is what backpropagation entails ( of course, without using calculus to explain it)
Thank you! I think right now I’m just doing model training
Hi there, I have some ugly data that's come out of a tool for getting PDF tables into dataframes - it works. But not perfectly.
So it's clear where the numbers should go - but I'm not entirely sure how to tell pandas to look above and below a non-empty row for columns and to concatenate strings on this condition.
Hey guys, so currently i have data like this:
time[s] P[kPa] V[km/h] SA[deg] CA[deg] SR[%] Fz[N] Fx[N] Fy[N] Mx[Nm] My[Nm] Mz[Nm] Vs[rad/s] RL[m] Ttyr[degC] Tamb[degC] Tbrg[degC] Tw[Nm] Yb[mm] CF[N] FD[N] RD[deg] CmdFz[N] CmdSA[deg] CmdCA[deg] CmdV[km/h] CmdP[kPa] CmdRL[m] CmdSR[%] CmdTw[Nm]
0.000000e+00 2.299780e+02 -6.300000e-02 6.000000e-03 0.000000e+00 0.000000e+00 3.844010e+03 -2.196800e+01 6.289300e+01 -1.281100e+01 -6.672000e+00 -3.076900e+01 -5.235988e-03 3.037220e-01 5.418200e+03 2.129100e+01 1.140000e-01 0.000000e+00 -2.400000e-01 6.289500e+01 2.196200e+01 2.598050e+02 3.841600e+03 0.000000e+00 0.000000e+00 0.000000e+00 2.300000e+02 3.037200e-01 0.000000e+00 0.000000e+00
How do i convert it and put it into an excel file?
I tried to solve this by iterating through rows and checking for Nones in each column, and adding above and below string, seems to work
import pandas as pd
import numpy as np
data = {'a':[None,'M', None], 'b':['2',None, '485']}
df = pd.DataFrame(data=data)
df.replace('None', np.nan, inplace=True)
df2 = df.copy()
for i in range(df.shape[0]):
for column in df.columns:
try:
if (df.iloc[i][column]) is None:
df2.iloc[i][column] = df.iloc[i-1][column] + ' ' + df.iloc[i+1][column]
except Exception as e:
pass
dang really? whats the hardest word that took you the most guesses?

Hey, this is exactly what is needed - thank you so much!
someone help me pls ;-;
import pandas as pd
file = open("discord_text.txt","r")
rows = []
for line in file:
rows.append([v for v in line.split(" ") if v != ""])
pd.Dataframe(rows).T.to_excel("output.xlsx")
Think this should do the trick @wild pagoda
It's not working tho, my code:
rows = []
rows.append([v for v in self.data.split(" ") if v!=""])
pd.DataFrame(rows).T.to_csv(filename)
currently, my self.data have format like this:
time[s] P[kPa] V[km/h] SA[deg] CA[deg] SR[%] Fz[N] Fx[N] Fy[N] Mx[Nm] My[Nm] Mz[Nm] Vs[rad/s] RL[m] Ttyr[degC] Tamb[degC] Tbrg[degC] Tw[Nm] Yb[mm] CF[N] FD[N] RD[deg] CmdFz[N] CmdSA[deg] CmdCA[deg] CmdV[km/h] CmdP[kPa] CmdRL[m] CmdSR[%] CmdTw[Nm]
0.000000e+00 2.299780e+02 -6.300000e-02 6.000000e-03 0.000000e+00 0.000000e+00 3.844010e+03 -2.196800e+01 6.289300e+01 -1.281100e+01 -6.672000e+00 -3.076900e+01 -5.235988e-03 3.037220e-01 5.418200e+03 2.129100e+01 1.140000e-01 0.000000e+00 -2.400000e-01 6.289500e+01 2.196200e+01 2.598050e+02 3.841600e+03 0.000000e+00 0.000000e+00 0.000000e+00 2.300000e+02 3.037200e-01 0.000000e+00 0.000000e+00
So what is that? tab-delimited row-wise and new-line-delimited column-wise?
sorry but i don't know what you mean
what i want is this
Try this: print(repr(self.data))
Okay, see between each line there is: \t these are "tabs" acting as seperators, so we say that as we go down the row that this is "tab-delimited"
and there'll be a single \n which is the new line
yeah true
And since self.data is one big lump we have to break it down.
So, first step, break it down into an iterable and then second step split on tab.
rows = []
for line in self.data.split("\n"):
row = [v for v in self.data.split("\t") if v != ""]
pd.Dataframe(rows).T.to_excel("out.xlsx")
the self.data.split do i need to repr(self.data).split?
No, shouldn't be necessary. The only string stored in python is the string with "\t".
and the row = [v...
should it be rows.append?
rows.append(row) sure
oh
it's kinda wrong lol
currentcode:
rows = []
for line in self.data.split("\n"):
row = [v for v in self.data.split("\t") if v != ""]
rows.append(row)
pd.DataFrame(rows).T.to_csv(filename)
Remove .T in Dataframe(rows)
not what i expected too, it's print the value 3 times
and it's not split the rows title and the value
Try using \\n+ or [\\r\\n]+ to split
after editing a littlebit, i make it work, now how do i remove the index number of row and col?
Let's solve the first problem first
no the problem is done :v
rows = []
for line in self.data.splitlines():
row = [v for v in self.data.split("\t") if v != ""]
pd.Dataframe(rows).T.to_excel("out.xlsx")
i need to split it first
self.data = self.data.split(" ")[0]
after that, i do this:
rows = []
for line in self.data.split("\n"):
row = [v for v in line.split("\t") if v != ""]
rows.append(row)
pd.DataFrame(rows).to_csv(filename)
now it's work
Okay, that's weird
:v so long as it works
Excess row?
That's because one of the lines is empty
So to get rid of the blank line just include a line that rejects ""
just realize no need to self.data = self.data.split(" ")[0]
no i want to remove the index of rows and col , you can see 0-2 in 1st col and 0-25 in 1st row
I'm not seeing what's wrong here tbh
Great!
yo if i used this to load my test images is it in order from the idirectory?
so the 5th image on the first folder of my dataset is the one where the model predicted wrong?
i just want to observe the images where the model predicted wrong
Hi, I have a problem I have to make a project for my school but I can't do it I've been taking my head for a week I don't understand anything, could someone do it and send it to me please, it's about an exploitation of a database to make a bar graph thanks to python. please send me a private msg if it is possible
you're more likely to get help if you put your question in the chat. you have to give enough information for people to see what you need and start helping. People aren't likely to DM you to find out what your actual question is, as it would end up being a waste of their time if they don't know or aren't interested.
They're not asking for help, just the answer 😛
We're not going to make your homework for you @echo rover
they'd have to put the actual question in the chat either way 😛
using programs to dig through large amounts of data, and the theory of getting useful information with those programs
stuff is things
I'm not sure if it's the correct channel
I'm developing a simple application which takes an input video source and processes each frame to display in a tkinter gui
In the processing I predict on the data using a custom YOLO model
I have a fairly good CPU and even on one video source I'm almost pinning my 5800X if I dont use cuda, is running the predictions this demanding or is it more likely that I'm spamming some callback too often?
One of the goals was for it to run locally on a raspberry pi, but 1fps to something along 1fpm is likely sufficient
I'm not sure what version you are using, but YOLO v3 has somewhere along 61 million parameters (if what I just googled is correct), so if it is anywhere near that big, a forward pass might take long
We're using YOLO v5
But I generally just wanted to know if it was as taxing or not
Running on CUDA I end up with ~12-16ms frametime on my system vs ~100-140ms on CPU
yeah that kind of speedup is definitely expected for larger models
I have a pyspark dataframe with schema: id (int), date (timestamp), val (float). I want to create another column val2, which contains the val for the same id and the most recent timestamp that is at least 30 days before the timestamp in the date column.
For instance:
id, date, val
1, 10-20-2018, 14
1, 10-31-2018, 9
1, 11-24-2018, 10
1, 12-23-2018, 4
2, 8-21-2020, 7
2, 9-29-2020, 20
2, 10-14-2020, 5
should add the column val2:
id, date, val, val2
1, 10-20-2018, 14, null
1, 10-31-2018, 9, null
1, 11-24-2018, 10, 14
1, 12-23-2018, 4, 9
2, 8-21-2020, 7, null
2, 9-29-2020, 20, 7
2, 10-14-2020, 5, 7
Any idea how this can be solved in pyspark?
Asking again, sorry - is it possible to dump a dataframe into a .xlsm file w/o modifying the existing vba macros?
I don't have CUDA either... Whenever I see cuda anywhere, I'm reminded I have Iris XE GPU 😩😭
You can dump a df in excel tho. Idk about VBA macros... Hopefully, someone with a good knowledge on that can answer your question.
Hey, I know this is not purely a python question but I hope someone would be able to give me a pointer anyway
I have a postresql db with a many-to-many relationship - author and article
In my CSV file, an article is stored as a row with a column for authors(a list). So should I create a new CSV file where I store the author name and give each author an ID and then match it up as I load it into my joining table in the db?
Or is it possible to do something clever with the serial sequence when i create my tables? I just dont see how I would match up the correct author and article ID's in that way
what are you doing with these csv files? what are you trying to achieve here?
:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1651006715:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).
I have a csv file with some scraped articles, which im trying to import into a db. So it's something like below. What I'm unsure of is whatever to do the id joining in the article_author table in python, or if there is some better way maybe in postresql
csv:
id, content, author
1, abab, john,jane
2, baba, john
table article
id, content
1, abab,
2, baba,
table author:
id, name
1, john
2, jane
table article_author
article_id, author_id
1, 1
1, 2
2, 1
why not rent? < - Azure/AWS/Google
!unmute 838609647342452796
:x: failed to pardon infraction mute for @wild hatch. User was not found in the guild.
o.o

Coffee time
you can do joins in pandas, but why not load these tables directly into postgresql? you can then do the joins when you need to query for data later.
you could "pre-join" everything in pandas too. or you can make temporary tables and then do the joins directly in postgres, saving the joined results to a final table
or create a view, or a materialized view... lots of options. depends on what you need
drinking some rn. while i try to finish this project 
Thanks a bunch, I feel more comfortable in pandas so I might do a pre-join there then
Hello! I'm just recently interested in Machine Learning, and I want to try some predictions (after reading a while its better to use LSTM(?)), I'm planning using an array and a value for input and have an output as value (3d array and label(int) as input and itll make pred based on another 3d array), is there any recommendation or example about LSTM? or maybe some documetation that can direct me to the right way some kind
you might try the pytorch example code for LSTMs
hmm i still dont get it
what sort of data are you training it on? is it time series data? What do the 3d arrays represent?
the array im working with is an pixel value of an images, it was originally a video that i convert to images, frame by frame
and theres a value for each of its frame as an input/would be prediction
and the output is what? the prediction for the next frame of the video as an image?
the value that was assigned for that frame
i can share the details in #help-honey
ah nvm its dormant
what value?
its an int, it's a bpm from dataset
So im a beginner in machine learning and i want to ask question. How do we choose machine learning model statistically? Or is it just a matter of trial and error?
sorry if this doesn’t count as natural language processing, but I’m in a bit of a pickle. I accepted an internship for CS but the project I was assigned isn’t my forte.
The question is to generate questions from a story or article, and find the best questions out of the generated questions.
I’m open to suggestions and advice!
What have you tried?
Just a bit of asking around but something called GPT 3
But I have to have access to it
dont hate me for asking but what is a neural network
look into how you can detect factual statements in the text, and then convert them to questions. GPT anything is going to be overkill.
You think so? But what questions do I ask based on the factual statements in the text?
"Bob is from New York" -> "Where is Bob from?" or "Who is from New York"
something that takes a bunch of inputs and outputs, and learns what function of the inputs gives you the outputs.
But how do I know or extract factual info? Reason why I wanted to use GPT is because this project is due in 2 1/2 weeks
how is GPT going to help you solve it?
doesn’t have to be perfect but the prof expects it to sort of work?
but wouldn’t it be faster? and easier to work with?
so could you use it for some kind of virtual assistant or would it be more for something that would for example play a game automatically
no? GPT-3 can generate long texts that are grammatically correct and "make sense". that doesn't mean that you can easily adapt it to write questions that are answered in a given document, or that you can adapt it for that task at all.
it would be more like the second one. because a virtual assistant is actually going to be a bunch of different programs--one for each kind of thing that the virtual assistant can do.
so following your idea, do I train a model to recognize key words such as? and how do I make it so it’s a correct and the question sort of makes sense?
ok thanks
what is the context for your need to design this? it seems odd that you've been given an intermediate NLP task with seemingly little direction.
@hollow kindle I apologize that I don't have a more thorough answer. it takes a long time for someone to wrap their mind around what a neural network is. I'm still working on it myself, in many ways.
its fine i have never done any research on it and only have an idea on what it is, but interested in making some kind of virtual assistant that can have responses different than some predetermined ones based on a set input
so I am in a research project/internship, and this tryout project is to create questions from a text. It doesn’t have to be perfect, but it should somewhat work. NLP is not my forte, I know next to nothing on it but this professor needs this done by 5/11. I was informed of this project today. There are a few other projects I can choose from but I don’t think that they are easier.
Finding and Fixing Bad Questions
We have some questions that we’ve identified as being bad. We don’t know why all of them are bad, but we’d like to make them better. Some patterns that we’ve seen are:
Ambiguity
Wrong assumptions
Wrong interpretations
Take a look at the questions. Do you see a pattern? Can you detect this pattern automatically (e.g., with a regular expression)? Can you correct any of the patterns with a simple script that either changes the question or the answers?
What to Submit: Submit your program that detects problems and fixes them. Along with a repository of your code, send a file of the original and fixed questions
it's pretty clear from this that they don't expect (or want) you to use GPT. this one about finding bad questions looks pretty interesting as well.
though it might be because I've become exceptionally opinionated about what constitutes a good or bad question after two years of answering questions here.
LOL. that is true
yeah, I’m undecided on which one to try
one of the reason why I mentioned GPT-3 is because of the guys at the Artificial Intelligence server told me that could be potentially helpful
i suspect you want to start with something simpler
I totally would break this problem down and take my time, but a time limit is really limiting what I can do
I don't even know how you would use GPT-3 to solve it, and you'd have to pay, if I understand correctly
it sounds like whoever suggested GPT-3 just mentioned the first high-profile NLP thing they thought of.
@brave sand they are suggesting you use regular expressions, so they are expecting "stupid" simple solutions
are there specific words that suggest bad questions? punctuation patterns? capitalization usage? length (either very long or very short)?
hm yeah. I think the bad question one is easier than the generating questions one
you could also try using a pre-trained language model like bert at some point in a project like this, but then you have these opaque word vectors to figure out what to do with; a language model is basically just super-powered dimension reduction in a task like this
yeah hard pass on that.
do you know the identifying whether a questions is good or not is a more doable problem than the generating questions?
if you want to hard pass on that, then you'll want to hard pass on any solution that involves GPT-3 as well 😛
it sounds more doable to me, though I haven't seen the questions they want you to classify. or the document(s) you might be pulling questions out of.
haha I did realize that once I read a little more into GPT-3
that's another thing, there isn't a doc with the questions or documents, I'll have to ask the prof.
there is another project I could do but it's less programming related.
[Non-Programming Option] Crafting Adversarial Questions
We have a couple of adversarial interfaces for writing questions:
http://fm2.qanta.org
https://trick.umiacs.umd.edu/
The code can be found here:
WAITING FOR LINK, WILL UPDATE SOON
This is a little more open-ended than other projects, but the main idea is to use these interfaces to write a bunch of adversarial questions. We’ll be able to use these questions for human–computer face-offs, and through your insights we’ll be able to improve the interfaces.
What to submit: Share the questions / claims that you’re most proud of and why they’re good at specifically targeting weaknesses in the computer’s ability to detect these sorts of problems.
Optionally: Create a pull request that improves either of these interfaces in a way that would help others write more challenging questions for a computer.
How this would translate into a full project: We would write lots of challenging questions, investigate how computers get tripped up, and improve the interfaces to highlight those properties to help others write similar questions.
Web site created using create-react-app
any idea what he is talking about?
and the prompt for the good bad questions is here:
`Finding and Fixing Bad Questions
We have some questions that we’ve identified as being bad. We don’t know why all of them are bad, but we’d like to make them better. Some patterns that we’ve seen are:
Ambiguity
Wrong assumptions
Wrong interpretations
Take a look at the questions. Do you see a pattern? Can you detect this pattern automatically (e.g., with a regular expression)? Can you correct any of the patterns with a simple script that either changes the question or the answers?
What to Submit: Submit your program that detects problems and fixes them. Along with a repository of your code, send a file of the original and fixed questions.
How this would translate into a full project: We’d try to fix as many questions as possible, retrain a QA system on this better data, document our ability to correct these problems, and hope to see improvement.
`
unfortunately I have other stuff I need to do before going to bed, so I have to drop off.
yeah no worries man, I couldn't thank you enough already. guess I'll be hanging around more often in the channel LOL
I'll probably check again tomorrow while I'm supposed to be working
are you a programmer for your job
yes, I'm a computational linguist.
cool
Hello all i want to know if it is able to put an preprocessing function inside a Keras Model. I want to investigate a model from paper, which using wavelet preprocess in the front after an image raw input, then each coeff processed in parallel in identical CNN block
i define my wavelet and CNN central block like this:
coeff = pywt.dwt2(data,'dmey','periodic')
ll,lh,hl,hh = coeff
return ll,lh,hl,hh
def CNN_Central_Block(input_size):
feed = Input(shape=(input_size,input_size,3))
x = Conv2D(kernel_size=(5,5),filters=32)(feed)
x = BatchNormalization()(x)
x = ReLU()(x)
x = MaxPool2D(pool_size=(3,3))(x)
x = Conv2D(kernel_size=(4,4),filters=64)(x)
x = ReLU()(x)
x = Conv2D(kernel_size=(1, 1), filters=10)(x)
x = Softmax()(x)
x = Flatten()(x)
out = Dense(10)(x)
return Model(input=feed,out=out)```.
One problem is since i using branching from one image to four different image coeff, should i put ImageDataGenerator in the front of model or just hardcoded a function to cover the paralel CNN model?
for those who want to know the paper: https://ieeexplore.ieee.org/document/7838150
Image classification is a vital technology many people in all arenas of human life utilize. It is pervasive in every facet of the social, economic, and corporate spheres of influence, worldwide. This need for more accurate, detail-oriented classification increases the need for modifications, adaptations, and innovations to Deep learning algorith...
i’m connected to andrew ng on linkedin
and i just fucking found out five minutes ago
💀💀💀
what stelercus said. i made a PoC using gpt-3 at work and even then we ended up deciding to use an open source model instead with little to no difference in model inference

rex i must be hallucinating
what the actual fuck
maybe it’s a good idea to never tell him i never actually finished his course on ML w opera 😭😭😭😭
tbh i dont have anything else to say except im lowkey jealous

oh
i also like his most recent initiative
check out krish naik he’s a homie
about data-centric AI
indian guy gives godly explanations
oh yeah naik doesn’t do research stuff but he talks a ton about a lot of ml related stuff
most recent guest of ken jee's podcast
and builds basics well
come on bro you know i have zero time for ken’s nearest neighbors rn 😭😭
no hate ken’s awesome
ok ill also subscribe to him
wait am i connected w him on linkedin too
💀💀💀
i swear i did not know anything about being connected w these guys beforehand
you know i listen to podcasts on my commutes and workouts, so idk what youre doing

i listen to an ex navy seals

nah jk i should defo listen to the podcast more
its not for everyone
it’s also just hard balancing it w lsat and bar prep too
yes before anyone asks i am preparing for the lsat and bar as an undergrad soph
before i get docked for being off topic
😭
we had the lawyer of the company come speak to us last week, the general counsel guy
my friend wants to do ML + cybersecurity
that sounds like an interesting cheesecake flavor
when did the cheesecake factory introduce that flavor?
😭😭😭
that sounds like a recruiter’s wet dream
i think your college is just too small and there's too many busy bodies
but that's my opinion
💀 💀 💀
nah they’re just 18 year olds that wanna get shitfaced
i don’t blame them there’s a lot of kids who think that and they all fall into the same trap
but the point is they gotta snap out of it at some point before it’s too late
gn man
Hi every one, i would like create a classificator of time series
When i insert the chart she return a string containing type of time series any one can help me?
I have a df which I have done groupby and added the result in new df . But when I use the same column name used in group by that give me error
is someone doing on some research? i want to join
The bitwise operation frustate me at all... so i change it to Add
i dont know if my GPU can stand for this suffering
does anyone have experience with spacy and lemmatization?
The lemmatization is not fully working, it works fine for some words but skips over others
Is there a way I can add custom lemma's to the function?
hm yeah, I’ll not use GPT-3 then. and I have to pay to use it
one hot encoding and label encoding ... when should you use one over the other?
For categories you use one-hot encoding, for sequential data you may use label encoding
so f.e. if the column is "score" and you have very bad, bad, regular, good, very good
then you could convert those to 0, 1, 2, 3, 4
Because regular is closer to good than very good f.e., there is some relation between these categories
@lucid nimbus
hi i have a ultimate tic tac toe game and i want to make an ai for the game, any library suggestion or tutorials i can use?
I think you would want to use a mini-max algorithm
thankss @mild dirge !
So no machine learning or anything
hmmm then any suggestions for that?
yeah, minimax
okay
That would be for regular tic-tac-toe, not sure how I would change for ultimate tic tac toe
ill try and figure something out
hi!
I want my code to perform semantic analysis and create a csv table:
from collections import Counter
import pandas as pd
stoplist = ['.', 'and', 'was', 'in', 'a', 'the', ',', '?', ':', 'of']
text1 = str(input("Paste text here: "))
words1 = [s.lower() for s in text1.split() if s.lower() not in stoplist]
data = {'quantity': words1}
df = pd.DataFrame(data)
df = df['quantity'].value_counts()
df.to_csv('seo.csv')
Stoplist works for words, however it does not for punctuation. Stackoverflow suggests using
.str.replace(r'[^\w\s]+', '')
but it doesn't work here: AttributeError: Can only use .str accessor with string values!
how do i implement "Keras Tuner" in my code? https://colab.research.google.com/drive/13163dhtSAkL9h53yj2dsRStsNNDTESJB
@serene scaffold sorry for the ping, let me know if I shouldn’t ping you from now on. what determines a question as a bad question? more importantly, does my program look through a sentence and look for keywords and sentence structure?
it sounds like "what determines a question as a bad question" is specifically what you are expected to figure out 🙂
yeah alright that makes sense. but how should my program work?
hey guys does anyone have experience with working with the dn3 library for processing EEG data?
try this:
from collections import Counter
import pandas as pd
import string
stoplist = ['and', 'was', 'in', 'a', 'the','of']
text1 = str(input("Paste text here: "))
# punctuation removal
text1 = text1.translate(str.maketrans('','',string.punctuation))
words1 = [s.lower() for s in text1.split() if s.lower() not in stoplist]
data = {'quantity': words1}
df = pd.DataFrame(data)
df = df['quantity'].value_counts()
df.to_csv('seo.csv')
how? again that seems like it's your task to figure out
i gave some suggestions yesterday, which require you to go through lots of examples and come up with some heuristics ("rules of thumb")
you often need to do "human learning" in order to do "machine learning" effectively
I meant, are there any nlp specific tools or tips that would make this task easier
they suggested regex, and that's definitely a good general-purpose tool
as well as the various string-processing functions in python
you might be able to use some functionality from NLTK or Spacy, but i suggest avoiding complicated "NLP stuff" until you at least have a better idea of what you actually need to do
avoid the temptation to jump for the coolest-sounding tool first
data science is 65% data cleaning, 15% reports/presentations/dataviz, 15% using rules of thumb and basic techniques, and 5% using advanced fancy stuff
anyone?
is it ok to promote my data science related youtube videos in this channel?
No, this falls under advertisement. Sorry.
Thanks @serene scaffold - Is there a channel where that sort of thing is ok?
"what makes a good or bad question" is for you to answer just as a regular person. don't think about it from the perspective of how you can detect the difference programmatically.
There is not, as our rule against advertising applies to the whole server.
@brave sand here's what's interesting about their suggestion that you use regex: regex is just detecting patterns in strings. the strings don't even have to mean anything in any human language for regex to work. so it's a very low-fi approach. more sophisticated approaches (like our beloved GPT-3) do things that try to emulate actually understanding what the words mean.
alright I’ll look into regex. thanks
Is there an easy way to check how a particular numpy method is implemented? I see that numpy is open source but when I go through github I am having trouble locating the relevant source code. In MATLAB at least, I can just open any function right from the terminal
if anyone can help me with minmax please go to #help-kiwi
Hi guys does anyone know how to resize each pixel in an image i.e. from 1x1x1 to 6.64x6.64x8.8?

