#data-science-and-ml
1 messages Β· Page 385 of 1
so you probably did too

but it actually is pretty dope tbh
and they have an option for students

i think we're gonna use this for our data mining project
@hollow sentinel bro its like someone took notion and combined it with a jupyter notebook 
at least thats what it feels like rn
yeah look
wow
once i have to do a group project for my college i'm gonna use this so i can force people to do work
yeah for real dude
π
and it doesnt have that sync-ing problem colab does
"sorry dude i can't open it"
you whats funny, someone at work asked me how to do that the other day
which honestly there is a bit of disclaimer here
it was more how do you download a .ipynb file from github
bc when you download it, it becomes a .txt file by default
and even when you change the extension it doesnt work
strange
but some light googling still made it possible to find the answer 
i think i downloaded an .ipynb the other day from github
and changed the extension
and it worked
you have to open it in jupyter and then change the extension name
i think learning how to google is crucial
searching stuff and being able to read
think critically
very important stuff
in any field
solve problems
even when everything is ambiguous
which to be fair can be kinda tough sometimes
that's why i appreciate coding sm
it makes me a better thinker
multi-faceted
whatnot
and this server has helped me become more self-reliant
ping me if you need anything
anyone working with chatterbot ?
Hi I am trying to create a tree in python, using a 10x10 matrix called map...I have reached this far:```python
#function to generate a tree
def generate_tree(map,start):
#create root node
root = node(start)
#pointer to root node
temp = root
#queue up root node
queue = [root]
#go through each element in queue
while queue:
for item in queue:
#UP NODE
if (item.val[0]-1 >= 0) and (map[item.val[0]-1][item.val[1]] != "X"):
temp.children.append(node(item.val[0]-1,item.val[1]))
queue.append(node(item.val[0]-1,item.val[1]))
#DOWN NODE
if item.val[0]+1 <= len(map) and (map[item.val[0]+1][item.val[1]] != "X"):
temp.children.append(node(item.val[0]+1,item.val[1]))
queue.append(node(item.val[0]+1,item.val[1]))
#LEFT NODE
if item.val[1]-1 >= 0 and (map[item.val[0]][item.val[1]-1] != "X"):
temp.children.append(node(item.val[0],item.val[1]-1))
queue.append(node(item.val[0],item.val[1]-1))
#RIGHT NODE
if item.val[1]+1 <= len(map[0]) and (map[item.val[0]][item.val[1]+1] != "X"):
temp.children.append(node(item.val[0],item.val[1]+1))
queue.append(node(item.val[0],item.val[1]+1))
#once node created and added to root, remove node
queue.remove(item.val)```the above code builds the tree upto the second layer but how do i add the following layers ie. the children of the children of the root? please ping me if you can help
this is an #algos-and-data-structs question
No, sorry. Please copy and paste your question to that channel, and then remove it from here. Thanks!
How to learn machine learning and deep learning
How do I deal with non-stationery data for time series analysis (autocorrelation function (ACF) and partial autocorrelation function (PACF) #help-falafel
I would recommend to read a book like https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow-dp-1492032646/dp/1492032646/ and to practice
Have you tried to differentiate it?
I am trying to submit my Titanic Submission to the Competition and it is showing this error. Can someone help me out
This is the model. ```
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(data_prepared, df_label)
y_pred_log_reg = clf.predict(data_prepared)
acc_log_reg = round( clf.score(data_prepared, df_label) * 100, 2)
print (str(acc_log_reg) + ' percent')```
This is how I am trying to submit ```
submission = pd.DataFrame({
"PassengerId": test["PassengerId"],
"Survived": y_pred_log_reg
})
It is showing me this error, Which I am not able to understand```
ValueError: array length 712 does not match index length 418```
Hi
@worldly dawn How do I differentiate it
can u show me?
import matplotlib.pyplot as plt
nile = sm.datasets.get_rdataset("Nile").data
from statsmodels.tsa.stattools import adfuller
def check_stationarity(series):
# Copied from https://machinelearningmastery.com/time-series-data-stationary-python/
result = adfuller(series.values)
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')
for key, value in result[4].items():
print('\t%s: %.3f' % (key, value))
if (result[1] <= 0.05) & (result[4]['5%'] > result[0]):
print("\u001b[32mStationary\u001b[0m")
else:
print("\x1b[31mNon-stationary\x1b[0m")
check_stationarity(nile['time'])```
here's my code
What type is test?
Also, I would verify that the sequences you're using for passengerid and survived have the same length
https://machinelearningmastery.com/remove-trends-seasonality-difference-transform-python/
Google can be a powerful tool with the right keywords π
If a kind data-science-soul could help me with my problem in #help-cherries I'd be very grateful π
Your test and y_pred have different lengths?
Yes
That's why it complains. They need to be the same length
How Can I achieve that, I have 2 files one Training and other Test. Both are of different lengths
You want to train on train and predict on test
yesss
I am new to Data Science, is this the correct way. Or there lies some other method.
That's how ML and kaggle works. You get train set with y labels. You train on it. Then you predict in test for which you don't have y labels.
In your code you predict in the same data as you trained. Them you create DF from this and test IDs. They have different lengths that's why you get error.
Okaaay Understood. Thaank you so much
Hey can you tell me some more Discord groups, for Data Science, Where I can take help . I only know this one
There's kaggle one as well
Any more you know ?
can you also share me the link
Scikit learn, fastai are on discord too. There are more I'm sure π
Okaay thaank you so much . Miwojc π
Thank you
hi any1 know how to change color of matplotlib.pyplot scale
so 419-425 would be white for example
what's the point of pickling
lemme know if that helps?
yeah thanks
whats pickling
allowing you to store and load arbitary python code
however it's not recommended security wise because it's too easy to sneak in malicious code
I want to make self thinking ai help me
Using machine learning and deep learning
you want jarvis
um
do you know what linear regression is?
actually no forget that
what's a training set and a testing set of data?
what's supervised and unsupervised learning?
can you explain to me the bias v. variance tradeoff?
what's a confusion matrix?
what are some metrics i can use for classification?
what are some metrics i can use for regression?
No like humans
I am new to ai
I friend requested you
yeah no sorry i don't accept friend requests from strangers
I not Stanger
right
anyways, you should be learning the basics of this stuff before you start going into deep learning
Yes
how are your python fundamentals
intermediate python
Learned in this lockdown
ok
i would recommend the aurelien geron book
hands on ml
you need some stats stuff to comprehend it
not all of stats
but some
is definitely good
statquest is nice if you wanna check it out
Ho thanks
that channel's code is in R, but don't worry about implementing things in code until you understand the basic concept
Are you prof in ml and dl
I'm 8th
Thx
Did you saw my profile photo that made by me for nft art
Can anyone tell me how can I implement spectral clustering algorithm on iris dataset
Hi
Hello
Can anyone tell me how can I implement spectral clustering algorithm on iris dataset
Hey guys,
I have a sheet with two columns, the model prediction, and validation. How can I evaluate the model? What are the metrics available in sklearn?
Can anyone tell me how can I implement spectral clustering algorithm on iris dataset
There are 3 different APIs for evaluating the quality of a modelβs predictions: Estimator score method: Estimators have a score method providing a default evaluation criterion for the problem they ...
Following This example will get you most of the way there. https://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html#sphx-glr-auto-examples-cluster-plot-cluster-comparison-py. If you donβt understand whatβs going on in there, Iβd recommend working through the scikit-learn tutorial from the start
If there is a kind datascience soul here, willing to help, please have a look at my problem in #help-potato
I just cannot keep getting on with my project without solving my train/test splitting problem
what does it mean if my model does have perfect precision?
did i implement it wrong?
The precision is the ratio tp / (tp + fp) where tp is the number of true positives and fp the number of false positives. The precision is intuitively the ability of the classifier not to label as positive a sample that is negative.
def recall(y_true, y_pred):
y_true = K.ones_like(y_true)
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
all_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
recall = true_positives / (all_positives + K.epsilon())
return recall
def precision(y_true, y_pred):
y_true = K.ones_like(y_true)
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
def f1score(y_true, y_pred):
precision_m = precision(y_true, y_pred)
recall_m = recall(y_true, y_pred)
return 2*((precision_m*recall_m)/(precision_m+recall_m+K.epsilon()))
Hi
How do I run a 5-fold cross-validation and report both average and standard deviations for the AUC. Make a table summarizing your results
simply use https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html#:~:text=The F-beta score can,and precision are equally important. instead of implementing it yourself
this is for multiclass right? does i get like if i have 6 classes i get 6 precisions?
also can i just put it here ?```python
metrics=[keras.metrics.categorical_accuracy]
yes should be multiclass
Is anyone here familiar with training/test splitting of a dataset and willing to help me with my problem at #help-potato ? I'd be very grateful for any help. I am stuck with this since Friday and cannot continue without solving this
Sure
@tacit basin U are free now?
Yeah
Fit a logistic regression model using 70%-30% of the data for training-testing the model. Report the area under the roc-curve, simply called AUC, for the test sample
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn.model_selection import StratifiedShuffleSplit
import pandas as pd
import statsmodels.api as sm
import sklearn
from sklearn.model_selection import train_test_split
df=pd.read_csv("santander_dataset.csv")
df.info()
y=df['target']
var_cols = [f'var_{i}' for i in range(200)]
x = df[var_cols]```
What's blocking you?
I'm not sure how to plot au
auc
X = sm.add_constant(x)
x_train,x_test,y_train,y_test=train_test_split(X,y,test_size=0.3)
logistic_regression = sm.Logit(y_train, x_train)
fitted_model = logistic_regression.fit()
print(fitted_model.summary())```
I'm self learning ML
and it is an exercise from the book
I doesnt provide answer and I have to pay seperate for the answer which I am not willing
Sure
You are hired by Santander Consumer Bank as data scientist and your first task is to identify which customers
will make a specific transaction in the future, irrespective of the amount of money transacted. To that end,
an analyst delivers to you a data set ready for modeling purposes. The file santander_dataset.csv contains 200 numerical features, one binary response variable and one customer identifier for a total of 200 000 customers. Further, the binary variable indicates whether that customer made a purchase in the future.You are eager to deliver some results to your boss and
4.1 Fit a logistic regression model using 70%-30% of the data for training-testing the model. Report the area under the roc-curve, simply called AUC, for the test sample.
Note: You are advised to use sm.Logit from statsmodels, otherwise make sure the library that you choose does not include a regularization term by default. You are also advised to use an intercept in your logistic regression model.
Usually the steps are: prepare data, split data into train test, train model, predict on test, calculate metric, like AUC in your case
What's your pred on valid set?
Predict on X test, y is your answer
ok
121589 0.291962
76793 0.517639
39540 0.129044
45611 0.160562
...
115452 0.023491
11195 0.507136
82182 0.230065
188417 0.735611```
this what I got
What is that?
this but x_test
Is that X test?
Once you have preds on x test and y test you can calculate auc
https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html
yes
I think your x test have more than one variable?
Is that pteficton on x test? That is y pred?
'key of type tuple not found and not a MultiIndex'
from sklearn import metrics
n_classes = 2
# Compute ROC curve and ROC area for each class
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
fpr[i], tpr[i], _ = metrics.roc_curve(y_test[:, i], y_score[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])
# Compute micro-average ROC curve and ROC area
fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), y_score.ravel())
roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])```
hi how can i change image size outputted by matplotlib.pyplot cause its getting over themselves now
@tacit basin U want me to send u the data set
what u mean
here
I mean is this classification, regression, just one column?
plt.figure(figsize=(3,3))
what di u ean
>>> import matplotlib.pyplot as plt
>>> from sklearn import datasets, metrics, model_selection, svm
>>> X, y = datasets.make_classification(random_state=0)
>>> X_train, X_test, y_train, y_test = model_selection.train_test_split(
... X, y, random_state=0)
>>> clf = svm.SVC(random_state=0)
>>> clf.fit(X_train, y_train)
SVC(random_state=0)
>>> metrics.plot_roc_curve(clf, X_test, y_test)
<...>
>>> plt.show()
This is exactly. Use your data.
from sklearn import metrics
metrics.plot_roc_curve(clf, X_test, y_test)
plt.show()
How can clustering be used for image classification??
The clustering of MNIST digits images into 10 clusters using K means algorithm by extracting features from the CNN model and achieving an accuracy of 98.5%. And also we will understand different aspects of extracting features from images, and see how we can use them to feed it to the K-Means algorithm.
ok thx
Big dataset?
I made a video about how to speed up pandas code. Something I wish I had known when I first learned pandas/python. Hope it's ok to share here. Welcome any feedback: https://www.youtube.com/watch?v=SAFmrTnEHLg
Face it, your pandas code is slow. Learn how to speed it up! In this video Rob discusses a key trick to making your code faster! Pandas is an essential tool for any python programmer and data scientist
Timeline
00:00 Intro
00:46 Creating our Data
02:39 The Problem
03:48 Coding Up the Problem
04:43 Level 1: Loop
06:29 Level 2: Apply
07:27 Level ...
thanks
very well done video bud. ill give your a channel a follow. you could probs turn this content into some type of article to help gain more followers if youre interested as well.
Thanks for the feedback @misty flint - still learning how to make videos and get the word out in a way that isn't too spammy. Making it into an medium article is a great idea!
i think its all about how you preface the message. i guess thats kinda sales-related tbh. and yeah its good content so the more you can spread the message, i think the better.

You're right, I kind of need to be sales-ish to get the word out. I've tried posting on the python reddit. One video got a lot of upvotes- the others got a lot of downvotes. I guess I just need to keep with it.
hi anyone help me with saliency plot
it's better to put your question out there, rather than ask people to commit to your question before you ask it
okay sorry, im new to this
Basically I was able to find code that creates a saliency plot from an exisiting AI model built in. However i have my own AI model i want to use instead, i wasn't sure what i need to add to the code to get my AI model read. https://paste.pythondiscord.com/lugugexeni
anyone do any quant here?
@karmic valley are you talking about these?
what images are those? are they from my code?
i can share with you another code file which has my AI model already loaded so you can see what parts of the code i can transfer to my saliency plot code?
they're from Wikipedia; I'm asking you to verify if my understanding of what you want is correct.
if you share code that loads an AI model, that doesn't mean the person reading it has the AI model on their computer.
yes i see. hmm... or if you have time i can share the code that loads the model and you can suggest what might work and i can try and let you know
it doesn't matter if you share the code that loads the model, because the model itself is a file that I do not have.
if the images were meant to represent a saliency map, i can show an example
Can you tell me if the image I showed already is an example of a saliency map?
it could be but doesn't look like it to me. let me find an example
for example this, so the colors represent what the AI was mainly looking are in relation to image. red being most looked at
do you have any AI model on your laptop that could try. I just need to know the format of loading an AI model to my saliency map code
loading a model depends on how the file was made. there's no one-size-fits-all answer to that.
what does your model output?
a 2d array of saliency scores per pixel?
okay let me just share the model loaded for a different purpose - this is not code for saliency maps.
models are usually pickled Python objects. you can't upload them here.
Hey @karmic valley!
You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.
how was the file created?
like the model or the python code?
this is the code that has my model loaded for a different purpose
@karmic valley take a look at this: https://pytorch.org/tutorials/beginner/saving_loading_models.html
i did have a look at that but still couldnt get it to work. I tried to copy the code from the successfully loaded model for a different purpose to my code for saliency maps but didnt know how much to copy
https://paste.pythondiscord.com/lugugexeni this is saliency map code i got - i didnt make i just found
and i want to put my model on this instead of vgg19
# update it with the previously saved weights. write path in the green part where you saved it
model.load_state_dict(torch.load('/Users/samay/Documents/Education/University Imperial/Module 3/AI/flow_model.pt', map_location="cpu"))
did you write this part?
yes i wrote that. that is in the code i shred which everything works well - that code is to give a numerical score -so something different
!voiceverify
do that in #voice-verification
thanks
shall i just copy this into the other code or is there more code i need to add
if that's the model you want to load in the other program, then yes. but you'd also need to add the requisite import statements.
yes i want to add the same model. what requisite statements do i need to add - do i have them in the code i sent that successfully loads the model? or do i have to write new code
whichever statements import the names you need to run that code.
model is probably defined in an assignment statement. but you definitely need to import torch
okay i will import all the same packages. do you think i have to use any of the other code from here https://paste.pythondiscord.com/ofutumamey
sorry i am not that good with python
that's a pretty big ask. you'll have to be more specific.
try running it and see what happens. Python's error messages are usually pretty informative.
okay i will try now
model = torchvision.models.vgg19(pretrained=True)
for param in model.parameters():
param.requires_grad = False
this is what my dataset looks like so i have to make a model for classification
how to explore some non conventional ways to choose architecture for some data?
this is in the saliency map code
What do you mean?
Architecture for some data?
this is in saliency map code. it is using a pretrained model vgg19 but what do i change that word to if i am using my own model
or do i remove all that coed
and replace with that statment you mentioned earlier
like sometimes some nlp model can work on video data as well......cnn can be applied to non- image based data etc etc.
how do i explore more such things
Look up a ml course or a book about ml methods I guess
or just google "classifier <datatype> data"
@serene scaffold you still here or gone
I think we've exhausted my potential usefulness, unfortunately
ok thanks
What kind of data do you have? @mint palm
or can someone help me solve this
or do you recommend asking anyone speciifc?
@mild dirge can you help finish last part of my question
my data is basically network slice prediction based on variour network parameters
currently CNN + LSTM works on it
This is not clarifying it for me, your data is the results after a grid search on some other classifier?
this is a better representation
Ah so you want to have a classifier that takes as input the results of previous networks
and then predicts "something?"
actually cnn and lstm are directly applied to raw data
i wanted to show you the data but its kind of missing, i am trying to find it,....the kaggle link only shows data summary
I am using sleep edf dataset
and have selected the eeg signals
and applied local binary pattern feature extraction on it
i now need to apply 3-4 deep learning models for classification
please help
Hello guys would someone mind helping abt which course is best to learn linear algebra
Linear algebra resources (like youtube or anything else) bcas I need to learn simultaneously for solving problems on computer vision radiometry, photometry, radiance, irradiance, BRDF etc all these topics based problems
@exotic thicket one usually uses Khan Academy to learn how to do the calculations, and 3blue1brown to understand the theory
I was taking linalg around this time two years ago when covid started, and the professor sent us an email that basically said "fuck! I have no idea how to teach you all remotely. watch 3blue1brown until I figure out what to do"
Yea I have been watching khan academy n 3blue1brown.. Is that enough or any materials u would like to recommend this by the way nice story..
Hello, I would like to model global warming from temperature records recorded daily from June 1920 to October 2019 in MontΓ©limar on Python. To do this, I would first like to model these seasonal variations by a sinusoidal fit. However, such a model fitted to the whole data set does not give any increase in average temperature. I therefore try to apply a sinusoidal fit for each decade.
I first plotted the data in the data file and then created a time variable so I could do my decadal average.
I applied the sine fit to all the decades in the data file, then plotted the entire graph with the fit. Here is my code. It works flawlessly, but I feel like I'm rewriting the same thing several times, which makes the code particularly long. I feel like I could do this in a much shorter time but every time I try I get errors and my graphs don't plot correctly anymore. So I would like to know if someone could help me to optimize it
Here is my code:
def T_A(t, A, phi, B):
omega = (2 * np.pi) / 365
return A * np.sin(omega * t + phi) + B
import datetime
current_decade = np.datetime64(date_new[0], 'Y')
time_for_B = np.linspace(1930, 2020, 10)
#print(time_for_B)
count_time = np.array([]) # Variable to store the temperatures of a decade
count_date = np.array([])
B_list = np.array([])
n = 1
for i in range(0, len(date_new)):
if np.datetime64(date_new[i], 'Y') >= current_decade + np.timedelta64(10, 'Y'):
current_decade = current_decade + np.timedelta64(10, 'Y')
n = n + 1
plt.figure(n)
plt.plot(count_date, count_time)
N = len(count_date)
time_model = np.linspace(0, N, N)
# Fit of the linear model
solution = curve_fit(T_A, time_model, count_time)
# Identification of the parameters
A, phi, B = solution[0]
# Display the result
#print('A = {:4.2f} amplitude'.format(A))
#print('B = {:4.2f} Β°C'.format(B))
#print('phi = {:4.2f} radians'.format(phi))
# Display the sine fit
y = T_A(time_model, A, phi, B)
B_list = np.append(B_list, B)
plt.plot(count_date, y)
errors = 5. * np.ones(y.shape)
# Fit of the linear model
solution, pcov = curve_fit(T_A, time_model, y, sigma = errors, absolute_sigma = True)
# Identification of the model parameters
A, phi, B = solution
# Calculation of the uncertainty on the fitted parameters
perr = np.sqrt(np.diag(pcov))
# Display
print('B = {:5.7f} Β± {:5.3f} Β°C'.format(B, perr[0]))
count_time = np.array([])
count_date = np.array([])
count_time = np.append(count_time, Temperature[i])
count_date = np.append(count_date, date_new[i])
n = n + 1
plt.figure(n)
plt.plot(count_date, count_time, '.')
N = len(count_date)
time_model = np.linspace(0, N, N)
# Fit of the linear model
solution = curve_fit(T_A, time_model, count_time)
# Identification of the parameters
A, phi, B = solution[0]
# Display the result
#print('A = {:4.2f} amplitude'.format(A))
#print('B = {:4.2f} Β°C'.format(B))
#print('phi = {:4.2} radians'.format(phi))
# Display the sine fit
y = T_A(time_model, A, phi, B)
B_list = np.append(B_list, B)
plt.plot(count_date, y)
#print('B =', B_list)
plt.grid()
plt.figure(n + 1)
plt.plot(time_for_B, B_list)
plt.grid()
# Definition of the table of measurement errors
errors = 0.117 * np.ones(B_list.shape)
solution, pcov = curve_fit(T_A, time_model, count_time)
# Identification of the model parameters
A, phi, B = solution
perr = np.sqrt(np.diag(covar))
# Display
print('B = {:5.7f} Β± {:5.3f} Β°C'.format(B, perr[0]))
# Graphical representation of the data with the error bars
plt.errorbar(time_for_B, B_list, yerr = errors, marker = '+', linestyle = '')
# Graph option
plt.xlabel('Date [year]')
plt.ylabel('B [Β°C]')
plt.show()
Here is my data file:
auto encoding require large data.......but how much data can be termed as insufficient?
This is more of a question for #software-architecture
Thank you for your answer, I will ask the question in the dedicated channel
For numerical data we normally have baseline models like K-nearest neighbor, decision tree etc. What kind of baseline model could be used for image data (classification task)?
Was thinking of maybe using SIFT and finding nearest neighbor or something, anything else that someone can think of?
Depends what the data set is like. Well-tuned Random forests are a pretty good baseline - they can do surprising well on a huge variety of data and are fairly quick and easy to setup
Like, if whatever youβre trying to do isnβt a lot better than XGBoost - just use XGBoost. It takes like two seconds to get it working and a deep learned model can take weeks
Deep learning models are part of automl tools like gluon for example
Hi all, I need some help designing a neural network for my final computer science project at school. I'm doing a codebullet style project where im teaching a car how to drive in unity3d using deep Q learning and i I need to write my own neural network from scratch. i was wondering if anyone knows how many inputs my neural network needs. The actual inputs of the car would be accelerate, turn left and turn right. Ive briefly gone through sentdex's neural network series but theres still a lot that I don't understand. I dont really know what my input layer should consist off nor what how many hidden layers I need
I feel like once I have the neural networks done then the Deep Q learning algorithm shouldnt be too difficult
Inputs of your car would be the outputs of your network
So 3 outputs
So your network should tell the car to accelerate, and turn left/right
The input depends on what information you plan on giving your car
Could have distance from car to wall in multiple angles in front of the car f.e.
Im really sorry if im vague with the information i give btw, I have less than a month into a deadline and my mind is a mess rn
thank you
But making a network from scratch when unsure about this kind of stuff does sound a bit daunting tbh
lemme send through a screenshot of the actual scene to give you a better overview of what im working with
oh it is I am a foolish man
but theres no turning back now
When you say "from scratch" do you mean no pytorch/scikit etc?
nope
So you have to write a neural network (like an mlp or something) from scratch using stuff like numpy?
yh, im making this project in unity so im doing it in c
but i dont really need help with that stuff
Ill be able to convert python knowledge to c# easily
its more of the theory thats buggin me
Check out the 3b1b series on neural networks
There will be walls surrounding everything btw
ayt will do! Thanks
also... Do you recon that I'll be able to make the neural network from scratch in 2 weeks. Ik this is an annoying question but i really need to know if what im doing is realistic
I do understand the basics of how a neural network works
You'll probably know how long it will take better than me
I think 1 month is really short for this type of project
If you have nothing else to do than it might be do-able
ok, thanks for ur help!
So what i think ive understood is that each of these blue lines (distance from wall) would be an input and if i increase the number of blue lines, all im doing is increasing accuracy?
And then Id be like right the there is a long distance from the front of my car to a wall so i need to accelerate or the distance between my car and the right wall is very small so i need to turn left etc
Yeah basically
This only works if theres walls all around the track though
otherwise you need to figure something else out
But where you say 'Id be like" that is the task of the algo to learn
yeah sorry thats what i mean
ok thanks i have a much better understanding of what needs to be done now
I've recently downloaded my Googla data with info such as my entire search history and youtube comment history dating back to 2013. I wanted to try and do a little project by organizing all my youtube comments and visualizing it in some way like a word cloud based on word frequency.
The problem is that google gave me the data in an HTML file, and I'm not sure how I can organize that into something like a spreadsheet for easier use.
I posted in #help-lollipop in more detail about this if anyone is willing to help me with this. 
hey I have a serious question that I need help with so I am doing an image detection model using Masks-RCNN and I am trying to return the length of each individual masks from the top left corner of the square to the bottom right of the square here is what i have so far
import PIL.Image as Image
import time
#RUN DETECTION
for image_id in dataset.image_ids:
image = dataset.load_image(image_id)
#image_id = .choice(dataset.image_ids)
print("image id is :",image_id)
image, image_meta, gt_class_id, gt_bbox, gt_mask =
modellib.load_image_gt(dataset, config, image_id, use_mini_mask=False)
info = dataset.image_info[image_id]
print("image ID: {}.{} ({}) {}".format(info["source"], info["id"], image_id, dataset.image_reference(image_id)))
Run object detection
results = model.detect([image], verbose=1)
x = get_ax(1)
r = results[0]
ax = plt.gca()
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], dataset.class_names, r['scores'], figsize=(16, 16), ax=ax, title="Predictions")
log("gt_class_id", gt_class_id)
log("gt_bbox", gt_bbox)
log("gt_mask", gt_mask)
for i in range(r['masks'].shape[-1]):
mask = r['masks'][:, :, i]
print("Mask ID", i)
image[mask] = 100
image[~mask] = 0
#count is the number of value y in x
unique, counts = np.unique(image, return_counts=True)
mask_area = counts[1] / (counts[0] + counts[1])
print(counts[1])
How do I start with data science with python?
Awesome question!
There are lots of youtube guides out there that answer this
but data science is very broad. To learn specifically towards a job field since that is more easier and manageable then trying to learn everything about data science. So can narrow down which field of data science interest you the most
If interested in AI, deep learning or Ai this is bit differeent and can find guides on this on youtube too.
but for most role with data science, generally knowing basics and OOP in python, numpy and pandas and data visualzing libraries are important at the start. This makes it easier to understand guides or articles without getting lost and just being able to work with data much more well.
any recommendations on producing either HTML or PDF reports from matplotlib charts and also dataframe dumps / heatmaps?
ipynb would be nice but it lacks input parameters to make it generic
I think there's four parts to being a data scientist and I want to know all of them
like that
Oh, thats more like i would say machine learning life cycle
Not all data scientist do this
Roles are split and data science term is used interchangeably with data anaylst at some companies
Im not sure of tutorials for this life cycle in one go
But you can find turorials on each of these life cycle on youtube and can learn from there
just create a streamlit app

that should make your stakeholder/boss happy
they just had a major update too https://streamlit.io/
i recommend it for anybody prototyping through analyses or models quickly

looks nice but I really just want pdf or HTML reports for offline use
its possible
theres an extension for jupyter notebooks
it was a pain to find tho so gl bud. sorry.
guys i need suggestion about using CUDA
for the driver, should i choose the Gaming Ready or the Studio Ready one?
or both of them has no effect with the CUDA performance?
Idk if this helps but there's an extension in JNB that allows for converting & downloading your notebook as a pdf.
There's also an option to save your EDA in HTML using Pandas_Profiling library.
I thought you could "print" a notebook as a PDF with just vanilla jupyter notebooks
(even though I think notebooks are basically cocaine)
It's possible to print your JNB as pdf but you'd have to install an extension. I've forgotten the name. I'll check for it
does it support input parameters?
Easiest way to find expected value, stdev, covariance, correlation, variance, etc. of joint random variable using Python?
This actually worked for me. Once you've installed it, you can easily download your JNB as pdf
import sys
!{sys.executable} -m pip install notebook-as-pdf pyppeteer-install
Not exactly. If you want to use input parameters I guess you'd have to write a custom function for it. Then print the JNB as a pdf afterwards π€
in general, flattening data starts by identifying what the "unit" is, and making 1 table to represent that unit
then you can either store the nested stuff in nested data structures (json, arrays, whatever), or normalize as needed/desired
Thanks I'll play around with the idea and see what can be done
You can use statsmodel or scipy to get those. Alternatively you can get standard Deviation, variance from pandas.
df['indemnity'].std() == Standard Deviation
df['indemnity'].var() == Variance
df.cov() == Covariance (pairwise)
You can check Pandas documentation for more
Learning Data Science
1.Learn python basics variables, functions, loops, if statement, oop, etc
2.Learn two python modules pandas and numpy
3.Learn statistics
- mean, median, mode, standard deviation, distributions, central limit theorem, and confidence intervals
4.Visualisations pick one of the following plotline, matplotlib, seaborn
5.Exploratory Data Analysis - Exploring and familiarizing yourself with dataset.
- Looking for trends, patterns or correlations between variables.
- Practice by working through someone else's project e.g. famous titanic dataset on Kaggle
7.Machine Learning
- learn 10 - 15 common ML algorithms.
- supervised learning
- unsupervised learning
- Reinforcement learning
8.Data Scraping / APIs - Allows for automated retrieval of data from websites
- Databases (primarily SQL)
10.Deployment putting the model or application you made into a live environment.
11.Recomended Resources
- freeCodeCamp.org for python
- Stat Quest, data iku for stats
- Kaggle for projects
- MODE for SQL
What do you guys think about my plan for learning DS?
im not a professional, im self teaching ML as well and made a personal roadmap as i want to work s data scientist too with machine learning. However , i would say being specific helps which is what you did for stats.
there is no one right way, eveeyone different, i would say though there is no jeed to learn 10-15 algos . Knowing different ML algos help when solving problem but can easily forgot them and there so many variations, only need to focus on basic ones
you can learn few claifiction, clustering and regression , common are Linear and logistic regression, NN,K means, SvM, KNN to name a few
this roadmap on its own feels like if investing full time , can be 3 months jounrye and if in school or working then 6 months to maybe a yearβs journey , whej learning stats see how it appplies to ML, like mean is common for feature enginneeing and standard deviation for feature engineering too...
but overall seems solid as getting feel of data science.
just make sure to build a ML project for resume
does anyone know how to tune XGBoost model using learning curves?
most important part imo. you can also take the approach where you do projects and learn the various subjects as you go instead of trying to learn everything all at once
then the next project, you can learn another set of skills, etc.
i think that might make it easier to progress instead of feeling "stuck" at times.
is there some explanation for that high loss? what could be the reason or is it normal?
i havent seen it on the tutorials
what i see is a steady decrease on loss
heya does anyone have a good grip in using javascript
this validation accuracy on last epoch should be the final performance of my model on the test set right?
but why i get different result when i try model.evaluate to the same test set?
I have to copy from csv to excel file on daily basis but all data should be unique. e.g. date 1 cv appended. date 2 appended (it how be unique if csv contain some older date data too). overall out should be unique. Can anybody please help?
so to accurately evaluate my model i should just generate confusion matrix based from mode.predict() and just calculate metrices using it ?
i dont understand why fit() evaluate() giving me different accuracy
Is there a way to calculate the metrics with only the data predictions and validation?!
you mean actual data and predicted data?
some of us do, but this is #data-science-and-ml channel tho. you can try off topic channels
Hello,
I'm a High School Student and I have experience in Unity & C# & Python & OpenCV and make many games and published them.
I need an project idea for a Contest in field of AI & Computer Vision and this idea should be practical and not repetitive.
If you have an idea, please guide me. π
No no. I meant I have the data predicted by a model, and I have the validation from experts.
Like true positive, false positives... Etc
so you have confusion matrix.
Yea
so yeah you can calculate stuff with it.
like accuracy = TP + TN / TOTAL
Oh, true lovely. Thank you
Anyone here decent with pandas?
or anything? Im trying to work out the averages of a set of data at each specific value
e.g. for every differing value of signup, I want to get the average rank
df.groupby('signup').rank.mean()
Traceback (most recent call last):
File "D:\6Mans\Heps6Mans\odldata.py", line 9, in <module>
df.groupby('SignupsID').rank.mean()
AttributeError: 'function' object has no attribute 'mean'
ah rank is a builtin function, try )['rank'].mean()
perfect thanks, one more question, how would I go about removing outliers, for example if a rank has values 3,5,5,6,6
How would I remove the 3?
depends what u mean by outliers, you can probably make a function that takes a set of ranks and returns whatever summarized value u want with .apply
like df.groubpy('x').apply(lambda x: some_fn(x.rank))
well, for each signup value, if a rank value is greater/less than 2.5 of the mean, then I want to remove it from the data.
thanks
sorry ive spent the last 15 mins trying to plot this and dont seem to be able to get it
Im able to plot the individual columns against each other, but not the means with their given value against the other values and means
When I use dataframe.to_string(header=false, index=false)
And i try to print it, it print it char by char. How can I make it line by line?
can you show what the current result is?
!e
import pandas as pd, numpy as np
df = pd.DataFrame(np.random.random((5, 5)))
print(df.to_string(header=False, index=False))
@serene scaffold :white_check_mark: Your eval job has completed with return code 0.
001 | 0.921904 0.165852 0.352897 0.583291 0.479739
002 | 0.312984 0.011975 0.376729 0.167694 0.924011
003 | 0.147286 0.413014 0.257549 0.243737 0.699188
004 | 0.845458 0.935840 0.243368 0.271021 0.681798
005 | 0.329731 0.794231 0.106713 0.480623 0.485640
I converted the data frame into a list, and the problem was solved. Thanks
Here's how to format Python code on Discord:
```py
print('Hello world!')
```
These are backticks, not quotes. Check this out if you can't find the backtick key.
@serene scaffold Well this is what im trying to achieve
can't help rn, sorry 
ok np 
If anyone else is able to help im just trying to plot my grouby
df.groupby('SignupsID')['Rank'].mean()
Good Morning Everyone
I have a requirement which I dont even know if state of AI/ML is so advanced to do this. So I will try to explain (I am not a beginner :))
My company provides legal advice to customers, we have a large dataset of documents with legal advice. When a new customer arrives with a similar issue, we would like to be able to generate advice for them automatically based on an ML trained model. Of course this advice would be reviewed by a human afterwards,
the way I thought it could be done, is to split the documents into paragraphs, and classify each paragraph with a specific category (Advice for X, Advice for Y, ). But I am not sure if that is sufficient.
Hello everyone π is there any1 who can help me with plotly dash ?
Yes, but please just ask your question next time
@tranquil drift #βο½how-to-get-help
Hi guys
I have two questions
I'm in my mid 19, is it too late for me to start learning data science?
&
what is the road map? I mean what things should learn? (I am already quite familiar with Python)
lol!!! I am 43 and I started 3 years ago,!
I would suggest a course on coursera on applied machine learning, or on Udemy also Python for Machine Learning.
but the world of DataScience is huge
Hi everyone, I'm 17 and I wanna learn about data science, ML. I'm decent in python and I know about numpy and pandas fairly. Can anyone please help me with my next step. What should I do?!!!
Can I text anyone personally to know about this
I'm using chi2 distance to find the simularity between all rows of my dataset. (for clustering)
I fail to see how feature scaling would make a difference (scaling between 0-1 would only make the chi2 distance smaller but the ratio wouldn't change)
So how could feature scaling help me achieve a better model?
No and what you should learn is 1) maths fundamentals. Basically the same stuff they teach you for any STEM university degree 2) statistics. This wouldnβt be required for most STEM degrees but is necessary for data science. 3) software writing teamwork skills. How to use git, best practices to write code thatβs readable and reusable, etc. 4) youβre gonna have to do presentations as a data scientist so get better at that
Most likely path to get into DS - go to college for STEM degree, then masters or PHD with a focus on computational research after
I'm actually a CS Undergrad (fresher)
I just want a roadmap to dive in DS/ML field
Thanks
Another catch: I'm from India, so education is different from what you think π .
This has to do with how the loss functions and model learning process respond to the features. If one column of feature data has values ranging. From -10000 to 10000, and another column of feature data ranges from 0 to 0.5, without feature normalization (scaling is one way of doing this) your model is going to have a hard time learning anything except that the huge ranged feature is 20000x more important than the small ranged one. Scaling all features from 0-1 prevents this
Iβm not Indian, but many of my good friends and work colleagues went to IITs, so Iβm not entirely unfamiliar. I know they make you take plenty of maths!
of course, thanks for the refresher! this completely slipped through my mind
but I don't think it has any effect on my clustering
Yaa π
But yeah I mean if you just want intro to ML stuff Iβd recommend Andrew Ngβs course on coursera.com. You can watch the course for free if you donβt want the certificate. Many Stanford university AI/ML courses are up for free on YouTube, just search for them.
Perhaps not, Iβm not familiar with chi2 clustering algorithm. It may include scaling in it, or be scale invariant in its construction. If so, then feature scaling wouldnβt matter.
Thanku π
chi2 distance compares one column at a time (thus scaling between columns isn't necessary for as far as I can tell)
Is anyone experienced with machine learning for customer feedback
by chi2 distance do you just mean a pearson's correlation coefficient test to distinguish clusters?
Can anyone who is good with pandas and dataframes pm me
this is the forumla
yeah this is a way of expressing the https://en.wikipedia.org/wiki/Chi-squared_test.
why are you using that for clustering, @urban lance . I'm curious
You'll be more likely to get help if you follow the server guidelines: #βο½how-to-get-help
I'm trying to predict in which stage a user is of his/her customer journey.
I preprocessed the data in such a way that I have counts of each action that a user took within a certain time interval.
On a forum online, I read that chi2 distance was perfect for a dataset like this
and I have gotten promising resultes so far
anyone can help me?
dontasktoask
okay interesting
How can I prevent this from happening? Table gets decreased because of missing values but I want them replaced with nans
whats the link to paste bin
https://stackoverflow.com/questions/5953373/how-to-split-image-into-multiple-pieces-in-python i found this link which has multiple ways of splitting image. not sure which would work for me - can you suggest one code to try. i want to split image into 100 vertical sections and save each section
theres this aswell will this work
inside merge add, how='left'
thank you, it worked
welcome
in the first one remove 'y:y+N', and it will split it into parts, but make sure your image is in a numpy array
okay let me try now
sorry im bad at coding
how do i upload my image and make it a numpty array
@radiant trout
from PIL import Image
importt numpy as np
img = Image.open('ur image location here !')
im = np.array(img)
Hi, Im a student trying to do research and on this dataset but its 8gb and my laptop cant download. Is there another way I can download it?
thank you! i will try this now
@radiant trout i put your code to convert it into numpy array, but for next part of code which i copied from that link to split image into 100 vertical sections i am getting error
do you have any suggestions. please. been trying to do this for a week aha
you can use colab, so you will have data over there, which will be downloaded over there and you can mess around over there.
advantage: it won't need big data plan of 8 gb.
disadvantage: the data is not on your machine.
if u want to convert an image with 200 pixel into 100 sections , then M=2
how many pixels is your image?
I forgot to mention its a tar file. Is that still possible with colab?
okay let me check 2 seconds
yep, you can use magic commands to basically use notebook as a terminal.
example, run
! pwd
1374 width and 512 height
so do i replace all the 'M' in my code with 2 or keep letter M
numpy.split(ary, indices_or_sections, axis=0)```
Split an array into multiple sub-arrays as views into *ary*.
you'll need to check axis. lemme mess a sec.
!e
import numpy as np
a = np.arange(15).reshape(3, 5)
print(np.array_split(a, 3, axis=1))
@lapis sequoia :white_check_mark: Your eval job has completed with return code 0.
001 | [array([[ 0, 1],
002 | [ 5, 6],
003 | [10, 11]]), array([[ 2, 3],
004 | [ 7, 8],
005 | [12, 13]]), array([[ 4],
006 | [ 9],
007 | [14]])]
hm good.
np.split(im, 100,axis=0)--> try this as @lapis sequoia suggested
yeah tho just a sidenote, you can use split if they can be equally splitted, else use array_split like used above.
axis should be 1 i think.
where do i upload my picture in this code
yeah but your image has 512 columns so you gotta use array_split.
or do i use both baratheon code and your code together
you can assume your picture as numpy array. someone already said that right?
oh i see you were adding onto his code. i will try now
oh yeah brartheon said.
@karmic valley your im is the image in numpy array form
I want the source code of self thinking ai like humans
also it will be appreciated if you put code here.
images are hard to read.
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
yeah. in format.
does anyone know how to add missing date values to this df, i.e. use previous value as the missing value
its just like 10 lines so putting here works too.
I want this source in github
pandas has a nice guide on this.
https://pandas.pydata.org/docs/user_guide/missing_data.html
@lapis sequoia @radiant trout i put it in paste bin the code
not sure if you can edit
are you new in coding?
if so, please do
im a few months in but have aproject due in a couple days
and i was told to split image
but still cant do
i can put in the code for you, but it seems like you kinda have less idea about whats happenning.
I'll just put comments so you understand things.
thank you!
thanks
i really enjoy learning to code and learn by example mostly.
i spent a week trying to split image into 100 vertical sections haha
from PIL import Image
import numpy as np
img = Image.open('C:\Users\samay\Documents\Education\University Imperial\Module 3\AI\Archive\con1_4360000ms_to_4374000ms_contrast_resized.png')
im = np.array(img) # this is numpy array from your image
print(np.split(im, 100, axis=1)) # this should prob give error since you just have 512 columns and it can't be perfectly divided by 100.
# so try array_split
thanks let me try
you mentioned that this error will come up, but what do i have to change to stop it coming up
i wrote that in comment too
yes i think so haha. but now got this new error
im = np.array_split(img) # this is numpy array from your image
TypeError: array_split() takes at least 2 arguments (1 given)
good. thats better. now read the doc carefully.
i know this could be easily resolved by me lol, but i want you to do it.
okay i will read now, please stay here 2min whilst i do . youre so helpful
could anyone help me out with this, im a newbie and so im struggling with the coding
@lapis sequoia hi ive tried to read the article you provided and some other website and i see that you need numpy.array_split(ary, indices_or_sections, axis=0). but i can't seem to find good explanation for what ary represents.
also i came across something called vsplit too for vertial splitting - not sure if that will be better for me or doesnt matter?
it says ary is about splitting srray into sub arrays
but not really sure tbh
oh i see thanks
numpy.array_split(ary, indices_or_sections, axis=0)
here what does 2nd argument seem like?
so the second in think is number of split you want
about 100
good. now try to print np.array_split(img, 100)[0].shape
what this will do, i will tell you in a sec
okay let me add that to my code thanks
you can remove all the below code.
shall i remove print(np.split(im, 100, axis=1))
from PIL import Image
import numpy as np
img = Image.open('C:\Users\samay\Documents\Education\University Imperial\Module 3\AI\Archive\con1_4360000ms_to_4374000ms_contrast_resized.png')
im = np.array(img)
print(np.array_split(im, 100)[0].shape)
thanks
Thank you! I used !wget
okay so it seems like it worked. i got code 0. does this code also save the images
what did it print?
also no it does not save image.
it printed 6L, 1374L
you will need to use r'' to handle backslashes in the string
or use forward slashes /
img = Image.open(r'C:\Users\samay\Documents\Education\University Imperial\Module 3\AI\Archive\con1_4360000ms_to_4374000ms_contrast_resized.png')
the r prefix tells python not to try to interpret \ as an escape sequence
oh its their previous code. i did not notice it.
L?
thanks will add this
hm what in the world is L
(long integer)
ohh!!
numpy.array_split(ary, indices_or_sections, axis=0)```
Split an array into multiple sub-arrays.
Please refer to the `split` documentation. The only difference between these functions is that `array_split` allows *indices\_or\_sections* to be an integer that does *not* equally divide the axis. For an array of length l that should be split into n sections, it returns l % n sub-arrays of size l//n + 1 and the rest of size l//n.
See also
[`split`](https://numpy.org/devdocs/reference/generated/numpy.split.html#numpy.split "numpy.split")Split array into multiple sub-arrays of equal size.
Examples...
okay so your code seems good.
from PIL import Image
import numpy as np
img = Image.open('C:\Users\samay\Documents\Education\University Imperial\Module 3\AI\Archive\con1_4360000ms_to_4374000ms_contrast_resized.png')
im = np.array(img)
images = np.array_split(im, 100)
so here each array is of size (6, 17..) or (5, 17...)
now you need to save each.
this gave syntax error
great
oh? what was the error
that's valid syntax, unless i miscounted parentheses
also don't overthink it
i'm just telling you to check the type of a certain thing
it's not a magic spell
great, this works no errors
ah got you
ok now you need to just loop over that list and save each.
thats code for how to save image.
im = Image.fromarray(A)
im.save("your_file.jpeg")
can i save as png because my images are png
you can yeah, i've just shown you the way.
thanks, will add this now!
i gotta run now! later.
you really need to learn basics first.
please last thing
I cannot spoon feed.
last thing ever promise
i will not ask after this
basically i have this due tomorrow thats why asking
i will learn everything properly after that
okay i used im and it gave no error
but it didnt work unfortunately
the image it saved is the same size @lapis sequoia
@desert oar any idea why code didnt work
Hello! I am new in data science and i want to improve myself in this field and I've got some ideas about python and the libraries that used for machine learning, the thing I am looking for is a documentary for Model Accuracy(actually i am not sure what they name it) or what to do at this part, Do you guys have any suggestion to me? Thank you!
like how to improve it>?
yes
Hello
Report the estimates for Ξ²1, Ξ²2, and Ξ²3, together with their 95% confidence intervals. Comment briefly on the results.
can anyone help with this
@lapis sequoia it called model performance but this includes hyper paramter tuning , featuring engineering for example
can just google how to improve machine learning model accuracy'
hi
@karmic valley I understand that this is important to you, but everyone here is a volunteer and you're asking for a lot. it was rude for you to continue to pester prashaaaaaaaaaaaant after they disengaged. please be more considerate going forward.
and I've already asked you to stop pinging people to draw attention to your question, but you did that with salt rock lamp. please contact us over @sonic vapor if you have any other questions about what is appropriate.
Hi
Hello, go ahead and introduce what you wanted to talk about
Go on, and once you've asked your question, people can see if they can answer it.
Someone tells you that your model in equation 3 is incorrect and that the correct model is
y = Ξ²0 + Ξ²1x1 + Ξ²2x2 + Ξ²3(x1 + x2)/2+ Ο΅, (4)
where Ο΅ βΌ N (Β΅ = 0, Ο2) and x1 and x2 are from exercise 1. Then, you are advice to
2.9 Use statsmodels and fit a linear regression model. Comment on the results.
2.10 How can you fix the model in equation 4? Name some alternatives
2.10 Add y-intercept to the equation so that it guarantees that your residuals have a mean of zero and to avoid the regression line to go through the origin.
!rule 8
8. Do not help with ongoing exams. When helping with homework, help people learn how to do the assignment without doing it for them.
just fyi, @regal gale - not that nobody will help, but we're not going to directly answer your question
Yeah, this seems like straight-up homework. Could you tell us what you've tried so far?
I would like to use a bunch of features to correlate their significance in determining an outcome of true/false
what is this called?
not really sure what you mean by that. Are you trying to decide how important each feature is to a model? How much each feature correlates to the target variable?
2.10 is my answer
@neat anvil
I see
so
a y-intercept is the value the function takes when the x variable is set to 0
so, what happens if you take the equation in (4) and set x to zero?
this should probably move to a help channel
open one and then @ me
@neat anvil yes
those are two different things I described
anyone help me with saliency mapping
if so, get in touch please
pastebin!
!pastebin
Pasting large amounts of code
If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/
After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.
!pastebin
*error - says model not defined. i am trying to load my own AI model and create saliency map. would appreciate if someone could fix my code https://paste.pythondiscord.com/qodoxiyuyu
Hi. How does one create an application out of a deep learning model? I mean how do you take all the model weights, your backend and frontend and package all that into something like an exe file?
Basic web app with stream lit or gradio is quite simple
Not a web app. I meant a native app kind of thing on a desktop
Sorry
You need to define the model
.
Hi, i can try :)
2.10 Add y-intercept to the equation so that it guarantees that your residuals have a mean of zero and to avoid the regression line to go through the origin.
Someone tells you that your model in equation 3 is incorrect and that the correct model is
y = Ξ²0 + Ξ²1x1 + Ξ²2x2 + Ξ²3(x1 + x2)/2+ Ο΅, (4)
where Ο΅ βΌ N (Β΅ = 0, Ο2) and x1 and x2 are from exercise 1. Then, you are advice to
2.9 Use statsmodels and fit a linear regression model. Comment on the results.
2.10 How can you fix the model in equation 4? Name some alternatives
2.10 Add y-intercept to the equation so that it guarantees that your residuals have a mean of zero and to avoid the regression line to go through the origin.
What's the equation?
y = Ξ²0 + Ξ²1x1 + Ξ²2x2 + Ξ²3(x1 + x2)/2+ Ο΅, (4)
okay think ive done that now.
now i want to change one part of code. the code is to find the image from the internet but i want to just upload from my laptop how do i do that https://paste.pythondiscord.com/uxiruqojeb
Can anyone tell me how can I visualise cluster made my spectral clustering algorithm
@tacit basin r u there
Is intercept Bo?
oh yeah
ok
my answer is wrong
I'm not sure how I can fix the model in equation 4 thou
Is it smth about collinearity/
. what is equation 4?
dis
Is that a desktop app? GUI? Terminal?
I'm using pycharm on my laptop
What's wrong with this model?
So it's terminal app. Hmm. Probably you could provide path to file.
How do you know its wrong?
Here's the output
what's wrong with it
Remove beta 3 as it p-value of 0.326 which is greater than 0.05
???
I think we can remove beta 3?
Like yolo inference for example
python detect.py --source 0 # webcam img.jpg # image vid.mp4 # video path/ # directory path/*.jpg # glob 'https://youtu.be/Zgi9g1ksQHc' # YouTube 'rtsp://example.com/media.mp4' # RTSP, RTMP, HTTP stream
https://github.com/ultralytics/yolov5
But I'm pretty sure there are more fancy ways too :)
Inside the Ultralytics mission of Making AI Easy! A behind-the-scenes look at the team behind YOLOv5 π β the world's favorite Vision AI.
YOLOv5 π resources
- GitHub β https://github.com/ultralytics/yolov5
- Wiki β https://github.com/ultralytics/yolov5/wiki
- Tutorials β https://github.com/ultralytics/yolov5#tutorials
- Docs β https://docs.ultra...
0.3 is not lower than 0.05
Ok
any other thing u can suggest
yes
can I just send u my ipynb
For next 15 min i should be
Hey guys, do we have a tensorflow enthusiast online?
You shall just share the question. Someone will prob answer if they can.
π€
What is your X_train?
Nope
Can I send you the Colab file?
Itβs a label encoded and one hot encoded with about 30 columns
In total there are 15 classes or so
Okay labels are strings?
Yes
Hm how would your model understand strings? The last layer will give probability distribution of size 15.
got it. thanks. what's the question, where in the notebook should i look?
Iβm not quite sure but i think that labels as strings do work
can u just see if I m doign it right
probably will not have time to got thourgh whole notebook today, sry
Howdy y'all. Small survey question:
- Do you / Have you used autoML?
- Do you find it useful to go through autoML?
- If you use autoML, what's the rest of your usual workflow?
I'm mostly interested in this since a few people on our DS team were very interested in autoML and some were very, very against it. I haven't used it much beyond h2o, but it was "okay" for EDA. I didn't have a strong opinion about it.
You can give a look at this tut. Very similar to yours. Here mark what they pass as labels to the model actually.
trying to 'run all' but getting errors on comments that are entered as a code. it would make sense to either make them markdown or comments
i am trying to create a saliency map. i found some code which used a vgg19 AI model already on python and it had code that downloaded the image from the internet to upload on python. I have now modified the code to add my own AI model but i want to change the code so that i dont need to write URL to upload image and instead just upload image from my laptop. could someone help modify my code so it does this - i couldn't get your prev suggestion to work miwojc. https://paste.pythondiscord.com/uxiruqojeb
can i find this dataset somewhere: santander_dataset.csv?
i used it for some kaggle comps. didn't get any medal lol π
- no - we evaluated some automl products in my company's field of interest and they were too expensive and\or limited in application to be worth trying out.
Haha, miwojc, did you think they gave you any insight?
Raymond, I have exactly the same issue, esp with H2o.
Hey @wooden forge!
You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.
could have let me copy paste it... ;(
I've heard that some people use AutoML for some feature-cleaning, or pre-processing, but I've not dipped my toes into that. That seems most promising to me right now, if anything.
Yes, they rather quickly run through a lot of models and hyperparams. Giving direction which model to develop further. There's a kaggler who posts notebooks with various automl frameworks for monthly comps.ill find it
Hi ! Alright so I had an experimental class this afternoon about Xrays diffraction on crystals. We used a software I don't know to plot the data and it used a fitting method: Pseudo Voigt. It also gave us this kind of file (https://paste.pythondiscord.com/ovupudemos) containing the fitting parameters. But I have no idea on how to use them to actually plot the curves on python because we only have screenshot (which is not how I conceive data science lol). So would anyone have any idea on how to read those parameters and use them to plot my data ? Thanks in advance !
you could parse that text file into one or two pandas dataframes quite easily
!d pandas.read_csv
pandas.read_csv(filepath_or_buffer, sep=NoDefault.no_default, delimiter=None, header='infer', names=NoDefault.no_default, index_col=None, usecols=None, squeeze=None, ...)```
Read a comma-separated values (csv) file into DataFrame.
Also supports optionally iterating or breaking of the file into chunks.
Additional help can be found in the online docs for [IO Tools](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html).
look at all the arguments around skiprows, column names, whatever
then it would not be hard to plot from there. Standard data munging and plotting.
If you don't already have encoded the equations for the https://en.wikipedia.org/wiki/Voigt_profile distribution, that might be a bit tricky
actually, looks like scipy has you covered
!d scipy.special.voigt_profile
scipy.special.voigt_profile(x, sigma, gamma, out=None) = <ufunc 'voigt_profile'>```
Voigt profile.
The Voigt profile is a convolution of a 1-D Normal distribution with standard deviation `sigma` and a 1-D Cauchy distribution with half-width at half-maximum `gamma`.
If `sigma = 0`, PDF of Cauchy distribution is returned. Conversely, if `gamma = 0`, PDF of Normal distribution is returned. If `sigma = gamma = 0`, the return value is `Inf` for `x = 0`, and `0` for all other `x`.
That's the one: nice overview of different automl frameworks
https://www.kaggle.com/rohanrao/automl-tutorial-tps-december-2021
Cool, thanks for the link. I'll be sure to read this noise! :']
IDK. You'll have to figure out how to translate the parameters that machine gives you into something that scipy expects. I don't think any of us can help you with that. The diffraction machine likely has documentation of what it means by each parameter it spits out at you, and scipy has documentation of what it expects, so you'll have to translate b\w the two. Because scientific equipment manufacturers often like to make things difficult, you may have to rearrange the equation and\or do some conversions to get it's output to match the scipy standard.
Anyone know how to calculate signal to noise ratio of image. @ me if any advice
i'm not sure if this is the right sections, but any google ortools wizards able to help with some routing questions?
I have a pandas dataframe like this:
Value
0 10
1 2
2 34
3 14
4 52
5 26
I'd like to filter it down such that only every other value is considered, like this:
Value
0 10
1 34
2 52
Anyone know how to do this? is it just a simple for loop where you increase a variable for every n rows you want to skip?
or is there a pandas thing to do this? I've seen the groupby function but it doesn't seem to do what I'm trying to do here
do you know how you'd get every other value in a list? it's quite similar with a dataframe.
I think I've figured it out. really what I'm trying to do is get every 32nd value, but I simplified it for the example. but I'm thinking I can say df.iloc[::32] or something like this
yes, try that
does anyone here do any algo trading?
For computer vision, is it worth studying the traditional contents of digital image processing or is it better to start learning and diving into deep learning?
hi
i have an image and i want python to work out the average pixel intensity below the blue line and average pixel intensity above blue line from image. I know code to work out average pixel intensity of full image. But dont know how to do pixel intensity below blue line and above. IF you have any ideas please help. Or another way is to get python to split image into 2 - one with everything below blue line and other with everything above line then i can do my code. But i dont know how to do that
results = model.detect([image], verbose=1)
x = get_ax(1)
r = results[0]
ax = plt.gca()
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'], dataset.class_names, r['scores'], figsize=(16, 16), ax=ax, title="Predictions")
log("gt_class_id", gt_class_id)
log("gt_bbox", gt_bbox)
log("gt_mask", gt_mask)
for i in range(r['masks'].shape[-1]):
mask=r['masks'][:, :, i]
print("Mask ID", i)
# print(r['masks'].shape[-2])
# print(mask.shape)
# mask = r['masks'][:, :, i]
image[mask] = 100
image[~mask] = 0
#count is the number of value y in x
unique, counts = np.unique(image, return_counts=True)
#counts[1] is the masks area
mask_area = counts[1] / (counts[0] + counts[1])
print("Distance:", math.sqrt(counts[1])*math.sqrt(2))
for i in range(r['rois'].shape[-2]):
boxes=r['rois'][i, :]
image[boxes]=100
image[~boxes]=0
unique, counts = np.unique(image, return_counts=True)
#counts[1] is the masks area
box_area = counts[1] / (counts[0] + counts[1])
print("Distance:", math.sqrt(counts[1])*math.sqrt(2))```
Hi so I have a very simple question or at least simple to some people I am trying to get the area of the bounding box in my machine learning model the above is my code you can see how I implemented it correctly for the masks but the masks are given width and height whereas the bounding box is given ymin, ymax, xmin... etc can anyone please demonstrate how I could return the diagnol of the bounding box or bottom left corned to the top right i am looking for that distance
boxes: [num_instance, (y1, x1, y2, x2, class_id)] in image coordinates.
masks: [height, width, num_instances]
class_ids: [num_instances]
class_names: list of class names of the dataset
scores: (optional) confidence scores for each box
title: (optional) Figure title
show_mask, show_bbox: To show masks and bounding boxes or not
figsize: (optional) the size of the image
colors: (optional) An array or colors to use with each object
captions: (optional) A list of strings to use as captions for each object
please help
**if anyone is wanting to hop in a quick chat to help I would appreciate it **
@helpers
you have to ask your actual question, with a code example and an explanation of the problem, before anyone will volunteer to help.
Hey @lapis sequoia!
It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com
can any of yall help me with my image duplicate detector program? the jist is that it takes both images, gets the average rgb value of both, then compares them and spits it out as true if the comparison is 100%. and this new update that ive tried to split the image into 4 different quadrants and use those quadrants as images to compare, which it did. however it isnt working as great as i envisioned, and even didnt come up true for some of the images that came up true with the single pixel method.
https://paste.pythondiscord.com/nabasixolu
i am applying auto encoding of this:
https://raw.githubusercontent.com/adtmv7/DeepSlice/master/Source/filtereddata.txt
its enough data, right? its my first time doing it.
you already know about me wanting to try it 
but it didnt come from nowhere; i watched sessions of AutoML from both Databricks and Dataiku, and one of the things that they included that might convince DS is an editable jupyter notebook to modify the AutoML, so it isnt a black box; its transparent. i think dataiku had some interesting metrics too like model drift and some way they measured bias too. been a while tho.
also while i have you. i learned about another nifty toy tool from that DE podcast: https://rockset.com/ 
its a db built for real-time analytics, optimized for it too.
Listen to this episode from Monday Morning Data Chat on Spotify. Shruti Bhat (Chief Product Officer & SVP Marketing @ Rockset) joins us to chat about real-time analytics at cloud scale, modern data apps and dashboards, and more. The world of analytics is moving toward real-time as a first-class citizen, so this is definitely a discussion you sho...

also salt rock lamp seemed to like it
so theres that
ok im done 
How do I open a .gz file from a website in Goolgle colab?
Hi everybody,
I start to be very interested by data science and ai, but I'm very newbi in this domain (I'm from the simple/classic web dev :P)
And I'm very interested by what the process or best documentation to learn to compare an image from a camera with 20K+ of images hosted on AWS the most efficient/fastest as possible (seconds, minimum of minutes max), and find the image which has the highest match score.
Should it be step by step by recognizing some element of the image first to go down on the classification of images, or just looping and compare the images ?
Python is the good programming language for this ? There is some resources to check and read about this ?
I also tried OpenCV for sure for some POC, but I'm pretty sure there is some most efficient methods to do.
Open talk here or PM π
If it's not the good channel, just say me and I can move my talk to the good channel, thank you.
@old dawn this is the right channel for your question. What you you have described is similar to image classification. As far as I know, this is pretty much always done with neural networks.
@serene scaffold Hey, thank you for the quick answer!
So OpenCV can be the good library to do the job, efficiently, and I just need to learn mo(ooooo)re about it ?
Or maybe there is a most efficient library in Python ?
I think you're overly fixating on efficiency when you first need to understand how these problems get solved.
I think too, maybe
That said, image processing is not my domain.
No problem, you answered me, and oriented me
It's appreciated
Of course π someone with relevant knowledge may come along
im not sure opencv is going to give "match scores" its more for image processing like object tracking, thresholding and things like that
aws has a bunch of apis for image recognition
if u wanted to do something like histogram matching then opencv would work
Thank you for the information about OpenCV library π
I tried the last week AWS Rekognition, but I have some problems to build the dataset with 20K+ images, I'm still checking what I do wrong for this method (My classification labels are not good I think). But if I can get more infos from people.
yeah it really depends, model building can be fairly involved
Maybe to draw my classification tree model on a paper, should help me to find the good way
you could have a look at the examples here https://cv.gluon.ai/build/examples_classification/index.html
Hello, sorry if this is the wrong place to ask but is there a faster way to do np.asarray([np.matmul(h, point) for point in xy_points]) where h is a 2x2 matrix, and xy points is a list of 2x1 matrices
Thank you for the resource π, I go to read it
Are you trying to store a bunch of images and have a fast image search / match on that?
Yes, but I just need to return the best match, not a list
Do the matches need to be exact or just "similar"?
@iron basalt Exact should be the best, but it depends of the source image, and it will come from a camera. So the exact match should insane ahah but very difficult to get it, so the most similar.
Here an example of what I want to do but with 20K+ images as dataset
I made a playing card detector program that uses OpenCV-Python to detect and identify playing cards in a video feed. It runs on the Raspberry Pi 3 with an attached PiCamera. This video explains the image processing algorithm I use to detect and identify the cards.
I'm looking for part-time consulting or short-term contracting work in the are...
That's why I used OpenCV first but it take very long time to process.
Last week I started to use AWS Rekognition, but I still get information to find the best way / thinking how to get the problem solve
CV is also not my domain but we used Azure's equivalent for a separate thing and it performed pretty decently

Oh?! I'm curious
yeah i dont remember the service since my friend was in charge of that part but im sure you can find it if you dig around
Thank you a lot, I check right now!
np. hope it can help
So i'm not sure what your performance issues with OpenCV are, but in general it seems like you are trying to solve a NNS (https://en.wikipedia.org/wiki/Nearest_neighbor_search ) problem and to do that you want to convert each image into some kind of lower dimensional point (vector) (or higher dimensions if you are daring enough to get deep into ML) and then find the nearest neighbor of the query point.
Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values.
Formally, the nearest-neighbor (NN...
One method that is popular because it can deal with both high dimensional input and can also act as an approximate nearest neighbor search method is locality sensitive hashing (LSH): https://en.wikipedia.org/wiki/Locality-sensitive_hashing
In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller than the universe of possible input items.) Since similar items end up in the same buckets, this technique can be used for data clustering and near...
In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets is much smaller than the universe of possible input items.) Since similar items end up in the same buckets, this technique can be used for data clustering and near...
You can do better than LSH with learning (ML) methods, but it works pretty well.
Interesting, I did not know about NNS and LSH
Good approach
On the general idea of converting each image to a vector, you can make the search for the nearest neighbor in that space faster by using a simple k-d tree.
In computer science, a k-d tree (short for k-dimensional tree) is a space-partitioning data structure for organizing points in a k-dimensional space. k-d trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbor searches) and creating point cloud...
(For better big O)
A lot of new information and concepts to understand, I'm reading it
Thank you for the explanation, it's a good summary you did
Very good community here by the way
Thank every people try to help me to understand this domain π
I think it's Azure Model Builder
https://docs.microsoft.com/en-us/dotnet/machine-learning/tutorials/image-classification-model-builder
give an example please?
Who want aurelien geron ml ed 2 book for free
Use this link :
https://filecr.com/elearning/hands-on-machine-learning-with-scikit/?id=95223165062
csv file date format- DD-MM-YYYY
dataframe date format- DD-MM-YYYY
After exporting to xlsx file format changing in - YYYY-MM-DD.
Kindly help tried everything gonna be mad. PLS help
when you say After exporting to xlsx file , are u checking it in some software or reading the file in pandas ?
should i use auto encoder for unsupervised pretraining only in large datasets?
Hi, guys, i'm currently trying to generate box plots with plotly. I'm doing this on a "large" dataframes (1.1M rows). However even though the box plot itself is pretty simple in the end (just a few "hoverable" points) the output file in itself is 20 to 50 MB, which is way too large.
It seems like Plotly keeps all the raw data to generate the plot instead of only using the computed (Median,Q1,Q2,Q3, Fences data)
Is there a way to ask plotly to only output computed values ?
Edit: My current solutions would be :
A: decimate the data to reduce the sampling rate (loss of precision)
B: Create a dataframe separate with Median Q1 ... Q4 and fences only and generate the boxplot from there (a bit redundant)
um, i'm not entirely sure but could you try changing include_plotlyjs to "cdn"? include_plotlyjs="cdn"
ON that the output file is intended for an iframe where our backend already have the library installed. Right now the library isn't in the HTML at all. I appreciate the suggestion tho.
Hey @obsidian kelp!
You either uploaded a .txt file or entered a message that was too long. Please use our paste bin instead.
Hello guys!
I need to parse an HTML table from Wikipedia (cirillic characters page)
I'm using Panda and Beautifulsoup
Here is my code from Jupyter:
https://paste.pythondiscord.com/duhagobuyu
In particular I think that the error lies here:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-15: ordinal not in range(128)
I've tried to encode data in different ways but I'm not getting any luck. Also, keep in mind I'm on a Mac, don't know if that's part of the problem
Thank you so much in advance for the help!
you pay in Mb for books there? seems dodgy
humble bundle is not free, but you get a lot of books for relatively very little money. all legally. i recommend that: https://www.humblebundle.com/books/allinone-python-packt-books?hmb_source=humble_home&hmb_medium=product_tile&hmb_campaign=mosaic_section_3_layout_index_2_layout_type_threes_tile_index_3_c_allinonepythonpackt_bookbundle
I see XL though MS XL
Hello everyone,
I have a list of products sold past year.
I d like to visualize by customers the top X products of the past years that they didn t buy.
What is the best solution to do so?
hello, I don't know if this is the right place to ask about image processing, but is anyone know the pythonic way to check whether the image is grayscale or colored. AFAIK, there is chance if colored image is grayscale or lack of color, and I want to eliminate this kind of image since it won't give significant information for me.
r == g == b (r == g and g == b in Python)
ah i see that
i thought that even its gray-colored RGB, the channels elements are unique for each other
When rgb are equal, it has zero saturation.
The value is the max of rgb.
(Which if they are equal is any of them)
got it
So all pixels in the image must have equal rgb (for each pixel, not between pixels) for it to be a grayscale image.
this is so neat i'm sad that i found it just now π
>>> x = np.array([[[125, 125, 125], [14, 14, 14], [0, 0, 0]], [[60, 60, 60], [18, 18, 18], [1, 1, 1]]])
>>> x
array([[[125, 125, 125],
[ 14, 14, 14],
[ 0, 0, 0]],
[[ 60, 60, 60],
[ 18, 18, 18],
[ 1, 1, 1]]])
>>> x[:, :, 0] == x[:, :, 1]
array([[ True, True, True],
[ True, True, True]])
>>> (x[:, :, 0] == x[:, :, 1]).all() and (x[:, :, 1] == x[:, :, 2]).all()
True
yeah, so the change is happening due to the software, it parses the date in its own default format, try finding performing the change in ms xl itself
greyscale usually has only one channel, while rgb has 3,
so your image shape may look like (1000, 2000, 3) for rgb
does anyone have any tips on wht to do if your clustering doesn't make sense
my feature engineering has gotta be improved but I don't know what else to try
I was just training the model and loss went nan, what does that imply?
I have faced this first time.
it was not nan initially, it was decreasing, it is possible that it went so far small that it shows nan now?
Are there any good tools for visualising neural network architecture? Specifically for CNNs
any kind soul can help to give some feedback to a self-check assignment from a regression textbook? Unfortunately I have to pay for the answer and I am not willing to, hopefully someone can let me know if there's any glaring issue
#data-science-and-ml message
hi would appreciate if someone could help me here



