#data-science-and-ml | Python | Page 372

flat sable Jan 29, 2022, 2:16 AM

#

when u read a book u need to implement in the same u learnin?

#

like use a code on the book and see what its doing

stone marlin Jan 29, 2022, 2:17 AM

#

Mouadjg, think about what sorts of things you want to do in the field before deciding on a job description. An ML Engineer is a very specific (almost kind of niche) job. :'] You might want to go to like, Indeed, and check out some job descriptions to see which you like.

flat sable Jan 29, 2022, 2:17 AM

#

?

oak olive Jan 29, 2022, 2:17 AM

#

flat sable like use a code on the book and see what its doing

Yep

flat sable Jan 29, 2022, 2:17 AM

#

oak olive Yep

ah i'll try to learn from books

oak olive Jan 29, 2022, 2:17 AM

#

I mean, I code it for myself searching for the stuff I do not know about

flat sable Jan 29, 2022, 2:18 AM

#

stone marlin Mouadjg, think about what sorts of things you want to do in the field before dec...

yeh sure i'll see now

oak olive Jan 29, 2022, 2:19 AM

#

flat sable ah i'll try to learn from books

Please, do not use me as a reference hahaha. I am just a bored student interested in ML, I am sure that @stone marlin can advice you way better than me

stone marlin Jan 29, 2022, 2:19 AM

#

In general, in the ML/DS area, things are different job-to-job. I'd say if Python is the language-of-choice, you will be using, at least:

Pandas (this gives you "dataframes" which are similar to tables in SQL or like in Excel)
Numpy (this gives you some methods to do numerical stuff in python efficiently)
Matplotlib (this package lets you plot things)
Scikit-Learn (this package has a ton of the ds-related things in it: preprocessing, modeling, evaluation, pipelining, etc.)

There are other packages, of course, but these are the big ones.

flat sable Jan 29, 2022, 2:19 AM

#

do any one of u have any idea about these two courses

#

https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/

Udemy

Learn Python for Data Science, Structures, Algorithms, Interviews

Learn how to use NumPy, Pandas, Seaborn , Matplotlib , Plotly , Scikit-Learn , Machine Learning, Tensorflow , and more!

#

https://www.udemy.com/course/complete-machine-learning-and-data-science-zero-to-mastery/

Udemy

Complete Machine Learning & Data Science Bootcamp 2022

Learn Data Science, Data Analysis, Machine Learning (Artificial Intelligence) and Python with Tensorflow, Pandas & more!

stone marlin Jan 29, 2022, 2:19 AM

#

You asked about this above, no? I don't know, but I'll echo Stel's sentiment.

flat sable Jan 29, 2022, 2:19 AM

#

oak olive Please, do not use me as a reference hahaha. I am just a bored student intereste...

no no it's not like that because lot of my friend told me to use books rather than video

oak olive Jan 29, 2022, 2:19 AM

#

stone marlin In general, in the ML/DS area, things are different job-to-job. I'd say if Pyth...

Excellent, so the stuff presented in Hands on Machine Learning is useful

flat sable Jan 29, 2022, 2:20 AM

#

stone marlin You asked about this above, no? I don't know, but I'll echo Stel's sentiment.

yeh

oak olive Jan 29, 2022, 2:20 AM

#

flat sable no no it's not like that because lot of my friend told me to use books rather th...

I do believe that It is a personal preference, I am just accustomed to it

stone marlin Jan 29, 2022, 2:20 AM

#

Yeah, it's legit. I'd say that learning to use all of those things is a great use of time. There are more specific libraries, but those can be learned as you go.

flat sable Jan 29, 2022, 2:20 AM

#

stone marlin In general, in the ML/DS area, things are different job-to-job. I'd say if Pyth...

thaank you so muchhh for your explainationn

flat sable Jan 29, 2022, 2:21 AM

#

oak olive I do believe that It is a personal preference, I am just accustomed to it

yeah it's depend on persone but i'll try it ,itcan be better

stone marlin Jan 29, 2022, 2:21 AM

#

I'm a big fan of books + online docs, but I'm much more used to it. Ultimately, whatever helps you learn the stuff.

oak olive Jan 29, 2022, 2:22 AM

#

Me too

#

I use a drawing tablet for annotations so e-reading is part of my daily basis

#

Thanks a lot @stone marlin for sharing your experience, it is been extremely helpful

flat sable Jan 29, 2022, 2:23 AM

#

oak olive I use a drawing tablet for annotations so e-reading is part of my daily basis

i like this habit of being disciplined to read books

stone marlin Jan 29, 2022, 2:23 AM

#

No problem, just stick to it.

oak olive Jan 29, 2022, 2:24 AM

#

flat sable i like this habit of being disciplined to read books

Hahaha COVID-19 forced me into it

hollow sentinel Jan 29, 2022, 2:53 AM

#

stone marlin No problem, just stick to it.

nooo what happened to your green filter pfp

hollow sentinel Jan 29, 2022, 2:54 AM

#

flat sable https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootca...

you can do this, but i don't think you'd really retain an understanding from it if you don't know the math behind it

#

there are github repos w the course in python i believe

stone marlin Jan 29, 2022, 2:56 AM

#

Oh, that's good --- still would be good to code it themselves, though.

hollow sentinel Jan 29, 2022, 2:57 AM

#

for some strange reason, df.dt.weekofyear works now??

#

yeah i just meant they could use it as a reference

#

i feel the burnout that's for sure

#

made a typo hang on

#

https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.isocalendar.html this is all i see in the docs for isocalendar

#

oh dear i think stack overflow is down lmao

flat sable Jan 29, 2022, 3:08 AM

#

hollow sentinel you can do this, but i don't think you'd really retain an understanding from it ...

do i need just a high school maths?*

vocal kraken Jan 29, 2022, 3:08 AM

#

im new resources to learn link?

#

or smth

hollow sentinel Jan 29, 2022, 3:08 AM

#

yeah i think high school stats can be a good basis to build on

flat sable Jan 29, 2022, 3:09 AM

#

hollow sentinel yeah i think high school stats can be a good basis to build on

ah thank u i'll start the course that u recommend

hollow sentinel Jan 29, 2022, 3:09 AM

#

https://youtu.be/9FtHB7V14Fo?list=PL5102DFDC6790F3D0

YouTube

Professor Leonard

Statistics Lecture 1.1: The Key Words and Definitions For Elementa...

https://www.patreon.com/ProfessorLeonard

Statistics Lecture 1.1: The Key Words and Definitions For Elementary Statistics

▶ Play video

#

i recommend you use this guy's playlist

flat sable Jan 29, 2022, 3:10 AM

#

hollow sentinel https://youtu.be/9FtHB7V14Fo?list=PL5102DFDC6790F3D0

aa thank y smmm

azure orchid Jan 29, 2022, 3:11 AM

#

anyone know AI Voice Assistant

safe elk Jan 29, 2022, 3:42 AM

#

oak olive Hahaha COVID-19 forced me into it

I did read more books due to the panndemic but had the habit before it

wicked grove Jan 29, 2022, 5:39 AM

#

Hello ,i wanted to know if kfold cross validation is good way to predict the accuracy of the model

#

Since during the training the model already sees data which is used for validation in the next iteration

serene scaffold Jan 29, 2022, 5:44 AM

#

@wicked grove each iteration of k fold cross validation, the model is reset

#

So it doesn't benefit from the evaluation data having been part of the training data in the past.

#

If it did, you would be right, however.

#

Good question lemon_hyperpleased

wicked grove Jan 29, 2022, 5:59 AM

#

serene scaffold <@696373334119546890> each iteration of k fold cross validation, the model is re...

Ohhh thank you so much
Is that automatically set in sklearn?

#

Or do i have to set that in some way?

vocal kraken Jan 29, 2022, 6:07 AM

#

im completely new to python itself how do i start coursse pls....

wicked grove Jan 29, 2022, 6:07 AM

#

@serene scaffold

models = [] train_errors = np.empty(shape=n_folds) test_errors = np.empty(shape=n_folds) for idx, (train, test) in enumerate(cv_folds): model = Model() model.fit(train) models.append(model) train_errors[:, idx] = model._loss(train) test_errors[:, idx[ = model._loss(test)``` like this?

swift oxide Jan 29, 2022, 6:16 AM

#

wicked grove <@253696366952316929> ```py models = [] train_errors = np.empty(shape=n_folds) ...

there is a metric cross_val_score

#

which automatically does cross validation and prints out a list of scores

#

default is I think 5

#

although yes there are methods you can make validation sets

#

KFold, Stratified , there are more, I just remember these two

#

https://www.youtube.com/watch?v=3fzYdnuvEfk

YouTube

Krish Naik

All Type Of Cross Validation With Python All In 1 Video

https://github.com/krishnaik06/Types-Of-Cross-Validation
Building machine learning models is an important element of predictive modeling. However, without proper model validation, the confidence that the trained model will generalize well on the unseen data can never be high. Model validation helps in ensuring that the model performs well on new...

▶ Play video

#

here

swift oxide Jan 29, 2022, 6:20 AM

#

vocal kraken im completely new to python itself how do i start coursse pls....

'python for everybody' , its on freecodecamp

vocal kraken Jan 29, 2022, 6:28 AM

#

k but i want to learn data science

untold venture Jan 29, 2022, 6:45 AM

#

Guys i listen to lex fridman podcast , eye on ai etc often i hear of some kind of online forums, groups where exciting things are being worked and explored apart from this discord community i have been unable to find some good forums /groups /channels/communities , i would really appreciate any opinions or suggestions .

wicked grove Jan 29, 2022, 7:35 AM

#

swift oxide which automatically does cross validation and prints out a list of scores

Yes yes but my question is should i reinstantiate the model everytime i do k fold

#

Or is that automatically done when i do model.fit in the loop

wicked grove Jan 29, 2022, 7:37 AM

#

serene scaffold <@696373334119546890> each iteration of k fold cross validation, the model is re...

like this is my code

#

from sklearn.model_selection import KFold
kf = KFold(3,shuffle=True,random_state=42)

fold=0
for train,test in kf.split(x_train):
  fold+=1
  print('fold',fold)
  x_train1 = x_train[train]
  x_val = x_train[test]
  y_train1=y_train[train]
  y_val=y_train[test]
  history = model2.fit(x_train1,y_train1,epochs=50,validation_data=(x_val, y_val),callbacks=[callback])```

#

should i initialize it anywhere in the loop

swift oxide Jan 29, 2022, 8:39 AM

#

wicked grove should i initialize it anywhere in the loop

use cross_val_score and then in the 'cv' argument put kf

gloomy glen Jan 29, 2022, 9:12 AM

#

Hai, can anyone guide me how to get the speech from a video into a text file?

#

I tried but i am getting error

#

import moviepy.editor as mp
import speech_recognition as sr

clip = mp.VideoFileClip(r"sample1.mp4")
clip.audio.write_audiofile(r"Converted_audio.wav")
print("Finished the convertion into audio...")

audio = sr.AudioFile("Converted_audio.wav")
print("Audio file readed...")

r = sr.Recognizer()
with audio as source:
audio_file = r.record(source)

result = r.recognize_google(audio_file)
with open('recognized.txt',mode ='w') as file:
file.write(result)

print("Wooh.. I did it...")

#

This is the error(last two lines)

#

raise RequestError("recognition request failed: {}".format(e.reason))
speech_recognition.RequestError: recognition request failed: Bad Request

lapis sequoia Jan 29, 2022, 9:37 AM

#

check this out

#

my new ai

#

J.A.R.V.I.S

#

https://gist.github.com/anmol-123456789/c5db168b8847e2ceb46e9cb4aa4cf710

Gist

How to make an ai using python only, with many interesting features...

How to make an ai using python only, with many interesting features . Give credit in the project when using . anmol-123456789 github bit.ly/justacoder - JARVISv-1.0.py

autumn glade Jan 29, 2022, 10:07 AM

#

are there any memory leak problems with librosa.load when its put inside a loop?

when this is run on colab it exceeds available RAM (12 GB)

import os
import nltk
import csv
src = '/content/drive/MyDrive/Audio/'
text = '/content/drive/MyDrive/Transcripts/'
for file in os.listdir(src):
  #path = os.path.join(src,file)
  path = f"{src}/{file}"
  #print(path)
  temp=file[:-3] +'txt'
  tpath =os.path.join(text,temp)
  #print(tpath)
  #print(path)
  #print(tpath)
  speech, rate = librosa.load(path,sr=16000)
  input_values = tokenizer(speech, return_tensors = 'pt').input_values
  logits = model(input_values).logits
  predicted_ids = torch.argmax(logits, dim =-1)
  transcriptions = tokenizer.decode(predicted_ids[0])
  print(transcriptions)
  f = open(tpath, "r")
  for x in f:
     print(x)
     wer=fastwer.score_sent(x, transcriptions, char_level=True)
     print(wer)```

sour shoal Jan 29, 2022, 12:13 PM

#

When normalizing data
do you subtract the mean first then divide by the standard deviation?
so like if i am standardizing lots of matrices in a larger matrix
I will find the mean of each individually first
subtract that from each entry in each matrix individually
then find the sd for each matrix individually and do the same, is this right?

#

but the important thing is i subtract the mean before i find the sd of each matrix right?

worthy phoenix Jan 29, 2022, 12:15 PM

#

anyone got a dataset of different types of common file formats used in our daily life?

somber prism Jan 29, 2022, 12:32 PM

#

hey everyone , can someone help me with this . https://www.kaggle.com/muhammedjaabir/asl-alphabet-classification-irl-testing its an asl alphabet sign classification. I can see that my model is overfitting and so to fix that i collected more image data (289k images ) and trained it for around 5 epochs which gave me ```

training acc (augmented and rescaled) :==> 85%
development acc (rescaled) :==> 95%
testing acc (rescaled) :==> 19%

asl alphabet classification - irl testing

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

nova smelt Jan 29, 2022, 12:56 PM

#

when visualizing data why do these weird white line occur horizontally in the blue graph?

#

i plotted the graph using plt.subplots(). when i just plot the blue data with plt.plot() there arent these weird white lines

modest shuttle Jan 29, 2022, 1:32 PM

#

Hello,
Pycharm or Anaconda for OpenCV?

lapis sequoia Jan 29, 2022, 1:48 PM

#

modest shuttle Hello, Pycharm or Anaconda for OpenCV?

Anaconda is a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment.

and pycharm is an IDE. I'm really confused.

modest shuttle Jan 29, 2022, 1:53 PM

#

lapis sequoia Anaconda is a distribution of the Python and R programming languages for scienti...

sorry, I'm very beginner.
I want to learn Object Detection and i think opencv is good for object detection, what's best way to use that?

swift oxide Jan 29, 2022, 2:05 PM

#

nova smelt when visualizing data why do these weird white line occur horizontally in the bl...

change the style maybe

lapis sequoia Jan 29, 2022, 3:20 PM

#

datascience is not web scrapping. please post in related channel.

late ruin Jan 29, 2022, 3:20 PM

#

lapis sequoia datascience is not web scrapping. please post in related channel.

oh im sorry i will delete, could you address me where?

lapis sequoia Jan 29, 2022, 3:21 PM

#

hm even I'm not sure where they do take place. I'm not sure if it can be put in #web-development
But webdevs may can help you since we usually scrap depending on classes or ids.

#

so #web-development seems atleast closer to it.

late ruin Jan 29, 2022, 3:23 PM

#

well i tried, i hope it will work and i will get help

lapis sequoia Jan 29, 2022, 3:24 PM

#

late ruin well i tried, i hope it will work and i will get help

regardless you can always ask in help channels ofc. and in #python-discussion too.

spring marsh Jan 29, 2022, 3:39 PM

#

How do I make my data frame (on the left) look like the one with lines(on the right)
?

#

https://media.discordapp.net/attachments/696353641476653127/937001171137216572/IMG_20220129_203717.jpg?width=720&height=540

serene scaffold Jan 29, 2022, 4:06 PM

#

@spring marsh probably a jupyter notebook setting. pandas doesn't actually render the tables like that, jupyter does.

spring marsh Jan 29, 2022, 4:08 PM

#

u know how to do that?

#

even pycharm shows tables like this only

#

@serene scaffold

serene scaffold Jan 29, 2022, 4:29 PM

#

spring marsh u know how to do that?

not really. why do you need them to look that way, though? just personal preference?

spring marsh Jan 29, 2022, 4:29 PM

#

personal preference they look better to me that way

#

@serene scaffold

serene scaffold Jan 29, 2022, 4:33 PM

#

@spring marsh looks like you can use this: https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.html#pandas.io.formats.style.Styler

#

so it would be something like df.head().style(...)

#

but I don't know html, and that's what it's based on.

spring marsh Jan 29, 2022, 4:34 PM

#

oh

#

ohk

#

let me try that

serene scaffold Jan 29, 2022, 4:35 PM

#

sorry I wasn't more helpful.

spring marsh Jan 29, 2022, 4:35 PM

#

aye more than enough helful

#

thanks for your help

#

*helpful

wicked grove Jan 29, 2022, 4:37 PM

#

serene scaffold sorry I wasn't more helpful.

Heyy, I'm so sorry to ping you again
You were telling me about how the model is reset in each iteration of kfold

#

Could you please tell me if my code is correct

#

from sklearn.model_selection import KFold
kf = KFold(3,shuffle=True,random_state=42)
histories=[]
fold=0
for train,test in kf.split(x_train):
  fold+=1
  print('fold',fold)
  x_train1 = x_train[train]
  x_val = x_train[test]
  y_train1=y_train[train]
  y_val=y_train[test]
  history = model2.fit(x_train1,y_train1,epochs=50,validation_data=(x_val, y_val),callbacks=[callback])
  histories.append(history)
  tf.keras.backend.clear_session()```

tropic matrix Jan 29, 2022, 4:38 PM

#

Hello, I'm trying to develop a regression model to predict prices for an in-game economy using TF Keras.
I've already managed to properly preprocess the data and such, but my main concern would be the accuracy for a trained model using it (as the game has over 2000 items in the economy, and I'm assuming if i just blindly included item id as one of the parameters to the model it wouldn't perform very accurately).
From doing some research, I heard making a model for each item would be the best choice, and from an experiment with a few items this is likely the case. However, what would be the best way to train and "store" each model for use? Should i train each model back to back, which would likely take a LONG time to complete? Is there a way to speed up said training and maybe use some sort of parallel processing? After training a model, should i just save the model info (with model.save) and write code to use the correct model based on the inputted item id? Is there a way to do this without making so many new files/directories (i.e. is it possible to "merge" all the models into one?)

sick moon Jan 29, 2022, 5:24 PM

#

https://www.linkedin.com/posts/muraatozbek_rl-plane-evade-activity-6893196300792512512-dBKg

M. Murat Özbek on LinkedIn: #RL #plane #evade

This video demonstrates an example of #RL-based Competitive Multi-agent System. In this task, the missiles are trying to crash into the #plane while communicating...

mild dirge Jan 29, 2022, 5:42 PM

#

tropic matrix Hello, I'm trying to develop a regression model to predict prices for an in-game...

You could use one hot encoding, since adding the id might "make the model think there's some sequential pattern in the id" (closer ID's are more similiair) but this is likely not the case

#

So if you think the ID is of large importance, add 2000 nodes to the input of the model, and have each item id give a value of 1 for a certain node, and 0 for all other 1999 nodes

#

That way the model won't try to learn a sequential pattern from the ID's

#

@tropic matrix

#

(This is talking about using 1 model, and not separate model for each item ID)

pearl heart Jan 29, 2022, 6:19 PM

#

hell

#

hello

mild dirge Jan 29, 2022, 6:20 PM

#

hey, what's up?

pearl heart Jan 29, 2022, 6:21 PM

#

I am just greeting you, I just joined the group

wicked grove Jan 29, 2022, 6:42 PM

#

mild dirge hey, what's up?

Hello

#

Could you please tell me if my code is correct

lapis sequoia Jan 29, 2022, 6:43 PM

#

What does it mean that ML pipelines should be composable? If you can provide an example in respond, that would be nice. Please, if you want to respond to this question, use "reply"

lapis sequoia Jan 29, 2022, 6:59 PM

#

Therefore, while exploratory data analysis code can still live in notebooks, the source code for components must be modularized.
What does it mean that source code for components must be modularized?

somber prism Jan 29, 2022, 7:21 PM

#

hey everyone , can someone help me with this . https://www.kaggle.com/muhammedjaabir/asl-alphabet-classification-irl-testing its an asl alphabet sign classification. I can see that my model is overfitting and so to fix that i collected more image data (289k images ) and trained it for around 5 epochs which gave me ```

training acc (augmented and rescaled) :==> 85%
development acc (rescaled) :==> 95%
testing acc (rescaled) :==> 19%

asl alphabet classification - irl testing

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

mild dirge Jan 29, 2022, 7:34 PM

#

lapis sequoia > Therefore, while exploratory data analysis code can still live in notebooks, t...

Put in different modules, like have a .py file for each separate component or use classes etc.

#

not one giant clump of code, but multiple nicely organised modules

stone marlin Jan 29, 2022, 7:41 PM

#

My workflow is usually do some eda and functionalize anything that kind'a works nicely, then group that into their own modules.

#

I don't like to keep everything in a notebook. Even if I use a notebook after that, I can still call my modules from there.

#

I've also, more recently, been trying to make most of my things pipelines so that I can take advantage of pipeline composition.

lapis sequoia Jan 29, 2022, 8:54 PM

#

So ML pipelines are composable if they are placed in modules and those modules and can be combined together to achieve particular goal?

stone marlin Jan 29, 2022, 8:54 PM

#

Pipelines are composable in general, modules are kind of like files that have all these related functions inside of them. It's a good way to structure a python package.

#

Here's an example pipeline from a blog post (it uses the Penguin dataset). This one is an example, so it's just using a bunch of stuff, but it shows how you can use and compose pipelines. I'm not sure if this is what is meant above, but it's something I do fairly often since, for example, I might want to try a different preprocessing pipeline but keep the same modeling pipeline or something.

# Create pipeline for preprocessing numerics.
pipeline_numeric = Pipeline([
    ("impute_w_mean", SimpleImputer(strategy="mean")),
    ("scale_normal", StandardScaler())
])

# Create pipeline for preprocessing categoricals.
pipeline_categorical = Pipeline([
    ("impute_w_most_frequent", SimpleImputer(strategy="most_frequent")),
    ("one_hot_encode", OneHotEncoder(handle_unknown='ignore', sparse=False))
])

# Give columns defining desired numeric/categorical cols.
numeric_cols = ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']
categorical_cols = ['species', 'island']

# Column Transformer which "runs" the two preprocessing pipelines
# on the correct columns.
preprocessing_transformer = ColumnTransformer([
    ("numeric", pipeline_numeric, numeric_cols),
    ("categorical", pipeline_categorical, categorical_cols)
])

# Normal RFC definition.
rf_clf = RandomForestClassifier()

# Entire preprocessing-to-modeling pipeline.
preprocess_model_pipeline = Pipeline([
    ("preprocessing", preprocessing_transformer),
    ("random_forest_classifier", rf_clf)
])

# Fitting the pipeline.
pmp = preprocess_model_pipeline.fit(x_train, y_train)

#

Obv you can have a whole lot more in here, I just wanted to give an example of how this kind of thing works with putting everything in a pipeline.

#

The perk is that I could totally just make another pipeline_numeric_2 or something with a different SimpleImputer strategy and then all I'd have to do is plug that into the preprocessing transformer.

#

Source on the docs: https://scikit-learn.org/stable/modules/compose.html#pipeline

scikit-learn

6.1. Pipelines and composite estimators

Transformers are usually combined with classifiers, regressors or other estimators to build a composite estimator. The most common tool is a Pipeline. Pipeline is often used in combination with Fea...

visual minnow Jan 29, 2022, 9:18 PM

#

Hey does this channel also contains computer vision?

sleek sentinel Jan 29, 2022, 9:47 PM

#

Hi, I have to make a language detector for very short texts which must have 100 languages, when I say very short it is for example messages of 5 characters or more but misspelled or something like "Bot... no..." I'm tryinnnnnnnnnnnnng" etc... Do you know what technology I should use? I also have my own datasets that I form as I go along which can have hundreds of thousands of examples.

tropic matrix Jan 29, 2022, 10:30 PM

#

mild dirge You could use one hot encoding, since adding the id might "make the model think ...

the thing about one hot encoding is that including the other features that are used as inputs, i would be getting around 3000-4000 inputs into the model, multiplied by the 23,141,931 rows of data i'm using, makes memory become an issue (also yes ids are actually strings btw but they aren't related)

mild dirge Jan 29, 2022, 10:30 PM

#

Well if you have memory issues you can always train in batches (which you probably already do)

#

and I think training and storing 2000 separate neural networks might also not be such a great idea @tropic matrix

tropic matrix Jan 29, 2022, 10:32 PM

#

mild dirge and I think training and storing 2000 separate neural networks might also not be...

that is also correct, ig the main thing is that when i trained each model individually it performed more accurately overall than when I trained it all at once

mild dirge Jan 29, 2022, 10:32 PM

#

right, but you didn't use 1 hot encoding

#

you probably gave the ID as an integer

#

which makes the model think there could be a sequential pattern

#

but the actual value of the ID isn't important, it's just symbolic

#

that's why one hot encoding should be used

lapis sequoia Jan 29, 2022, 11:39 PM

#

stone marlin Pipelines are composable in general, modules are kind of like files that have al...

"Pipelines are composable in general..." - I am not sure what does it mean that pipelines are composable. Does it mean that different pipelines are connected together to make one solution?

visual minnow Jan 29, 2022, 11:47 PM

#

visual minnow Hey does this channel also contains computer vision?

Can someone answer?

serene scaffold Jan 29, 2022, 11:50 PM

#

visual minnow Can someone answer?

you can ask about computer vision, yes.

visual minnow Jan 30, 2022, 12:17 AM

#

Aight ty

velvet thorn Jan 30, 2022, 12:23 AM

#

lapis sequoia "Pipelines are composable in general..." - I am not sure what does it mean that ...

it means that you can make a pipeline out of other pipelines

lapis sequoia Jan 30, 2022, 12:28 AM

#

velvet thorn it means that you can make a pipeline out of other pipelines

Yeah, that's what I thought. Thanks for response!

magic ridge Jan 30, 2022, 12:59 AM

#

Who here is familiarized with the "face_recognition" lib?

serene scaffold Jan 30, 2022, 1:04 AM

#

magic ridge Who here is familiarized with the "face_recognition" lib?

whenever you want help, you should just ask your actual question, not if anyone knows about the topic of a question you haven't actually asked.

#

(context: I told this person to ask their question in this channel.)

magic ridge Jan 30, 2022, 1:05 AM

#

Very well then.

#

The error:

File "/usr/local/lib/python3.10/site-packages/PIL/Image.py", line 2959, in open
    fp = io.BytesIO(fp.read())
AttributeError: 'tuple' object has no attribute 'read'

#

The code:

def LoadImageFile(FilePath:str, Unknown=True, TargetName=None) -> list:
#    set_trace()
    try:
        FFile = face_recognition.load_image_file(FilePath)
        Locations = face_recognition.face_locations(FFile)
        Encodings = face_recognition.face_encodings(FFile, Locations)
    except:
        set_trace()```

#

I've verified prior to the load_image_call that the image is indeed a string rather than a tuple and it's a real path(which is also previously verified with os.path)

I've also made sure that it wasn't face_recognition doing anything wrong because I've used it in various scenarios with nearly the same context.

#

PDB Output(Before the load_image_call):

LoadImageFile()
-> try:
(Pdb) print(FilePath)
Pic.png
(Pdb)

#

PDB Output(Invoked by exception)

LoadImageFile()
-> if Unknown:
(Pdb) print(FilePath)
()
(Pdb) ```

#

Please mention me when one of you are able to help.

visual minnow Jan 30, 2022, 1:27 AM

#

#

This is the result of some txt detection

#

Using ocr with python

#

Anything i could possibly imporove?

stone marlin Jan 30, 2022, 2:08 AM

#

I just got done with an "interview problem" that I thought y'all might enjoy, so I wanted to share it with you! This was for a Sr. Data Scientist position at a small-to-mid-sized startup where the focus was mainly marketing-type things. So, how would you go about something like this?

Note: The time I was allotted was 1.5 hours, no restrictions on language, methods, etc.

Note further: I am not asking for solutions for myself, I already did this and submitted it. I think it would be fun to see how y'all would approach something like this!

Problem: Divvy is a bike-share program in Chicago. It collects a bunch of data about riders. Using data from, say, Divvy_Trips_2020_Q1.zip at https://divvy-tripdata.s3.amazonaws.com/index.html, try to figure out something about this stuff:

Suppose we work for Divvy and we want to give discount rates to commuters. At what stations would we want to advertise the most?
Suppose we work for Divvy and we want to give discount rates to "pleasure-cyclists", who mostly just rent a bike to use for a nice trip along the lake a day or two a month. Where would we advertise this?
(Here's one I thought of) If we increased the price of Divvy a bit, what stations do we expect to grow / shrink? If we decreased the price of Divvy for a bit, what stations do we expect to grow / shrink?

Have fun!

prime hearth Jan 30, 2022, 2:19 AM

#

Thanks I will try this , im surprised this is for senior role though, i though senior data scientist would be masked to build a ML model deploy it on cloud with tableua for visual aids and data cleaning techniques and featue engineering

#

This is more like data analyst role (not saying it is but this is my first time seeing this),

dusk tide Jan 30, 2022, 2:20 AM

#

Any deep learning book for beginners which has everything including maths ??

serene scaffold Jan 30, 2022, 2:23 AM

#

dusk tide Any deep learning book for beginners which has everything including maths ??

https://www.deeplearningbook.org/

iron basalt Jan 30, 2022, 3:46 AM

#

stone marlin I just got done with an "interview problem" that I thought y'all might enjoy, **...

Have not looked at the data and idk if I can be bothered to do this problem, but does the data contain information about what was advertised where, how, and how much was spent?

oak olive Jan 30, 2022, 3:47 AM

#

Hi!

#

I am going to make a pretty strange question

#

Currently I am reading a book called "Introduction to Statistical Learning" by Trevor Hastie. To be honest I am completely in love with it and the stuff it offers.

#

However, the moment I enjoy the most is when the "math" stuff comes into play, now that I have studied AI I know for sure that this is what I want to do for a living

#

In the real world. What happens with the "mathematical" profile worker? Is it really taken into account?

#

I do not know if the question is clear, I am not a native speaker but I tried to do my best in order to express what I want to say

#

Please let me know if my explanation is not clear. Thanks in advance

iron basalt Jan 30, 2022, 4:23 AM

#

oak olive In the real world. What happens with the "mathematical" profile worker? Is it re...

In AI? Yes, but that's just part of it, you need to be able to program to demonstrate things (science / engineering), not just prove things (math). And also maybe even more, like robotics (physical / real world demonstration).

#

Could a mathematician get a job in AI? Yeah, probably, but you have be really good at it then to make it worth hiring you for only that, and not someone else that can do multiple things.

tropic matrix Jan 30, 2022, 4:37 AM

#

mild dirge that's why one hot encoding should be used

hm ok, I'll try one hot encoding the item names again
on another note, is there anything i can do to help speed up the training process? i'm already training in batches to save on ram usage due to the huge amount of data and using a gpu, but is there anything else I can do?

iron basalt Jan 30, 2022, 4:40 AM

#

iron basalt Could a mathematician get a job in AI? Yeah, probably, but you have be really go...

A lot of people in AI really like math, but the real hard work comes from reality being messy and incredibly complex, plus lots of trial and error (the daily grind). When I get to work on just some plain math I am relieved (unless it's someone else's paper with nonsense notation and hand-waving).

oak olive Jan 30, 2022, 4:44 AM

#

iron basalt A lot of people in AI really like math, but the real hard work comes from realit...

I get the point

#

Would you consider that a good mathematical background is mandatory? How useful advanced math really is when it comes to AI?

#

Sorry for asking too much, new questions related to the AI field arise to my mind every day

iron basalt Jan 30, 2022, 4:52 AM

#

oak olive Would you consider that a good mathematical background is mandatory? How useful ...

Yes it's mandatory (the fundamentals such as linear algebra, statistics, calculus). At the very least many of the different fields' intuitions / essence are useful as a guide on what could be done / give ideas of where to go.

oak olive Jan 30, 2022, 4:52 AM

#

iron basalt Yes it's mandatory (the fundamentals such as linear algebra, statistics, calculu...

Are you currently working in the AI field?

iron basalt Jan 30, 2022, 4:52 AM

#

oak olive Are you currently working in the AI field?

Yes.

oak olive Jan 30, 2022, 4:53 AM

#

Would you please, if you dont mind, describe me what a day in your work is like?

iron basalt Jan 30, 2022, 5:04 AM

#

It consists of working on my current project and spending time seeing what others are working on while also improving my overall knowledge in (not necessarily in this order) mathematics (whatever field I currently feel lacking in or think might be useful), computer science, physics, biology, neuroscience, history (of AI/ML/related), psychology, robotics, and others (it can all apply to AI, so at the very least I want to know the general ideas floating around). The current project might be something more math focused, something more programming focused, or something physical like building a robot / test environment / computing cluster.

oak olive Jan 30, 2022, 5:05 AM

#

It seems extremity interesting

#

You are really fortunate

upper bluff Jan 30, 2022, 7:05 AM

#

Hello mates, i am trying to do ML with a small synthetic dataset but i cant get more than 0.64 accuracy lemon_angrysad can someone please guide me out, i have tried literally everything, neural networks, various regressions and boosting but idfk why i cant get more than 0.64

#

i would be very grateful

upper bluff Jan 30, 2022, 7:20 AM

#

this is the data, i have tried plotting it in many different ways too

📎 final_train.csv

#

https://cdn.discordapp.com/attachments/837554241743749151/937005272436727868/unknown.png

#

almost all the features have a similar plot like this, with different scales

nova pollen Jan 30, 2022, 7:21 AM

#

oof harsh, not even column labels

upper bluff Jan 30, 2022, 7:21 AM

#

yeaaaa ;-;, purely synthetic

#

i did a correlation plot: but i kind of find it overwhelming to start with

#

https://cdn.discordapp.com/attachments/828325357102432327/937226744593793044/D8dfEjfhWLihwAAAABJRU5ErkJggg.png

nova pollen Jan 30, 2022, 7:22 AM

#

do you know what accuracy you're expecting? 0.64 itself is quite meaningless unless you can get a gauge of what's possible

upper bluff Jan 30, 2022, 7:23 AM

#

the baseline, which they told me is 0.63 AUC score, many of my classmates have gotten 0.66 or even 0.68

#

many people are suggesting me to select features/feature engineering but there are so many different combinations, i dont know where to start

nova pollen Jan 30, 2022, 7:24 AM

#

mm feature engineering is much harder without column labels

#

what AUC is yours at?

upper bluff Jan 30, 2022, 7:25 AM

#

i achieved a maximum of 0.638 on the testing dataset, which was predicted by the model which gave 0.641 AUC when i was doing KFold with my training dataset

nova pollen Jan 30, 2022, 7:28 AM

#

mm ill give it a try

upper bluff Jan 30, 2022, 7:30 AM

#

nova pollen mm ill give it a try

ohhh thank you so much, i have a deadline till today midnight (after 12 hours) if you can even provide me with a direction to move forward in, i will be very grateful

nova pollen Jan 30, 2022, 7:31 AM

#

alright

#

if you could give a rundown of what you've tried thatd be great too

upper bluff Jan 30, 2022, 7:33 AM

#

sure:

Neural Networks:

from keras.models import Sequential
from keras.layers import Dense
from sklearn.metrics import roc_auc_score
from tensorflow.keras.metrics import AUC
import os
model = Sequential()
model.add(Dense(26,input_dim=15, activation='relu'))
#model.add(Dense(14, activation="relu"))
model.add(Dense(13, activation="relu"))
# model.add(Dense(8, activation="relu"))
model.add(Dense(1,activation="sigmoid"))
model.compile(loss="binary_crossentropy",optimizer="adam", metrics=['accuracy'])
model.fit(train_X,train_y,epochs=10,batch_size=10,verbose=0)

tried many different combinations of number of nodes and layers, epoch and batch_size values but this was the relatively best situation where i got around .63 auc score

#

tried many types of correlation plots, like this one:

#

and the one i sent earlier:

#

https://cdn.discordapp.com/attachments/828325357102432327/937226744593793044/D8dfEjfhWLihwAAAABJRU5ErkJggg.png

#

then i tried selectKBest features, with 2 different algos

#

i got this plot with mutual_info_regression

#

#

this with chi2:

#

#

and then, finally i have tried LGBMRegressor, TabNetRegressor, XGBRegressor, LogisticRegressor the accuracy doesnt move. Ranges from 0.58 to 0.62

#

thats all i have till now

stone marlin Jan 30, 2022, 8:48 AM

#

Re my problem above: there is no current advertisement AFAIK, it's mostly a pretty open question. It pretty much has trip data: when a person left a station or arrived at a station, that's a big chunk of it. I think they just want to see what you'd do with it, who knows.

It's fine if y'all don't have time or can't be bothered, I wanted to share it in case anyone found it interesting! I can keep them to myself next time, haha, sorry about that!

iron basalt Jan 30, 2022, 9:12 AM

#

stone marlin Re my problem above: there is no current advertisement AFAIK, it's mostly a pret...

No, it's fine, it's an interesting problem. Without advertisement data and budget data it's pretty hard to tell, since it's a resource allocation problem. So really it just comes down to personal priors / marketing knowledge in general (i'm not into marketing so my priors are really lame, like try the most densely populated for the most eyes (maybe that's what they actually want for 1 (and for 2 probably they want to see if you can find the pattern of "pleasure-cyclist" / where they are))).

stone marlin Jan 30, 2022, 9:17 AM

#

Yeah, I think the lack of advertisement data + pretty much any kind of budgeting data is the "hard" part of this problem. If we had that, it kind of becomes an easier optimization problem. I sorta liked it because with a 1.5 hour limit I think it was trying to see how I thought to EDA data and what stuff I could come up with. I dunno if this is a good interview problem, but it was a fun problem to do.

#

Like, idk how this actually reflects the job or how well I can do it, but I thought it was sort'a neat to go through.

iron basalt Jan 30, 2022, 9:17 AM

#

(with mobile apps being required to use say the bikes, advertisement cost goes near zero (which is why every company wants you to download their app), and your location is given by the phone (GPS))

#

(which also why mobile apps don't even care, they just spam advertisements)

#

(answer is everywhere then)

stone marlin Jan 30, 2022, 9:18 AM

#

Yeah, I don't think Divvy [the actual company the data is from, which is NOT the company I am applying for] does any advertising. Or, I haven't seen it.

iron basalt Jan 30, 2022, 9:20 AM

#

It's not an easy optimization problem even with budget and advertisement result data (because people are dynamic and can feel overly advertised to, budget changes over time, etc))

stone marlin Jan 30, 2022, 9:20 AM

#

Yeeeep. I feel this is sort of similar to a lot of marketing problems given to DS, where everything is real loosey-goosey.

iron basalt Jan 30, 2022, 9:21 AM

#

With that data I would maybe throw a bandit swarm at it (it's a multi-arm bandit problem).

stone marlin Jan 30, 2022, 9:22 AM

#

Yeah, that's a plan. Like, pick out most probable spots and try stuff out. Otherwise it's a bandit with a ton of arms.

iron basalt Jan 30, 2022, 9:23 AM

#

The bandit swarm will try things and adapt to the result. It smartly picks what to try out and balance risk/reward/exploration/exploitation.

#

The feedback loops gives even more useful data, but it's harder than "normal" ML.

#

Or by hand / manual would be A/B testing.

stone marlin Jan 30, 2022, 9:24 AM

#

In this business-case though, they prob wouldn't want MAB, since it'd be sort of giving people discounts in places and then having to take it away. So you're still prob gonna wind up trying to find optimal paths.

wicked grove Jan 30, 2022, 9:25 AM

#

iron basalt The bandit swarm will try things and adapt to the result. It smartly picks what ...

Hello, im stuck on a transfer learning problem . Im classifying images using vgg19
I have used kfold to check the accuracy

#

I'm not quite sure if what im doing is correct tho, can you please help me out

iron basalt Jan 30, 2022, 9:26 AM

#

stone marlin In this business-case though, they prob wouldn't want MAB, since it'd be sort of...

It would not surprise me if the bandit swarm picks up on people getting pissed for the discounts getting taken away and such and tries to exploit the psychology somehow.

#

I have seen it do things like that before in simulation.

#

And other stuff ofc, like break the physics engine.

#

(it's pretty useful for debugging stuff)

stone marlin Jan 30, 2022, 9:28 AM

#

Yeah, there's not gonna be a "real solution" to this problem, but in my mind it came down to: finding the most reasonable stations to try this on, and then choosing those to test the discounts or whatever. Which is pretty much the deal here with MAB.

iron basalt Jan 30, 2022, 9:28 AM

#

Yeah, anything that's not like advertise where nobody is at.

#

You could at least know what not to do probably.

stone marlin Jan 30, 2022, 9:30 AM

#

Yep. I'd be interested to see what other candidates' solutions looked like, given that even basic EDA takes longer than one thinks.

iron basalt Jan 30, 2022, 9:30 AM

#

Yeah speed seems key here. Like are you already comfy with your tools.

stone marlin Jan 30, 2022, 9:32 AM

#

Yeah, if I didn't know pandas / sklearn / whatever, it would'a probably taken an hour just to plot all the stuff out, haha.

iron basalt Jan 30, 2022, 9:32 AM

#

It's a solid way to weed people out. Even if they don't get amazing conclusions, if they can't pull up anything in time, they did not practice.

stone marlin Jan 30, 2022, 9:33 AM

#

I think that's the case. The job is for a Senior Data Scientist (whatever that means) but this isn't all that different from the things I had to do when I was an entry-level DS. It'll prob be the case that the tech interview will be more difficult, and they'll use this as a jumping-off-point.

iron basalt Jan 30, 2022, 9:33 AM

#

(and you are not hiring, in this case, for someone to first spend a week getting into it)

stone marlin Jan 30, 2022, 9:35 AM

#

Yep! I'll update a little later on what they ask me about it, or if anything harder comes about, or anything like that. Another company I interviewed with had a problem which was literally "Do you know what ARIMA is?" so I didn't share that one here. :''']

lapis sequoia Jan 30, 2022, 11:34 AM

#

y=mx+c is the formula to find the linear regression line, right ?
And here c means the the co ordinate at y-axis from where the regression line starts, right?

sour shoal Jan 30, 2022, 11:44 AM

#

Anyone know of any PCA projects I could do? Like I was thinking eigenface type facial recognition but I would rather do something else. Something to do with making useful predictions with data.

lapis sequoia Jan 30, 2022, 11:50 AM

#

lapis sequoia y=mx+c is the formula to find the linear regression line, right ? And here ```c`...

line starts seems like a vague way to put it, it's a line, if your data(x) is negative, y will go below c.

#

but yeah at y axis, value of y will be c since x is 0 there.

spring marsh Jan 30, 2022, 11:53 AM

#

lapis sequoia y=mx+c is the formula to find the linear regression line, right ? And here ```c`...

well basically y=c when x=0

#

c is the intercept of the line at y axis

spring marsh Jan 30, 2022, 11:58 AM

#

sour shoal Anyone know of any PCA projects I could do? Like I was thinking eigenface type f...

How about something estimating the growth of world GDP in 2022 provided there is a new strain of corona and one case where there is not one like a comparison you can also use geo plotting to visualize the growth

sour shoal Jan 30, 2022, 11:59 AM

#

spring marsh How about something estimating the growth of world GDP in 2022 provided there is...

how would that use PCA?

spring marsh Jan 30, 2022, 11:59 AM

#

oh shit sorry I didnt see that pca one line

#

too high my bad

sour shoal Jan 30, 2022, 12:00 PM

#

spring marsh oh shit sorry I didnt see that pca one line

how is that to do with ai at all lol, i am guessing you thought i asked for a DS project?

sour shoal Jan 30, 2022, 12:00 PM

#

spring marsh too high my bad

dont worry about it

spring marsh Jan 30, 2022, 12:00 PM

#

sour shoal how is that to do with ai at all lol, i am guessing you thought i asked for a DS...

yeah

hollow sentinel Jan 30, 2022, 12:43 PM

#

in simple terms can someone explain to me the difference between value_counts and count()?

#

data["created_at"]
a_new_series_of_not_spamcalls = data[["created_at", "is_blocked"]]

a_new_series_of_not_spamcalls["day"] = data["created_at"].dt.day_name()


#a_new_series_of_not_spamcalls

df_not_blocked_by_week = a_new_series[a_new_series["is_blocked"]==0]

df_not_blocked_by_week

# df_not_blocked_by_week.shape

# df_not_blocked_by_week.groupby(by = df_blocked["day"]).count()

#

i know for certain that df_not_blocked_by_week is (40027459, 3)

#

what i do not understand is that when i groupby it, it's an empty dataframe

#

could it potentially have something to do with the setting by copy warning?

#

it's not an attribute error

#

it's not a type error

#

oh shit

#

maybe i know what the error is

#

i didn't even use the var i created from earlier on lol

#

unfortunately, no that was not the error

#

um i don't have syntax highlighting in jupyter notebook lmao

#

shit

#

i'm a dumbasss

#

no duh it wouldn't work if you used a dataframe that had all 1s as a groupby

#

https://tenor.com/view/horny-bonk-gif-22415732

Tenor

#

yeah now it works just fine

lapis sequoia Jan 30, 2022, 2:08 PM

#

How to find the best fit line which has the least error

tidal bough Jan 30, 2022, 2:09 PM

#

depends on what error metric you are using. If it's just mean squared error, then an exact solution is possible*, or if you have too many points for an exact solution, something like gradient descent.

beta = (X^T @ X)^(-1) @ X^T @ Y, derived here, say: https://en.wikipedia.org/wiki/Linear_regression#Least-squares_estimation_and_related_techniques

night gorge Jan 30, 2022, 2:09 PM

#

I was trying to learn Kmeans clustering of unsupervised data. How can we select number of clusters from Elbow diagram? I think it will be point with minimum angle. Am I right?
for eg, is it 2 on above diagram?

mild dirge Jan 30, 2022, 2:18 PM

#

seems like it yeah

#

as long as the inertia is low enough and the number of clusters is also low

#

It's is the point in the graph where the curve bends from high slope to low slope, this is what I find everywhere, and it seems a bit subjective

#

But in your graph it's pretty visibly at 2 clusters

night gorge Jan 30, 2022, 2:20 PM

#

I am confused between 2 and 3

night gorge Jan 30, 2022, 2:20 PM

#

mild dirge But in your graph it's pretty visibly at 2 clusters

👍

mild dirge Jan 30, 2022, 2:21 PM

#

basically just choose the sharpest angle

modern cypress Jan 30, 2022, 3:42 PM

#

#

Hey can anyone explain this error?

#

epochs = 5
x_train = np.array(x_train, dtype='object')
y_train = np.array(y_train)

model.compile(optimizer='adam', 
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=[keras.metrics.SparseCategoricalAccuracy()])
model.fit(x_train, y_train, epochs=epochs)```

#

ValueError: could not broadcast input array from shape (400,400,3) into shape (400,400)

serene scaffold Jan 30, 2022, 3:43 PM

#

@modern cypress what is x_train before?

#

to be clear, it looks like you're trying to write over x_train as an array version of whatever it is. I'm asking about what you're passing to np.array(...), not the result of that.

modern cypress Jan 30, 2022, 3:45 PM

#

x = []
y = []

for features, label in all_data:
    img = tf.convert_to_tensor(features, dtype=tf.float32)
    x.append(img)
    y.append(label)```

#

Images here, and then I used train_test_split to split it to x_train, x_test

#

(I'm slightly confused, because this was working earlier)

merry wadi Jan 30, 2022, 3:47 PM

#

Is there a way I can find which column contains the value of another column?

For example I have columns 1,2,3,4,5 and I want to see which of the columns A,B,C,D has the same value as the first columns for every row

modern cypress Jan 30, 2022, 3:50 PM

#

serene scaffold to be clear, it looks like you're trying to write over `x_train` as an array ver...

Could it be due to the .convert_to_tensor ?

serene scaffold Jan 30, 2022, 3:51 PM

#

merry wadi Is there a way I can find which column contains the value of another column? F...

is this a dataframe? please do print(df.head().to_dict('list')) and paste the raw text into this chat.

serene scaffold Jan 30, 2022, 3:51 PM

#

modern cypress Could it be due to the .convert_to_tensor ?

does each image have the same shape?

modern cypress Jan 30, 2022, 3:52 PM

#

Yes sir, I resize all the images before they're put here image.resize((400, 400))

serene scaffold Jan 30, 2022, 3:52 PM

#

modern cypress Yes sir, I resize all the images before they're put here `image.resize((400, 400...

alright. tensors are a lot like numpy arrays anyway--why are you trying to convert it?

modern cypress Jan 30, 2022, 3:55 PM

#

I was having some errors earlier, I did some googling and saw someone said to include this as the solution so I tried it out and it worked

#

However, I don't remember the error, I'll try find that out

#

Oh right: ValueError: Input 0 of layer sequential_1 is incompatible with the layer: : expected min_ndim=4, found ndim=2. Full shape received: (None, 1)

modern cypress Jan 30, 2022, 3:58 PM

#

serene scaffold alright. tensors are a lot like numpy arrays anyway--why are you trying to conve...

I get this error without converting to tensor

serene scaffold Jan 30, 2022, 3:59 PM

#

modern cypress I get this error without converting to tensor

but the error message you posted earlier points to the line where you pass it to np.array

modern cypress Jan 30, 2022, 4:01 PM

#

Yes, sorry if I am confusing you. I was just explaining why I converted it to tensor

#

I'm quite new to this, so there may likely be errors in my thinking

serene scaffold Jan 30, 2022, 4:02 PM

#

@modern cypress that's okay lemon_hyperpleased

modern cypress Jan 30, 2022, 4:18 PM

#

If anyone has any pointers, I'd appreciate it

opaque oasis Jan 30, 2022, 4:24 PM

#

Anyone here use python

#

With power bi

serene scaffold Jan 30, 2022, 4:33 PM

#

@opaque oasis you have to ask your actual question about using python with power bi, or no one will volunteer to help.

modern cypress Jan 30, 2022, 4:36 PM

#

modern cypress

If I am reading this error correctly, it's saying that I'm trying to create a ndarray from ragged nested sequences, however when I go through x_train one by one and print it's shape I can see that they are all the same (400,400,3)

#

So surely this is fixed, and not jagged? Especially if I am resizing them earlier to fixed dimensions image.resize((400, 400))

mild dirge Jan 30, 2022, 4:40 PM

#

Is there a single image that is a different dimension?

#

@modern cypress

modern cypress Jan 30, 2022, 4:42 PM

#

As far as I can tell, no. All the prints have the exact same shape

mild dirge Jan 30, 2022, 4:42 PM

#

how many prints?

#

do:

for x in x_train:
  if x.shape != (400, 400, 3):
    print('wrong dimensions')

modern cypress Jan 30, 2022, 4:43 PM

#

Oh!

#

Interesting

mild dirge Jan 30, 2022, 4:43 PM

#

print the x.shape

#

if it's wrong

modern cypress Jan 30, 2022, 4:43 PM

#

(400, 400)

mild dirge Jan 30, 2022, 4:44 PM

#

so it's greyscale or something

lapis sequoia Jan 30, 2022, 4:44 PM

#

The outputs are the model's predictions which you can also monitor and measure. This should include an understanding of the deployment of different model versions to help you understand how different versions perform. You should also consider performing correlation analysis to understand how changes in your inputs affect your model outputs. And again, this should be done on slices of your data, for example, correlation analysis can help you detect how seemingly harmless changes in your inputs cause prediction failures.
Can someone expand on this? For example by giving an example why to do correlation analysis. I am not sure why to do correlation analysis

modern cypress Jan 30, 2022, 4:44 PM

#

Let me clear output and try again real quick

civic stone Jan 30, 2022, 4:44 PM

#

Hello Everyone,
i need your help and support
i want to use clustering algorithms for some online labs
so i need a dataset that contains online laboratories description
the dataset size that i need should be more than 100 labs
where i can find it any suggestions please ?

mild dirge Jan 30, 2022, 4:46 PM

#

lapis sequoia > The outputs are the model's predictions which you can also monitor and measure...

You could add noise to separate dimensions and see how it affects the output, if it classifies it differently (or in case of regression how much the output changes)

merry wadi Jan 30, 2022, 4:46 PM

#

serene scaffold is this a dataframe? please do `print(df.head().to_dict('list'))` and paste the ...

{'1': [1, 0, 0], '2': [0, 1, 0], '3': [0, 0, 0], '4': [0, 0, 1], 'A': [0, 0, 0], 'B': [0, 1, 0], 'C': [0, 0, 0], 'D': [1, 0, 1]}

mild dirge Jan 30, 2022, 4:46 PM

#

It can give you insight into which variables are important for prediction

lapis sequoia Jan 30, 2022, 4:47 PM

#

mild dirge You could add noise to separate dimensions and see how it affects the output, if...

What you mean by separate dimensions?

mild dirge Jan 30, 2022, 4:47 PM

#

Like if you base your prediction of an animal on it's weight, height, average color etc.

#

you try add a bit of noise to color and see if that changes the prediction

#

If it doesn't change anything, that might mean color by itself does not have a lot of impact on the prediction

#

but that's just a really general example

lapis sequoia Jan 30, 2022, 4:48 PM

#

correlation analysis can help you detect how seemingly harmless changes in your inputs cause prediction failures.

#

This part about harmless changes isn't clear. Does he think that if input value is disturbed a little bit, because model overfitted, then he will fail to give correct prediction?

#

@mild dirge

mild dirge Jan 30, 2022, 4:49 PM

#

#

This is a pretty common example

#

Only a tiny tiny bit of noise is added, and the prediction changes completely

modern cypress Jan 30, 2022, 4:50 PM

#

mild dirge so it's greyscale or something

Thank you so much 🙏 I used the small loop you sent to find the grayscale images

mild dirge Jan 30, 2022, 4:50 PM

#

mild dirge Only a tiny tiny bit of noise is added, and the prediction changes completely

This is something you want to avoid

mild dirge Jan 30, 2022, 4:50 PM

#

modern cypress Thank you so much 🙏 I used the small loop you sent to find the grayscale images

awesome haha

lapis sequoia Jan 30, 2022, 4:50 PM

#

correlation analysis can help you detect how seemingly harmless changes in your inputs cause prediction failures.
This part about harmless changes isn't clear. Does he think that if input value is disturbed a little bit, because model overfitted, then he will fail to give correct prediction?
@mild dirge

mild dirge Jan 30, 2022, 4:50 PM

#

yeah something like that

mild dirge Jan 30, 2022, 4:51 PM

#

mild dirge

Look at this

#

the noise is very small, but completely changes the outcome

#

it's called an adversarial attack

#

your network should be robust against this

lapis sequoia Jan 30, 2022, 4:52 PM

#

Yeah I understand that.

#

Thanks for explantion

#

@mild dirge Can I DM you about something?

mild dirge Jan 30, 2022, 4:53 PM

#

If it's confidential sure, otherwise feel free to ask here 🙂

opaque oasis Jan 30, 2022, 4:59 PM

#

Want to use python as source for hourly or daily refresh to a rest api and then use power query

#

To connect to the data frame and

#

Refresh from the service

#

My report dashboard

#

I think it doable using a personal data gate way

lapis sequoia Jan 30, 2022, 5:32 PM

#

Is least squared method and total square method the same ?

low spear Jan 30, 2022, 6:04 PM

#

anyone can help?

#

i traced the error and it's due to len(x_roi)==0 that's why i get none but i know it shouldn't be returned as none and i suspect it's due to best_iou < 0.1 and i stuck there. still don't understand why my best_iou is smaller than it should be (got 0.001~0.06 instead)

serene scaffold Jan 30, 2022, 6:05 PM

#

@low spear your Y1 is None, for some reason.

#

so you'll have to check why calc_iou returned None for that value.

low spear Jan 30, 2022, 6:06 PM

#

https://github.com/RockyXu66/Faster_RCNN_for_Open_Images_Dataset_Keras/blob/master/frcnn_train_vgg.ipynb specifically used this code but the only difference is using my own dataset with the size 1137 * 640

GitHub

Faster_RCNN_for_Open_Images_Dataset_Keras/frcnn_train_vgg.ipynb at ...

Faster R-CNN for Open Images Dataset by Keras. Contribute to RockyXu66/Faster_RCNN_for_Open_Images_Dataset_Keras development by creating an account on GitHub.

merry wadi Jan 30, 2022, 6:49 PM

#

@serene scaffold were you able to see the dictionary I sent in

serene scaffold Jan 30, 2022, 6:55 PM

#

@merry wadi Yeah, sorry. Are you trying to compac column one with column A, 2 with B, etc?

#

*compare

merry wadi Jan 30, 2022, 6:57 PM

#

serene scaffold <@301098659641163777> Yeah, sorry. Are you trying to compac column one with colu...

All good and I’m trying to find if A matches up with column 1,2,3,4. If B matches up with 1,2,3,4 etc.

And be able to update that column with a new value using np.where or apply

civic stone Jan 30, 2022, 7:03 PM

#

civic stone Hello Everyone, i need your help and support i want to use clustering algorith...

😟 Nobody have any idea ?

tropic matrix Jan 30, 2022, 8:19 PM

#

tropic matrix hm ok, I'll try one hot encoding the item names again on another note, is there ...

@mild dirge (sorry for pinging, I didn't see if you responded to this)

hollow sentinel Jan 30, 2022, 8:21 PM

#

hey um quick question

#

i can't seem to change the size of a matplotlib grouped bar chart

#

labels = ['Monday' , 'Tuesday' , 'Wednesday' , 'Thursday' ,
       'Friday' , 'Saturday' , 'Sunday']


spam_calls_by_day = [1872864 , 2165318 , 2786778 , 2563185 ,
             1930462 , 770719 , 369446]

#Monday - 5843038
#Tuesday - 6128272
#Wednesday - 7769496
#Thursday - 7638595
#Friday - 6662231
#Saturday - 3393077
#Sunday - 2592750

not_spam_calls_by_day = [5843038, 6128272, 7769496, 7638595, 6662231, 3393077, 2592750]


x = np.arange(len(labels))  # the label locations

width = 0.4  # the width of the bars
fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, spam_calls_by_day, width, label='Spam Calls By Day')
rects2 = ax.bar(x + width/2, not_spam_calls_by_day, width, label='Ham Calls By Day')



# Add some text for labels,x-axis tick labels

ax.set_ylabel('Amount of Calls')
ax.set_xlabel('Days of the Week')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()

# Display plot


plt.setp(ax.get_xticklabels(), rotation=30, ha='right')

plt.figure(figsize=(20,20))

fig.tight_layout()

plt.show()

#

lol i feel dumb

#

figured it out

nova smelt Jan 30, 2022, 9:50 PM

#

Hey, i am currently working with time series data and i am following he tensorflow time series forecasting tutorial. In order to use the datetime of the measurements as inputs to the NN they apply sine and cosine transofrm to the dates. It makes sense to me why to do that and also why sine and cosine is needed for that but what i dont really understand is why you need both sine and cosine. Would appreciate if someone could explain the idea and concept of that or maybe knows a resource where i can find some more information about that

green niche Jan 31, 2022, 12:14 AM

#

is there a math prerequisite for reading this book^

#

Is algebra 2 good enough?

dusk tide Jan 31, 2022, 12:41 AM

#

Is this book covers everything thing that we need to know in ML and DL??

#

Any book for DL which covers each and everything from beginning to end??

iron basalt Jan 31, 2022, 12:52 AM

#

dusk tide Is this book covers everything thing that we need to know in ML and DL??

No. No.

iron basalt Jan 31, 2022, 12:52 AM

#

green niche is there a math prerequisite for reading this book^

You should at least be familiar with vectors.

green niche Jan 31, 2022, 12:52 AM

#

iron basalt You should at least be familiar with vectors.

vector calculus or just the basic notion of vectors

iron basalt Jan 31, 2022, 12:53 AM

#

green niche vector calculus or just the basic notion of vectors

Just the basic notion.

#

Vector calculus is one of the chapters.

#

Also should know some calculus*

worldly dawn Jan 31, 2022, 1:04 AM

#

dusk tide Any book for DL which covers each and everything from beginning to end??

like https://www.deeplearningbook.org/ ?

prime hearth Jan 31, 2022, 2:42 AM

#

hello, I implemented linear regression from scratch with no libraries except numpy and pandas

#

how could I validate my linear model with my test data?

prime hearth Jan 31, 2022, 3:02 AM

#

i was thinking of passing my test data to my new slope and bias and check the cost at each iteratino and which ever is lowest cost i take the 1 - cost to get accuracy for testing data

sinful scarab Jan 31, 2022, 3:09 AM

#

if I use

torch.save(model, PATH)

and subsequently

model = torch.load(PATH)

is there a way I can infer the architecture of my model from the model object? If not, or if not advised to do so, what is your recommendation on saving the architecture of my model? I'm considering copying the code block in my colab notebook as a text. If I choose to do so - is there a nice way to do that via python code?

dusk tide Jan 31, 2022, 3:19 AM

#

worldly dawn like <https://www.deeplearningbook.org/> ?

The book seems good but the derivation and maths seems difficult

#

Anyway to make it easy understandable??

prime hearth Jan 31, 2022, 3:31 AM

#

youtube

#

also, there may be more one than way of learning but I would say trying to read a ML book isnt great way to learn as also requires strong math background. Practical experience is needed to see how the math applies, thats why I mentioned youtube because there lots of great tutorials that shows applicatino of whats inside textbook.

#

And trying to learn vector calclulus and other math for some can become quicly demotivating as it can feel like so much to learn. This is not best way to learn, instead focus on particular alogirhtm of machine learning or deep learning and only then if need to know certain math then can learn about it just the basics needed.

worldly dawn Jan 31, 2022, 4:28 AM

#

dusk tide Anyway to make it easy understandable??

Short of paying someone to explicitly write a course completely tailored to your situation, you will have a hard time something like that.
I would recommend to try reading from these sources and diving deeper as needed. That will require a bit more effort on your end, but it will eventually pay off.
But do keep in mind that while interesting and rewarding, DL is a complex piece of technology

stark zenith Jan 31, 2022, 4:33 AM

#

thing about math is, like everything else, you're gonna have to push through

#

math is a harder push than python, that's for sure

lapis sequoia Jan 31, 2022, 5:57 AM

#

hey guys, ive made an AI Jarvis using some simple modules, and speech recognition. I also designed a GUI for it in Qt Designer but when i convert it from ui file to py file and open in python, all my images are not visible and the window get white... i thought it was hanged but i waited for an hour still nothing happened... pls help me out

#

i also saw some solutions from GitHub, it says give the picture path to the folder where you have saved your pictures... so how do i paste a picture path in a folder??? shall i make a shortcut and put the path in it??? pls helpp

lapis sequoia Jan 31, 2022, 6:38 AM

#

Hey guys got a question if I wanna make an ai that runs 24h/d do I need do turn my pc 24h/d?

worldly dawn Jan 31, 2022, 6:54 AM

#

lapis sequoia Hey guys got a question if I wanna make an ai that runs 24h/d do I need do turn ...

If some code has to run 24h/d, it needs the capability to do so

#

But that has little to do with AI/ML

lapis sequoia Jan 31, 2022, 7:08 AM

#

helpppp plssss

#

pls

#

plpslpls

worldly dawn Jan 31, 2022, 7:11 AM

#

lapis sequoia helpppp plssss

You may want to read more on https://stackoverflow.com/help/how-to-ask about how to ask great questions. It will help increase your odds of getting a reasonable answer

lapis sequoia Jan 31, 2022, 7:12 AM

#

bruh

#

ok bro fine

#

btw how to make a module?

worldly dawn Jan 31, 2022, 7:13 AM

#

lapis sequoia btw how to make a module?

https://docs.python.org/3/tutorial/modules.html
But that's the wrong channel for it...

lapis sequoia Jan 31, 2022, 7:13 AM

#

oh

#

so where do i get help?

#

in # python-general?

#

or where?

worldly dawn Jan 31, 2022, 7:16 AM

#

yeah either that or #❓｜how-to-get-help

lapis sequoia Jan 31, 2022, 7:16 AM

#

oh

#

ty

latent vector Jan 31, 2022, 9:25 AM

#

Here is the free article that will take you through the basics of data science, BY nm.dev’

#

https://link.medium.com/TTDxW5xH7mb

Medium

Introduction to Data Science Workbook

The principal purpose of Data Science is to find patterns within data. It uses various statistical techniques to analyse and draw insights…

#

https://link.medium.com/BKgMZ0wH7mb

Medium

Introduction to Data Science

Data science is the domain of study that deals with vast volumes of data using modern tools and techniques to find unseen patterns, derive…

acoustic forge Jan 31, 2022, 10:46 AM

#

Any experts in Transformer models? Specifically Huggingface BERT/BART models

lapis sequoia Jan 31, 2022, 10:46 AM

#

acoustic forge Any experts in Transformer models? Specifically Huggingface BERT/BART models

not sure about experts but some people here do know about transformers.

acoustic forge Jan 31, 2022, 10:49 AM

#

Basically, I am trying to train a transformer model to generate a meta description (meta tag in web development) based on <h1>, <h2> and <p> tags.

So this means that I need the model to take multiple inputs of course. I read up on it, and I would need to concatenate these tags somehow and feed it into the model

#

Just wondering how to do this smartly

worthy phoenix Jan 31, 2022, 11:33 AM

#

suppose i have a list of np arrays appeneded to a list how can i save that list of np arrays as a single image?

grave frost Jan 31, 2022, 11:52 AM

#

acoustic forge Basically, I am trying to train a transformer model to generate a meta descripti...

that's a a seq2seq task - do you have any dataset? you can finetune a model from HuggingFace models

acoustic forge Jan 31, 2022, 11:54 AM

#

grave frost that's a a seq2seq task - do you have any dataset? you can finetune a model from...

Yes - I do have a dataset containing four columns

"Meta" (The meta description that I want to generate
"h1"
"h2"
"p"
The three last ones are the ones that I want to somehow combine into an input that a model can then generate a meta description for

acoustic forge Jan 31, 2022, 11:54 AM

#

grave frost that's a a seq2seq task - do you have any dataset? you can finetune a model from...

I have read on HuggingFace, but I cannot find anywhere, where I can input multiple features/columns - Do you have any good resources on that

grave frost Jan 31, 2022, 11:55 AM

#

acoustic forge Yes - I do have a dataset containing four columns 1. "Meta" (The meta descripti...

what do colums 2-4 contain?

acoustic forge Jan 31, 2022, 11:56 AM

#

All are strings.

h1 will contain html <h1> strings
h2 will contain html <h2> strings
p will contain html <p> strings.

An example of a p instance:

'Special days or every day – the simple but elegant FÖRNUFT 24-piece cutlery set will add extra flavour to your table settings. The cutlery is made of stainless steel that is durable and easy to clean. Cutlery with a clean design goes well with many different kinds of table settings. Includes: Fork, knife, spoon, teaspoon and dessert/salad fork, 4 of each. The material in this product may be recyclable. Please check the recycling rules in your community and if recycling facilities exist in your area. Maria Vinka Stainless steel Stainless steel, Stainless steel Dishwasher-safe. For the flatware to be easy to clean and to reduce the risk of corrosion, always rinse off the remains of any food immediately. Wash, rinse and dry your flatware before using it for the first time. Width: 6 ¾ " Height: 1 ¼ " Length: 8 ¾ " Weight: 2 lb 1 oz Package(s): 1 Stainless steel is found in everything from building structures and cars to sinks and knives. It’s easy to see why it has so many uses. Stainless steel is hard and durable and has good resistance to corrosion – namely rust. It generally has a low nickel content, and for IKEA products we mainly use stainless steel that’s nickel-free. Like many other metals, it can be recycled again and again to become new, hardwearing items – without losing its valuable properties. All cutlery needs some care to maintain its original finish. Rinse cutlery after use to remove any scraps of food, even if you plan to wash it later. If you use a dishwasher, choose a shorter cycle, and dry the cutlery by hand. That way you avoid rust spots caused by food that sticks to the cutlery, or by leaving cutlery too long in the hot, humid air inside the machine. Good advice doesn\'t have to be expensive.'

#

Example of a h1 instance:
'FÖRNUFT20-piece flatware set, stainless steel'

#

And example of a h2 instance:
'Product details Measurements Reviews Material Quality'

#

So I want to somehow include these three strings in a model to generate a meta description

grave frost Jan 31, 2022, 11:58 AM

#

acoustic forge So I want to somehow include these three strings in a model to generate a meta d...

if you crop the para,

#

then make them into a single sentence - delimited by a special character like <DELIM>

#

pass that service token into the model when building it (there would be a list, indlcuding <END>, <START> etc. just add your own to that list)

acoustic forge Jan 31, 2022, 12:00 PM

#

Okay, so it would be like

FÖRNUFT20-piece flatware set, stainless steel<DELIM> Product details Measurements Reviews Material Quality <DELIM> {Long Paragraph}

grave frost Jan 31, 2022, 12:00 PM

#

acoustic forge Okay, so it would be like ``` FÖRNUFT20-piece flatware set, stainless steel<DEL...

yes

#

no

grave frost Jan 31, 2022, 12:01 PM

#

acoustic forge Okay, so it would be like ``` FÖRNUFT20-piece flatware set, stainless steel<DEL...

it would be something like this in the tokenized form

["FÖRNUFT20-piece", "flatware set", "stainless steel", "<DELIM>" "Product details",....]

a list after tokenization

acoustic forge Jan 31, 2022, 12:02 PM

#

Ah, alright. So I can use an autotokenizer for that, somehow?

grave frost Jan 31, 2022, 12:02 PM

#

dataset can use any delimiter

FÖRNUFT20-piece flatware set, stainless steel|#| Product details Measurements Reviews Material Quality |#| {Long Paragraph}

#

ye, you would have to modify it

#

or, if you're too lazy - you could just add a delimiter in the dataset and hope the model learns it itself

acoustic forge Jan 31, 2022, 12:04 PM

#

So I could essentially join the three columns and add |#| between them. And then I can somehow tell the bert model to understand |#| as the delimiter?

grave frost Jan 31, 2022, 12:04 PM

#

it should pick that up on its own

acoustic forge Jan 31, 2022, 12:05 PM

#

Alright - I'll give that a go!

acoustic forge Jan 31, 2022, 12:11 PM

#

grave frost it should pick that up on its own

So I combine it and I have something like this:

'FÖRNUFT20-piece flatware set, stainless steel |#| Product details Measurements Reviews Material Quality |#| Special days or every day – the simple but elegant FÖRNUFT 24-piece cutlery set will add extra flavour to your table settings. The cutlery is made of stainless steel that is durable and easy to clean. Cutlery with a clean design goes well with many different kinds of table settings. Includes: Fork, knife, spoon, teaspoon and dessert/salad fork, 4 of each. The material in this product may be recyclable. Please check the recycling rules in your community and if recycling facilities exist in your area. Maria Vinka Stainless steel Stainless steel, Stainless steel Dishwasher-safe. For the flatware to be easy to clean and to reduce the risk of corrosion, always rinse off the remains of any food immediately. Wash, rinse and dry your flatware before using it for the first time. Width: 6 ¾ " Height: 1 ¼ " Length: 8 ¾ " Weight: 2 lb 1 oz Package(s): 1 Stainless steel is found in everything from building structures and cars to sinks and knives. It’s easy to see why it has so many uses. Stainless steel is hard and durable and has good resistance to corrosion – namely rust. It generally has a low nickel content, and for IKEA products we mainly use stainless steel that’s nickel-free. Like many other metals, it can be recycled again and again to become new, hardwearing items – without losing its valuable properties. All cutlery needs some care to maintain its original finish. Rinse cutlery after use to remove any scraps of food, even if you plan to wash it later. If you use a dishwasher, choose a shorter cycle, and dry the cutlery by hand. That way you avoid rust spots caused by food that sticks to the cutlery, or by leaving cutlery too long in the hot, humid air inside the machine. Good advice doesn\'t have to be expensive.'

How would you recommend tokenizing this? I saw some special tokenizer apis in the huggingface

#

ecosystem. Is there a special tokenizer I can tell to treat the |#| characters differently?

#

Actually, I see that different tokenizers have different sequence tokens https://huggingface.co/docs/transformers/v4.16.1/en/model_doc/roberta#transformers.RobertaTokenizer

RoBERTa

weak grove Jan 31, 2022, 12:44 PM

#

why my accuracy is not increasing what might be the issue?

grave frost Jan 31, 2022, 12:49 PM

#

acoustic forge So I combine it and I have something like this: ``` 'FÖRNUFT20-piece flatware se...

SEP tokens already exists https://huggingface.co/docs/transformers/v4.16.1/en/model_doc/roberta#transformers.RobertaTokenizer.sep_token

#

refer to this https://huggingface.co/docs/transformers/v4.16.1/en/model_doc/roberta#transformers.RobertaTokenizer.build_inputs_with_special_tokens

dusk tide Jan 31, 2022, 1:07 PM

#

worldly dawn Short of paying someone to explicitly write a course completely tailored to your...

Ok thanks

arctic wedgeBOT Jan 31, 2022, 1:18 PM

#

:incoming_envelope: :ok_hand: applied mute to @zenith tide until <t:1643635727:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

acoustic forge Jan 31, 2022, 1:45 PM

#

grave frost refer to this https://huggingface.co/docs/transformers/v4.16.1/en/model_doc/robe...

So this is how I should do it instead of the |#|

grave frost Jan 31, 2022, 1:58 PM

#

acoustic forge So this is how I should do it instead of the |#|

yh, you don't need to add another token

#

but you would have to do some modifications to instruct the tokenizer to place <SEP> where you want it to be - between the different columns' sequences

acoustic forge Jan 31, 2022, 2:00 PM

#

Alright - I thought perhaps I could do it with a script

acoustic forge Jan 31, 2022, 2:02 PM

#

grave frost but you would have to do some modifications to instruct the tokenizer to place <...

I'll try to look into it, thanks 🙂

wicked grove Jan 31, 2022, 2:05 PM

#

Hello, i trained my model (vgg19) and evaluated using kfold cross validation

#

I get an average accuracy of 82

#

But on when i evaluate on the test set the accuracy is 74

#

Is there anyway to improve the model

acoustic forge Jan 31, 2022, 2:16 PM

#

Hey guys - I have the following after I do datasets.load_dataset

DatasetDict({
    train: Dataset({
        features: ['index', 'url', 'meta', 'title', 'h1', 'h2', 'p'],
        num_rows: 23691
    })
})

I want to tokenize using the following function

tokenizer = BartTokenizer.from_pretrained('facebook/bart-base')
def tokenize_function(example):
    return tokenizer(example["h1"], example["h2"], example["p"], truncation=True)

tokenized_dataset = dataset["train"].map(tokenize_function)

But unfortunately, I get the following error text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).

prime hearth Jan 31, 2022, 3:13 PM

#

@wicked grove if this is for supevised machine learning, then apply regularization, and check which data gave poor accuracy possible outliers or requires feature engineering ( check for correlation and use feature selection techniques)

lapis sequoia Jan 31, 2022, 3:25 PM

#

I can't understand how to find the best fit line with least squared method, can anybody suggest me a good resource for it

mild dirge Jan 31, 2022, 3:27 PM

#

lapis sequoia I can't understand how to find the best fit line with least squared method, can ...

https://www.youtube.com/watch?v=P8hT5nDai6A

YouTube

The Organic Chemistry Tutor

Linear Regression Using Least Squares Method - Line of Best Fit Equ...

This statistics video tutorial explains how to find the equation of the line that best fits the observed data using the least squares method of linear regression.

My Website: https://www.video-tutor.net
Patreon: https://www.patreon.com/MathScienceTutor
Amazon Store: https://www.amazon.com/shop/theorganicchemistrytutor

Disclaimer: Some of t...

▶ Play video

wicked grove Jan 31, 2022, 3:32 PM

#

prime hearth <@696373334119546890> if this is for supevised machine learning, then apply regu...

Yes it's supervised
I have applied dropout
How do i check which data gave poor accuracy??

#

Cause my test is just numpy array which i get after train_test_split

mild dirge Jan 31, 2022, 3:35 PM

#

You should probably try to visualize the test set

#

Maybe it is more difficult than the data used for cross validation

wicked grove Jan 31, 2022, 3:50 PM

#

mild dirge You should probably try to visualize the test set

Rightt,But how can i visualize it 🙈

#

In the final result can i display val acc and test acc?? And what happens if there is a 10% difference between val and test acc

mild dirge Jan 31, 2022, 3:51 PM

#

then you probably overfitted on val data somehow

#

Or there is not enough data to make an accurate prediction on the accuracy

#

maybe average accuracy over 100 runs

wicked grove Jan 31, 2022, 4:01 PM

#

mild dirge then you probably overfitted on val data somehow

Hmm

#

How can i improve that

#

Also can i use cross_ val_score with a cnn model??

mild dirge Jan 31, 2022, 4:03 PM

#

泣いオオカミー ≛ 🏆 ≛ already gave some good suggestions

#

cross validation is not bound to a model type

#

it works on any prediction task

flat sable Jan 31, 2022, 5:11 PM

#

hello i wanna ask you guys should i learn machine learning then neuroscience with robotic or Ai

mild dirge Jan 31, 2022, 5:14 PM

#

flat sable hello i wanna ask you guys should i learn machine learning then neuroscience wit...

neuroscience is only losely related to most machine learning topics and alogirthms

#

you want to maybe start with linear algebra and statistics

flat sable Jan 31, 2022, 5:17 PM

#

mild dirge you want to maybe start with linear algebra and statistics

yeh i'll start with it

#

other question do machine learning do only prediction or also do learning it self by retrieve data training

nova smelt Jan 31, 2022, 5:22 PM

#

Hey, i am using a simple LSTM NN with a Dense output layer that has one unit (i only want to predict on numerical value). Model fitting and so on works but i am confused with the output of model.predict() when i let it predict on one batch.
You can see the prediction in the picture for some reason it is 3 numbers and not just one. the upper matrix is the prediction at index 0 and the number at the bottom is the actual number it should predict

#

lapis sequoia Jan 31, 2022, 5:24 PM

#

mild dirge https://www.youtube.com/watch?v=P8hT5nDai6A

Linear regression line formula is y=mx+c
Here we have to find m and c, after finding m and c
,we have to put them in the y=mx+c equation, now as we know the values of x, so now we will put the values of x in the the y=mx+c equation and we will get the y values.

Now after this we will plot the y and x co ordinates on the graph (these are the predicted values) and draw the regression line. After that we will use R² method to check whether our line is best fit or not, if it's not then we will repeat the whole process again, which will give us new values of m and c, so we will put that new m, c and the original x values in the y=mx+c equation and now we got a regression line which fits in a better way than earlier

Am I right ?

mild dirge Jan 31, 2022, 5:30 PM

#

Not really, if you watch the video it will show that you will need both the x and y values of every datapoint

#

with those you can get the line of least squared (so calculate m and c)

#

It's not an iterative process

#

After you got the line equation, you can predict the y value of new x values you get

flat sable Jan 31, 2022, 5:47 PM

#

mild dirge Not really, if you watch the video it will show that you will need both the x an...

other question do machine learning do only prediction or also do learning it self by retrieve data training

mild dirge Jan 31, 2022, 5:47 PM

#

not sure what you mean

prime hearth Jan 31, 2022, 5:56 PM

#

hello, would like to please ask, I have a dataframe and would like to train test split it but when I use sklearn library it gives me a dataframe - I would like instead it to be 2d array with the numerical values not the whole dataframe, how to do please?

nova smelt Jan 31, 2022, 5:57 PM

#

you can use np.array(dataframe)

prime hearth Jan 31, 2022, 5:58 PM

#

oh okay thanks il try this

#

it worked awesome!

nova smelt Jan 31, 2022, 5:59 PM

#

nice👍

wicked grove Jan 31, 2022, 6:07 PM

#

prime hearth oh okay thanks il try this

heyy, i had a doubt...do i have to save my model before i use model.evaluate on the test set

#

or can i train and then evalueate it

mild dirge Jan 31, 2022, 6:17 PM

#

Don't need to save it

#

as long as you have trained the model you can use it to evaluate data

#

saving is just so you can use it after terminating the programming so you can load it again

wicked grove Jan 31, 2022, 6:28 PM

#

mild dirge as long as you have trained the model you can use it to evaluate data

Ohhh okayy, thank you so muc

nova smelt Jan 31, 2022, 6:37 PM

#

something is going massivley wrong with training my LSTM NN. for some reason the model predicts always the exact same value no matter the input.

shrewd saddle Jan 31, 2022, 7:29 PM

#

hey

stone marlin Jan 31, 2022, 7:58 PM

#

I had another interview problem I wanted to share with y'all. I promise there won't be too many of these! This one was totally different from the previous:

Given the dataset Labeled Faces in the Wild [http://vis-www.cs.umass.edu/lfw/], create a system [sic] which will identify the person's nose and draw a green box around it. Time limit was one hour.

I didn't get through this one as well (I don't usually work with CV) but it was fun to try it out, and I submitted what little I had, haha.

[Edit: The job title was "Data Scientist, Agriculture".]

How would y'all do something like this?

#

My solution, which did not ultimately yield any fruit, was really naive. I did a standard NN-type thing, and then tried to do some LIME stuff to see if there was a "nose"-type feature (spoiler: there wasn't) and then I panicked and tried to do it for eyes which worked a bit better, then I made a box from eyes to a set number of pixels down, and then cut that box in half, basically hoping the nose was in that area.

It was a terrible system. I know now that there's the facial_landmark stuff in cv, haha, but it was honestly a fun time. At least y'all can get a little chuckle at my attempt. :']

mild dirge Jan 31, 2022, 8:24 PM

#

Yeah not sure how I would have done that without these really useful libraries tbh

#

You either already need a network that can detect noses or manually annotate them yourself and somehow train a network to extract the location of a nose

stone marlin Jan 31, 2022, 8:44 PM

#

Yep. I think that it may have been the point of the problem to have already known the libraries and resources, haha, which I def did not.

iron basalt Jan 31, 2022, 9:05 PM

#

stone marlin I had another interview problem I wanted to share with y'all. I promise there w...

Image hash (sliding window) first attempt.

stone marlin Jan 31, 2022, 9:12 PM

#

Huh, I never heard of image hashing, that's pretty cool.

iron basalt Jan 31, 2022, 9:19 PM

#

Well, a bunch of sliding window stuff will work. No need to go all crazy with a deep NN.

#

(Since it's to detect a specific thing)

#

(And only 1 hour)

stone marlin Jan 31, 2022, 9:35 PM

#

I've got a very shallow level of understanding for NN stuff --- slowly increasing, but, you know, haha --- so I'll have to check that method out to see how it works and what it does.

iron basalt Jan 31, 2022, 10:26 PM

#

stone marlin I've got a very shallow level of understanding for NN stuff --- slowly increasin...

Any simple method will work that is hand crafted to detect noses, so OpenCV's built in stuff should work. The reason it should work is because the dataset description basically gives away the answer: "The only constraint on these faces is that they were detected by the Viola-Jones face detector."

#

Viola-Jones is old school face recognition before everyone started using DNNs.

stone marlin Jan 31, 2022, 10:28 PM

#

Hm, lot'sa stuff I dont' know about. I'd prob have given this a glance if I knew it was going to be entirely CV-related, haha.

iron basalt Jan 31, 2022, 10:28 PM

#

Only works on nice clear images.

#

Otherwise it gives way too many false positives.

#

The label being the person's name is ofc not useful, so they really want a specific purpose / hand crafted thing (for noses in this case).

#

(Viola-Jones is specifically hand-crafted filters for faces)

#

Having now read the dataset description, I would just throw OpenCV at it.

#

Although that would not get the best results possible.

sour spindle Feb 1, 2022, 12:07 AM

#

nova smelt something is going massivley wrong with training my LSTM NN. for some reason the...

can you send the code for the model?

nova smelt Feb 1, 2022, 12:08 AM

#

i've got help on another server already thanks tho

lyric ermine Feb 1, 2022, 12:26 AM

#

hi everyone.

iam wondering if somebody knows of an easy way to schedule colab notebooks on your google drive. i would like to let some notebooks run on automatically schedule once per day.

serene scaffold Feb 1, 2022, 12:31 AM

#

lyric ermine hi everyone. iam wondering if somebody knows of an easy way to schedule colab ...

You would have to see if there's a colab API, but I think it's pretty unlikely that they support that, as colab is intended to be a learning environment, not a general purpose host for Python programs.

lyric ermine Feb 1, 2022, 12:32 AM

#

i see, thanks @serene scaffold

glass minnow Feb 1, 2022, 3:03 AM

#

why column "one" has float value and column "two" has integer value please help!!

calm thicket Feb 1, 2022, 3:06 AM

#

they're not the same length

plush jungle Feb 1, 2022, 4:41 AM

#

can someone help me understand the hidden state in RNNs?

#

I just built an RNN and trained it on a dataset with 5 different words, so the inputs are all one hot vectors like

[0,0,1,0,0]```

#

but each cell has a hidden state with 256 weights

#

hidden_size = 256
class MyRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MyRNN, self).__init__()
        self.hidden_size = hidden_size
        self.in2hidden = nn.Linear(input_size + hidden_size, hidden_size)
        self.in2output = nn.Linear(input_size + hidden_size, output_size)
    
    def forward(self, x, hidden_state):
        combined = torch.cat((x, hidden_state), 1)
        hidden = torch.sigmoid(self.in2hidden(combined))
        output = self.in2output(combined)
        return output, hidden```

#

but I'm so confused how it works from here

#

the forward function concatenates the 5 vector to the 256 vector, creating a 261 vector

#

then it passes that to nn.Linear with arguments 261, 256

#

and then nn.Linear does a linear transformation to make the 261 vector into a 256 vector?

#

what exactly is happening here

solemn dawn Feb 1, 2022, 7:14 AM

#

how we apply softmax to sklearn neural network ?

dusk tide Feb 1, 2022, 7:19 AM

#

Anyone has tried O Reilly , packet publication books for data science or ML or DL ??

lapis sequoia Feb 1, 2022, 7:46 AM

#

In linear regression the m and c values will keep changing unless we get the best fit regression line, right ?

timber fable Feb 1, 2022, 8:08 AM

#

Ya

#

@solemn dawn use tensorflow for neural networks

clever moon Feb 1, 2022, 8:10 AM

#

Hie @potent jolt I have got the following data frame and i would like to delete rows where CHILD_DEATHS and WOMEN_DEATHS is equal to zero but when i do that, its deleting all rows with zero in any of the columns

clever moon Feb 1, 2022, 8:11 AM

#

clever moon Hie <@!401346079733317634> I have got the following data frame and i would like ...

#

Sorry if I am at a wrong channel

snow moat Feb 1, 2022, 8:44 AM

#

clever moon Hie <@!401346079733317634> I have got the following data frame and i would like ...

del dataframeName. loc("rowNane")

clever moon Feb 1, 2022, 8:50 AM

#

snow moat del dataframeName. loc("rowNane")

The problem is I don't know the row name. I want to delete all rows where child deaths and women deaths is equal to zero but it's deleting all rows where women deaths or child deaths is equal to zero

fading quarry Feb 1, 2022, 9:08 AM

#

any pyspark expert here??

#

I am trying to insert a dataframe in a table. the dataframe what i have gives me output like this _

Field	Value
id	95
name	N04
parentId	702
parentExternalId	N7
description	WAT..
metadata	{_replir...
source	https://github.co...
Id	43636
createdTime	2021-10-13 11:58:...
lastUpdatedTime
rootId	6978
agregates	null
dataset	null
labels	null

And I have multiple rows here. Now I am trying to insert this data in one of the table which has columns as (PointID, Name, PointParentID, desc). I would like to insert Id column in PointID, name in Name, parentID in PointParentId and description in desc. How can I do this using pyspark sql. I tried below command and it works in inserting a single data in a table with a single column.

df = spark.sql("Insert into `Utilities_66_Demo`.`UtilityProduct` values('Hi')")
df = spark.sql("Select * from `Utilities_66_Demo`.`UtilityProduct`")
df.show()

Same way I have table with 4 columns defined above and a dataframe where I need to insert data to.
But how can i insert a data completely in single call from the dataframe whose output I have shown above. the dataframe I got is from this code:

dfSourceAssets = spark.read.format("xyz.spark.v1") \
   .option("type", "assets") \
   .option("tokenUri", sourceTokenURI) \
   .option("clientId", sourceClientID) \
   .option("clientSecret", sourceClientSecret) \
   .option("baseUrl", sourceBaseURL) \
   .option("project", sourceProject) \
   .option("scopes", sourceScopes) \
   .load()

acoustic forge Feb 1, 2022, 2:00 PM

#

Whenever I run the huggingface trainer api (trainer.train()) my kernel crashes

grizzled stirrup Feb 1, 2022, 2:41 PM

#

I am new to stats in general. I am building a logistic regression model, and a few vairables have a 2.6 or 3.2 correlation coefficient. Is that good since it's a positive relationship?

Also the r-squared score is 26% which is too low, right?

lapis sequoia Feb 1, 2022, 3:00 PM

#

So there are many methods by which we can get the best fit linear regression line, so do we need to learn all the methods or any one ?

grizzled stirrup Feb 1, 2022, 3:09 PM

#

I probally need to just learn all of it. Typing it out has made me realize I really have no idea what I am doing. Any great resources for learning from scratch on these models?

Most google articles seem to think I have a great understanding of models already

mild dirge Feb 1, 2022, 3:53 PM

#

grizzled stirrup I am new to stats in general. I am building a logistic regression model, and a f...

A correlation coefficient should be between -1 and 1, where 1 is a strong positive relation (if one var is big, the other is too) and -1 a strong negative relation (if one var is big, the other is small) and 0 signifies no relation.

#

(probably a bit oversimplified but that's the just of it)

#

If the correlation coefficient for the correlation between an input variable and output is not 0, the input variable could contain useful information for predicting the outcome variable

#

And if it is 0, sometimes it still might be useful information, but in combination with other variables only (no linear relation with the outcome)

#

@grizzled stirrup

#

Does that explain it a bit?

grizzled stirrup Feb 1, 2022, 3:56 PM

#

This was actually very helpful @mild dirge ! Thank you a lot. I was a bit thrown off on why one could come back as more than 1 (2.6 for example) as I did think it had to fall between the -1 and 1 line. However, you cleared many things up! Thank you 🙂

mild dirge Feb 1, 2022, 3:57 PM

#

Also when two input variables have a very strong correlation, that often means that you can basically remove one of the variables, since they contain the similar/same information @grizzled stirrup

#

Did you perhaps calculate the covariance instead of correlation coefficient? that might give you values above 1 and below -1

grizzled stirrup Feb 1, 2022, 4:06 PM

#

Hmm... I am looking at it is saying coefficient estimate and that gives the 2.x number. Perhaps I did do that

mild dirge Feb 1, 2022, 4:53 PM

#

There's plenty of detailed explanations of matplotlib, pandas and numpy to find so if you're really interested you should probably look it up on google/youtube. Why do you think they are useless?

#

@lapis sequoia

#

Or maybe at what part specifically do you think they are lacking?

serene scaffold Feb 1, 2022, 4:54 PM

#

I wrote an explanation of what they do in the pins. Numpy does large amounts of math in batches, which is very useful for data science. Pandas does large amounts of tabular data manipulations in batches, which is also very useful. and matplotlib makes data visualizations, which is useful.

#

(even though the matplotlib api makes me sad)

mild dirge Feb 1, 2022, 5:00 PM

#

Corey schafer has some good tutorials on matplotlib and pandas

#

Just processing a lot of numbers at once

#

like calculating the sum of a vector

#

or adding two vectors together

#

that kinda stuff

#

like an array

#

not immutable

#

but fixed size yes

serene scaffold Feb 1, 2022, 5:03 PM

#

fixed size, not immutable

mild dirge Feb 1, 2022, 5:03 PM

#

and normally also only contains 1 data type

#

whereas lists often can have mixed strings, numbers, etc.

serene scaffold Feb 1, 2022, 5:04 PM

#

also, arrays are intended to make it easy to express "larger" math operations. for example

#

!e

import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
result = a + b
print(result)

arctic wedgeBOT Feb 1, 2022, 5:05 PM

#

@serene scaffold :white_check_mark: Your eval job has completed with return code 0.

[ 6  8 10 12]

serene scaffold Feb 1, 2022, 5:05 PM

#

in "regular python", you would have to do something like [ax + bx for ax, bx in zip(a, b)]

#

and then if it were a 2d array, it would have to be like that, but nested.

dusk tide Feb 1, 2022, 5:06 PM

#

Which book would be better to learn data science from scratch ??

serene scaffold Feb 1, 2022, 5:10 PM

#

np.array returns an array, and the __add__ method of it does element-wise addition, returning a new array

cloud lagoon Feb 1, 2022, 5:15 PM

#

can you use scrapy when you need to interact with the webpage to get the data you need?

I know you can use selenium in that case but since selenium is waay slower I was wandering if there is a workaround using scrapy

#

say for example, you need to input txt into a text field to get the data you need

#

can you use scrapy for that?

serene scaffold Feb 1, 2022, 5:27 PM

#

@lapis sequoia making more sense?

uneven parrot Feb 1, 2022, 5:30 PM

#

hey guys can someone try to help me in help-bread?

#

#help-bread message

eager imp Feb 1, 2022, 5:39 PM

#

I'd like to combine GP with DL to identify regions in the AST that should be mutated more than others, any idea which model could work for that?

nova smelt Feb 1, 2022, 6:34 PM

#

Hey, i have a LSTM NN with 11 input features. Is there an easy way to find out what features contribute the most to the output of the NN?

mild dirge Feb 1, 2022, 8:04 PM

#

https://datascience.stackexchange.com/questions/44644/how-to-determine-feature-importance-in-a-neural-network

Data Science Stack Exchange

How to determine feature importance in a neural network?

I have a neural network to solve a time series forecasting problem. It is a sequence-to-sequence neural network and currently it is trained on samples each with ten features. The performance of the

hazy lotus Feb 1, 2022, 8:14 PM

#

hey is this the right channel to ask a matplotlib specific question or is general better?

serene scaffold Feb 1, 2022, 8:16 PM

#

hazy lotus hey is this the right channel to ask a matplotlib specific question or is genera...

this is the channel for matplotlib questions; questions in #python-discussion are usually not answered, since the channel moves so quickly.

hazy lotus Feb 1, 2022, 8:24 PM

#

great thank you!

#

I'm trying to decorate the Figure class's add_subplot method.

#

this is what I have:

class ChildFigure(Figure):
    def __init__(self):
        self = figure()   
    def add_subplot(self, *args, **kwargs):
        ax = super().add_subplot(*args, **kwargs)
        # do something 
        return ax

When I call the method I get an "'ChildFigure' object has no attribute 'transSubfigure'" exception

#

Where am i going wrong?

serene scaffold Feb 1, 2022, 8:28 PM

#

sounds like the exception happens "further down"; try sharing the whole traceback

#

!traceback

arctic wedgeBOT Feb 1, 2022, 8:29 PM

#

Please provide the full traceback for your exception in order to help us identify your issue.

A full traceback could look like:

Traceback (most recent call last):
    File "tiny", line 3, in
        do_something()
    File "tiny", line 2, in do_something
        a = 6 / b
ZeroDivisionError: division by zero

The best way to read your traceback is bottom to top.

• Identify the exception raised (in this case ZeroDivisionError)
• Make note of the line number (in this case 2), and navigate there in your program.
• Try to understand why the error occurred (in this case because b is 0).

To read more about exceptions and errors, please refer to the PyDis Wiki or the official Python tutorial.

hazy lotus Feb 1, 2022, 8:34 PM

#

hm okay let me figure out how to do that

#

in the meantime this is my full code:

class ChildFigure(Figure):
    def __init__(self):
        self = figure()   
    def add_subplot(self, *args, **kwargs):
        ax = super().add_subplot(*args, **kwargs)
        # do something 
        return ax

# Data for plotting
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2 * np.pi * t)
fig = ChildFigure()
ax = fig.add_subplot(111)
ax.plot(t, s)
show()

#

it works when i substitute ChildFigure() for figure(). trying the traceback now,

#

  File "c:\python\examples\ChildFigure.py", line 22, in <module>
    ax = fig.add_subplot(111)
  File "c:\python\examples\ChildFigure.py", line 13, in add_subplot
    ax = super().add_subplot(*args, **kwargs)
  File "C:\python\anaconda\lib\site-packages\matplotlib\figure.py", line 772, in add_subplot
    ax = subplot_class_factory(projection_class)(self, *args, **pkw)
  File "C:\python\anaconda\lib\site-packages\matplotlib\axes\_subplots.py", line 34, in __init__
    self._axes_class.__init__(self, fig, [0, 0, 1, 1], **kwargs)
  File "C:\python\anaconda\lib\site-packages\matplotlib\_api\deprecation.py", line 456, in wrapper
    return func(*args, **kwargs)
  File "C:\python\anaconda\lib\site-packages\matplotlib\axes\_base.py", line 615, in __init__
    self.set_figure(fig)
  File "C:\python\anaconda\lib\site-packages\matplotlib\axes\_base.py", line 756, in set_figure
    fig.transSubfigure)
AttributeError: 'ChildFigure' object has no attribute 'transSubfigure'

#

is my problem the self = figure() in consturctor

mild dirge Feb 1, 2022, 8:51 PM

#

You want the child to behave just like a regular matplotlib figure right?

hazy lotus Feb 1, 2022, 8:51 PM

#

yeah

mild dirge Feb 1, 2022, 8:51 PM

#

So why not just do

class ChildFigure(plt.Figure):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

#

Would be my first instinct, not really comfortable with this, but since no-one else is replying

#

This also gives problems further down the line though

hazy lotus Feb 1, 2022, 8:55 PM

#

yeah i tried that at first but the figure dosn't render. i think figure() does some stuff behind the scenes adding it to a figure manager.

mild dirge Feb 1, 2022, 8:55 PM

#

Yeah, so when setting the current figure to your child class instance you can do plt.figure(fig)

#

but that gives the following error

#

Traceback (most recent call last):
  File "C:\Users\Jory\PycharmProjects\PathOfExile\tool.py", line 19, in <module>
    plt.figure(fig)
  File "C:\Users\Jory\PycharmProjects\PathOfExile\venv\lib\site-packages\matplotlib\pyplot.py", line 753, in figure
    raise ValueError("The passed figure is not managed by pyplot")
ValueError: The passed figure is not managed by pyplot

#

apparently you can do something like:

#

plt.figure(FigureClass=ChildFigure)

#

But this gives me an empty plot

#

import matplotlib.pyplot as plt
import numpy as np

class ChildFigure(plt.Figure):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def add_subplot(self, *args, **kwargs):
        ax = super().add_subplot(*args, **kwargs)
        # do something
        return ax

# Data for plotting
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2 * np.pi * t)
fig = ChildFigure()
ax = fig.add_subplot(111)
ax.plot(t, s)
plt.figure(FigureClass=ChildFigure)
plt.show()

hazy lotus Feb 1, 2022, 8:57 PM

#

ah yeah i was wrestling with that error too.

#

oh you may be on to something with the FIgureClass arg.

mild dirge Feb 1, 2022, 8:57 PM

#

mild dirge ```py import matplotlib.pyplot as plt import numpy as np class ChildFigure(plt....

This actually shows something, but an empty plot :/

#

https://matplotlib.org/devdocs/gallery/subplots_axes_and_figures/custom_figure_class.html

#

They actually got documentation on sub-classing figure ^^

#

maybe that link can give ya a push in the right direction, I don't think i'm of much help here

hazy lotus Feb 1, 2022, 9:01 PM

#

awesome thank you very much!

hazy lotus Feb 1, 2022, 9:13 PM

#

mild dirge https://matplotlib.org/devdocs/gallery/subplots_axes_and_figures/custom_figure_c...

this worked!

class ChildFigure(plt.Figure):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    @staticmethod
    def create():
        return figure(FigureClass=ChildFigure)
    def add_subplot(self, *args, **kwargs):
        ax = super().add_subplot(*args, **kwargs)
        print("in subplot\n")
        return ax

now i can create the figure beforehand with fig = ChildFigure.create()

mild dirge Feb 1, 2022, 9:13 PM

#

Awesome!

hazy lotus Feb 1, 2022, 9:14 PM

#

weirdly, no plot shows up when i put a breakpoint in the child's add_subplot method but its getting called correctly. must be some threading thing with the debugger. thank you very much for your help.

nova smelt Feb 1, 2022, 9:20 PM

#

mild dirge https://datascience.stackexchange.com/questions/44644/how-to-determine-feature-i...

uh awesome this are some interesting ideas to solve this thanks a lot!

limber galleon Feb 1, 2022, 9:21 PM

#

HI

#

HRU

#

HI

#

HI

#

H

#

I

mild dirge Feb 1, 2022, 9:22 PM

#

limber galleon HI

Don't spam to get verified

limber galleon Feb 1, 2022, 9:23 PM

#

mild dirge Don't spam to get verified

SORRY MY SMALL BROTHER DID THAT

serene scaffold Feb 1, 2022, 9:25 PM

#

limber galleon SORRY MY SMALL BROTHER DID THAT

what are you gonna do about it

limber galleon Feb 1, 2022, 9:26 PM

#

serene scaffold what are you gonna do about it

CAN U STOP AND MIND UR OWN BUISINESS I THINK SO ITS BETTER

modern cypress Feb 1, 2022, 9:37 PM

#

what is the significance of the number of epochs? I am unsure how to select and optimize my model

serene scaffold Feb 1, 2022, 10:07 PM

#

modern cypress what is the significance of the number of epochs? I am unsure how to select and ...

when you ask "what is the significance", are you asking what an epoch is?

modern cypress Feb 1, 2022, 10:20 PM

#

I know it is like a "cycle" almost, but I'm not sure how to determine how many cycles my model needs

serene scaffold Feb 1, 2022, 10:22 PM

#

modern cypress I know it is like a "cycle" almost, but I'm not sure how to determine how many c...

you can't really know in advance

swift basin Feb 1, 2022, 10:42 PM

#

modern cypress I know it is like a "cycle" almost, but I'm not sure how to determine how many c...

Basically you divide the training dataset into "batches" or groups of training examples based on the batch size (if batch size = 1,000 in a 100,000 sample dataset, you get 100,000/1,000 = 100 batches), and if you set the number of epochs to 1, you're saying you want to go through the entire dataset once, which means 100 iterations (one for each batch). For 3 epochs, it'd be 3 times 100 = 300.

How many iterations do you need? You can't really know in advance, but there's a trade-off: the amount of time and computing resources you have vs. the possibility of getting better results if you add more epochs. So you have to decide on a case-by-case basis.

mild dirge Feb 1, 2022, 10:43 PM

#

You can also risk over-fitting if you use too many epochs

swift basin Feb 1, 2022, 10:44 PM

#

mild dirge You can also risk over-fitting if you use too many epochs

true, but there's tools like early stopping to handle that

#

I would say a good strategy is going for 1 epoch, seeing how long it takes to run, and based on that, choosing the number of epochs, early stopping and so on. So if it took 20 min for 1 epoch, maybe I'd use 10 epochs if I'm only willing to wait 20 min * 10 epochs = 200 min = a bit over 3 hours.

lapis sequoia Feb 1, 2022, 11:30 PM

#

Log messages are very easy to generate since it is just a string, a blob of Jason or typed key value pairs. Event logs provide valuable insight along with context providing detail that averages and percentiles don't surface.
I don't understand last part "along with context providing detail that averages and percentiles don't surface" - can someone explain that part?

modern cypress Feb 1, 2022, 11:38 PM

#

swift basin I would say a good strategy is going for 1 epoch, seeing how long it takes to ru...

Ahhhhh okay, thank you all for your response

wispy trout Feb 2, 2022, 12:01 AM

#

http://thispersondoesnotexist.com

This Person Does Not Exist

#

Ai creates fake people

short haven Feb 2, 2022, 12:49 AM

#

How do I get access to tf.compat.v1 in tensorflow? import statements don't allow me to access it

#

Using tensorflow 2.3.0

mild dirge Feb 2, 2022, 12:52 AM

#

wispy trout http://thispersondoesnotexist.com

:/

#

Thats the stuff of nightmares

safe elk Feb 2, 2022, 12:52 AM

#

wispy trout Ai creates fake people

Perfect lol the fake AI GF for valentines

plush jungle Feb 2, 2022, 1:01 AM

#

I have this RNN in pytorch

#

class MyRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MyRNN, self).__init__()
        self.hidden_size = hidden_size
        self.in2hidden = nn.Linear(input_size + hidden_size, hidden_size)
        self.in2output = nn.Linear(input_size + hidden_size, output_size)

    def forward(self, x, hidden_state):
        combined = torch.cat((x, hidden_state), 1)
        hidden = torch.sigmoid(self.in2hidden(combined))
        output = self.in2output(combined)
        return output, hidden
    
    def init_hidden(self):
        return nn.init.kaiming_uniform_(torch.empty(1, self.hidden_size))```

#

and i'm really confused about how the hidden state works

#

the internet says that kaiming_uniform is basically random, except that the random values are chosen to best reduce the vanishing gradient problem

#

but then I'm really confused about this line

#

hidden = torch.sigmoid(self.in2hidden(combined))```

#

it passes the hidden state vector to a neural net layer and then puts the output through sigmoid?

arctic wedgeBOT Feb 2, 2022, 1:42 AM

#

Hey @lapis sequoia!

It looks like you tried to attach a Python file - please use a code-pasting service such as https://paste.pythondiscord.com

lapis sequoia Feb 2, 2022, 1:45 AM

#

hist,_,_ = np.histogram2d(x, y, bins=(1920,1080)) 
dpi = 300
fig = plt.figure(figsize=(1920/dpi, 1080/dpi), dpi=dpi)
plt.imshow(hist.T,cmap='hot')
plt.savefig("graph.png")

#

#

How would I need to call the histogram function if I want to start the X,Y coordinates on the top-left? My input coordinates also behave like that

pastel valley Feb 2, 2022, 4:19 AM

#

yo what this means?

#

do i need to set up something first before using tensorflow?

#

i dont have dedicated gpu but that message is ok even i ignore it?

lapis sequoia Feb 2, 2022, 5:13 AM

#

pastel valley yo what this means?

i thinkyou can ignore it

lapis sequoia Feb 2, 2022, 7:39 AM

#

I have a data from which consists numerical and categorical columns along with date(yyyy-mm-dd)

Date l col_1 l col_2 l col_3 l
2021-01-01 100 0 200
2021-02-01 120 0 300
2021-03-01 140 1 300
2021-04-01 150 0 300
2021-05-01 160 1 300
2021-06-01 180 0 300
2021-07-01 120 1 300

Now I need to pass the dictionary which has percentage change for col_1 and col_3 and for col_ 2 it paas the categorical responses

dicct = {"col_1" : 10, "col_2": [0,1,0,0,1,1,0], "col_3":-5}

So this is my input now it should change my above df in such a way:

My col_1 increased by 10%
My col_2 is replaced by the list in dictionary
My col_3 is decreased by 5%

wicked grove Feb 2, 2022, 8:27 AM

#

Hello, i just wanted to know how nany images should a test set and validation set contain

#

If my total dataset had 3k images

spare junco Feb 2, 2022, 8:30 AM

#

70/20/10 should be cool i think @wicked grove

#

70: train
20: val

#

10: test

last echo Feb 2, 2022, 9:12 AM

#

does this suffer from overfitting or underfitting?

wicked grove Feb 2, 2022, 9:16 AM

#

spare junco 10: test

If there are 300 images in test set and 100 images for each class is that sufficient??

lapis sequoia Feb 2, 2022, 9:45 AM

#

Hey guys i want to make an AI which can create a sentence for me.
I want to use it for a thing in a game.

So what i want to do is that i want to turn sentences like this:

selling m5 100K
selling house 232 2M

into this:

Im selling a "BMW M5". Price: $100.000
Im selling the House Nr.232. Price: $2.000.000

How can i start the project? Does someone have tips and Ideas for me how i could create something like this?

PS: You can ping me when u answer to this.

spare junco Feb 2, 2022, 10:20 AM

#

wicked grove If there are 300 images in test set and 100 images for each class is that suffic...

Yea, atleast i would do that. test set is just to test right, How your model is performing

#

70/20/10 is the most commonly used division proportions

#

you can do 60/20/20 also if you want

#

or 65/20/15

#

thats your choice

spare junco Feb 2, 2022, 10:23 AM

#

lapis sequoia Hey guys i want to make an AI which can create a sentence for me. I want to use ...

I think you should use NLP (natural language processing) for this cuz it includes language processing

lapis sequoia Feb 2, 2022, 10:42 AM

#

spare junco I think you should use NLP (natural language processing) for this cuz it includ...

Okay.
So how would i start there? I never did anything with AI.
Is it possible to do it for german?

spare junco Feb 2, 2022, 11:23 AM

#

lapis sequoia Okay. So how would i start there? I never did anything with AI. Is it possible t...

In order to learn NLP first, you should start learning just machine learning, there must be many tutorials on youtube in German. Then you should learn neural networks nd then you can properly understand NLP. i havent learnt NLP yet so i cant tell you in detail but this is what i have watched from the youtube videos

lapis sequoia Feb 2, 2022, 11:25 AM

#

spare junco In order to learn NLP first, you should start learning just machine learning, th...

Hm okay.
Actually i don't want to learn all those things (even tho they are interesting) to do this.
Still thanks for your help.

wicked grove Feb 2, 2022, 11:28 AM

#

spare junco Yea, atleast i would do that. test set is just to test right, How your model is ...

Okayy thank youu:)) i thought 85/15 ...and im using kfold for validation

spare junco Feb 2, 2022, 11:36 AM

#

wicked grove Okayy thank youu:)) i thought 85/15 ...and im using kfold for validation

Cool

wicked grove Feb 2, 2022, 12:22 PM

#

spare junco Cool

Does the validation and test have to be of similar size?

mild dirge Feb 2, 2022, 12:35 PM

#

the greater the size, the better you can estimate the perfomance of your model

#

But there's no real benefit to having the exact same size no

#

There even exists something called leave-one-out cross validation where you train the model on all the data but 1 data point, and then test if the model guesses that data point correctly. And then you average that over n folds, where n is the amount of data points.

#

This is actually one of the more accurate cross validation methods, but takes a long time as you'd have to retrain for every data point

#

@wicked grove

sharp reef Feb 2, 2022, 12:54 PM

#

So I found that for my matrix (np.tril(m).sum() + np.triu(m, 1).sum() + np.diag(m).sum()) (sum of upper and lower triangles and the diagonal) is slightly larger than m.sum() - how could that even be possible?

fleet wedge Feb 2, 2022, 12:56 PM

#

Hi. Have you ever used Chatterbot?

#

Is there any way, to change his language, to, for example, polish, korean?

#

Or I should use other libraries and translate text from polish to english and from english to polish?

mild dirge Feb 2, 2022, 1:01 PM

#

sharp reef So I found that for my matrix `(np.tril(m).sum() + np.triu(m, 1).sum() + np.diag...

What is slightly?

sharp reef Feb 2, 2022, 1:02 PM

#

mild dirge What is slightly?

Triangle sum is 22800262 and the full matrix sum is 22800242

mild dirge Feb 2, 2022, 1:02 PM

#

Oh right, np.tril includes the diagonal

#

So it's the sum of diagonal bigger

sharp reef Feb 2, 2022, 1:02 PM

#

the diagonal is 0

mild dirge Feb 2, 2022, 1:03 PM

#

you sure, can you print the sum of diagonal rl quick?

sharp reef Feb 2, 2022, 1:03 PM

#

yeah that's the first thing I checked

#

np.diag(m).sum() == 0

#

also np.tril(m).sum() == np.triu(m, 1).sum() since it's a symmetric matrix

mild dirge Feb 2, 2022, 1:05 PM

#

Did you modify the matrix somewhere in between?

#

Can you maybe just show your code

sharp reef Feb 2, 2022, 1:06 PM

#

No I just print(np.tril(weights).sum() + np.triu(weights, 1).sum(), weights.sum())

#

hmm ok I do once thing at the start which is weights = weights.astype(np.float32)

mild dirge Feb 2, 2022, 1:07 PM

#

not sure how big the matrix is and if it may round in between

#

If there are many many values between 0 and 1 that may be a reason (maybe?)

#

How come it gives an integer result for the sum though?

sharp reef Feb 2, 2022, 1:08 PM

#

I just trimmed the final .0

#

I tried it again without casting and the total sum correctly becomes 22800262, matching the sum of triangles

mild dirge Feb 2, 2022, 1:09 PM

#

ah alright

sharp reef Feb 2, 2022, 1:09 PM

#

So it seems to be related to casting it to float?

#

but only affects the total sum?

#

yeah it seems to be a float32 accuracy issue

#

I haven't seen one like this before that manifested itself in this way

mild dirge Feb 2, 2022, 1:11 PM

#

yeah because the number is big

#

and the float only has 32 bits

#

floats are more inaccurate further from 0 so that can make sense ^^

#

try 64 bit float maybe?

#

if you need floats that is

sharp reef Feb 2, 2022, 1:12 PM

#

I think I'm most confused about the triangle sums being more accurate

mild dirge Feb 2, 2022, 1:13 PM

#

What are you basing the accuracy on?

sharp reef Feb 2, 2022, 1:13 PM

#

yeah I need to turn this matrix into probabilities

sharp reef Feb 2, 2022, 1:13 PM

#

mild dirge What are you basing the accuracy on?

that it matches the integer sum result

mild dirge Feb 2, 2022, 1:13 PM

#

ah alright

#

yeah if the values actually are integers, might as well not convert to floats 😛

sharp reef Feb 2, 2022, 1:14 PM

#

If I don't numpy will do it as soon as I divide - but yeah I'll make sure to sum first before dividing

spring marsh Feb 2, 2022, 1:26 PM

#

hey can you please help me in #help-cheese

simple ivy Feb 2, 2022, 4:06 PM

#

has anyone ever done object detection with a dataset of videos?

mild dirge Feb 2, 2022, 4:18 PM

#

simple ivy has anyone ever done object detection with a dataset of videos?

What are you having trouble with?

simple ivy Feb 2, 2022, 4:20 PM

#

mild dirge What are you having trouble with?

oh not having trouble yet, just about to start collecting data and not sure where to start

#

i did object detection with images but not sure how it changes for video so wanted to ask here

mild dirge Feb 2, 2022, 4:20 PM

#

Well using videos can have some slight benefits

#

Since you asked that I'm just reading some bits of this https://towardsdatascience.com/ug-vod-the-ultimate-guide-to-video-object-detection-816a76073aef

Medium

The Ultimate Guide to Video Object Detection

Everything up to the year 20/20 (Computer Vision)

#

And you can f.e. use classifications of previous frames to your advantage

#

Obviously you could simply separate the images and have a regular image prediction problem, but having a video could give some extra info

worthy nest Feb 2, 2022, 5:00 PM

#

Hi, nice to meet yall. I want to put a small group together to study machine learning algorithms together. I’m thinking of having ML journal club and/or discuss various algorithms from general to deep concepts such as reinforced learning and evolutionary algorithms. Please dm me if you are interested.

lapis sequoia Feb 2, 2022, 5:18 PM

#

so i am trying to learn machine learning (tensorflow in particular) and the maths involved in it but i am having some trouble learning it since i am a high schooler i did consider learning concepts like calculus first but dont understand certain calculations in it what should i do is there a particular list of topics that i can learn in order to understand all the concepts?

#

Cloud providers, including Google, offer managed services such as Google's Vertex Prediction, that help you perform continuous evaluation of your prediction requests. Continuous evaluation regularly sample's prediction input and output from trained machine learning models that you've deployed to Vertex prediction. Vertex data labeling service then assigns human reviewers to provide ground truth labels to your prediction input, or alternatively, you can provide your own ground truth labels. The data labeling service compares your model's predictions with the ground truth labels to provide continual feedback on how well your model is performing over time.
I am not sure what does "Continuous evaluation regularly sample's prediction input and output" mean. Word "sample's" confuses me

lapis sequoia Feb 2, 2022, 5:19 PM

#

lapis sequoia so i am trying to learn machine learning (tensorflow in particular) and the math...

(i am aware that i dont really NEED to know the math behind it but kind of want to)

#

i am a beginner at this btw

#

@lapis sequoia You can audit this specialization https://www.coursera.org/specializations/mathematics-machine-learning

Coursera

Mathematics for Machine Learning

Offered by Imperial College London. Mathematics for Machine Learning. Learn about the prerequisite mathematics for applications in data ... Enroll for free.

#

If you want to access course materials you have to pay for course or ask for financial aid which they give in a lot of scenarios

#

First two courses are very good but in third course there are not so much visualisations, it's too much theoretical and professor doesn't explains concepts that much clearly as in first two courses

simple ivy Feb 2, 2022, 5:22 PM

#

mild dirge Since you asked that I'm just reading some bits of this https://towardsdatascien...

this is awesome- thanks so much for sharing!

lapis sequoia Feb 2, 2022, 5:23 PM

#

do i have to sign up for a trial?

prime hearth Feb 2, 2022, 5:23 PM

#

what model are you doing or learning right now

#

like machine learning algorithm

lapis sequoia Feb 2, 2022, 5:24 PM

#

lapis sequoia do i have to sign up for a trial?

@lapis sequoia

lapis sequoia Feb 2, 2022, 5:24 PM

#

prime hearth what model are you doing or learning right now

me?

prime hearth Feb 2, 2022, 5:24 PM

#

yes

lapis sequoia Feb 2, 2022, 5:24 PM

#

neural networks

lapis sequoia Feb 2, 2022, 5:24 PM

#

lapis sequoia do i have to sign up for a trial?

Click "audit", don't sign up for a trial

#

i have learned linear regression and classification

prime hearth Feb 2, 2022, 5:24 PM

#

oh okay good

#

which math particular

#

that dont understand for that model

#

is it derivates?

#

etc

lapis sequoia Feb 2, 2022, 5:25 PM

#

lapis sequoia Click "audit", don't sign up for a trial

i dont see that

worthy nest Feb 2, 2022, 5:25 PM

#

lapis sequoia so i am trying to learn machine learning (tensorflow in particular) and the math...

Hm... I think convolution is the core concept of any ML utilizing neural network...

lapis sequoia Feb 2, 2022, 5:25 PM

#

prime hearth is it derivates?

integrals in calculus

fleet wedge Feb 2, 2022, 5:26 PM

#

Do you have any tutorials or websites, books for neural network?

#

for begginers

lapis sequoia Feb 2, 2022, 5:27 PM

#

lapis sequoia i dont see that

prime hearth Feb 2, 2022, 5:27 PM

#

oh okaym hmm integrals is not needed though to implement a NN from scratch

#

or with any libraries

lapis sequoia Feb 2, 2022, 5:27 PM

#

lapis sequoia

prime hearth Feb 2, 2022, 5:27 PM

#

only if want to deep more into NN

lapis sequoia Feb 2, 2022, 5:27 PM

#

fleet wedge Do you have any tutorials or websites, books for neural network?

not really i can search for them on the internet though

lapis sequoia Feb 2, 2022, 5:28 PM

#

prime hearth oh okaym hmm integrals is not needed though to implement a NN from scratch

i am talking about calculus

prime hearth Feb 2, 2022, 5:28 PM

#

the onlt calculus reallly needed to understand NN for internviews is what partial derivative that is, there are many oyutube videos that show this

fleet wedge Feb 2, 2022, 5:29 PM

#

No need. I had thought about some tryed ones

prime hearth Feb 2, 2022, 5:29 PM

#

and little about maximum and minimum

worthy nest Feb 2, 2022, 5:29 PM

#

Convolution...

prime hearth Feb 2, 2022, 5:29 PM

#

other than that , there really is no more to calculus- its a huge field that will take you while before you can understand everything about calculus and its connections to ML

#

only need to know partial derivates for NN

lapis sequoia Feb 2, 2022, 5:30 PM

#

worthy nest Convolution...

i dont think its convolution nn i tried searching for it and i have heard it be mentioned but i dont think thats what i am trying to learn

prime hearth Feb 2, 2022, 5:30 PM

#

convultion is more complex

#

focus on NN for now

worthy nest Feb 2, 2022, 5:30 PM

#

Look up convolution... basics of anything that applies any kind of filter lol

lapis sequoia Feb 2, 2022, 5:30 PM

#

https://colab.research.google.com/drive/1m2cg3D1x3j5vrFc-Cu0gMvc48gWyCOuG#forceEdit=true&sandboxMode=true&scrollTo=ZFQqW9r-ikJb

Google Colaboratory

#

this is the notebook i am following for

lapis sequoia Feb 2, 2022, 5:31 PM

#

lapis sequoia

Can you try to go to any other course and see if there is "Audit" option so you can know whether it's a case that in this courses there is no option for auditing them? So, if you don't want to pay for these courses, you can ask for financial aid - you need to ask financial aid for each course and you will wait 15 days for response whether you will get financial aid.

lapis sequoia Feb 2, 2022, 5:31 PM

#

lapis sequoia Can you try to go to any other course and see if there is "Audit" option so you ...

ooo

lapis sequoia Feb 2, 2022, 5:31 PM

#

prime hearth and little about maximum and minimum

What maximum and minimum, max and min of what?

lapis sequoia Feb 2, 2022, 5:32 PM

#

prime hearth only need to know partial derivates for NN

so can i just jump to that concept directly like is there any titourials?

prime hearth Feb 2, 2022, 5:33 PM

#

@lapis sequoia in gradient descent this is what derivaite does and why we use - instead of + for updating weights.
w - learning rate * dw we use - because that gives min. IN addition, for NN you will see "adam" as optimizer that because in neural networks we can have local minumums and global minumums causing problems to find best values for our model while in linear regression we only have global minumums

lapis sequoia Feb 2, 2022, 5:33 PM

#

lapis sequoia ooo

lol, what you mean by "ooo"? Did you check for other courses whether you can audit them?

prime hearth Feb 2, 2022, 5:33 PM

#

local mins and max is just what first derivative does , once you understand why we take derivaites thats reallly only needed for ML neural networks and just understand what each hidden layer or activatio function does

lapis sequoia Feb 2, 2022, 5:34 PM

#

lapis sequoia lol, what you mean by "ooo"? Did you check for other courses whether you can aud...

i dont ig i have to apply for financial aid

lapis sequoia Feb 2, 2022, 5:34 PM

#

lapis sequoia i dont ig i have to apply for financial aid

ig?

lapis sequoia Feb 2, 2022, 5:34 PM

#

lapis sequoia ig?

i guess

prime hearth Feb 2, 2022, 5:34 PM

#

@lapis sequoia @lapis sequoia if learning NN, i recommend that you also try other sources like youtube video guides as they will show visuals that will help you understand the google notebook you are using to learn NN

#

but there is no need to learn integrals

#

if you do it will be more confusing

lapis sequoia Feb 2, 2022, 5:35 PM

#

I am also interested about learning how partial derivation is done in neural networks. Can you guys provide any tutorial(s)?

prime hearth Feb 2, 2022, 5:35 PM

#

youtube i found is really great

#

theyre very beginner friendly

lapis sequoia Feb 2, 2022, 5:35 PM

#

Do you have any specifics videos for this topic?

mild dirge Feb 2, 2022, 5:35 PM

#

3 blue 1 brown is generally great

#

he has a series on NNs

lapis sequoia Feb 2, 2022, 5:36 PM

#

I see

mild dirge Feb 2, 2022, 5:36 PM

#

also on linear algebra in general

lapis sequoia Feb 2, 2022, 5:36 PM

#

prime hearth youtube i found is really great

https://www.youtube.com/watch?v=AXqhWeUEtQU

YouTube

Khan Academy

Partial derivatives, introduction

Partial derivatives tell you how a multivariable function changes as you tweak just one of the variables in its input.

About Khan Academy: Khan Academy offers practice exercises, instructional videos, and a personalized learning dashboard that empower learners to study at their own pace in and outside of the classroom. We tackle math, science, ...

▶ Play video

#

@lapis sequoia Can you audit other courses?

lapis sequoia Feb 2, 2022, 5:36 PM

#

lapis sequoia https://www.youtube.com/watch?v=AXqhWeUEtQU

is this a good tutorial to follow along with?

lapis sequoia Feb 2, 2022, 5:36 PM

#

lapis sequoia <@456226577798135808> Can you audit other courses?

no i cant

prime hearth Feb 2, 2022, 5:36 PM

#

many found that to be great, i personally dont use khan academy because they dont show practical examples which helps me learn more

mild dirge Feb 2, 2022, 5:36 PM

#

3b1b normally puts a lot of effort into visualizing and explaining intuition, khan academy normally runs through an example problem in their videos

lapis sequoia Feb 2, 2022, 5:36 PM

#

lapis sequoia no i cant

Weird, so if you go to other courses there is no "Audit"?

lapis sequoia Feb 2, 2022, 5:37 PM

#

lapis sequoia Weird, so if you go to other courses there is no "Audit"?

nope

#

@lapis sequoia You could ask Coursera helpdesk about that or google about your problem

#

Wait, are you logged in in Coursera? @lapis sequoia

lapis sequoia Feb 2, 2022, 5:37 PM

#

lapis sequoia Wait, are you logged in in Coursera? <@456226577798135808>

yes

lapis sequoia Feb 2, 2022, 5:38 PM

#

lapis sequoia yes

Weird

#

if i could i can just learn partial derivation it would probably be easier

lapis sequoia Feb 2, 2022, 5:38 PM

#

prime hearth many found that to be great, i personally dont use khan academy because they don...

are there any video specific recommendations i could watch?

#

something that maybe is beginner friendly

#

@lapis sequoia Or others, if you find any good video about this topic (how partial derivation is done in neural networks) please send me DM or tag me

lapis sequoia Feb 2, 2022, 5:40 PM

#

lapis sequoia <@456226577798135808> Or others, if you find any good video about this topic (ho...

alright i will

jaunty cove Feb 2, 2022, 5:40 PM

#

Hello all, I am having an issue with binning in Python using pd.cut()
I am able to apply the pd.cut() bins to my train and test data BUT because it is automatic, when I try to apply it to new data for predictions (after my model is complete) it gives different bins and is not recognized by the model.
Does anyone know of an easy (faster than manually binning) way to apply the same bins to my new data set?

lapis sequoia Feb 2, 2022, 5:48 PM

#

Now that you've detected drift, what can you do about it? Well, first, try to determine which data in your previous training data set is still valid by using unsupervised methods, such as clustering or statistical methods that look at divergence.
I am interested how can clustering here helps? Can somebody provide an example? Or how can statistical methods helps that look at divergence?

#

Also, what in this context means "divergence"?

#

If you know about something that I asked about, pleasure use reply so I get notified

lapis sequoia Feb 2, 2022, 6:14 PM

#

The way actual users experience your system is essential to assessing the true impact of its predictions, recommendations and decisions. For example, you should design your features with appropriate disclosures, built in clarity and control is crucial to a good user experience.
"you should design your features with appropriate disclosures," what does that mean?

prime hearth Feb 2, 2022, 6:15 PM

#

I not sure as i need to see what site you are using or what context

soft viper Feb 2, 2022, 7:27 PM

#

what do you actually compare between cluster algorithm? Other than run time

prime hearth Feb 2, 2022, 7:49 PM

#

Performance; accuracy of model

dim aspen Feb 2, 2022, 8:16 PM

#

I'm doing a project where I do autonomous navigation using lidar camera fusion. Is there a single algorithm or library or framework I can use to do navigation and sensor fusion? I'm thinking of something like SLAM where it combined lidar and camera data to navigate. I need it to be able to run on a small device, so not too computationally heavy

sour spindle Feb 2, 2022, 8:21 PM

#

i made a model generated its own trading strategy that beat the tesla return on investment by 74x. I would like to see your thoughts on it since I am going to make a free to use website using these generated strategies.

TSLA_strategy_generator_roi_1.7milv22k.png

agile cobalt Feb 2, 2022, 8:27 PM

#

sour spindle i made a model generated its own trading strategy that beat the tesla return on ...

you might want to to add some disclaimers such as "this is not financial advice" or at least "I am not liable" if you share it
and make sure that your testing methods are reliable, not just "train and test on 100% of the data"

desert oar Feb 2, 2022, 8:35 PM

#

individual data frame operations are pretty highly optimized. the problem is that sequential data frame operations are not optimized at all, because there is no "compilation" step. each operation has to be a self-contained operation. frameworks like dask and pyspark sidestep this problem because applying a function to a dask dataframe does not actually immediately apply the function. it just builds up a "graph" of computations yet-to-be-done, and doesn't actually execute them until you request execution. so then the framework can inspect the full graph of operations and apply optimizations thereto.

desert oar Feb 2, 2022, 9:48 PM

#

sour spindle i made a model generated its own trading strategy that beat the tesla return on ...

if something is too good to be true, it probably is. i am not a finance person, but that seems like something you might want to backtest on more than just 1 ticker

#

also you need to be very careful of data leakage when evaluating "sequential" predictive models

#

consider that 10 1-step-ahead forecasts is often very very different from a 10-step-ahead forecast

mild dirge Feb 2, 2022, 10:37 PM

#

Also the stock went from 5 to 1200 😛

#

But yeah, really agree with the too good to be true argment

uncut orbit Feb 3, 2022, 12:27 AM

#

i dont understand how standard scaler works

prime hearth Feb 3, 2022, 12:42 AM

#

standard scaler tries to give mean of 0 and standard deviation of 1. Overall standard scalre performs better than normalization , but we use normlaization for imaages like CNN algorithm or anything between 0-1 values , for regression we use standard scaler

lapis sequoia Feb 3, 2022, 12:45 AM

#

Yo!

#

can anyone help me get through a project im working on... Im pretty fucked

#

Im working 7 days a week 10 hour days and I have a DS project im working on academically and its WRECKING my shit

serene scaffold Feb 3, 2022, 12:47 AM

#

lapis sequoia can anyone help me get through a project im working on... Im pretty fucked

perhaps? you have to ask your actual question to attract potential answerers.

lapis sequoia Feb 3, 2022, 12:47 AM

#

I have a deadline to produce a predictive model

serene scaffold Feb 3, 2022, 12:48 AM

#

@prime hearth for your reference, editing a message to include a ping does not actually ping the person.

lapis sequoia Feb 3, 2022, 12:48 AM

#

And i have no idea how to construct one. Im just working my ass off and really cannot wrap my mind around this shit in the time constraint i have

#

heres the objective text

#

To explore and visualize the dataset, build a linear regression model to predict the prices of used cars, and generate a set of insights and recommendations that will help the business

prime hearth Feb 3, 2022, 12:49 AM

#

oh thanks i never knew that;w as doing it this whole time

uncut orbit Feb 3, 2022, 12:49 AM

#

prime hearth standard scaler tries to give mean of 0 and standard deviation of 1. Overall sta...

thanks

serene scaffold Feb 3, 2022, 12:49 AM

#

lapis sequoia And i have no idea how to construct one. Im just working my ass off and really c...

this server is not a code-writing service, so if you are not in a position to understand what you've been asked to do, it's not likely that anyone will be able to help you complete the assignment.

lapis sequoia Feb 3, 2022, 12:54 AM

#

I didnt ask for anyone to write my code

#

I asked for help - to make myself clear. Im underarmed for what im assigned. I know this entirely

#

im desperate to apply what i DO know to accomplish what im working on. - if no one can or is willing to help i understand entirely

#

im just trying to make financial ends meet and finish this project

serene scaffold Feb 3, 2022, 12:56 AM

#

I understand. It's unfortunate that many countries don't adequately invest in the education of their citizens.

Do you understand what linear regression is, at a high level?

lapis sequoia Feb 3, 2022, 12:56 AM

#

No i do not. at a high level

#

Im familiar with Pandas, Seaborn, not numpy

serene scaffold Feb 3, 2022, 12:57 AM

#

when talking about "levels" of knowledge, "high" is less.

lapis sequoia Feb 3, 2022, 12:57 AM

#

just trying to get up to speed with what im assigned with my back against the wall

#

it just sucks - i got pressured into this course and Im working my ass off and trying to meet the courses deadlines

#

im already paid into it - following along but Its just kicking my ass - i want to learn but i cant access these resources that they claim to provide

#

I feel scammed paying 4k for a 6 month AIML course just to be in the dark while im trying to pay bills on my other hand you know

#

It seems like an easy enough project but i dont know how to build what theyre asking for

#

either way - if anyone is able to lend a hand i'd be incredibly grateful. Im not confident in my ability to perform where its required at this moment and Ive got family basically telling me im a failure for not being adequate.

#

ill fuck off i sorry yall

serene scaffold Feb 3, 2022, 1:00 AM

#

do you know what a "best fit line" is? (in this case, it doesn't need to be a "straight line", even though a line is necessarily straight by some definitions) @lapis sequoia

lapis sequoia Feb 3, 2022, 1:01 AM

#

Im familar with it but

#

Im not entirely sure how to even apply it

#

I've got a data set on my hands but ive never built a predictive model or had time due to my work schedule to really attempt it in full.

serene scaffold Feb 3, 2022, 1:02 AM

#

So, you're familiar with it. This is a good start. Linear regression is a process for calculating the best fit line/curve given a bunch of points.

lapis sequoia Feb 3, 2022, 1:03 AM

#

"There is a huge demand for used cars in the Indian Market today. As sales of new cars have slowed down in the recent past, the pre-owned car market has continued to grow over the past years and is larger than the new car market now. Cars4U is a budding tech start-up that aims to find footholes in this market.

In 2018-19, while new car sales were recorded at 3.6 million units, around 4 million second-hand cars were bought and sold. There is a slowdown in new car sales and that could mean that the demand is shifting towards the pre-owned market. In fact, some car sellers replace their old cars with pre-owned cars instead of buying new ones. Unlike new cars, where price and supply are fairly deterministic and managed by OEMs (Original Equipment Manufacturer / except for dealership level discounts which come into play only in the last stage of the customer journey), used cars are very different beasts with huge uncertainty in both pricing and supply. Keeping this in mind, the pricing scheme of these used cars becomes important in order to grow in the market.

As a senior data scientist at Cars4U, you have to come up with a pricing model that can effectively predict the price of used cars and can help the business in devising profitable strategies using differential pricing. For example, if the business knows the market price, it will never sell anything below it.

"

#

this is my context statement

#

I have a dataset to accomodate.

serene scaffold Feb 3, 2022, 1:03 AM

#

lapis sequoia I've got a data set on my hands but ive never built a predictive model or had ti...

I understand that you have a lot on your plate right now. let's just focus on understanding some topics, so that you can continue as best you can.

lapis sequoia Feb 3, 2022, 1:03 AM

#

Objective
To explore and visualize the dataset, build a linear regression model to predict the prices of used cars, and generate a set of insights and recommendations that will help the business.

serene scaffold Feb 3, 2022, 1:03 AM

#

do you know about the scikit-learn library?

lapis sequoia Feb 3, 2022, 1:04 AM

#

i do not

#

im familiar with pandas, seaborn, numpy

serene scaffold Feb 3, 2022, 1:04 AM

#

well, maybe we'll get to that in a bit. in the mean time, let's look at the data. Whatever format it's in, please put some examples in this chat as text (no screenshots).

lapis sequoia Feb 3, 2022, 1:05 AM

#

📎 used_cars_data.csv

#

the bare minimum objective is to "build a linear regression model to predict the prices of used cars, and generate a set of insights and recommendations that will help the business."

serene scaffold Feb 3, 2022, 1:06 AM

#

so you have these: ``S.No.,Name,Location,Year,Kilometers_Driven,Fuel_Type,Transmission,Owner_Type,Mileage,Engine,Power,Seats,New_Price,Price`

so you want to be able to calculate what the Price value should be, given all the others.

lapis sequoia Feb 3, 2022, 1:06 AM

#

yeah.

#

"should be" is where im stumped. I have no idea where to start when it comes to predictive analysis

serene scaffold Feb 3, 2022, 1:07 AM

#

well, it might be possible to do it just using the New_Price and Kilometers

#

generally speaking, cars lose value the more you drive them

#

Thins like the engine, power, and seats would probably be accounted for in the original price of the car

#

make sense?

mild dirge Feb 3, 2022, 1:08 AM

#

I think since it's also for data analysis, they might want you to use those for the linear regression too

#

to see the impact of those on the price

lapis sequoia Feb 3, 2022, 1:09 AM

#

Im also considering

#

brand names

#

like maruti as opposed to luxury brands like audi , mercedes

serene scaffold Feb 3, 2022, 1:09 AM

#

you'd have to encode the brands in some way, since linear regression deals with numbers

lapis sequoia Feb 3, 2022, 1:10 AM

#

I understand the idealogical approach - its the execution and model building where im unknowledgeable.

mild dirge Feb 3, 2022, 1:10 AM

#

You could do it yourself with numpy, there's also libraries that are really easy to use

#

like scikit that stelercus recommended

lapis sequoia Feb 3, 2022, 1:10 AM

#

Im unfamiliar with Scikit

#

im sorry for being in such a desperate position

serene scaffold Feb 3, 2022, 1:12 AM

#

you can use this: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html

scikit-learn

sklearn.linear_model.LinearRegression

Examples using sklearn.linear_model.LinearRegression: Principal Component Regression vs Partial Least Squares Regression Principal Component Regression vs Partial Least Squares Regression, Plot ind...

#

the usage might look like this

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(df[['New_Price', 'Kilometers']], df['Price'])

#

and then you can do model.predict with some array of shape (n, 2) where the first column is the new price and the second column is the number of kilometers driven.

lapis sequoia Feb 3, 2022, 1:14 AM

#

is scikit something i can import from my Anaconda toolkit or am i going to have to download and import from elsewhere

serene scaffold Feb 3, 2022, 1:15 AM

#

lapis sequoia is scikit something i can import from my Anaconda toolkit or am i going to have ...

I don't use anaconda, so idk.

lapis sequoia Feb 3, 2022, 1:15 AM

#

the project is out of a Jupityr notebook.

#

I feel like a fucking idiot whos way in over his head

serene scaffold Feb 3, 2022, 1:16 AM

#

I understand that this is a frustrating, and somewhat embarrassing situation for you. I would just focus on doing what you can.

lapis sequoia Feb 3, 2022, 1:17 AM

#

Im just trying to get scikit into my environment right now

serene scaffold Feb 3, 2022, 1:18 AM

#

it would usually be pip install sklearn. idk how it works for anaconda.

lapis sequoia Feb 3, 2022, 1:21 AM

#

Just trying to figure it out

#

Cant even get a foothold on a starting point

#

im so fucked im sorry yall.

desert oar Feb 3, 2022, 1:30 AM

#

lapis sequoia Cant even get a foothold on a starting point

back up. what do you know?

#

you said that you have worked with pandas and seaborn. that's fine. but what do you know about modeling, stats, machine learning, etc?

#

it's fine if the answer is "not much", but given that you are taking a course and are expected to complete this assignment, i highly doubt that you know nothing

#

if you list the things that you do know how to do, then we can maybe help you work up towards something achievable with your current state of knowledge

desert oar Feb 3, 2022, 1:31 AM

#

lapis sequoia is scikit something i can import from my Anaconda toolkit or am i going to have ...

in general, python packages need to be installed. you should be able to install it in Anaconda

#

i am very curious what course you took that cost 4k USD, that's quite a lot of money

#

imo if you are paying money like that, you deserve some kind of one-on-one assistance if you are struggling

lapis sequoia Feb 3, 2022, 1:32 AM

#

Its a mygreatlearning course partnered with UT texas mccombs school of business

#

AIML

#

and none of the instructors are responsive

#

at all

desert oar Feb 3, 2022, 1:33 AM

#

i see

#

frankly i'd try to get my money back then

lapis sequoia Feb 3, 2022, 1:33 AM

#

I dont even think I can.

desert oar Feb 3, 2022, 1:33 AM

#

$4k is like what you pay at a big name university for credits

lapis sequoia Feb 3, 2022, 1:33 AM

#

They structure payments where its 1k each

#

and they basically rapid fire them at you before youre even halfway done w the course

desert oar Feb 3, 2022, 1:33 AM

#

i'm sorry to hear that. it sounds like a money mill scam

lapis sequoia Feb 3, 2022, 1:33 AM

#

In an attempt to appease external pressures I signed up for the course but

#

due to work im just behind

desert oar Feb 3, 2022, 1:34 AM

#

external pressures from who?

lapis sequoia Feb 3, 2022, 1:34 AM

#

family

desert oar Feb 3, 2022, 1:34 AM

#

i see

#

so how behind are you exactly? have you tried reaching out to an administrator who is not a professor?

lapis sequoia Feb 3, 2022, 1:34 AM

#

I enjoy the learning

#

I enjoy what we work on

serene scaffold Feb 3, 2022, 1:34 AM

#

4k is way more than one should have to pay for a single course in the US.

lapis sequoia Feb 3, 2022, 1:34 AM

#

but Im getting fucking rocked

#

Its a 6 month post graduate program

#

basically my family is dissapointed in what i do for a living

#

and constantly harps on me to do this shit

#

because the market is "hot" etc etc etc

#

I know id be capable but. i have bills to pay. i work 7 days a week (voting machine company / way behind bc of covid and supply chain)

#

so I have to travel , work long hours, and when im finally at the hotel im fuckng exhausted

desert oar Feb 3, 2022, 1:36 AM

#

serene scaffold 4k is way more than one should have to pay for a single course in the US.

yeah, even individual courses at ivy league schools don't cost that much, or didn't when i was in school

lapis sequoia Feb 3, 2022, 1:36 AM

#

trying to catch up but im just extremely disoriented