serene scaffold Oct 16, 2021, 1:24 AM

#

In the other three, did you expose your question, or did you ask if anyone has certain knowledge?

oblique ridge Oct 16, 2021, 1:25 AM

#

serene scaffold In the other three, did you expose your question, or did you ask if anyone has c...

Latter one. Point taken, will go directly for the question

serene scaffold Oct 16, 2021, 1:25 AM

#

oblique ridge Latter one. Point taken, will go directly for the question

The reason is, even if people know about the topic of your question, it's unlikely that they'll want to interview you to dig down to the actual question. It's best to save everyone a step and put the actual question out.

#

Good luck!

oblique ridge Oct 16, 2021, 1:26 AM

#

serene scaffold The reason is, even if people know about the *topic* of your question, it's unli...

Got it. Thanks for the feedback!

next lance Oct 16, 2021, 3:20 AM

#

Yes I know what a dot product is and I can do it on paper
I know a lot about Numpy, How the neron Network works and all. I have created the model like a bit of it parts
Yes I don't think I am Expert IN AI and ML but I think that it's hard

Also it's not for my homework but I am leaning Deep Learning 😅

tender hearth Oct 16, 2021, 4:04 AM

#

next lance Yes I know what a dot product is and I can do it on paper I know a lot about Nu...

Can you share the code that you have made already? I'm having trouble understanding what you mean by "Where is the Deep Learning"

next lance Oct 16, 2021, 4:39 AM

#

Ok

#

After 2 hours

peak halo Oct 16, 2021, 6:01 AM

#

@lapis sequoia and @granite vine

#

here

granite vine Oct 16, 2021, 6:10 AM

#

peak halo <@456226577798135808> and <@!857277701933236235>

Ok

peak halo Oct 16, 2021, 6:13 AM

#

yeah, tell

jade acorn Oct 16, 2021, 6:37 AM

#

whats the difference between me using a statistical program to make a linear regression model versus doing it in Python via sklearn? How is the latter machine learning when its just generating predictions? Can you say that in this specific case with linear regression models that all "automated" calculators of least squares are considered a rough form of machine learning?

tender hearth Oct 16, 2021, 7:43 AM

#

jade acorn whats the difference between me using a statistical program to make a linear reg...

The former uses an algorithm to directly generate a line. The latter initializes a bunch of random weights and adjusts them over time based on loss. Both (may) lead to the same outcome, but they arrive at that outcome differently. If the data you have is linearly separable such that least squares suffices there is no need for ML which is suited for modeling non-linear relationships

#

(Just realized I did not answer your question. No, least squares is not a form of machine learning because no learning is done)

wide meadow Oct 16, 2021, 8:10 AM

#

Hi, in Machine Learning, do we select the model first and then tune it's hyperparameters or tune the hyperparameters for all the models first and then select the model?

jade acorn Oct 16, 2021, 8:12 AM

#

tender hearth The former uses an algorithm to directly generate a line. The latter initializes...

in ordinary least squares what is there to randomly generate?

jade acorn Oct 16, 2021, 8:13 AM

#

tender hearth The former uses an algorithm to directly generate a line. The latter initializes...

i realize theres multiple ways to calculate the predictions and MSE in linear regression, but im not seeing what you mean by "adjusting the random weights over time ", the only thing i can think of is gradient descent cause that is an algorithm that adjusts the MSE iteravely. Do you mean neural networks?

next lance Oct 16, 2021, 8:14 AM

#

tender hearth Can you share the code that you have made already? I'm having trouble understand...

So I have just done setup of folders, I took some imagages from Google for training and then I have labelled those images. You can just say that I have only worked with ipython till now. So there is no code but I know about Numpy (learned from Sentdex) but I don't see any use of it (Numpy) till now

verbal cape Oct 16, 2021, 8:37 AM

#

I was aligning images and I wanted to load images in afolder with a specific name. How can I do that?

woeful falcon Oct 16, 2021, 8:43 AM

#

Anyone into NLP here,
is nltk a good choice ?

lapis sequoia Oct 16, 2021, 9:24 AM

#

hello

uneven thistle Oct 16, 2021, 10:09 AM

#

how can i improve this histogram made on plotly as you can see he markings are not clear since one bar is too high so other bars are so low that they are not visible and the y axis scale is too high as well how to make the scale smaller with the frequency of 100

tender hearth Oct 16, 2021, 12:07 PM

#

jade acorn i realize theres multiple ways to calculate the predictions and MSE in linear r...

Right, that is what I meant. Sorry, I should've been specific

desert oar Oct 16, 2021, 12:17 PM

#

tender hearth (Just realized I did not answer your question. No, least squares is not a form o...

I really strongly disagree with this

#

Like fundamentally I think that's wrong

#

I'm on mobile so I can't effectively plead my case, but my basic opinion is that fitting a line is definitely "learning", albeit learning something relatively simple

#

Arguably "least squares" itself is just an algorithm, it's not inherently machine learning. But fitting a line with least squares is just as much machine learning as fitting a deep network with SGD

#

cc @jade acorn

tender hearth Oct 16, 2021, 12:21 PM

#

desert oar Arguably "least squares" itself is just an algorithm, it's not inherently machin...

Hm, upon further reflection I see your point

#

My mind Ctrl+Shift+F's "machine learning" to "deep learning" since that's usually what people are referring to

sage latch Oct 16, 2021, 1:19 PM

#

Hi, random question:
I'm interested in datascience/Ai stuff; can you guys recommend some certifications or so, maybe I can convince my employer to pay me some courses/certificates;
I just don't know where to start

neon imp Oct 16, 2021, 1:28 PM

#

https://www.springer.com/us/book/9781071614174

An Introduction to Statistical Learning - with Applications in R | ...

This book provides an accessible overview of the field of statistical learning, with applications in R programming....

sage latch Oct 16, 2021, 1:32 PM

#

I'm working in some consulting setup, would be nice if there where soem certifications wich I can convince the higher ups that those are marketable and improve my value so they can hire my out for more moneys, but thy for that link

short chasm Oct 16, 2021, 1:35 PM

#

Hello everyone. I want to create a blank screen with the plt.figure() function, but I can't. Can you help me?

next lance Oct 16, 2021, 2:48 PM

#

I am making a object detection using Tenserflow but I am installing a lot of things
Do professional programmers also download so many files like labelImg and more.
Can we not just install all the modules and use them as code, like install by doing pip

serene scaffold Oct 16, 2021, 2:53 PM

#

next lance I am making a object detection using Tenserflow but I am installing a lot of thi...

what is labelimg?

#

Can we not just install all the modules and use them as code, like install by doing pip
are you talking about downloading software that isn't Python libraries? different operating systems have different packages managers. what OS are you on?

azure marsh Oct 16, 2021, 2:55 PM

#

labelimg is image annotation software. Ironically it is in pip, though some level tools aren't. Many packages have many ways of installing

next lance Oct 16, 2021, 3:01 PM

#

serene scaffold > Can we not just install all the modules and use them as code, like install by ...

Yes I am saying that Can we install all modules by doing pip install

#

Do all professionals also download so many repositories and all other things

#

I am following a tutorial in which I have made so many steps, installed so many things and all this

#

https://youtu.be/yqkISICHH-U This is the tutorial ,Do you know any good place where I can learn like a professional

azure marsh Oct 16, 2021, 3:09 PM

#

In reality, even professionals deal with heterogeneous and painful environment setups

#

You can look try and look for dockers with everything set up

next lance Oct 16, 2021, 3:18 PM

#

azure marsh In reality, even professionals deal with heterogeneous and painful environment s...

But I am installing many libraries differently even when I can do pip
Is there any different use of installing libraries differently

#

When will I be doing that big code part

chrome lintel Oct 16, 2021, 3:21 PM

#

if it's a standalone package I don't think there's anything wrong with simply installing with pip. Some more sophisticated packages also come with software that you need to manually download and install - but if pip install works you should use that

next lance Oct 16, 2021, 3:21 PM

#

Yes I am thinking the same

#

So When will I be doing lot of code 🤔

chrome lintel Oct 16, 2021, 3:22 PM

#

Depends on the tutorial you're following. I imagine once you have all the required packages it'll be coding time

azure marsh Oct 16, 2021, 3:23 PM

#

next lance But I am installing many libraries differently even when I can do pip Is there ...

Is there not one requirements file?

next lance Oct 16, 2021, 3:24 PM

#

I am just using ipython till now

next lance Oct 16, 2021, 3:24 PM

#

azure marsh Is there not one requirements file?

Like?

azure marsh Oct 16, 2021, 3:24 PM

#

pip install -r requirements.txt

next lance Oct 16, 2021, 3:24 PM

#

Never did that

azure marsh Oct 16, 2021, 3:24 PM

#

Several pip packages can be installed that way from a list

#

Tutorials differ in the amount of effort they put into the setup

#

If they are having you a bunch of different pip packages separately, they could've made it one line with a file like that

next lance Oct 16, 2021, 3:26 PM

#

No that guy is making me install all these differently

#

Also Do all programmers do all this creepy things in AI and ML ? 😳

#

I haven't written a single line of code in Pycharm, only using ipython to install and setup

#

And the guy in the video says now time to train the model

#

https://www.tensorflow.org/hub/tutorials/object_detection If you look at this then you can see that we are not installing anything leaving libs

TensorFlow

Object Detection | TensorFlow Hub

jade acorn Oct 16, 2021, 4:52 PM

#

desert oar I really strongly disagree with this

i see

grave frost Oct 16, 2021, 5:13 PM

#

next lance Also Do all programmers do all this creepy things in AI and ML ? 😳

that doesn't sound like a great way to learn ML

#

start from the basics instead of some random guy on youtube

desert bear Oct 16, 2021, 5:59 PM

#

Does anyone know how to find two points that have extending coordinates in certain domain.
In this example, my domain is (-5, 5), and one point extends outside of that.

frigid elk Oct 16, 2021, 6:16 PM

#

I'm having difficulty understanding a concept. .... true or false, ... when using a threshold other than .5 for classification, does that essentially turn it in to a regression problem? are classification and regression the same, just one has a binary output (in the case of 2 classes)?

tidal bough Oct 16, 2021, 6:37 PM

#

desert bear Does anyone know how to find two points that have extending coordinates in certa...

Sounds like you just want to filter the points by a condition like not (-5<=x<=5 and -5<=y<=5)

viscid siren Oct 16, 2021, 6:40 PM

#

Hi guys do ya got some good tutorials and explanations for basics in data science with python or c language?

desert bear Oct 16, 2021, 6:59 PM

#

tidal bough Sounds like you just want to filter the points by a condition like `not (-5<=x<=...

I wanted to adjust the plot when the iteration of my algorithm produces points that passes my domain

#

but that would be too much of work honestly

jade acorn Oct 16, 2021, 7:30 PM

#

is there a function in numpy, sklearn or scipy that can detect if a dataset goes in a linear line, polynomial or exponential?

tidal bough Oct 16, 2021, 7:36 PM

#

Well, you can try fitting the dataset to a line, a polynomial of some degree or to an exponent. I don't think there is (or can be, really) an automatic function to determine that

odd meteor Oct 16, 2021, 7:45 PM

#

frigid elk I'm having difficulty understanding a concept. .... true or false, ... when usin...

Ordinarily, sigmoid function takes care of that in classification problem. This is what works behind the scene to determine the class each predicted value rightly belongs to.

So long as you understand the concept of sigmoid function, then you honestly shouldn't bother about having a threshold other than 0.5 (given you're interested in building an unbiased model)

Sigmoid function is an activation function used in Logistic Regression (classification problem) to make our output be between 0 and 1 (or between - 1 and 1)

So if the predicted value is above 0.5, it'll be assigned to class 1, and if otherwise class 0.

So long as there is an activation function ( in our case sigmoid function) present in your Logistic Regression model, you cannot predict a continuous value.

So in essence, even if the set threshold is 0.7, that doesn't turn your classification problem into a regression problem.

frigid elk Oct 16, 2021, 7:49 PM

#

odd meteor Ordinarily, sigmoid function takes care of that in classification problem. This ...

Thanks for the response. .. my thought on moving the threshold is due to running an imbalanced dataset, .. i've ran standard, undersampling, oversampling and smote against my dataset and in all regards the recall is not satisfactory unless i lower my threshold

#

just want to make sure that i'm considering the right value, .. that a regression prediction is the continuous value for what would be a classification otherwise when the threshold is .5

odd meteor Oct 16, 2021, 7:56 PM

#

frigid elk Thanks for the response. .. my thought on moving the threshold is due to running...

If you have an imbalanced class, try to use of the available resampling techniques like SMOTE (if you want to oversample the minority class)

You can even add the parameter stratify =y when splitting your data with train test split.

Then endeavour to also add the class_weight parameter in your model to handle imbalance class.

XGBoost uses scale_pos_weight

Then use StratifiedKFokd for your cross-validation.

Try to google more ways to handle imbalance class. Don't touch the default set threshold which is 0.5

frigid elk Oct 16, 2021, 8:04 PM

#

odd meteor If you have an imbalanced class, try to use of the available resampling techniqu...

in the case of xgbregressor, how do i know what activation function is being used? is sigmoid default, i hear relu is more preferred

cold cloud Oct 16, 2021, 8:06 PM

#

hey everyone, i seem to be having a problem with the feature importance on my models. im using the built in sklearn feature_importances_ for gradient boosting, random forest, and extreme gradient boosting. but my extreme gradient boosting feature importance seems to be a very different than the gradient boosting or random forest feature importances to the point where a low correlating, binary feature is my most important variable.

is the built in feature_importance for random forest and gradient boosting different than for extreme gradient boosting? im struggling to find an explanation for this. or is there a better way to find feature importance for my models?

odd meteor Oct 16, 2021, 8:07 PM

#

frigid elk in the case of xgbregressor, how do i know what activation function is being use...

I just recently started learning Deep Learning but to the best of my knowledge, ReLu is an activation function used in Neural Network. It's not used in Regression problem. Activation function isn't used in Regression

lapis sequoia Oct 16, 2021, 8:09 PM

#

so i am trying to learn ai and machine learning and i was watching a video from tech with tim and i did what he did and understood a part of it, i was hoping someone could explain to me this:

#

import pandas as pd
import numpy as np
import sklearn
from sklearn import linear_model
from sklearn.utils import shuffle

data = pd.read_csv("student-mat.csv", sep=";")

data = data[["G1", "G2", "G3", "studytime", "failures", "absences"]]

predict = "G3"

X = np.array(data.drop([predict], 1))
y = np.array(data[predict])

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size = 0.1)

linear = linear_model.LinearRegression()

linear.fit(x_train, y_train)
acc = linear.score(x_test, y_test)
print(acc)

print('Coefficient: \n', linear.coef_)
print('Intercept: \n', linear.intercept_)

predictions = linear.predict(x_test)

for x in range(len(predictions)):
    print(predictions[x], x_test[x], y_test[x])```

frigid elk Oct 16, 2021, 8:13 PM

#

odd meteor I just recently started learning Deep Learning but to the best of my knowledge, ...

thanks for the chat, .. i think you're right with relu being a neural network function.

desert oar Oct 16, 2021, 8:30 PM

#

odd meteor I just recently started learning Deep Learning but to the best of my knowledge, ...

Activation functions are used for hidden layers in regression problems

#

Just not for the output layer

desert oar Oct 16, 2021, 8:31 PM

#

frigid elk thanks for the chat, .. i think you're right with relu being a neural network fu...

There is no activation function in xgboost. It's worth reading about how it works

odd meteor Oct 16, 2021, 8:32 PM

#

cold cloud hey everyone, i seem to be having a problem with the feature importance on my mo...

That shouldn't worry you. Of course, different models built with different ML algorithms won't exactly have the same feature importance. I think you should focus more on the model that best minimizes your loss function.

About other ways to do Feature Selection, you could use RFE or better still, combine feature selectors using Voting Method.

odd meteor Oct 16, 2021, 8:33 PM

#

desert oar Activation functions are used for hidden layers in regression problems

In Deep Learning right?

rain temple Oct 16, 2021, 8:47 PM

#

lapis sequoia so i am trying to learn ai and machine learning and i was watching a video from ...

Basically what this code does is apply a Linear Regression model onto your dataset. You are storing the dataset in the dataframe "data" and you are predicting the value of G3 given the other features in the dataset. Firstly you are dropping the value to be predicted from the dataset and spltting the data into training and testing data. Training data is to train the model and ensure that it can learn the best fit for the linear regression line, and test data is to find the accuracy of the model u have applied --> acc = linear.score (x_test, y_test). In the application of linear regression, you are initializing the model in "linear" and applying to the model using the "fit". Linear regression works by estimating parameters ie: the gradient and the y intercept, and then minimising loss using the Mean Squared Error. After applying lin regression to the model you are predicting the test values.

#

im probably wrong in some places as im also new to ML, so you should probs seek confirmation from more experiences peopl

frigid elk Oct 16, 2021, 8:54 PM

#

odd meteor That shouldn't worry you. Of course, different models built with different ML al...

what about SHAP values? the logic behind them seems sound and easily explainable

odd meteor Oct 16, 2021, 9:07 PM

#

lapis sequoia ``` import pandas as pd import numpy as np import sklearn from sklearn import li...

Importing necessary Libraries

import pandas as pd
import numpy as np
import sklearn
from sklearn import linear_model
from sklearn.utils import shuffle

Read Dataset Into Pandas DataFrame
data = pd.read_csv("student-mat.csv", sep=";")

__Selected The Features and Label he's interested in using to build a model __

data = data[["G1", "G2", "G3", "studytime", "failures", "absences"]]

Specified the label (Response Variable)

target_col = "G3"

Declared The Input & Target Variable

He dropped the target column from dataframe so it'll contain only the input features (X), then converted both X (DataFrame) and y (Series) into a numpy array.

X = np.array(data.drop([target_col], 1))  
y = np.array(data[target_col])

Splitted Dataset into train and holdout set

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size = 0.1)

Instantiated the Linear Regression object

linear = linear_model.LinearRegression()

Train a Linear Regression Model

linear.fit(x_train, y_train)

__Get The Coefficient of Determination (R2) __

acc = linear.score(x_test, y_test)
print(acc)

__Print Your Model Parameters (The weight and intercept) __

print('Coefficient: \n', linear.coef_)
print('Intercept: \n', linear.intercept_)

Make Predictions with the holdout set

predictions = linear.predict(x_test)

Print The Predicted_y, X_test, and Actual_y

for x in range(len(predictions)):
    print(predictions[x], x_test[x], y_test[x])

odd meteor Oct 16, 2021, 9:12 PM

#

rain temple im probably wrong in some places as im also new to ML, so you should probs seek ...

We're all learning... 😊

I know sigmoid function is the activation function used in Logistic Regression. I'm just starting to learn DL, however, I'm yet to come across where an activation function is used for a Regression problem.

rain temple Oct 16, 2021, 9:15 PM

#

Arent activation functions primarly used in Neural Nets tho?

odd meteor Oct 16, 2021, 9:17 PM

#

rain temple Arent activation functions primarly used in Neural Nets tho?

Yeah, but activation function was also used in Logistic Regression (sigmoid) that's what works behind the scene to designate values to the class they belong to during prediction

odd meteor Oct 16, 2021, 9:22 PM

#

frigid elk what about SHAP values? the logic behind them seems sound and easily explainable

I don't know enough about SHAP values to comment on it at this time.

Do you mind telling me what it means/where it's applied?

frigid elk Oct 16, 2021, 9:29 PM

#

odd meteor I don't know enough about SHAP values to comment on it at this time. Do you min...

it's a method of evaluating model feature importance, down to the observation if needed, based on game theory. .. it's able to work on multiple classifiers by evaluating the results in a similar manner of leave one out cross validation. ... that's about the closest parallel i can think of off the top of my head. ... look up shapley values on youtube, there are some great resources. ... data robot has one that's around 5 minutes that explains it very well at a high level

odd meteor Oct 16, 2021, 9:34 PM

#

frigid elk it's a method of evaluating model feature importance, down to the observation if...

Okay that's cool I'll look it up. I majorly use the feature_importances_ method, RFE, and sometimes Voting Method for combing feature selectors (when I want to combine multiple models)

desert oar Oct 16, 2021, 10:12 PM

#

odd meteor In Deep Learning right?

Yes

severe hazel Oct 16, 2021, 10:58 PM

#

Hi Gys think I'm at the right place. I have a question. For someone in mapping

#

I want to build interactive maps. I want my python coordinate points to show up on this map and have lines between them. What would you guys recommend?

lapis sequoia Oct 17, 2021, 2:27 AM

#

Is 83% accuracy good, fellas?

next lance Oct 17, 2021, 2:27 AM

#

grave frost that doesn't sound like a great way to learn ML

I have already learned How the Neuron Network Network works

next lance Oct 17, 2021, 2:31 AM

#

lapis sequoia Is 83% accuracy good, fellas?

I think more than 85 is good

lapis sequoia Oct 17, 2021, 2:32 AM

#

next lance I think more than 85 is good

Oh thanks mate, but my output is absurdly restricted to 2 out of 11 possible outputs

next lance Oct 17, 2021, 2:32 AM

#

Oh bad

#

Are you using openvc?

lapis sequoia Oct 17, 2021, 2:33 AM

#

next lance Are you using openvc?

No

next lance Oct 17, 2021, 2:35 AM

#

lapis sequoia No

You should use it for better input if you are doing image detection

desert oar Oct 17, 2021, 2:37 AM

#

lapis sequoia Is 83% accuracy good, fellas?

It depends entirely on the prediction task and the data you have available

blazing pawn Oct 17, 2021, 4:28 AM

#

severe hazel I want to build interactive maps. I want my python coordinate points to show up...

I think numpy, matplotit and seaborn library would be enough for your task

twin valve Oct 17, 2021, 7:23 AM

#

uk what u dont hv enough of

#

water bottles

#

from lttstore.com

royal crest Oct 17, 2021, 7:37 AM

#

twin valve from lttstore.com

!rule 7

arctic wedgeBOT Oct 17, 2021, 7:37 AM

#

Rules

7. Keep discussions relevant to the channel topic. Each channel's description tells you the topic.

next lance Oct 17, 2021, 8:42 AM

#

Which is the best Keypoints, Boxes or Masks in Tenserflow 2 Zoo

spice moss Oct 17, 2021, 8:45 AM

#

I just started learning neural Networks but I don't understand the role of Activation and softmax functions and how the neurons work with them

#

Can anyone help

#

Or recommend any books/videos for it

next lance Oct 17, 2021, 8:47 AM

#

spice moss Or recommend any books/videos for it

Yes there is a very good video by Sentdex on Neron Networks and Numpy

spice moss Oct 17, 2021, 8:47 AM

#

Oh OK

next lance Oct 17, 2021, 8:47 AM

#

It covers mostly everything from scratch

spice moss Oct 17, 2021, 8:48 AM

#

Oh thank you, I will definitely check that out

next lance Oct 17, 2021, 8:48 AM

#

I will give you the link

#

https://www.youtube.com/watch?v=Wo5dMEP_BbI&list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3

YouTube

sentdex

Neural Networks from Scratch - P.1 Intro and Neuron Code

Building neural networks from scratch in Python introduction.

Neural Networks from Scratch book: https://nnfs.io

Playlist for this series: https://www.youtube.com/playlist?list=PLQVvvaa0QuDcjD5BAw2DxE6OF2tius3V3

Python 3 basics: https://pythonprogramming.net/introduction-learn-python-3-tutorials/
Intermediate Python (w/ OOP): https://pythonpr...

▶ Play video

spice moss Oct 17, 2021, 8:50 AM

#

Screenshot_20211017_095014_com.google.android.youtube.jpg

next lance Oct 17, 2021, 8:50 AM

#

https://nnfs.io

#

This is the book

spice moss Oct 17, 2021, 8:51 AM

#

Oh ok thank you so much

next lance Oct 17, 2021, 8:51 AM

#

also check out the playlist

next lance Oct 17, 2021, 8:51 AM

#

spice moss Oh ok thank you so much

😄 Happy to help

lyric ermine Oct 17, 2021, 9:42 AM

#

hey guys, i wanna start working on my portofolio.

is dash and plotly recommended to build first basic dashboards?

grave frost Oct 17, 2021, 11:51 AM

#

next lance I have already learned How the Neuron Network Network works

then why the difficulty?

lapis sequoia Oct 17, 2021, 12:08 PM

#

I want to start with AI
Can anyone tell me what should I do first ?

tender hearth Oct 17, 2021, 12:14 PM

#

I'm reading the Universal Sentence Encoder paper, and I'm having trouble understanding this part of the paper:

The context aware word representations are con-
verted to a fixed length sentence encoding vector
by computing the element-wise sum of the repre-
sentations at each word position.

Here's the rest of the text if you need context:

The transformer based sentence encoding model
constructs sentence embeddings using the en-
coding sub-graph of the transformer architecture
(Vaswani et al., 2017). This sub-graph uses at-
tention to compute context aware representations
of words in a sentence that take into account both
the ordering and identity of all the other words.
The context aware word representations are con-
verted to a fixed length sentence encoding vector
by computing the element-wise sum of the repre-
sentations at each word position. The encoder
takes as input a lowercased PTB tokenized string
and outputs a 512 dimensional vector as the sen-
tence embedding.

#

"Element-wise sum"?

#

Ah. I believe it's taking a sum of axis=0

#

That would make the most sense. Since axis=1 would still produce a variable length tensor

reef dock Oct 17, 2021, 12:32 PM

#

I've been practising some of the concepts I've learnt in this data science course that i'm doing and I've been wondering (and this might come across as a stupid question but) how do you know what to do with the data you have at hand?

serene scaffold Oct 17, 2021, 12:40 PM

#

reef dock I've been practising some of the concepts I've learnt in this data science cours...

There isn't an easy answer for that question, as it's largely case-by-case

#

you might start by asking yourself "what are insights that a human could extract from this data if they had time to go through all of it by hand"

reef dock Oct 17, 2021, 12:43 PM

#

I guess that's sort of a drawback of just practising code? Picking up a dataset from nowhere without having a plan kinda made me blank.

serene scaffold Oct 17, 2021, 12:43 PM

#

reef dock I guess that's sort of a drawback of just practising code? Picking up a dataset ...

what was the most recent dataset you got ahold of? maybe we could spitball some ideas. (once I get back from making coffee.)

reef dock Oct 17, 2021, 12:44 PM

#

Can i post links here?

reef dock Oct 17, 2021, 12:44 PM

#

serene scaffold what was the most recent dataset you got ahold of? maybe we could spitball some ...

https://www.kaggle.com/aparnashastry/building-permit-applications-data <- This dataset

San Francisco Building Permits

5 years and 200k building permits

serene scaffold Oct 17, 2021, 12:45 PM

#

I'll be back in a few

#

@reef dock did you look at the names of the columns?

['Permit Number', 'Permit Type', 'Permit Type Definition',
       'Permit Creation Date', 'Block', 'Lot', 'Street Number',
       'Street Number Suffix', 'Street Name', 'Street Suffix', 'Unit',
       'Unit Suffix', 'Description', 'Current Status', 'Current Status Date',
       'Filed Date', 'Issued Date', 'Completed Date',
       'First Construction Document Date', 'Structural Notification',
       'Number of Existing Stories', 'Number of Proposed Stories',
       'Voluntary Soft-Story Retrofit', 'Fire Only Permit',
       'Permit Expiration Date', 'Estimated Cost', 'Revised Cost',
       'Existing Use', 'Existing Units', 'Proposed Use', 'Proposed Units',
       'Plansets', 'TIDF Compliance', 'Existing Construction Type',
       'Existing Construction Type Description', 'Proposed Construction Type',
       'Proposed Construction Type Description', 'Site Permit',
       'Supervisor District', 'Neighborhoods - Analysis Boundaries', 'Zipcode',
       'Location', 'Record ID']

reef dock Oct 17, 2021, 12:56 PM

#

Yes

#

I ended up removing some of the ones that didn't seem too useful too.

serene scaffold Oct 17, 2021, 12:57 PM

#

which ones did you find not useful?

reef dock Oct 17, 2021, 12:57 PM

#

Street Number Suffix, Proposed Construction Type, Site Permit, TIDF Compliance, Unit

serene scaffold Oct 17, 2021, 1:02 PM

#

@reef dockwhat do you think would be interesting to learn from this data?

granite vine Oct 17, 2021, 1:03 PM

#

hello

#

lol

reef dock Oct 17, 2021, 1:03 PM

#

I'm not entirely sure since I picked the dataset at random. Though I did try to look at the neighborhoods that have the most permit applications.

granite vine Oct 17, 2021, 1:03 PM

#

reef dock I'm not entirely sure since I picked the dataset at random. Though I did try to ...

ok

#

help

#

<@&831776746206265384>

#

can you help meeeee

#

plssssssssssss

serene scaffold Oct 17, 2021, 1:04 PM

#

Please don't ping moderators asking for help.

granite vine Oct 17, 2021, 1:04 PM

#

serene scaffold Please don't ping moderators asking for help.

so who can i ping for help

#

@digital shard

serene scaffold Oct 17, 2021, 1:04 PM

#

granite vine so who can i ping for help

No one

granite vine Oct 17, 2021, 1:04 PM

#

serene scaffold No one

why

granite vine Oct 17, 2021, 1:04 PM

#

serene scaffold No one

so can you help meeeee

serene scaffold Oct 17, 2021, 1:04 PM

#

granite vine why

We don't provide on-call help.

granite vine Oct 17, 2021, 1:04 PM

#

serene scaffold We don't provide on-call help.

ok

#

sir

granite vine Oct 17, 2021, 1:05 PM

#

serene scaffold We don't provide on-call help.

pls help me

#

a short help

reef dock Oct 17, 2021, 1:05 PM

#

There's help channels in the server if that's what you're looking for.

serene scaffold Oct 17, 2021, 1:05 PM

#

I don't even know what you want help with, whereas keN has been asking clear questions.

granite vine Oct 17, 2021, 1:05 PM

#

serene scaffold I don't even know what you want help with, whereas keN has been asking clear que...

i want pyaudio in py

#

it not working

reef dock Oct 17, 2021, 1:06 PM

#

You should look at #❓｜how-to-get-help

granite vine Oct 17, 2021, 1:06 PM

#

serene scaffold I don't even know what you want help with, whereas keN has been asking clear que...

so i go on unofficial python but my powershell is coming with wheel not support error

serene scaffold Oct 17, 2021, 1:06 PM

#

granite vine so i go on unofficial python but my powershell is coming with wheel not support ...

Try opening a help channel; see #❓｜how-to-get-help

granite vine Oct 17, 2021, 1:07 PM

#

serene scaffold Try opening a help channel; see <#704250143020417084>

ok

#

done

#

lol

reef dock Oct 17, 2021, 1:15 PM

#

@serene scaffold if you had to work with this data, what would you do?

serene scaffold Oct 17, 2021, 1:16 PM

#

reef dock <@!253696366952316929> if you had to work with this data, what would you do?

I do human language stuff, so I don't really know. It might be interesting to see which of these features predicts the estimated cost

reef dock Oct 17, 2021, 1:16 PM

#

what do you mean human language stuff?

serene scaffold Oct 17, 2021, 1:16 PM

#

or when the estimated cost and revised cost are different

serene scaffold Oct 17, 2021, 1:17 PM

#

reef dock what do you mean human language stuff?

information extraction, text-to-speech engines, machine translation

reef dock Oct 17, 2021, 1:17 PM

#

serene scaffold or when the estimated cost and revised cost are different

So kinda like the predicted impact of a change in those costs?

reef dock Oct 17, 2021, 1:17 PM

#

serene scaffold information extraction, text-to-speech engines, machine translation

Oh wow, that sounds interesting.

serene scaffold Oct 17, 2021, 1:17 PM

#

reef dock So kinda like the predicted impact of a change in those costs?

maybe? sounds like you know about this kind of thing

reef dock Oct 17, 2021, 1:18 PM

#

serene scaffold maybe? sounds like you know about this kind of thing

I tried to work with predictive models in the past. Though my knowledge in that is very brief and theoretical.

tender hearth Oct 17, 2021, 1:37 PM

#

Anyone have any suggestions for producing a speaker embedding from an audio waveform

#

Reading WaveNet's paper rn to see what they've done

edgy hearth Oct 17, 2021, 1:44 PM

#

can anyone help me out ?

#

here is the code

#


data = pd.read_csv('hollowen_costumes.csv')
print(data.describe())```

#

soo like i can print this

serene scaffold Oct 17, 2021, 1:45 PM

#

tender hearth Anyone have any suggestions for producing a speaker embedding from an audio wave...

so you're trying to go from a wav file to an array?

edgy hearth Oct 17, 2021, 1:46 PM

#

but its not clean

tender hearth Oct 17, 2021, 1:46 PM

#

Depends what you mean

edgy hearth Oct 17, 2021, 1:46 PM

#

is there any func to do that ?

serene scaffold Oct 17, 2021, 1:46 PM

#

edgy hearth but its not clean

"unclean data" varies wildly. you have to know what you have and what you want it to be.

tender hearth Oct 17, 2021, 1:46 PM

#

If by array, you mean the audio samples, I already have that. if by array, you mean fixed-length embedding vector, yes that's what I'm looking for

edgy hearth Oct 17, 2021, 1:47 PM

#

unique                                                238                                                                                                                                                                   50
top                                                  Name                                                                                                                                                                    1
freq                                                    1                                                                                                                                                                   28```

#

this is the output

edgy hearth Oct 17, 2021, 1:47 PM

#

serene scaffold "unclean data" varies wildly. you have to know what you have and what you want i...

uh huh

#

can i make it like in a striaght line ?

tender hearth Oct 17, 2021, 1:47 PM

#

So I'm looking at Wavenet autoencoders because audio is extremely high resolution and using recurrent methods would be really compute intensive

serene scaffold Oct 17, 2021, 1:48 PM

#

tender hearth So I'm looking at Wavenet autoencoders because audio is extremely high resolutio...

This isn't something I know about, evidently.

#

@edgy hearth can you do print(data.head().to_csv()) and paste the text in this chat?

edgy hearth Oct 17, 2021, 1:49 PM

#

sure

#

1,Clown,100
2,Vampire,97
3,Harley Quinn,96
4,Casa De Papel,95```

#

this is the output

#

yeah ig it worked

#

: )))

#

did .head() do the thing ?

serene scaffold Oct 17, 2021, 1:51 PM

#

edgy hearth did ``.head()`` do the thing ?

head just shows you the first few rows. describe does calculations and shows you the result.

abstract frost Oct 17, 2021, 2:02 PM

#

hello, has anyone used bookstoscrape in the past?

serene scaffold Oct 17, 2021, 2:03 PM

#

abstract frost hello, has anyone used bookstoscrape in the past?

Your best bet is to ask the question you would be asking if someone said yes, as even those who have heard of bookstoscrape need to know what you need help with.

abstract frost Oct 17, 2021, 2:05 PM

#

okay i managed to extract every info on a book page and now im struggling the same info for every book in the same genre.

#

well idk if that made sense, but basically i have the cript to get all infos of a book and now i just need to repeat it for every book of a genre

lapis sequoia Oct 17, 2021, 2:20 PM

#

Anybody had any experience with donkey cars?

#

well I want to create a fully fledged AI assistant that can talk as well as work as a utility software
As I don't want it to be based on if else statements
which modules should I learn to make such a AI in a easy way
not very complex
P.S. - I am a 9th grade student with good python skills so pls list down the prerequisites(basically math required)

serene scaffold Oct 17, 2021, 3:04 PM

#

lapis sequoia well I want to create a fully fledged AI assistant that can talk as well as work...

I'm concerned that your expectations aren't realistic.

lapis sequoia Oct 17, 2021, 3:04 PM

#

serene scaffold I'm concerned that your expectations aren't realistic.

why tho ?

serene scaffold Oct 17, 2021, 3:05 PM

#

lapis sequoia why tho ?

Making a "fully fleged AI assistant" would be challenging even for people with advanced degrees in math and computer science.

#

though this does not mean that you can't do things with AI that will be interesting and gratifying for you.

lapis sequoia Oct 17, 2021, 3:06 PM

#

serene scaffold Making a "fully fleged AI assistant" would be challenging even for people with a...

I didn't meant that much fully fledged like google assistant and siri
one that can do some basic functionality and basically talk in hindi and english languages if possible

#

by basic functionality I mean
opening a software
playing music
opening different folders
searching stuff on web
that can easily be done with python

lapis sequoia Oct 17, 2021, 3:08 PM

#

serene scaffold though this does not mean that you can't do things with AI that will be interest...

hmm
so can you suggest the module/modules I should learn to make it a little AI

serene scaffold Oct 17, 2021, 3:09 PM

#

lapis sequoia hmm so can you suggest the module/modules I should learn to make it a little AI

You might try reading about intent classification, as your assistant would need that.

#

Also, I would suggest against "learning libraries/modules"

lapis sequoia Oct 17, 2021, 3:10 PM

#

serene scaffold You might try reading about intent classification, as your assistant would need ...

hmm which module should I specifically target
tensorflow or spacy or some frameworks like rasa

lapis sequoia Oct 17, 2021, 3:11 PM

#

serene scaffold Also, I would suggest against "learning libraries/modules"

umm so that I learn to train models from scratch ?

serene scaffold Oct 17, 2021, 3:11 PM

#

Again, you don't want to "learn libraries" as these libraries are intended to be building blocks for solving lots of different problems, and understanding what problems are out there and what the ~~solutions~~ potential approaches are will serve you better.

lapis sequoia Oct 17, 2021, 3:13 PM

#

serene scaffold Again, you don't want to "learn libraries" as these libraries are intended to be...

hmm so from where should I begin

serene scaffold Oct 17, 2021, 3:13 PM

#

lapis sequoia hmm so from where should I begin

Reading about intent classification.

lapis sequoia Oct 17, 2021, 3:13 PM

#

serene scaffold Reading about intent classification.

hmm after that

serene scaffold Oct 17, 2021, 3:13 PM

#

You can't skip this step. Good luck!

lapis sequoia Oct 17, 2021, 3:14 PM

#

serene scaffold You can't skip this step. Good luck!

I won't and my last and most important question
what math I would need to learn for this

serene scaffold Oct 17, 2021, 3:15 PM

#

lapis sequoia I won't and my last and most important question what math I would need to learn ...

AI often involves knowing probability, statistics, combinatorics, and linear algebra.

#

the first three are inter-related though.

lapis sequoia Oct 17, 2021, 3:15 PM

#

serene scaffold AI often involves knowing probability, statistics, combinatorics, and linear alg...

thnx !!

lapis sequoia Oct 17, 2021, 4:22 PM

#

serene scaffold AI often involves knowing probability, statistics, combinatorics, and linear alg...

Does differentiation gets included in LA too? It's not heavily needed but at the same time it's better if we know it.

serene scaffold Oct 17, 2021, 4:35 PM

#

lapis sequoia Does differentiation gets included in LA too? It's not heavily needed but at the...

That falls under calculus. My list was not exhaustive, however

lapis sequoia Oct 17, 2021, 4:37 PM

#

Makes sense.

arctic wedgeBOT Oct 17, 2021, 5:21 PM

#

:incoming_envelope: :ok_hand: applied mute to @lapis sequoia until <t:1634491907:f> (9 minutes and 59 seconds) (reason: duplicates rule: sent 4 duplicated messages in 10s).

edgy hearth Oct 17, 2021, 5:42 PM

#

serene scaffold `head` just shows you the first few rows. `describe` does calculations and shows...

gotcha

coral kindle Oct 17, 2021, 6:24 PM

#

I have a question with huge loads of NLP

#

Can libraries like PySpark process them chunk by chunk? Because I really want to try something with the ArXiV dataset snapshot.

#

And it's unfortunate that I can't fully exploit it because of RAM capacity

#

I've been trying Spark and Spark-NLP but idk if I should bounce back to a simple NN on Torch once the preprocessing is done

#

https://nlp.johnsnowlabs.com/

John Snow Labs - Spark NLP

The first production grade versions of the latest deep learning NLP research

#

Because so far it looks neat

#

Do I really have to set up a machine on the cloud ?

coral kindle Oct 17, 2021, 7:12 PM

#

Has anybody ever used spark-nlp? Ever?

#

There's some towardsdatascience articles on it

grave frost Oct 17, 2021, 9:02 PM

#

tender hearth Anyone have any suggestions for producing a speaker embedding from an audio wave...

wav2vec2 if you don't wanna mess around with SOTA stuff. otherwise, waveNet is pretty good too

tender hearth Oct 18, 2021, 12:05 AM

#

grave frost wav2vec2 if you don't wanna mess around with SOTA stuff. otherwise, waveNet is p...

Noted, thanks for the suggestion

lapis sequoia Oct 18, 2021, 12:24 AM

#

What steps or learning paths should I take if I want to become an ML engineer?

robust rampart Oct 18, 2021, 12:50 AM

#

Tree Depth: 37
yuck

ashen sable Oct 18, 2021, 1:13 AM

#

hey guys i want to build a speaking assistant which learns day by day..i am thinking of using reinforcment learning..so what do u guys suggest

warm valley Oct 18, 2021, 7:19 AM

#

Hey folks, a small qsn.
Should I use feature selection in test data too?

I.e, I used feature selection to clear the data and model in decision tree but when i used it to predict the test data.

It gives
Feature name useen at fit time.

quasi parcel Oct 18, 2021, 9:04 AM

#

Hi everyone https://paste.pythondiscord.com/sicunewomu.yaml this is the csv
the is the pivot table code

weighted_matrix = combineframes.explode('product_ids').pivot_table(index='Customer_ID', columns='product_ids', aggfunc='sum', values='res') 

weighted_matrix```

#

the above code is only returning 174 columns but it should be 999 rows

#

in this weighted_matrix

zinc rock Oct 18, 2021, 10:02 AM

#

hi would like help in #help-dumpling for pandas

rose cipher Oct 18, 2021, 10:25 AM

#

Guys, I would like to know if Cloud Plataforms (AWS, GCP, AZURE, etc) can have acess to my data? If yes, which types of data?

uneven thistle Oct 18, 2021, 10:26 AM

#

i am trying to resample my data to yearly but the problem is that my price column is integer and country code and organization is string so after resampling i used groupby to group according to country adnd org and then used sum but it is showing the price 0 for every row what shuoul i do?

zinc rock Oct 18, 2021, 10:57 AM

#

anyone know where is this from

#

lol

#

desert oar Oct 18, 2021, 11:39 AM

#

warm valley Hey folks, a small qsn. Should I use feature selection in test data too? I.e, ...

in test data, use the same exact features as in training. remember: test data is meant to simulate new data that your model has not seen yet.

warm valley Oct 18, 2021, 11:52 AM

#

desert oar in test data, use the same exact features as in training. remember: test data is...

Thanks much.
Also, I got 0.75 AUC, it is considered good exam wise?

desert oar Oct 18, 2021, 11:53 AM

#

warm valley Thanks much. Also, I got 0.75 AUC, it is considered good exam wise?

we can't give exam help here.

#

!rules 8

arctic wedgeBOT Oct 18, 2021, 11:53 AM

#

Rules

8. Do not help with ongoing exams. When helping with homework, help people learn how to do the assignment without doing it for them.

desert oar Oct 18, 2021, 11:54 AM

#

but this is a general enough question that i am comfortable answering: it depends entirely on the data. often something like 0.75 is not good enough for "business" purposes, but it depends on the cost of misclassification in your particular problem

#

recall the interpretation of AUROC on binary classification

#

in some medical studies or other research settings (e.g. social science), 0.75 might be great

#

but in those contexts you are often interested in probability modeling and not just accurate point predictions

desert oar Oct 18, 2021, 11:59 AM

#

uneven thistle i am trying to resample my data to yearly but the problem is that my price colum...

can you share a sample of data that reproduces the problem, and can you repost your code as text, using a code block?

also, you might want to pay attention to that warning: some of your Date values might be ints.

#

!code

arctic wedgeBOT Oct 18, 2021, 11:59 AM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

uneven thistle Oct 18, 2021, 12:00 PM

#

desert oar can you share a sample of data that reproduces the problem, and can you repost y...

desert oar Oct 18, 2021, 12:01 PM

#

i can't copy and paste data from a screenshot. can you post CSV?

#

!paste

arctic wedgeBOT Oct 18, 2021, 12:01 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

uneven thistle Oct 18, 2021, 12:01 PM

#

this is the data i am gettig after resampling

#

ok

arctic wedgeBOT Oct 18, 2021, 12:02 PM

#

Hey @uneven thistle!

Uh-oh! It looks like your message got zapped by our spam filter. We currently don't allow .csv attachments, so here are some tips to help you travel safely:

• If you attempted to send a message longer than 2000 characters, try shortening your message to fit within the character limit or use a pasting service (see below)

• If you tried to show someone your code, you can use codeblocks
(run !code-blocks in #bot-commands for more information) or use a pasting service like:

https://paste.pythondiscord.com

uneven thistle Oct 18, 2021, 12:03 PM

#

it is showing that csv attachents are currently not allowed

desert oar Oct 18, 2021, 12:04 PM

#

yes, use the paste site i linked

jolly cliff Oct 18, 2021, 12:10 PM

#

How to generate unique random numbers in numpy array?

jolly cliff Oct 18, 2021, 12:10 PM

#

jolly cliff How to generate unique random numbers in numpy array?

Can someone pls help?

uneven thistle Oct 18, 2021, 12:19 PM

#

https://paste.pythondiscord.com/afiyebetuf.coffeescript

#

here it is @desert oar but i didnt find a way t copy csv file

desert oar Oct 18, 2021, 12:20 PM

#

jolly cliff How to generate unique random numbers in numpy array?

https://numpy.org/doc/stable/reference/random/index.html

uneven thistle Oct 18, 2021, 12:24 PM

#

what is this @desert oar

#

i want to resample the data not sample

#

did you understand my question @desert oar

desert oar Oct 18, 2021, 12:30 PM

#

uneven thistle i want to resample the data not sample

that was in response to the other user. i understand the question, but i can't help without sample data. i don't have any way to reproduce your problem

uneven thistle Oct 18, 2021, 12:30 PM

#

how can i upload csv

desert oar Oct 18, 2021, 12:31 PM

#

paste it into the paste site

#

i only need enough rows to see the problem, i don't need the whole file

uneven thistle Oct 18, 2021, 12:32 PM

#

there is no option to past it in the paste site

jolly cliff Oct 18, 2021, 12:32 PM

#

desert oar https://numpy.org/doc/stable/reference/random/index.html

How exactly?

uneven thistle Oct 18, 2021, 12:34 PM

#

is this enough for you?

#

@desert oar

uneven thistle Oct 18, 2021, 12:57 PM

#

plzz help @desert oar

desert oar Oct 18, 2021, 1:03 PM

#

that isn't enough, sorry. the best i can offer is that a lot of your Price values are nan in this screenshot https://cdn.discordapp.com/attachments/366673247892275221/899628086654545931/Screenshot_141.png, which would result in a lot of 0s upon grouping and summing

silver summit Oct 18, 2021, 1:07 PM

#

anyone good with pyspark?

#

I have a binary file that is very large. Trying to figure out how to use pyspark to work with it.

quasi parcel Oct 18, 2021, 1:37 PM

#

rose cipher Guys, I would like to know if Cloud Plataforms (AWS, GCP, AZURE, etc) can have a...

yes you can store json, csv, txt, xls parquet any type of file in AWS s3

#

Google Object Storage in gcp

#

in azure Azure Blob

serene scaffold Oct 18, 2021, 1:39 PM

#

@silver summit it's important that you always state your actual question, as even people who know about a given topic won't volunteer themselves until they know what you're really asking. I assume you're asking this question: #python-discussion message

silver summit Oct 18, 2021, 1:44 PM

#

serene scaffold <@!717074747674066975> it's important that you always state your actual question...

I'm not. The other question was regarding using custom udf's. This question is I just have a very large binary file that is slow to work with. I'm interested in finding out if I can use pyspark to work with it in a distributed way. Possible streaming so I additionally don't need to laod the whole thing into memory. I think spar.readStream could be an option but it's not currently working.

desert oar Oct 18, 2021, 2:46 PM

#

"not currently working" is a great place to start by asking a specific question about the unexpected or error output you are getting

rose cipher Oct 18, 2021, 3:03 PM

#

quasi parcel yes you can store json, csv, txt, xls parquet any type of file in AWS s3

No, I mean, is secure to use these services? Can they acess private data?

quasi parcel Oct 18, 2021, 3:03 PM

#

like?

#

credentials

#

@rose cipher

rose cipher Oct 18, 2021, 3:07 PM

#

No. I just want to know if these companies can have acess to the data that I use on this plataforms.

quasi parcel Oct 18, 2021, 3:08 PM

#

no they wont there is a strict policy that your data wont be shared or used by anyone else apart from you

rose cipher Oct 18, 2021, 3:12 PM

#

quasi parcel no they wont there is a strict policy that your data wont be shared or used by a...

That's what I wanted to know. Thanks!

#

I am thinking about get into the Cloud Industry, but I was afraid about privacy concerns

quasi parcel Oct 18, 2021, 3:12 PM

#

unless you keep open to public

#

so try to keep it private

rose cipher Oct 18, 2021, 3:12 PM

#

quasi parcel unless you keep open to public

Yes

#

There's public, private and hybrid cloud, that's what you mean right?

quasi parcel Oct 18, 2021, 3:13 PM

#

no no

#

so in s3 we have this policy were we can allow the file to be publicly accessible then anyone can download your file

#

i do suggest s3 with private policy its really good

rose cipher Oct 18, 2021, 3:17 PM

#

Sure

#

Good to know that these services are safe

#

When I mean safe I am saying about data protection

quasi parcel Oct 18, 2021, 3:20 PM

#

yes you can use these services what is your exact functionality?

rose cipher Oct 18, 2021, 3:24 PM

#

I don't know, I will start to learn them now. I just wanted to know if they were Safe

silver summit Oct 18, 2021, 4:42 PM

#

desert oar "not currently working" is a great place to start by asking a _specific_ questio...

wasn't around my computer when responded, daughter is asleep now so I can add more context

#

when I use pyspark to read a binary file as spark.readStream.format('binaryfile').load('filename.bin') I get a schema error ```
IllegalArgumentException:
Schema must be specified when creating a streaming source DataFrame. If some
files already exist in the directory, then depending on the file format you
may be able to create a static DataFrame on that directory with
'spark.read.load(directory)' and infer schema from it.'

bold timber Oct 18, 2021, 4:47 PM

#

how to handle this error?

median fulcrum Oct 18, 2021, 5:06 PM

#

any ideas how can I use seaborn boxplot to describe this database?

serene scaffold Oct 18, 2021, 5:06 PM

#

bold timber how to handle this error?

can you copy and paste the whole error message as text? Please ping me if you do this.

#

!traceback

arctic wedgeBOT Oct 18, 2021, 5:06 PM

#

Please provide the full traceback for your exception in order to help us identify your issue.

A full traceback could look like:

Traceback (most recent call last):
    File "tiny", line 3, in
        do_something()
    File "tiny", line 2, in do_something
        a = 6 / b
ZeroDivisionError: division by zero

The best way to read your traceback is bottom to top.

• Identify the exception raised (in this case ZeroDivisionError)
• Make note of the line number (in this case 2), and navigate there in your program.
• Try to understand why the error occurred (in this case because b is 0).

To read more about exceptions and errors, please refer to the PyDis Wiki or the official Python tutorial.

serene scaffold Oct 18, 2021, 5:06 PM

#

^ this is what is meant by "whole error message"

bold timber Oct 18, 2021, 5:07 PM

#

serene scaffold can you copy and paste the whole error message as text? Please ping me if you do...

ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes'

serene scaffold Oct 18, 2021, 5:07 PM

#

bold timber ImportError: cannot import name 'MaskedArray' from 'sklearn.utils.fixes'

This is only the very end of the error message. I asked to see the whole thing.

quasi parcel Oct 18, 2021, 5:08 PM

#

what is your sklean version

#

@bold timber

bold timber Oct 18, 2021, 5:09 PM

#

quasi parcel <@!786960616664727572>

0.23.2

quasi parcel Oct 18, 2021, 5:10 PM

#

try installing 0.18.2

silver summit Oct 18, 2021, 5:19 PM

#

ok, I gave up on readStream... trying to figure out how to use just read with a binary file... so I currently have the data loaded in with spark.read.format('binaryfile').load(fn) and have the following

Screen_Shot_2021-10-18_at_10.19.01_AM.png

desert oar Oct 18, 2021, 5:19 PM

#

silver summit when I use pyspark to read a binary file as `spark.readStream.format('binaryfile...

what is actually in the file? is it supposed to be a dataframe? what is the file format?

silver summit Oct 18, 2021, 5:20 PM

#

the content field is a list of values, the first 4 are just checks, the next 3 are header info then the rest is the data I need, trying to figure out how to strip out the first 4, then the next 3 then the remaining values

#

the header info basically tells me how much data there is and the type etc

#

yeah the data is a list of numbers that will eventually need to be shaped into a dataframe

#

file format is binary

formal lava Oct 18, 2021, 5:22 PM

#

What r req for ai and how to start?

desert oar Oct 18, 2021, 5:26 PM

#

@silver summit i think readStream is only for dataframes, i don't know if there's an API to parse an arbitrary huge blob of bytes into a dataframe. you might have to fall back to reading each line into an RDD, and parsing each line into a Row, and then using createDataFrame to collect all those rows into a dataframe

#

it'd still be "streaming" but it's not quite as elegant as declaratively specifying a schema

robust jungle Oct 18, 2021, 5:26 PM

#

How can I find the output node names of an xception model

median fulcrum Oct 18, 2021, 5:26 PM

#

what's the correct way to do that?

silver summit Oct 18, 2021, 5:26 PM

#

@desert oar well I have the data in a dataframe now, I just need to transform it to a series instead of a list of values in a single row of a column

robust jungle Oct 18, 2021, 5:26 PM

#

(Keras)

robust jungle Oct 18, 2021, 5:28 PM

#

formal lava What r req for ai and how to start?

Depends on what you are trying to do

formal lava Oct 18, 2021, 5:28 PM

#

idk machine learning?

silver summit Oct 18, 2021, 5:28 PM

#

lol bro...

robust jungle Oct 18, 2021, 5:28 PM

#

I recommend starting with tensorflow

#

There are good guides online

silver summit Oct 18, 2021, 5:29 PM

#

wtf

#

that's terrible advice

desert oar Oct 18, 2021, 5:29 PM

#

silver summit <@!389497659087650836> well I have the data in a dataframe now, I just need to t...

are they all the same lengths? i think there's a way to explode a single array-valued column into separate columns, but it's ugly. i know that you can "explode" an array-valued column to rows, and then re-collect them into columns with a lot of expensive joins. i'd personally go back to the RDD.map version w/ a function that returns a Row

robust jungle Oct 18, 2021, 5:29 PM

#

Mb

#

I’m kinda new too, so I’m just saying what I did

silver summit Oct 18, 2021, 5:30 PM

#

absolutely don't do that

#

you need to understand fundamentals before you just jump into stuff like that

#

take an online course about machine learning, get a sense for statistics, look at sklearn and kaggle

median fulcrum Oct 18, 2021, 5:31 PM

#

median fulcrum what's the correct way to do that?

someting like that? But still the error

silver summit Oct 18, 2021, 5:31 PM

#

just load data into pandas and try to figure out what youre looking at, means, stds, sizes, shapes, plots etc...

#

daughter is awake... be on later

desert oar Oct 18, 2021, 5:32 PM

#

i agree that you should definitely learn some data viz and data manipulation in pandas along w/ tensorflow

#

but i don't think it's that bad to start poking around in TF with image classification or whatever

#

but you will very quickly start wishing you knew more principled approaches to problem solving, and then you should start focusing on stats and more machine learning fundamentals

formal lava Oct 18, 2021, 5:33 PM

#

mhm

desert oar Oct 18, 2021, 5:33 PM

#

it might be more satisfying to do a bit of hands-on playing around with tensorflow, then spend some time on fundamentals, then spend some more time playing around, etc.

formal lava Oct 18, 2021, 5:34 PM

#

so what do u mean by "playing around", im not sure I can build anything

median fulcrum Oct 18, 2021, 5:34 PM

#

AttributeError: module 'matplotlib.pyplot' has no attribute 'set_xlabel'

#

what

#

It has

#

https://matplotlib.org/stable/tutorials/text/text_intro.html

#

#

what I am missing?

desert oar Oct 18, 2021, 5:39 PM

#

@median fulcrum plt.xlabel, set_xlabel is for the underlying Axes object, e.g. plt.gca().set_xlabel

median fulcrum Oct 18, 2021, 5:40 PM

#

desert oar <@!758034911641862304> `plt.xlabel`, `set_xlabel` is for the underlying Axes obj...

but ,for example in this code I can't put the set_xlabal in the final

#

How I would write in that?

median fulcrum Oct 18, 2021, 5:41 PM

#

desert oar <@!758034911641862304> `plt.xlabel`, `set_xlabel` is for the underlying Axes obj...

btw I tried with ax

desert oar Oct 18, 2021, 5:41 PM

#

plt.hist(x='loan', data=credit_risk)
plt.xlabel('Loan ($)')

maybe like that? i don't use the plt api much

#

did it work with ax? that should work

bold timber Oct 18, 2021, 5:41 PM

#

quasi parcel try installing 0.18.2

Ok thank you

median fulcrum Oct 18, 2021, 5:41 PM

#

desert oar did it work with `ax`? that should work

don't work

median fulcrum Oct 18, 2021, 5:42 PM

#

median fulcrum btw I tried with ax

as you can see

desert oar Oct 18, 2021, 5:42 PM

#

what doesn't work about it?

#

i don't see the error or any output

bold timber Oct 18, 2021, 5:42 PM

#

how to fixed this?

desert oar Oct 18, 2021, 5:42 PM

#

https://matplotlib.org/stable/gallery/pyplots/fig_axes_labels_simple.html

median fulcrum Oct 18, 2021, 5:43 PM

#

desert oar ```python plt.hist(x='loan', data=credit_risk) plt.xlabel('Loan ($)') ``` maybe ...

median fulcrum Oct 18, 2021, 5:45 PM

#

desert oar i don't see the error or any output

#

:/

#

if someone know ping me pls

#

got it!

desert oar Oct 18, 2021, 6:11 PM

#

median fulcrum got it!

yeah this is what i had in mind. i thought plt.xlabel would do it but i guess i was wrong

median fulcrum Oct 18, 2021, 6:12 PM

#

desert oar yeah this is what i had in mind. i thought `plt.xlabel` would do it but i guess ...

I think is outdated that

#

idk

desert oar Oct 18, 2021, 6:12 PM

#

no i think i was just wrong

#

plt.xlabel appears to get the current label

median fulcrum Oct 18, 2021, 6:12 PM

#

but when I tried just got an error

desert oar Oct 18, 2021, 6:12 PM

#

maybe you can do plt.xlabel = ...? like i said, i don't use the pyplot api much anymore

median fulcrum Oct 18, 2021, 6:13 PM

#

It worked so I'm not gonna change that 😂

desert oar Oct 18, 2021, 6:13 PM

#

that's how i'd do it 99% of the time

#

you can also just do ax = plt.gca() to get the current axis object instead of plt.subplots, that's up to you

median fulcrum Oct 18, 2021, 6:13 PM

#

desert oar you can also just do `ax = plt.gca()` to get the current axis object instead of ...

oh

#

true

plain verge Oct 18, 2021, 7:10 PM

#

hi everyone

#

I was learning pandas

#

and I wonder if there is anyway to use DataFrame.loc with both list of columns and slice at the same time

#

like I want to have something like df.loc[:, ["Name", "LastName", "Age":"School", "Interest":]]

#

however this does not work

#

How can I do something like this?

rigid zodiac Oct 18, 2021, 7:12 PM

#

df.iloc[{'a','b'}]

#

that's how you select columns

plain verge Oct 18, 2021, 7:14 PM

#

isn't iloc for int indexes of columns?

rigid zodiac Oct 18, 2021, 7:14 PM

#

not really, like I use it to select a few column only

#

wait my mistakes

#

it's just df[{'a','b'}]

plain verge Oct 18, 2021, 7:16 PM

#

oh wait yeah
that works
but like then what's the purpose of loc?

#

also can I do slices with this?

#

like every column from Name to Age?

rigid zodiac Oct 18, 2021, 7:17 PM

#

iloc is when you want to choose a specific row to specific column

plain verge Oct 18, 2021, 7:19 PM

#

look

#

I wanna get these columns in this order
Name, LastName, Age, .... , School, Interest, ....

#

where those ... means anything in between

#

how can I do it in one line?

#

there is this syntax
df.loc[:, ["Name", "LastName", "Age"]]
and also there is this:
df.loc[:, "Age":"School"]
how can I use this both at the same time?

rigid zodiac Oct 18, 2021, 7:23 PM

#

for that it will be better to choose iloc
df.iloc[:3 , : ]
where the first one is for the amount of columns up to 3 (ie column 0 -> 3)

plain verge Oct 18, 2021, 7:23 PM

#

what if the data is changing in column positions?

#

then I need some way to get column indexes first

rigid zodiac Oct 18, 2021, 7:24 PM

#

you can use { }

#

so like

df.iloc[ {   } , { } ]

#

or just google

desert oar Oct 18, 2021, 7:25 PM

#

rigid zodiac it's just ```df[{'a','b'}]```

you're using a set here, that isn't normally necessary, nor is it different from a list

ripe forge Oct 18, 2021, 7:25 PM

#

desert oar maybe you can do `plt.xlabel = ...`? like i said, i don't use the pyplot api muc...

what do you use instead? curious

desert oar Oct 18, 2021, 7:25 PM

#

plain verge there is this syntax `df.loc[:, ["Name", "LastName", "Age"]]` and also there is ...

i don't think you can combine those, but you can try writing slice("Age", "School") instead of "Age":"School" - the : syntax is special syntax that expands to slice()

plain verge Oct 18, 2021, 7:26 PM

#

desert oar i don't think you can combine those, but you can try writing `slice("Age", "Scho...

I did, didn't work

desert oar Oct 18, 2021, 7:26 PM

#

ripe forge what do you use instead? curious

the object oriented api, so fig, ax = plt.subplots() ; ax.foo() ; ax.bar() ; plt.show()

ripe forge Oct 18, 2021, 7:26 PM

#

oh. gotcha

desert oar Oct 18, 2021, 7:26 PM

#

plain verge I did, didn't work

maybe it's just not supported by pandas. it would be convenient, i agree

#

you are trying to do something like this, right? df.loc[:, ["Name", slice("Age", "School")]]

steady cargo Oct 18, 2021, 7:28 PM

#

Hi, I have a question.
Suppose we want to use face recognition in a mobile application using Python language, what library do we use or how do we link these both together?

plain verge Oct 18, 2021, 7:28 PM

#

desert oar you are trying to do something like this, right? `df.loc[:, ["Name", slice("Age"...

yes exactly

desert oar Oct 18, 2021, 7:28 PM

#

yeah you'd have to write your own function to "expand" that

plain verge Oct 18, 2021, 7:31 PM

#

what if I do something like

df = pd.concat([df.loc[:, ["Name", "Age"]], df.loc[:, "Age":"School"]])

but then it doubles the rows each having some columns

#

oh wait wrong

#

this I mean

desert oar Oct 18, 2021, 7:32 PM

#

that's a good idea, you can generalize it like this:

def slice_columns(df, *column_specs):
    return pd.concat([df.loc[:, spec] for spec in column_spec], axis=1)

rigid zodiac Oct 18, 2021, 7:32 PM

#

plain verge what if I do something like ```py df = pd.concat([df.loc[:, ["Name", "Age"]], df...

cause you do age : school

plain verge Oct 18, 2021, 7:40 PM

#

desert oar that's a good idea, you can generalize it like this: ```python def slice_columns...

yeah something like this

#

I actually found something cool
I did

pd.merge([df.loc[:, ["Name", "Age"]], df.loc[:, "Age":"School"]])

however, the downside is that you can only do 1 slice or you need to have another merge inside
any better way to merge multiple objects instead of just 2 will fix this problem

desert oar Oct 18, 2021, 7:46 PM

#

yeah you still have to de-duplicate columns after, i think the pd.concat version lets you do that more easily

plain verge Oct 18, 2021, 7:47 PM

#

concat produces wrong set

desert oar Oct 18, 2021, 7:47 PM

#

the bad part about defining a new function is that you can't use : syntax anymore

#

wrong how?

plain verge Oct 18, 2021, 7:48 PM

#

it is like

Name    LastName    Age     School
Someone SomeLast    None    None
None    None        SomeAge SomeSchool

#

where there is actually only 1 entry
Someone SomeLast SomeAge SomeSchool

desert oar Oct 18, 2021, 7:49 PM

#

that seems odd, did you forget axis=1?

plain verge Oct 18, 2021, 7:51 PM

#

maybe

#

lemme try

#

oh wait now it works

#

I am like so confused about those axis things
like I have a software engineer kind backend
and I see all these multi-dimensional arrays as nested arrays
and it makes understand axis hard for me
I need to research more and get comfortable with it

desert oar Oct 18, 2021, 7:55 PM

#

it's good enough to think of a DataFrame as a collection of Series in a trenchcoat

#

the columns and index are labels for columns (1 column = 1 series) and rows, respectively

#

a Series has an index, those are element/row labels

#

all Series in a DataFrame share an index

#

the pandas docs don't have a clear explanation of this data model, so it's understandable if you don't get it right away

#

the index/columns thing itself is an interesting beast.. that's an instance of the Index class, which is array-like, but also acts like keys in a lookup table

#

(i'm not sure if internally it uses a b-tree index or hash index or something else, maybe it varies)

#

you can also have a multiindex, where each element is a conceptually tuple, but it also acts like a collection of individual Index array things

#

i've been wanting to write a guide to this stuff for months, but every time i try i find it very difficult to explain clearly, and very difficult to design a sensible learning path through it

#

i think everyone learns pandas by just stumbling around until things make sense 😆

plain verge Oct 18, 2021, 8:00 PM

#

my confusion kinda comes from numpy
anyway I am feeling pretty sleepy rn so I probably won't understand anything rn XD
so like I guess I should continue tomorrow
I'll also read your explanation and continue this discussion tomorrow
thanks for your help tho

desert oar Oct 18, 2021, 8:00 PM

#

you're welcome

plain verge Oct 18, 2021, 8:00 PM

#

desert oar i think everyone learns pandas by just stumbling around until things make sense ...

everything is basically like that

#

since this chat is not that active I guess I can find your message easily after even a day XD

desert oar Oct 18, 2021, 8:00 PM

#

true, although things with better docs tend to reduce the amount of "stumbling" needed

#

pandas has a lot of it

plain verge Oct 18, 2021, 8:00 PM

#

yeah

desert oar Oct 18, 2021, 8:01 PM

#

it can be very active sometimes. i tend to write things down in a note file and copy the discord message link

plain verge Oct 18, 2021, 8:01 PM

#

by the time I was talkin my chrome didn't load google for some reason so i couldn't google stuff so I could find concat and merge easier lol

plain verge Oct 18, 2021, 8:01 PM

#

desert oar it can be very active sometimes. i tend to write things down in a note file and ...

hmm ok I'll do that

#

alright I'll come back tomorrow bye

silver summit Oct 18, 2021, 8:03 PM

#

@desert oar hey salt, did you know how to unpack a list from a single row in a pyspark dataframe and return a series?

desert oar Oct 18, 2021, 8:11 PM

#

silver summit <@!389497659087650836> hey salt, did you know how to unpack a list from a single...

what do you mean by return a "series"? do you want to turn every array element into a separate column, or a separate row?

#

the latter has a built-in method, explode

#

the former i definitely had to do in the past but i don't remember how and i do remember it was ugly

silver summit Oct 18, 2021, 8:13 PM

#

separate row

#

@desert oar ayyyy, nice!! tyty

Screen_Shot_2021-10-18_at_1.15.59_PM.png

#

I'm not clear on when spark actually runs compuation. Transformations are just added to the execution graph but not actually run until results need to be returned to the driver. In this case, ignoring the show, does the explode count as a transformation?

desert oar Oct 18, 2021, 8:22 PM

#

silver summit I'm not clear on when spark actually runs compuation. Transformations are just ...

this is the eternally shitty part about spark. you kind of just have to know, and the docs don't really state it clearly. i don't believe explode triggers computation, but you might want to repartition if the results are "unbalanced" in size

silver summit Oct 18, 2021, 8:22 PM

#

I have so many jobs at work that just take days b/c ti wrote them shitty... really need to figure this all out.

#

using a lot of pandas udfs b/c I just dunno how to do stuff

desert oar Oct 18, 2021, 8:29 PM

#

if you post some specific examples i might be able to help

silver summit Oct 18, 2021, 8:31 PM

#

cool, ty!

#

do you work as ds or mle?

desert oar Oct 18, 2021, 8:31 PM

#

neither, currently. but ds in the past

silver summit Oct 18, 2021, 8:32 PM

#

oh, swe then?

desert oar Oct 18, 2021, 8:33 PM

#

yep, less work for more money!

silver summit Oct 18, 2021, 8:33 PM

#

haha, yeah I'm thinking that too

#

swapping over to MLE in the next 6mo or so, similar pay range as swe

#

coinbase throwing 400k at senior mle's right now 😮

desert oar Oct 18, 2021, 8:40 PM

#

i didn't know the numbers were that high. i also don't know how much i want to work at coinbase 😆

silver summit Oct 18, 2021, 8:42 PM

#

haha yeah for sure, but any big tech company will pay very well for MLE in general

jade acorn Oct 18, 2021, 8:49 PM

#

anyone knowledgeable on scipy stats cdf and pdf? for calculating p value of F statistic

silver summit Oct 18, 2021, 8:51 PM

#

@jade acorn most of these functions should return a tuple of the statistic and p

#

can you give an example?

jade acorn Oct 18, 2021, 8:53 PM

#

I need to calculate the bottom part manually (Prob > F) and i already have the F(2,247) value which i also calculated manually, The F-value is the Mean Square Model divided by the Mean Square Residual yielding F=3.48, The p-value associated with this F value is 0.0325

silver summit Oct 18, 2021, 8:55 PM

#

how did you define F?

jade acorn Oct 18, 2021, 8:57 PM

#

F = (ssreg/modeldf)/(ssres/resdf) , ssreg being SS model , modeldf being models degrees of freedom,ssres being SS residuals, and resdf is the residuals degrees of freedom

#

and SS is Sum of Squares

silver summit Oct 18, 2021, 8:57 PM

#

sure, this is from scipy?

jade acorn Oct 18, 2021, 8:57 PM

#

the above picture is from the program called Stata

silver summit Oct 18, 2021, 8:58 PM

#

oh

jade acorn Oct 18, 2021, 8:58 PM

#

i just want to know how to do it in python

#

this is from some python code i wrote, as u can see its almost the same but i just cant figure out the p value of the F

silver summit Oct 18, 2021, 8:59 PM

#

check scipy docs and sklearn docs a bit more, I'm certain it's there, my daughter just woke up from her second nap... gotta go get her, will be on much later today if you still need help

desert oar Oct 18, 2021, 9:16 PM

#

@jade acorn it's better to ask your entire question, don't wait for someone to interview you in order to figure out what you want

#

you know this by now, i think

#

and to answer the question, you would do something like this to get an object representing the F(2, 247) distribution:

import scipy.stats as sstats

dist = sstats.f(2, 247)

#

which you can then use any of the methods of rv_continuous as defined in https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.html

#

typically for an hypothesis test you want the "quantile function", the inverse cdf

#

scipy calls it the "percentage point function", ppf

#

so you might write stats.f(2, 247).ppf(3.475101) to get the p-value associated with the test statistic 3.475101

wide meadow Oct 18, 2021, 9:27 PM

#

Before feeding data to an algorithm, is it necessary to transform the features to normal distribution?

royal crest Oct 18, 2021, 9:36 PM

#

There’s also pingouin which is also quite comprehensive

#

!pypi pingouin

arctic wedgeBOT Oct 18, 2021, 9:36 PM

#

pingouin v0.4.0

Pingouin: statistical package for Python

modest timber Oct 18, 2021, 9:54 PM

#

Hi, I have question about inputs in LSTM . What should be this if I have 20 data for inputs in model.fit

#

Could it be input_shape(20), and later model.predict(array of 20)? ( sorry for my english)

desert oar Oct 18, 2021, 9:55 PM

#

wide meadow Before feeding data to an algorithm, is it necessary to transform the features t...

no, it isn't

desert oar Oct 18, 2021, 9:56 PM

#

royal crest There’s also pingouin which is also quite comprehensive

this looks useful, i haven't tried it yet. nice collection of things you'd normally want to use R for

desert bear Oct 18, 2021, 9:57 PM

#

Good evening, I built an algorithm that compares two methods of finding local minimum of given functions.
Gradient Descent and Newton's method.
I run the algorithms with the same parameters and the following results were achieved.
In the same number of iterations Gradient descent completed it faster (~4s) and got closer to the local minimum.
Newton's method computed in ~7.25s and got further from the optimum.

I though that newton's method would achieve better results in the same number of iterations. I mean, it is more computationally expensive, but still....
Does anyone have any thought on that?

#

I suspect that It depends on the step_size, parameter. Even though this results had same value of this parameter, but somehow I think that comparing results with this parameter equal in two runs seems not okay, but how to describe that to my teacher

desert oar Oct 18, 2021, 10:09 PM

#

@desert bear i think your inuition is correct, but can you show the code? i never thought of the newton-raphson method as having a configurable step size

#

is it not x1 = x0 + f'(x0) / f''(x0)?

desert bear Oct 18, 2021, 10:11 PM

#

desert oar <@!396745638605488150> i think your inuition is correct, but can you show the co...

Basically, I'm using algorithms given by my teacher

#

This Beta parameter is my step_size

desert oar Oct 18, 2021, 10:12 PM

#

normally i would set B_t to 1

#

that's the "theoretical" version

#

i think the point of using the hessian is that you can take fewer and bigger steps

desert bear Oct 18, 2021, 10:13 PM

#

Okay, I set that in one of my tests, and It found local minimum in one iteration 😮

desert oar Oct 18, 2021, 10:13 PM

#

yep! it is actually the "optimal" step size for a quadratic function

#

https://stats.stackexchange.com/a/395670/36229

Cross Validated

Do there exist adaptive step size methods for Newton-Raphson optimi...

Stochastic/Mini-batch gradient descent, caused by interest in deep learning, has made lots of advances in adaptive step sizes. For example, Adam, Nadam, Adamax, ..., are all improvements to the sta...

desert bear Oct 18, 2021, 10:14 PM

#

Okay, let me read about it, thanks

desert oar Oct 18, 2021, 10:14 PM

#

https://sites.stat.washington.edu/adobra/classes/536/Files/week1/newtonfull.pdf see also an example of its use for fitting maximum likelihood

modest timber Oct 18, 2021, 10:20 PM

#

how about my question? 😄

royal crest Oct 18, 2021, 10:36 PM

#

desert oar this looks useful, i haven't tried it yet. nice collection of things you'd norma...

i think it was the developers' core motive! they essentially went "R is great for all these, so why not having something equivalent in Python?"

#

so they built it upon pandas iirc, so it's cut down a lot of time for me doing stats

desert oar Oct 18, 2021, 10:37 PM

#

yep that appears to be an explicit goal, scipy stats + pandas + a lot of r-like convenience functions

royal crest Oct 18, 2021, 10:38 PM

#

validated against R equivalents too so that's one for reliability

robust jungle Oct 18, 2021, 10:38 PM

#

Anyone know how to get the output node names from an xception model?

green phoenix Oct 18, 2021, 10:40 PM

#

im trying to figure out how to make datapipe lines but everytime I do this I get the error that "income" is not in the dataset, can someone tell why?

desert oar Oct 18, 2021, 10:41 PM

#

@modest timber i'm not sure if i understand. can you be more specific? input_size is the number of features at each time step, not the length of the input sequence

#

https://stackoverflow.com/a/45023288/2954547
https://pytorch.org/docs/stable/generated/torch.nn.LSTM.html

Stack Overflow

Understanding a simple LSTM pytorch

import torch,ipdb
import torch.autograd as autograd
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable

rnn = nn.LSTM(input_size=10,

desert oar Oct 18, 2021, 10:41 PM

#

green phoenix im trying to figure out how to make datapipe lines but everytime I do this I get...

!paste this is too small to read, can you re-post as code, either using our paste site or a code block?

arctic wedgeBOT Oct 18, 2021, 10:41 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

desert oar Oct 18, 2021, 10:41 PM

#

include the error output as well

green phoenix Oct 18, 2021, 10:42 PM

#

!paste

#

how do i make a code block

royal crest Oct 18, 2021, 10:42 PM

#

!code

arctic wedgeBOT Oct 18, 2021, 10:42 PM

#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

desert oar Oct 18, 2021, 10:42 PM

#

☝️ read the box

#

same with the box under my post, !paste just generates the box with instructions

green phoenix Oct 18, 2021, 10:45 PM

#


url = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"

c_names = ["age", "workclass", "fnlwgt", "education", "education-num", "maritalstatus", "occupation", "relationship", "race", 
          "sex", "capital-gain", "capital-loss", "hours-per-week", "nativecountry", "income"]

df = pd.read_csv(url, names=c_names)

column_trans = make_column_transformer((
    OneHotEncoder(sparse=False), ["workclass", "occupation", "nativecountry"]),
    (LabelEncoder(), ["income"]),
    (OrdinalEncoder(categories=[' Preschool',' 1st-4th',' 5th-6th',' 7th-8th',' 9th',' 10th',' 11th',' 12th',' HS-grad', 
     ' Prof-school', ' Assoc-acdm', ' Assoc-voc', ' Some-college',' Bachelors',' Masters', ' Doctorate']),
     ["education"]), remainder="passthrough")

X = df.drop(["maritalstatus", "relationship", "race", "sex", "income"], axis="columns")
y = df.income

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

svmm = svm.SVC()

pipe = make_pipeline(column_trans, svmm)

scores = cross_val_score(pipe, X, y, scoring="accuracy",cv=5)

#

https://paste.pythondiscord.com/raw/xisehogule

#

idk if thats the right way

desert bear Oct 18, 2021, 10:47 PM

#

desert oar yep! it is actually the "optimal" step size for a quadratic function

Thanks a lot, you are extremely knowledgeable. These links are very useful. I decided to test both methods on different function (Rosenbrock function). Firstly I was nailed down, because Newton's method was jumping significantly, but then I found that it is correct behaviour (https://www.numerical-tours.com/matlab/optim_2_newton/).

One thing is not clear for me. How did the teacher want me to compare the results of both algorithms for the same parameters.
Newton's gives best result for step_size=1, but when gradient descent is fed with this parameter it produces points of coordinates' values 1e+20, basically it makes too big steps. It seems incomparable.

desert oar Oct 18, 2021, 10:49 PM

#

Maybe that's part of the exercise?

#

See how instructive it was to try the different sizes?

#

I bet if you tried gradient descent with the same step size, it would go all over the place

desert bear Oct 18, 2021, 10:57 PM

#

desert oar Maybe that's part of the exercise?

Yea, maybe, I will sum all this observations in my report. The teacher probably won't be happy, since he seems like a "do simple - simple means less reading for me". Thanks a lot for making me understand it more

modest timber Oct 18, 2021, 10:59 PM

#

@desert oari come here because I am kinda confused - I use list with 20 inputs signals - input_shape=(20,1)

#

but i got error

#

ValueError: Input 0 of layer sequential is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 1)

#

predicted_stock_price = model.predict(X_predict[0:20])

desert oar Oct 18, 2021, 11:02 PM

#

can you show the code for the model? what is the shape of X_predict?

modest timber Oct 18, 2021, 11:04 PM

#

i could sent you in file, because its some complex

#

with data i operate on

#

ok?

distant trout Oct 18, 2021, 11:12 PM

#

Hi, anyone could help me with descent gradient for summation function like in png. I only see explaination for x^2 function but i cannot find anywhere information how deal with it. Any ideas?

modest timber Oct 18, 2021, 11:15 PM

#

 for i in range(0,10):
        a= X_predict[i:20+i]
        print(a.shape)
        predicted_stock_price = model.predict(a)

#

shape = 20,1

#

model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1],1)))

#

shape = 20,1

#

 ValueError: Input 0 of layer sequential is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: (None, 1)

tender hearth Oct 19, 2021, 12:02 AM

#

grave frost wav2vec2 if you don't wanna mess around with SOTA stuff. otherwise, waveNet is p...

I'm thinking of splitting the waveform into 800ms windows, running a dilated causal convolution on each independently, and then averaging the outputs

#

What do you think?

desert oar Oct 19, 2021, 12:03 AM

#

modest timber model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1],1)...

it is expecting a 3-dimensional input, but you gave it 2

#

that is, each time step should be [x1, ..., x20]

#

https://stats.stackexchange.com/a/277169/36229 @modest timber

Cross Validated

Understanding input_shape parameter in LSTM with Keras

I'm trying to use the example described in the Keras documentation named "Stacked LSTM for sequence classification" (see code below) and can't figure out the input_shape parameter in the context of...

modest timber Oct 19, 2021, 12:23 AM

#

desert oar that is, _each time step_ should be `[x1, ..., x20]`

I dont understand, you show me list with 1-dimension, but i need 3,

#

Maybe i need to add simply the 1 value, and should have shape 20,1,1

granite flame Oct 19, 2021, 12:29 AM

#

hi, can sequential API handle inputs of nonlinear relationship in my case i have input variables as flow rate and temperature?

onyx drum Oct 19, 2021, 1:25 AM

#

How do I write a huge data list (say with 10 million data points) into a .txt file? When I do it for a few hundred points, it's fine, but for millions, it stores it as a "1.2 3.4 1.6 ... ... 1.4 1.2] in the shortened form.

I tried writing it element by element, but the for loop makes it slow. Any way to directly write the whole list?

royal crest Oct 19, 2021, 1:27 AM

#

!d numpy.savetxt

arctic wedgeBOT Oct 19, 2021, 1:27 AM

#

numpy.savetxt


numpy.savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='# ', encoding=None)```
Save an array to a text file.

royal crest Oct 19, 2021, 1:27 AM

#

this could be useful

#

though i have not benchmarked it myself

onyx drum Oct 19, 2021, 1:35 AM

#

Aha, thanks!

quasi parcel Oct 19, 2021, 2:03 AM

#

i have created a recommendation engine can anyone go through it please

#

and let me know if its correct or not

wicked grove Oct 19, 2021, 2:18 AM

#

Hello, can i combine 2 image datasets together for multi class classification and train them with a cnn model?

plain verge Oct 19, 2021, 4:27 AM

#

desert oar i've been wanting to write a guide to this stuff for months, but every time i tr...

yeah
the thing as much as someone explains, for some stuff you'll understand better when u know how it works
pandas is a complex library, but I guess it's possible to figure out how it works in a high level
that way it helps me understand better
like now I know that those are series, but I don't know where that axis goes and what it does
in the docs it only says the axis of operation and nothing else
I'll figure it out by stumbling around easier than to try to find some form of article describing it

glossy moth Oct 19, 2021, 4:31 AM

#

Hi all- really dumb question:
I am using matplot and I have a 3x14 table. For column 3, my title is significantly longer than columns 1 and 2, and unfortunately if I leave it the text inside the table becomes unreadable. Changing text size simply scales the column titles too so the problem remains. If I shorten the column 3 title, the issue resolves. Is there any way to resolve this without shortening my header?

pearl beacon Oct 19, 2021, 5:52 AM

#

Hi! I'm trying to make a speech recognition script for a personal project, and I've decided on mozilla deepspeech. My problem is, I don't really understand the audio handling, and I want to remove the VAD feature from this script so I can manually control when it records:
https://github.com/mozilla/DeepSpeech-examples/tree/r0.9/mic_vad_streaming

GitHub

DeepSpeech-examples/mic_vad_streaming at r0.9 · mozilla/DeepSpeech-...

Examples of how to use or integrate DeepSpeech. Contribute to mozilla/DeepSpeech-examples development by creating an account on GitHub.

brave sparrow Oct 19, 2021, 6:03 AM

#

hi guys

#

how do i make not 1 output but two for this code

lone drum Oct 19, 2021, 6:24 AM

#

Hello I have a dataframe in which one column has 'CE' and 'PE' values in that column
I have to separate this column based on these values
For eg
'CE' values are saved in different data frame and
'PE' values saved in another data frame
Ping me when replying

#

My code

df_chunks = pd.read_csv(f'{input_path}{input_file}{extension}' , engine='python',  chunksize=500000, names=['Msgtype', 'Activity Type', 'Transaction Time', 'script_name', 'expiry', 'strike_price', 'call/put', 'Exchange', 'Token', 'Buy/Sell', 'Buy Order Number', 'Sell Order Number', 'Price', 'Qty', 'price_in_rupees', 'lot'])
i=0
for chunk in df_chunks['call/put']:
    print('chunk')
    print(chunk)
    # for i in chunk['call/put']:
    #     put_val = chunk.loc[chunk['call/put'] == 'PE']
    #     call_val = chunk.loc[chunk['call/put'] == 'CE']
    #     print(put_val)            
    #     print(call_val)
    #     put_val.to_csv(f'{output_path}{output_file_put}{extension}', index =False, header = None, mode = 'a')
    #     call_val.to_csv(f'{output_path}{output_file_call}{extension}', index =False, header = None, mode = 'a')
    #     break
    # break

exotic coral Oct 19, 2021, 7:45 AM

#

def data_type_format(data, indexes):
    "remove the header row and convert all the columns to type float"
    headless = data[1:, :]
    # sinker = headless[:, indexes]
    # floater = np.delete(headless, indexes, 1)
    # sunk = sinker.astype('<U30')
    # floated = floater.astype(float)
    indices = np.arange(9)
    mask = np.delete(indices, indexes, 0)
    mask_list = list(mask)
    a =  headless[:, indexes].astype('<U30')
    b = headless[:, mask_list].astype(float)
    headless.astype(object)
    # answer = np.concatenate((sunk, floated), axis=1)
    # np.sort(answer)
    # for index in data:
    #     answer.append(tuple(index))
    return headless

#

I'm not sure how to get two different dtypes in the same array

hard pelican Oct 19, 2021, 9:01 AM

#

Hey,
In pandas, I have time, value and mark_upcoming_change columns, I want to calculate the amount of time a column was on a specific value, as seen here

#

right column is the one I want to calculate

obsidian bramble Oct 19, 2021, 9:44 AM

#

heya\

#

i wanted to integrate alarm cllock system in an virtual assistant

#

how do i do

#

?

serene scaffold Oct 19, 2021, 11:47 AM

#

obsidian bramble i wanted to integrate alarm cllock system in an virtual assistant

this isn't really a data science question

obsidian bramble Oct 19, 2021, 11:59 AM

#

serene scaffold this isn't really a data science question

then tell where should i ask i m a begginer

#

idk

#

wht is the difference between a and data sceince

lone drum Oct 19, 2021, 12:05 PM

#

Hey stelercus
When I write in CSV file using pandas some of columns are not completely filled

#

the highlighted part is getting empty

#

Can u please help me to understand this?

#

Why I am getting this way

#

In my original data file i have complete data

serene scaffold Oct 19, 2021, 12:17 PM

#

@lone drum I would need to see the original CSVs (no screenshots) and the code you used to create this table (no screenshots). Please ping me if you provide that.

odd meteor Oct 19, 2021, 12:32 PM

#

lone drum Can u please help me to understand this?

Send a screenshot of how it turned out on Pandas. So people can easily understand the error message or what went wrong

serene scaffold Oct 19, 2021, 12:35 PM

#

odd meteor Send a screenshot of how it turned out on Pandas. So people can easily understan...

Please don't ask for screenshots of pandas stuff as it is prohibitively difficult for people to replicate data in screenshots.

desert oar Oct 19, 2021, 12:36 PM

#

plain verge yeah the thing as much as someone explains, for some stuff you'll understand bet...

the "axis of operation" is a numpy concept. you can think of it as the "the axis that is consumed" when performing an operation.

this is easy to see with an "aggregation" operation like DataFrame.sum:

>>> df = DataFrame({'a': [1,2,3], 'b': [4,5,6]})
>>> df.sum(axis=0)  # axis='index'
a     6
b    15
dtype: int64

this means "apply the .sum operation by iterating over the 0th axis (the index).

DataFrame.apply is a bit tricker:

>>> df = DataFrame({'a': [1,2,3], 'b': [4,5,6]})
>>> add_one = lambda y: y + 1
>>> df.apply(add_one, axis=0)  # axis='index'
   a  b
0  2  5
1  3  6
2  4  7

the add_one function is applied to each column, thereby "consuming" the entire index for each column

lone drum Oct 19, 2021, 12:46 PM

#

serene scaffold <@!680099760836968475> I would need to see the original CSVs (no screenshots) an...

How I can show u original CSV?
It is 32 gb in size

royal crest Oct 19, 2021, 12:46 PM

#

just grab the first 10 rows?

plain verge Oct 19, 2021, 12:49 PM

#

desert oar the "axis of operation" is a numpy concept. you can think of it as the "the axis...

yeah actually I found it in the docs, it said concat over axis 0 will concat indexes but axis 1 will concat columns
I guess that also applies to any other function
thanks for the info

serene scaffold Oct 19, 2021, 12:51 PM

#

lone drum How I can show u original CSV? It is 32 gb in size

pandas might be the wrong tool if you're dealing with that much data.

arctic wedgeBOT Oct 19, 2021, 1:00 PM

#

Hey @lone drum!

It looks like you tried to attach file type(s) that we do not allow (.xlsx). We currently allow the following file types: .gif, .jpg, .jpeg, .mov, .mp4, .mpg, .png, .mp3, .wav, .ogg, .webm, .webp, .flac, .m4a.

Feel free to ask in #community-meta if you think this is a mistake.

lone drum Oct 19, 2021, 1:00 PM

#

serene scaffold pandas might be the wrong tool if you're dealing with that much data.

How I can share you sample data for same

worthy crystal Oct 19, 2021, 1:01 PM

#

hello I have a question if my model is overfitting or not

#

I am doing CAE

#

does this seem like its overfitting? I see gap its "big" in a way but it is only 0.0020 difference between them

#

should I support that is overfitting or not?

#

axis X is epochs

desert oar Oct 19, 2021, 1:03 PM

#

lone drum How I can share you sample data for same

!paste use this site 👇

arctic wedgeBOT Oct 19, 2021, 1:03 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

plain verge Oct 19, 2021, 1:07 PM

#

hi everyone

#

I have a weird data that when I import to pandas using json_normalize loads very inefficienty
simple format of the json is

[
 {
  "id": 12342,
  "type": "node",
  "tags": [ "amenity":"table", "size":2 ]
 }
]

id and type are always there
however tags can be empty, have any amount of k/v pairs, and there are total of 110,000 possible keys (eg. amentiy)

#

How can I properly import this to a pandas DataFrame with efficient access to tags?

desert oar Oct 19, 2021, 1:29 PM

#

plain verge I have a weird data that when I import to pandas using json_normalize loads very...

[ "amenity":"table", "size":2 ] isn't valid syntax in either python or json. did you mean { "amenity":"table", "size":2 }?
i would load it like this to start:

data = [
 {
  "id": 12342,
  "type": "node",
  "tags": { "amenity":"table", "size":2 }
 },
 {
  "id": 93823,
  "type": "node",
  "tags": {}
 }
]

df = pd.DataFrame(data)

      id  type                             tags
0  12342  node  {'amenity': 'table', 'size': 2}
1  93823  node                               {}

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   id      2 non-null      int64
 1   type    2 non-null      object
 2   tags    2 non-null      object
dtypes: int64(1), object(2)
memory usage: 176.0+ bytes

plain verge Oct 19, 2021, 1:30 PM

#

desert oar `[ "amenity":"table", "size":2 ]` isn't valid syntax in either python or json. d...

yeah that was a typo

#

hmm

#

but then tags are dictionary

#

oh wait

desert oar Oct 19, 2021, 1:30 PM

#

right, efficient depends heavily on what you're trying to do

plain verge Oct 19, 2021, 1:30 PM

#

so basically it will be a dataframe inside another dataframe right?

desert oar Oct 19, 2021, 1:31 PM

#

no, it's literally a dict in each element of the tags column

plain verge Oct 19, 2021, 1:31 PM

#

what if I convert that dict to a dataframe first?

desert oar Oct 19, 2021, 1:31 PM

#

you can, but why would you?

#

that'd be way more confusing imo

plain verge Oct 19, 2021, 1:31 PM

#

maybe since dataframe is faster than dict?

desert oar Oct 19, 2021, 1:31 PM

#

faster for what?

plain verge Oct 19, 2021, 1:32 PM

#

I need to run a lot of search stuff on the tags part
so I may as well need to use some dataframe features other than performance

quasi parcel Oct 19, 2021, 1:33 PM

#

correct me if i am wrong cant we use flatten_json methon here @desert oar

plain verge Oct 19, 2021, 1:33 PM

#

how can I do that fast?
like I don't wanna run a for loop doing df["tags"] = pd.DataFrame(js[i]["tags"])

desert oar Oct 19, 2021, 1:34 PM

#

quasi parcel correct me if i am wrong cant we use flatten_json methon here <@!389497659087650...

there are a lot of ways to manipulate data like this! but without some specific use case, it's all guesswork

desert oar Oct 19, 2021, 1:34 PM

#

plain verge I need to run a lot of search stuff on the tags part so I may as well need to us...

it sounds like you want to "explode" these into key-value pairs or something?

plain verge Oct 19, 2021, 1:35 PM

#

desert oar it sounds like you want to "explode" these into key-value pairs or something?

I basically wanna store them as dataframe instead of dict
how can I make it do it without running a for loop
is there like anything builtin?

desert oar Oct 19, 2021, 1:35 PM

#

and i am asking you how the dataframe needs to look

#

give an example

plain verge Oct 19, 2021, 1:35 PM

#

ok

serene scaffold Oct 19, 2021, 1:36 PM

#

print(df.head().to_csv())

#

@lone drum ^

plain verge Oct 19, 2021, 1:37 PM

#

the dataframe would be just the normal thing
like we have this parent dataframe called pf
and pf["tags"] is another dataframe with simple format like Columns: key and value
for example for my example there will be 2 rows:

key       value
amenity   table
size      2

#

I want something like this

silver summit Oct 19, 2021, 1:38 PM

#

anyone know how to explode a bytearray? I have a bytearray in a single row of a pyspark dataframe and I need to get each value of the bytearray into a new row, the error I'm getting is

AnalysisException: cannot resolve 'explode(content)' due to data type mismatch: input to function explode should be array or map type, not binary;
'Project [explode(content#131) AS List()]
+- Relation [path#128,modificationTime#129,length#130L,content#131] binaryFile

#

command I'm using is just df.select(F.explode('content'))

quasi parcel Oct 19, 2021, 1:40 PM

#

plain verge the dataframe would be just the normal thing like we have this parent dataframe ...

can you give a complete dataframe example please

plain verge Oct 19, 2021, 1:40 PM

#

quasi parcel can you give a complete dataframe example please

ok

desert oar Oct 19, 2021, 1:41 PM

#

id  type  key      value
 1  node  amenity  table
 1  node  size     2
 2  node  size     large

like this?

plain verge Oct 19, 2021, 1:42 PM

#

this will work, but itsn't it bad to have a row duplicated several times?

#

I was thinking of this format

#

   id   type         tags
0  1234 node         <pd.DataFrame object>
1  8897 way          <pd.DataFrame object>

where that object has this format

   key     value
0  amenity table
1  size    2

desert oar Oct 19, 2021, 1:43 PM

#

it's not really bad, don't over-optimize. if the id is the index, it won't be duplicated. you can keep the "metadata" separate from the "tags" if you want

id  key      value
 1  amenity  table
 1  size     2
 2  size     large

id  type
 1  node
 2  node

plain verge Oct 19, 2021, 1:44 PM

#

plain verge ``` id type tags 0 1234 node <pd.DataFrame object> 1 8897...

what about this?

desert oar Oct 19, 2021, 1:45 PM

#

you can do that, but i don't recommend it

plain verge Oct 19, 2021, 1:46 PM

#

desert oar it's not really bad, don't over-optimize. if the `id` is the index, it won't be ...

I guess I should try these
the thing is I have over 100 thousand of rows, which I want to run some analysis on, fast, without using that much memory
so I am trying to optimize it as much as possible
also I guess I should finish my pandas tutorial vid before continuing on

plain verge Oct 19, 2021, 1:46 PM

#

desert oar you can do that, but i don't recommend it

what's the reason just curios

desert oar Oct 19, 2021, 1:46 PM

#

a dataframe inside each element is not really better than a dict in each element, in that pandas has to loop slowly over the series of dataframes

#

don't fall into the trap of "pandas fast, more pandas more fast"

plain verge Oct 19, 2021, 1:47 PM

#

hmmm I see

#

I almost fell for that

desert oar Oct 19, 2021, 1:47 PM

#

dict lookups will be faster than dataframe lookups in most cases anyway

#

so I am trying to optimize it as much as possible
have you heard the quote, "premature optimization is the root of all evil"?

#

that said, exploding this to a key-value format like i described would ~~probably~~ possibly make it easier to work with

#

it really depends on what kinds of operations you're trying to do

plain verge Oct 19, 2021, 1:48 PM

#

desert oar > so I am trying to optimize it as much as possible have you heard the quote, "p...

yeah, I am done with the code, which has no numpy/pandas in it
decided to learn pandas to optimize it

plain verge Oct 19, 2021, 1:49 PM

#

desert oar that said, exploding this to a key-value format like i described would ~~probabl...

I also saw this a lot on google
I'll try it

desert oar Oct 19, 2021, 1:49 PM

#

plain verge yeah, I am done with the code, which has no numpy/pandas in it decided to learn ...

it might be instructive to post the non-pandas version

#

!paste

arctic wedgeBOT Oct 19, 2021, 1:49 PM

#

Pasting large amounts of code

If your code is too long to fit in a codeblock in discord, you can paste your code here:
https://paste.pythondiscord.com/

After pasting your code, save it by clicking the floppy disk icon in the top right, or by typing ctrl + S. After doing that, the URL should change. Copy the URL and post it here so others can see it.

silver summit Oct 19, 2021, 1:50 PM

#

I can use conv to convert the bytearray to a value, but need to map the convert to each element then explode somehow

Screen_Shot_2021-10-19_at_6.50.32_AM.png

plain verge Oct 19, 2021, 1:50 PM

#

desert oar it might be instructive to post the non-pandas version

I have it on github
but it's not as simple
I'm also trying to implement a data engine, which happened to be super slow by my own
I used that in most of my code
others are just for loops basically

#

so

#

I can tell you more details

#

I wanna be able to:

#

get all rows that have a specific key <and any other>

#

get all rows with a specific k/v pair

#

get all rows that have at least one key

#

search by regular expression on key names and get all that match

#

nothing else

desert oar Oct 19, 2021, 1:51 PM

#

silver summit I can use conv to convert the bytearray to a value, but need to map the convert ...

my strategy for this stuff is: write a naive udf to do the conversion, then figure out a way to use spark-isms later. so df.withColumn('parsed_content', parse(F.col('content'))) where parse is your udf that returns a list/array, which can be exploded

plain verge Oct 19, 2021, 1:52 PM

#

desert oar my strategy for this stuff is: write a naive udf to do the conversion, then figu...

what's spark isms?

desert oar Oct 19, 2021, 1:53 PM

#

plain verge what's spark isms?

that was a response to the other user. they are asking about Apache Spark

silver summit Oct 19, 2021, 1:53 PM

#

@desert oar yeah that sounds reasonable, be back in a couple hours to review this

plain verge Oct 19, 2021, 1:53 PM

#

oh I didn't realize bruh

plain verge Oct 19, 2021, 2:01 PM

#

desert oar ``` id type key value 1 node amenity table 1 node size 2 2 n...

so how can I actually do this?
what's the function/way?

worthy crystal Oct 19, 2021, 2:07 PM

#

is it okay that the val loss and the same with the loss ?

#

I know this is not overfitting

#

but it is another problem?

desert oar Oct 19, 2021, 2:10 PM

#

plain verge I wanna be able to:

!eval ```python
import pandas as pd

data = [
{ "id": 12342, "type": "node", "tags": { "amenity":"table", "size":2 } },
{ "id": 93823, "type": "node", "tags": { "color":"blue" } },
]

id_column = 'id'
tag_column = 'tags'
meta_columns = ['type']

df = pd.DataFrame(data).set_index('id')

tag_kv_pairs is a Series of tuples: (tagName, tagValue)

tag_kv_pairs = (
df[tag_column]
.map(lambda kv: list(kv.items()))
.explode()
)
df_tags = pd.DataFrame(
tag_kv_pairs.tolist(),
index=tag_kv_pairs.index,
columns=['key', 'value'],
)

df_meta = df[meta_columns].copy()
del df

print(df_meta)
print(df_tags)

arctic wedgeBOT Oct 19, 2021, 2:10 PM

#

@desert oar :white_check_mark: Your eval job has completed with return code 0.

001 |        type
002 | id         
003 | 12342  node
004 | 93823  node
005 |            key  value
006 | id                   
007 | 12342  amenity  table
008 | 12342     size      2
009 | 93823    color   blue

plain verge Oct 19, 2021, 2:13 PM

#

oh so there is explode() function
thanks for your help I really appreciate it

worthy crystal Oct 19, 2021, 2:14 PM

#

worthy crystal is it okay that the val loss and the same with the loss ?

sorry again I wanted to add is this the scenario of prefect fitting?

quasi parcel Oct 19, 2021, 2:22 PM

#

i dont know how to handle customer_id with null

#

one moment

#

let me share the data

#

https://paste.pythondiscord.com/omixaxecay.apache this the data

#

i dont know how to handel customer_id with null or 0

#

can anyone help me with that

desert oar Oct 19, 2021, 2:30 PM

#

how do you want to handle it? @quasi parcel

desert oar Oct 19, 2021, 2:31 PM

#

worthy crystal but it is another problem?

this would make me suspicious. how big are the validation and training sets? what kind of ML task is this? what is the data like? is it highly imbalanced?

#

you might want to manually input some made-up data to see what the predictions are, make sure they make sense

worthy crystal Oct 19, 2021, 2:32 PM

#

desert oar this would make me suspicious. how big are the validation and training sets? wha...

EMNIST balanced dataset

quasi parcel Oct 19, 2021, 2:32 PM

#

so i am building a recommendation engine so is it okay to create separate df to which i keep all the customer_ids of null there

#

?

worthy crystal Oct 19, 2021, 2:32 PM

#

I created noise input

desert oar Oct 19, 2021, 2:32 PM

#

worthy crystal EMNIST balanced dataset

oh, i'd be less worried then

worthy crystal Oct 19, 2021, 2:32 PM

#

and it clear it

#

desert oar Oct 19, 2021, 2:33 PM

#

i don't know what SotA numbers are, but you should always be suspicious if your DIY thing is beating or coming close to SotA

#

otherwise it's probably okay? i'm not much of an image classification expert

worthy crystal Oct 19, 2021, 2:33 PM

#

ahh okayy thank you for your help!!!!!

quasi parcel Oct 19, 2021, 2:52 PM

#

can you suggest me a way or its too much to ask @desert oar

desert oar Oct 19, 2021, 2:55 PM

#

quasi parcel so i am building a recommendation engine so is it okay to create separate df to ...

i'm not sure what you mean by that either, sorry

grave frost Oct 19, 2021, 3:03 PM

#

tender hearth I'm thinking of splitting the waveform into 800ms windows, running a dilated cau...

you'd be losing too much contextual information, and overlapped samples won't help. what's your use-case?

desert oar Oct 19, 2021, 3:04 PM

#

are you doing user-user or user-item collaborative filtering? you might need to have 2 separate recommendation models: one that uses customer ids, and one that doesnt. then you ensemble them together when customer id is present, and you use the non-id model otherwise. i've done things like that before (although not specifically in the case of customer ids and recommendations)

#

@quasi parcel ☝️

quasi parcel Oct 19, 2021, 3:13 PM

#

ohh okay

#

i think i got some clarity thanks

#

if i have any doubts

#

i will ask again

robust jungle Oct 19, 2021, 3:19 PM

#

does anyone know how I can get output node names from a keras model?

delicate tree Oct 19, 2021, 3:21 PM

#

there is the mysql.connector thing for ur responses in python to mysql is there something like that for csv files

sacred narwhal Oct 19, 2021, 4:38 PM

#

https://pytorch.org/tutorials/beginner/basics/data_tutorial.html

#

is this a good tutorial to learn pytorch

fluid pebble Oct 19, 2021, 4:47 PM

#

hi everyone

#

i am new to python and i have developed a program to spot differences between 2 images but it is too many differences even if there is no difference in images. kindly help me with that

robust jungle Oct 19, 2021, 4:49 PM

#

sure

#

can you send

#

the code

#

the example images

#

and the output?

silver summit Oct 19, 2021, 4:49 PM

#

what's your distance metric?

fluid pebble Oct 19, 2021, 4:49 PM

#

robust jungle sure

okay i am sending give me 2 min

silver summit Oct 19, 2021, 4:50 PM

#

If you're comparing the pixel values in the image you'll have a tough time. You will need some function that says these are "close enough" to be considered the same.

fluid pebble Oct 19, 2021, 4:51 PM

#

well i think the program is comparing pixel wise because the images have ti be same size

#

is it true??

silver summit Oct 19, 2021, 4:52 PM

#

you're are talking about 2 differenet things, size != pixel value

#

sure, you should probably make sure the dimensions line up, but the real challenge is how you compare the images

fluid pebble Oct 19, 2021, 4:53 PM

#

should i send you the code?

silver summit Oct 19, 2021, 4:54 PM

#

post it on github

#

link here

fluid pebble Oct 19, 2021, 4:54 PM

#

okay

silver summit Oct 19, 2021, 4:54 PM

#

I can spend like 10min looking at it now or can review it later. Lot of smart ppl here to help however.

fluid pebble Oct 19, 2021, 4:55 PM

#

okay but just one thing

stiff inlet Oct 19, 2021, 4:55 PM

#

hey guys

#

can i use some help?

heavy sail Oct 19, 2021, 4:56 PM

#

Hi, I'm trying to:

silver summit Oct 19, 2021, 4:57 PM

#

@heavy sail reduce the equation, you have x on top and bottom, same with y, what do you have?

fluid pebble Oct 19, 2021, 4:57 PM

#

silver summit I can spend like 10min looking at it now or can review it later. Lot of smart p...

the main task which i have to perform is that " i have to compare two images and the should be able to tell the differences in them for eg if there is a text missing or is there any spot in any of the image,"

heavy sail Oct 19, 2021, 4:58 PM

#

could you help me on #help-ramen @silver summit ?

robust jungle Oct 19, 2021, 4:59 PM

#

fluid pebble the main task which i have to perform is that " i have to compare two images and...

does it have to be able to tell what about the image is different

#

or that it is different at all

silver summit Oct 19, 2021, 4:59 PM

#

@fluid pebble ok, keep in mind that the wording sounds straight forward but this can be very complicated... if you're saying you have to pick out words embedded in the image... well

#

need like ocr for that, but I'm willing to bet your task is much simpler than that

silver summit Oct 19, 2021, 5:00 PM

#

heavy sail could you help me on <#828686085395316746> <@!717074747674066975> ?

ok

heavy sail Oct 19, 2021, 5:00 PM

#

thanks

fluid pebble Oct 19, 2021, 5:01 PM

#

robust jungle does it have to be able to tell what about the image is different

the program can tell if there is any difference in one or both of the image

robust jungle Oct 19, 2021, 5:01 PM

#

take this with a grain of salt:
find synonyms for height
search for those
find a number following it
search for what unit it's using

robust jungle Oct 19, 2021, 5:02 PM

#

fluid pebble the program can tell if there is any difference in one or both of the image

do you have to modify something preexisting, or make your own?

#

or are they both options

fluid pebble Oct 19, 2021, 5:02 PM

#

robust jungle do you have to modify something preexisting, or make your own?

no no its not like that

#

just comparison of 2 images

robust jungle Oct 19, 2021, 5:03 PM

#

yes, im talking about the program to do it

fluid pebble Oct 19, 2021, 5:03 PM

#

one is orignal and the other is new one

robust jungle Oct 19, 2021, 5:03 PM

#

are you being given a program and being told to modify it in some way

#

or

#

are you being given a task as above and being told to make something to do it

fluid pebble Oct 19, 2021, 5:04 PM

#

robust jungle are you being given a program and being told to modify it in some way

well i got a program from youtube and made change according to result

robust jungle Oct 19, 2021, 5:04 PM

#

alright

fluid pebble Oct 19, 2021, 5:04 PM

#

robust jungle are you being given a task as above and being told to make something to do it

yes i have been assigned a task by my boss

#

at office

robust jungle Oct 19, 2021, 5:05 PM

#

im a newbie to this so I haven't tried many things, but I know that one way it could work is with keras

#

since I have a program that can do something similar to that in theory

#

my idea:

#

use transfer learning

fluid pebble Oct 19, 2021, 5:06 PM

#

can you share

robust jungle Oct 19, 2021, 5:06 PM

#

data augmentation if you wanted to ignore something

#

basically

#

use that image you have as a dataset to train it off of

#

input the 2nd image

#

get output

#

might work might not

#

ill test it gimme a second

fluid pebble Oct 19, 2021, 5:08 PM

#

i have started working on python only a week ago

robust jungle Oct 19, 2021, 5:08 PM

#

og

#

oh

#

brb

fluid pebble Oct 19, 2021, 5:08 PM

#

okay

silver summit Oct 19, 2021, 5:09 PM

#

this isn't a training and testing problem

#

it's just a math problem

#

if you need to pick out parts of the image, this is called semantic segmenation

#

if this is the case just use some off the shelf image models and maybe ocr to pick out text, you should not have to train anything or build any models

robust jungle Oct 19, 2021, 5:11 PM

#

neat

fluid pebble Oct 19, 2021, 5:12 PM

#

i have used ocr also but the accuracy of ocr was really bad

silver summit Oct 19, 2021, 5:12 PM

#

You need to define the problem in much more detail for us to help

fluid pebble Oct 19, 2021, 5:13 PM

#

okay i will define deeply

lapis sequoia Oct 19, 2021, 5:14 PM

#

how do i learn ai without going to youtube

silver summit Oct 19, 2021, 5:14 PM

#

for example, if you say the images need to be the same do you mean exactly? like pixel for pixel? or can one be stretched a bit, rotated or filpped and still be the same? can it have a bit of noise on it (like some small percentage of the pixels are different) and it still be the same? can it have a bit of text on it but the image is still the same etc

#

also what is the context? this is for work but how will it be used? business context, timelines etc etc

desert oar Oct 19, 2021, 5:17 PM

#

"bag of words", not "word of bags" 🙂 the idea is that the order of the words in the document is ignored, so it's like you took all the words, dumped them into a bag, and shook the bag around.

as for your actual question: you might want to read this book chapter https://web.stanford.edu/~jurafsky/slp3/23.pdf from Speech and Language Processing, a currently-in-progress book (homepage is here https://web.stanford.edu/~jurafsky/slp3/)

fluid pebble Oct 19, 2021, 5:17 PM

#

silver summit for example, if you say the images need to be the same do you mean exactly? like...

i am giving scan of two images one is an 'ideal image' and the other is 'difference image'. i need to develop a program two compare 'ideal image with the difference image'. the result i want to get is the program to tell difference in text, in color and if the text is missing,

fluid pebble Oct 19, 2021, 5:17 PM

#

silver summit also what is the context? this is for work but how will it be used? business con...

its for business purpose

desert oar Oct 19, 2021, 5:17 PM

#

OCR sounds like a good start for the "text" part

silver summit Oct 19, 2021, 5:18 PM

#

yup

desert oar Oct 19, 2021, 5:18 PM

#

you can probably solve that problem without heavy-duty "machine learning"

silver summit Oct 19, 2021, 5:18 PM

#

color is harder to figure out

#

you may just use a histogram of rgb values

desert oar Oct 19, 2021, 5:18 PM

#

e.g. something like levenshtein distance on the OCR'ed text (maybe something word-based and not character-based),

fluid pebble Oct 19, 2021, 5:18 PM

#

lets omit color

silver summit Oct 19, 2021, 5:19 PM

#

compare the histogram distributions, this should be pretty straight forward

desert oar Oct 19, 2021, 5:19 PM

#

you might need to put in some kind of adjustments to account for the scanning (?) process

fluid pebble Oct 19, 2021, 5:19 PM

#

just missing of text and if there is a spot in difference image against ideal image

desert oar Oct 19, 2021, 5:20 PM

#

i wonder how this works, it seems kind of like what you're asking for?

#

https://deepai.org/machine-learning-model/image-similarity

DeepAI

Image Similarity

Image Similarity compares two images and returns a value that tells you how visually similar they are. The lower the the score, the more contextually similar the two images are with a score of '0' being identical. Sifting through datasets looking for duplicates or finding a visually similar set of images can be painful - so let computer vision d...

#

are you trying to match up receipts or something?

silver summit Oct 19, 2021, 5:20 PM

#

omg receips fucking sucks....

desert oar Oct 19, 2021, 5:20 PM

#

heh, apparently expensify just has people manually enter receipts

fluid pebble Oct 19, 2021, 5:20 PM

#

desert oar are you trying to match up receipts or something?

i am trying to match medicine packaging components

silver summit Oct 19, 2021, 5:20 PM

#

I've tried this before at work... so bad

desert oar Oct 19, 2021, 5:21 PM

#

fluid pebble i am trying to match medicine packaging components

ok, so you're dealing with mostly text on labels on bottles/boxes

fluid pebble Oct 19, 2021, 5:21 PM

#

have you ever used polyfax?

desert oar Oct 19, 2021, 5:21 PM

#

i have not, it looks like some kind of antibiotic

fluid pebble Oct 19, 2021, 5:21 PM

#

the small boxes of medication cream

silver summit Oct 19, 2021, 5:22 PM

#

yeah that sounds reasonable, ocr out the text, compare the word sets with edit distance (mentioned above)

fluid pebble Oct 19, 2021, 5:22 PM

#

silver summit yeah that sounds reasonable, ocr out the text, compare the word sets with edit d...

but ocr accuracy is really bad

silver summit Oct 19, 2021, 5:23 PM

#

are you sure? what have you tried?

desert oar Oct 19, 2021, 5:23 PM

#

OCR is really good nowadays, maybe the scanning process is really noisy?

silver summit Oct 19, 2021, 5:23 PM

#

I've done ocr on products on grocery store shelves... it's pretty good

fluid pebble Oct 19, 2021, 5:23 PM

#

it misses some text and it also miss spells some word

#

i have used easyocr

desert oar Oct 19, 2021, 5:23 PM

#

is the next not english? maybe non-english ocr is a lot worse

#

are you doing something like detecting counterfeit products?

fluid pebble Oct 19, 2021, 5:24 PM

#

desert oar are you doing something like detecting counterfeit products?

no no the text is in english

silver summit Oct 19, 2021, 5:24 PM

#

well that's not a correct conclusion, if you tried one ocr you cannot say ocr sucks... there are many ocr models

fluid pebble Oct 19, 2021, 5:24 PM

#

desert oar are you doing something like detecting counterfeit products?

no thats too advance

#

i also pytesseract same problem exist

silver summit Oct 19, 2021, 5:26 PM

#

I gotta run, I think this problem is very doable.

fluid pebble Oct 19, 2021, 5:26 PM

#

okay

robust jungle Oct 19, 2021, 5:33 PM

#

quick question: how can I get output node names from a keras Xception model?

#

specifically I want to freeze it

#

in order to use it with opencv

lapis sequoia Oct 19, 2021, 5:47 PM

#

so i was watching a tutorial of tech with tim about saving modules and visualizing data and it was fine but the code was suppose to train the modules and use the best one

#

and what it is doing is using the last module trained

#

#Import Library
import numpy as np
import pandas as pd
from sklearn import linear_model
import sklearn
from sklearn.utils import shuffle
import matplotlib.pyplot as plt
from matplotlib import style
import pickle

style.use("ggplot")

data = pd.read_csv("student-mat.csv", sep=";")

predict = "G3"

data = data[["G1", "G2", "absences","failures", "studytime","G3"]]
data = shuffle(data) # Optional - shuffle the data

x = np.array(data.drop([predict], 1))
y =np.array(data[predict])
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)


# TRAIN MODEL MULTIPLE TIMES FOR BEST SCORE
best = 0
for _ in range(20):
    x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.1)

    linear = linear_model.LinearRegression()

    linear.fit(x_train, y_train)
    acc = linear.score(x_test, y_test)
    print("Accuracy: " + str(acc))

    if acc > best:
        best = acc
        with open("studentgrades.pickle", "wb") as f:
            pickle.dump(linear, f)

# LOAD MODEL
pickle_in = open("studentgrades.pickle", "rb")
linear = pickle.load(pickle_in)


print("-------------------------")
print('Coefficient: \n', linear.coef_)
print('Intercept: \n', linear.intercept_)
print("-------------------------")

predicted= linear.predict(x_test)
for x in range(len(predicted)):
    print(predicted[x], x_test[x], y_test[x])

# Drawing and plotting model
plot = "studytime"
plt.scatter(data[plot], data["G3"])
plt.legend(loc=4)
plt.xlabel(plot)
plt.ylabel("Final Grade")
plt.show()```

iron basalt Oct 19, 2021, 7:00 PM

#

The idea of a generative adversarial network can exist in the human brain, but the specific thing referred to as a GAN in Deep Learning can't, it's not biologically plausible. @surreal elm

surreal elm Oct 19, 2021, 7:01 PM

#

Have you seen the recent suggestion that dentrites are logic gates?

#

XOR / NOR /NAND / etc?

iron basalt Oct 19, 2021, 7:01 PM

#

That's a 1940s thing and was the first thing suggested.

#

They built several logic gates out of real neurons.

surreal elm Oct 19, 2021, 7:02 PM

#

https://www.quantamagazine.org/neural-dendrites-reveal-their-computational-power-20200114/

#

originally they assumed it was all on / off

#

this is much more complex

iron basalt Oct 19, 2021, 7:02 PM

#

Yes neurons are far more complex than anything currently used in code.

surreal elm Oct 19, 2021, 7:02 PM

#

hmm that link is not loading now

#

https://syncedreview.com/2020/01/09/brains-are-amazing-neuroscientists-discover-l2-3-human-neurons-can-compute-the-xor-operation/

Synced

‘Brains Are Amazing’ – Neuroscientists Discover L2/3 Human Neurons ...

When talk of artificial neural networks began some fifty years ago the idea was to mimic the behaviour and function of the neurons in human brains — a premise that has more or less survived to this day. But new research now suggests scientists may have severely underestimated the power and potential of our neurons.“BrainsContinue Reading

#

this apparently adds about 20x power to the estimated potential

#

and there is likely more

iron basalt Oct 19, 2021, 7:05 PM

#

The thing that makes Deep Learning not biologically plausible is things such as backpropagation (through multiple layers), and convolutions (shared weights specifically, there is no sliding window in the human brain for obvious reasons, but a similar thing, multiple receptive fields, can do the trick).

surreal elm Oct 19, 2021, 7:05 PM

#

the neurons can share info too locally

#

the capsid viral shell thing

#

that packages packets of data

#

I have to look up it's name again

#

"ARC proteins"

#

so there is a second and even third channel

#

THC is backpropigating

#

(cannabinoids)

#

so it's not easy to even imagine how data is flowing

iron basalt Oct 19, 2021, 7:09 PM

#

Real neural networks do not work well on Von Neumann machines. They require special hardware, specifically https://en.wikipedia.org/wiki/Reservoir_computing. The gains from this are not just 20x, it's much larger, but also can't really be directly compared.

torpid raptor Oct 19, 2021, 7:11 PM

#

can someone help me to scrap a website

#

i'm stuck here for about 13hour

#

please if any expert in web scraping can help

quasi parcel Oct 19, 2021, 7:13 PM

#

YES

#

where are you stuck at

#

@torpid raptor

plain verge Oct 19, 2021, 7:14 PM

#

hi everyone
why is this so hard
I have a geopandas data frame containing some polygons
some polygons are inside each other, for example there may be a big park containing a couple small playgrounds inside it
I wanna get rid of the polygons that are inside another polygon, in my geodataframe
how can I do it?

iron basalt Oct 19, 2021, 7:43 PM

#

surreal elm THC is backpropigating

When referring to backpropagation I mean like in Deep Learning, which is most definitely not biologically plausible. It's why the original learning rules for NNs did not use backpropagation either.

thorn crag Oct 19, 2021, 7:45 PM

#

Hello, I've been learning python for aa bit more than 1 year now and currently learning JS (following a web development path which will lead to going through Django) .
However I am starting to think that I don't like web development - so I was thinking maybe going for what python's most popular for - data science and machine learning.
Could some one give some tips if it's a good idea for a non-good mathematician to start his journey on this long path?
I really like OOP as a concept and I am not sure if I apply my knowledge in this new field.
Any courses for newbies?

tidal bough Oct 19, 2021, 8:10 PM

#

I don't think it's possible to not even read the other columns (with CSV files - it's possible with some formats like FWF), but it probably just discarded all other columns of each row right after reading that row.

#

oh, I see what you're asking though; that'd require some sort of stream decoding. Is that possible for ZIP?..

desert oar Oct 19, 2021, 8:41 PM

#

yeah it just discards the unused fields and i think doesn't even parse them

#

i think zlib does have some kind of streaming support

surreal elm Oct 19, 2021, 10:25 PM

#

99% py

#

the spikes are when I am generating the buildings

#

I ported like 1/2 my game to the new streaming system

#

the old file was uber bloated too,
cut out a bunch of fluff*

desert oar Oct 19, 2021, 11:25 PM

#

don't work like that. parsing csv is line-by-line

#

unless the lines themselves were really really long

nova gate Oct 19, 2021, 11:33 PM

#

Howdy - does anybody know how I can hide this error message in Jupyter Notebook for QQ Plot:

#

stable umbra Oct 20, 2021, 12:04 AM

#

I needed to convert 50000 images in numpy arrays and it took so long I had to do multiprocessing and split it up into individual files and it still took a crazy long time. I used a csv to organize everything.

#

I'm looking into learning openpyxl and use excel spreadsheets instead.

stable umbra Oct 20, 2021, 12:49 AM

#

convert
[convert]
VERB
cause to change in form, character, or function.

#

as in, from a .png image to an array of floats representing rgb values.

tender hearth Oct 20, 2021, 1:02 AM

#

grave frost you'd be losing too much contextual information, and overlapped samples won't he...

I'm trying to capture voice characteristics, so I don't think contextual information is all that important

#

Voice cloning is the use case

modest timber Oct 20, 2021, 1:26 AM

#

Hi, what if I want do predict stocks market by 20 last days close price, should i use 20= batch_size in LSTM? Could anyone explain me the batch size idea? because I coudn't get it

tender hearth Oct 20, 2021, 1:31 AM

#

modest timber Hi, what if I want do predict stocks market by 20 last days close price, should ...

You could run the network on the whole dataset, and then compute the loss and adjust the networks' weights

#

Or you could compute the loss and adjust the networks' weights on each sample in the dataset

#

The first option is generally slow

#

The second option generally needs to unstable learning

#

A 3rd option would be to compute loss and adjust network weights in "batches" of 16, 32, 50, or however big you want your batch to be

#

Batch size is just another hyperparameter

#

If you want to make a prediction based on close prices of the last 20 days you don't need to specify any parameter to the LSTM

#

Since by design it accepts variable-length sequences

modest timber Oct 20, 2021, 1:33 AM

#

Sorry, I stuck with making proper input shape and x_test shape

#

What should look input shape in that case

desert oar Oct 20, 2021, 1:34 AM

#

it's just 1 price series, like the s&p 500? or it's 20 stocks?

modest timber Oct 20, 2021, 1:34 AM

#

One price series

tender hearth Oct 20, 2021, 1:35 AM

#

If it's one price the input shape is (batch_size, 20, 1)

#

Well actually I believe it's (20, batch_size, 1) but if you set the batch_first=True keyword argument to your LSTM it would be (batch_size, 20, 1)

modest timber Oct 20, 2021, 1:37 AM

#

Ok thank you. Let me ask you, why we need this batch size ( i think of this like of blocks of data) if my network use only 20 input at ones.

#

It dont get it

tender hearth Oct 20, 2021, 1:38 AM

#

Well batch size is just the number of samples your network will look at before adjusting its weights

#

If you have batch_size = len(dataset), then your network will look at the entire dataset, and then adjust its weights

#

if you have batch_size = 1, then your network will look at one sample at a time, adjust its weights, and the move on to the next sample

#

Line-Plots-of-Classification-Accuracy-on-Train-and-Test-Datasets-With-Different-Batch-Sizes.png

modest timber Oct 20, 2021, 1:39 AM

#

I got it. :)

tender hearth Oct 20, 2021, 1:39 AM

#

Batch size is just another hyperparameter

#

You can see in batch=4 the loss jumps up and down

#

Thats because its adjusting its weights every 4 samples which might be too little samples

modest timber Oct 20, 2021, 1:41 AM

#

So its try 4 weight and choose one or drop some

#

Or smthing like that

#

4 difrent weight

tender hearth Oct 20, 2021, 1:42 AM

#

No it's just the number of samples your network looks at a time

#

Let's take the stock market example

#

Say you're trying to predict the price on the 21st day given 20 days of data

#

If you have batch_size = 1, then your network will look at one sample, give a prediction, compute how wrong it was from that prediction, and then adjust its weights

#

If you have batch_size = 4, then your network will look at 4 samples, which in this case would be 4 samples of 20 days of data and 4 predictions

desert oar Oct 20, 2021, 1:44 AM

#

yeah maybe the confusion is what a "sample" is

#

a "sample" is one "window" of 20 days

tender hearth Oct 20, 2021, 1:44 AM

#

https://stats.stackexchange.com/questions/153531/what-is-batch-size-in-neural-network

modest timber Oct 20, 2021, 1:44 AM

#

Ok i understand i think :)

#

So why the shape batch_size = True would have input shape begin with batch_size, and normaly no

#

But secondary

tender hearth Oct 20, 2021, 1:49 AM

#

I don't know that's an internal design decision

modest timber Oct 20, 2021, 1:50 AM

#

Aha, ok .

shrewd pewter Oct 20, 2021, 1:56 AM

#

Function f({stats}) takes in your 3 inputs and outputs 1 or 0 depending on if you win. Find the stat point allocation that maximizes wins```
How would you guys approach this problem?

serene scaffold Oct 20, 2021, 2:39 AM

#

shrewd pewter ```You have 20 "stat points" There are 3 inputs, "strength", "wisdom", "agility...

This is not a data science question

wicked grove Oct 20, 2021, 3:19 AM

#

serene scaffold This is not a data science question

Hello, i have question
Should i convert this list to a string datatype before applying stemming and lemmatizer or leave the data type as is?

serene scaffold Oct 20, 2021, 3:27 AM

#

wicked grove Hello, i have question Should i convert this list to a string datatype before a...

it depends on what the stemmer/lemmatizer expects.

#

by creating the list of strings, I assume you're tokenizing

wicked grove Oct 20, 2021, 3:29 AM

#

serene scaffold Oct 20, 2021, 3:30 AM

#

Going forward, please always copy and paste the actual text into the chat.

wicked grove Oct 20, 2021, 3:30 AM

#

serene scaffold by creating the list of strings, I assume you're tokenizing

Yes i used regexptokenizer

wicked grove Oct 20, 2021, 3:30 AM

#

serene scaffold Going forward, please always copy and paste the actual text into the chat.

Okay

#

tokenizer=RegexpTokenizer(r'\w+')
dataset['text']=dataset['text'].apply(tokenizer.tokenize)
dataset['text']=dataset['text'].astype.str()
print(dataset['text'].head())

serene scaffold Oct 20, 2021, 3:31 AM

#

wicked grove ```py tokenizer=RegexpTokenizer(r'\w+') dataset['text']=dataset['text'].apply(to...

remember to put a space on either side of binary operators.

tokenizer = RegexpTokenizer(r'\w+')
dataset['text'] = dataset['text'].apply(tokenizer.tokenize)
dataset['text'] = dataset['text'].astype.str()
print(dataset['text'].head())

However, it is unlikely that this does what you expected.

#data-science-and-ml

tag_kv_pairs is a Series of tuples: (tagName, tagValue)