#data-science-and-ml | Python | Page 261

proper fable Oct 17, 2020, 8:03 AM

#

thanks in advance

grave frost Oct 17, 2020, 8:17 AM

#

@proper fable just explore other EDA notebooks there in Kaggle, by using the search bar in the notebooks section

#

@lapis sequoia The math required for ML/AI is pretty dependent on the task you are doing - simple tasks, simple math complex tasks, complex maths. I think calculus and Algebra basics should be pretty good for general Machine Learning and knowledge about vectors/matrices (usually taught in C.S in schools) would be very helpful too.

proper fable Oct 17, 2020, 8:22 AM

#

@proper fable just explore other EDA notebooks there in Kaggle, by using the search bar in the notebooks section
@grave frost Thankyouuu that helps me a lot. I dont know that I can do such
a thing before

grave frost Oct 17, 2020, 8:47 AM

#

np

wild spoke Oct 17, 2020, 9:23 AM

#

word_vecs = KeyedVectors.load_word2vec_format("./glove.txt") how do get the "glove.txt" file or how do i generate it?

#

I am using gensim.models

lapis sequoia Oct 17, 2020, 11:47 AM

#

spaCy: Are vocabularies a set of just the words of all analyzed documents or a set beyond former?

mild topaz Oct 17, 2020, 1:46 PM

#

2020-10-17 18:32:05,249 findDocumentType1 MainThread : test!
2020-10-17 18:32:11,981 findDocumentType1 Thread-19 : Exception on /findDocumentType1 [POST]
Traceback (most recent call last):
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper
    resp = resource(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view
    return self.dispatch_request(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request
    resp = meth(*args, **kwargs)
TypeError: post() takes 0 positional arguments but 1 was given```

dense knot Oct 17, 2020, 1:49 PM

#

Guys, i'm a beginner in python. Do you guys have reference code for random forest algorithm without using scikit learn (sklearn) in jupyter notebook. Thank you.

sweet ember Oct 17, 2020, 2:41 PM

#

Hey guys, I ended up getting a gig on DS in freelancing. I have a dataset of users location and activity location and time data from which I have to find how much time the users spends in a specific location.

#

Is there a way to do it?

#

id, time, user_x, user_y, act_x, act_y, activity are the features

#

ids repeat and activity coordinates repeat sometimes.

heady hatch Oct 17, 2020, 4:00 PM

#

What are the features like?

lapis sequoia Oct 17, 2020, 4:19 PM

#

@lapis sequoia The math required for ML/AI is pretty dependent on the task you are doing - simple tasks, simple math complex tasks, complex maths. I think calculus and Algebra basics should be pretty good for general Machine Learning and knowledge about vectors/matrices (usually taught in C.S in schools) would be very helpful too.
@grave frost thank you for your help. wish you the best rhings

heady hatch Oct 17, 2020, 4:28 PM

#

Hello wonderful people,

asking for advice here. I'm doing a semantic search using roberta embeddings, but it's trained on a max length of 512.

But the text data I'm working with are double that. Should I truncate the text data?

#

My end goal is get the embeddings to compare.

#

Or should I not go with fancy approach and go with something simpler like tfidf due to the text length?

heady hatch Oct 17, 2020, 5:45 PM

#

I think I was able to solve the previous problem just taking a naive approach.

Now I have a new question. Does it matter of the batch size when we're encoding for the embeddings?

ie getting embeddings at batch size 32 vs 256. Using the embeddings only for comparison.

#

I'm aware batch size makes a difference when doing downstream tasks, but what about encoding the actual embedding?

tidal bronze Oct 17, 2020, 6:54 PM

#

self.df["64gb"] = np.where("64" in self.df["title"], True, False)

returns False but it should work

limpid raft Oct 17, 2020, 7:33 PM

#

I don't understand why i is a str and not an integer. Also, how could I iterate over this list?:

`lst = [('someting1'), ('something2')]

for i in lst:

first_lst = lst[i].split('|')

`

lapis sequoia Oct 17, 2020, 7:40 PM

#

@limpid raft i is not always an integer ,in this case i can be ('something1' ) or ('something2')

#

lst = ['someting1', 'something2']
first_lst = lst[0]

limpid raft Oct 17, 2020, 8:08 PM

#

@lapis sequoia Does it take then the type lst? and what if lst is a list of integers and strings, what does i become in that case? And is it possible to not iterate over this list manually?

lapis sequoia Oct 17, 2020, 8:09 PM

#

@limpid raft Always lst[0] will be the first it doesn't matter int or str

#

to get all of them there two options you can use While or For loop

#

lst = ['someting1', 'something2']
for lsts in lst:
 print(lsts)

#

lst = ['someting1', 'something2']
i = 0
while i < len(lst):
 print(lst[i])
 i += 1

limpid raft Oct 17, 2020, 8:20 PM

#

ahh, so lsts[0] would then be something1. But, does the 'in' statement create the variable lsts such that it has the same type as lst?

#

From my understanding it's purpose is to check if a value is present in a sequence (range, list,etc). Is the 'for' loop forcing the type lst onto lsts?

tender umbra Oct 17, 2020, 9:42 PM

#

Hi, does anyone here worked on graph neural networks?

#

I am looking for efficient implications of SOTAs in graph representation learning. Need to deploy model that works on huge number of small relatively sparse graphs (<100k nodes). Wondering which package would be best etc.

shell berry Oct 17, 2020, 10:18 PM

#

Can someone please help me understand the X and y inputs to scikit-learn's linear regression? I have a list of X points and a list of corresponding Y points.

austere swift Oct 17, 2020, 10:20 PM

#

X is the features, y is the labels

#

thats the most basic way of understanding it

#

or in the case of linear regression you can think of it like regressing on a graph with x and y variables

shell berry Oct 17, 2020, 10:23 PM

#

@austere swift thanks, but when I try it says the sizes of the lists are wrong even though they're both 1x5000

austere swift Oct 17, 2020, 10:23 PM

#

whats the exact error message?

shell berry Oct 17, 2020, 10:24 PM

#

📎 unknown.png

#

📎 unknown.png

#

and I have a linear relationship

austere swift Oct 17, 2020, 10:25 PM

#

so sklearn doesnt like lists that look like [a, b, c, d], it wants lists like [[a], [b], [c], [d]]

shell berry Oct 17, 2020, 10:25 PM

#

Ah I see

austere swift Oct 17, 2020, 10:25 PM

#

so thats why its asking you to do the array.reshape(-1, 1) thing

#

so you can just reshape it like that

shell berry Oct 17, 2020, 10:28 PM

#

x = np.reshape(mapping_x, (-1,1))
y = np.reshape(mapping_y, (-1,1))

reg = LinearRegression().fit(x, y)```

#

Same error with this

austere swift Oct 17, 2020, 10:29 PM

#

try only reshaping the x variable, not y

shell berry Oct 17, 2020, 10:31 PM

#

Same thing

#

nvm, it worked. thanks!

gray sedge Oct 18, 2020, 3:55 AM

#

is web scraping data science

#

if web scraping isn't data science can someone tell me where to ask a beautifulsoup question

regal belfry Oct 18, 2020, 5:58 AM

#

Whats the best way to do a column level compare between two dataframes in pandas?

velvet thorn Oct 18, 2020, 6:37 AM

#

Whats the best way to do a column level compare between two dataframes in pandas?
@regal belfry what od you mean column level compare

tidal bronze Oct 18, 2020, 7:04 AM

#

need help over at #help-kiwi

regal belfry Oct 18, 2020, 8:35 AM

#

@regal belfry what od you mean column level compare
@velvet thorn if df1.column == df2.column then show all matching rows

lapis sequoia Oct 18, 2020, 10:21 AM

#

why doi get this error

#

UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
  warnings.warn(msg, UndefinedMetricWarning)

#

im trying to predict data from a csv file

#

ping

#

r is undefined i think

#

no

#

r is not even there in my code

#

sry im newbie

#

idk

sweet ember Oct 18, 2020, 12:40 PM

#

getting bar graph behind catplot

📎 Screenshot_from_2020-10-18_18-09-49.png

#

Hey guys am getting barplot behid catplot, i only want catplot. How do I remove bar graphs and the lines fro the chart?

lapis sequoia Oct 18, 2020, 5:06 PM

#

if web scraping isn't data science can someone tell me where to ask a beautifulsoup question
@gray sedge You can ask it here too and try in Web Dev channel.

#

Hey guys am getting barplot behid catplot, i only want catplot. How do I remove bar graphs and the lines fro the chart?
@sweet ember give the code that you are using to generate the plot.

#

or Read the documentation here.
https://seaborn.pydata.org/generated/seaborn.catplot.html

You can try different parameters in kind to fix it.

pure pond Oct 18, 2020, 5:27 PM

#

Is ROOT well known/respected/w.e in the data science community? I'm doing a physics masters using it and might be interested in going into data science after

tidal bronze Oct 18, 2020, 6:22 PM

#

hey, I've made a scraper that will monitor ads posted to craiglist for certain categories and compare against the average price in order to identify bargains

#

is there any other rules you guy would suggest, I was thinking if item is 30% cheaper than the average, notify me

#

but maybe average is not the best metric to use?

grave frost Oct 18, 2020, 6:49 PM

#

@pure pond BTW What is ROOT?

lapis sequoia Oct 18, 2020, 7:35 PM

#

~~can someone help?~~
ERROR: Could not install packages due to an EnvironmentError: [Errno 2] No such file or directory: 'C:\\Users\\HP\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python37\\site-packages\\sklearn\\datasets\\tests\\data\\openml\\292\\api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz'
~~i get this error when trying to install sklearn~~
i upgraded pip and it got fixed

#

ugh now i get this

ImportError: cannot import name '__check_build' from 'sklearn' (C:\Users\HP\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\sklearn\__init__.py)

this is the code:

# make predictions
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC
# Load dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = read_csv(url, names=names)
# Split-out validation dataset
array = dataset.values
X = array[:,0:4]
y = array[:,4]
X_train, X_validation, Y_train, Y_validation = train_test_split(X, y, test_size=0.20, random_state=1)
# Make predictions on validation dataset
model = SVC(gamma='auto')
model.fit(X_train, Y_train)
predictions = model.predict(X_validation)
# Evaluate predictions
print(accuracy_score(Y_validation, predictions))
print(confusion_matrix(Y_validation, predictions))
print(classification_report(Y_validation, predictions))

hollow sentinel Oct 18, 2020, 9:41 PM

#

me doing a udemy course on data science

hasty thorn Oct 18, 2020, 10:10 PM

#

can anyone suggest some resources for learning NLTK sentiment analysis

plucky zephyr Oct 18, 2020, 11:00 PM

#

if i plot error like this, it is overfit right?
so i just need to stop iteration early to make it not overfit?

x-axis = iteration
y-axis = rmse

📎 unknown.png

gilded shadow Oct 19, 2020, 2:41 AM

#

wow ya looks like after 2 iterations it's there 🙃

dusky carbon Oct 19, 2020, 2:44 AM

#

hey guys, i'm having some trouble printing zero values from my dataframe/panda code

#

📎 Screen_Shot_2020-10-19_at_10.25.40_am.png

#

📎 Screen_Shot_2020-10-19_at_10.26.11_am.png

#

this is my code and output, i just want it to ALSO print the data for the ones that have a zero value, any ideas?

quiet whale Oct 19, 2020, 3:57 AM

#

I have imbalance dataset and I've done under sampling with decision tree classifier which give me score of f1=1, looks too good to be true then I saw the confusing matrix and it shows that FN and FP is both 0...

is it a good thing? I'm very new at this. I've also try over and under sampling with SMOTE combined with XGBoost classifier and the best f1 score is 0.46

lapis sequoia Oct 19, 2020, 4:05 AM

#

so i wanted to ask the math behind test_size

#

x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X,y, test_size=)```

#

how can we decide test_size

#

just want to know the math behind it

velvet thorn Oct 19, 2020, 4:06 AM

#

I have imbalance dataset and I've done under sampling with decision tree classifier which give me score of f1=1, looks too good to be true then I saw the confusing matrix and it shows that FN and FP is both 0...

is it a good thing? I'm very new at this. I've also try over and under sampling with SMOTE combined with XGBoost classifier and the best f1 score is 0.46
@quiet whale you probably have data leakage

#

just want to know the math behind it
@lapis sequoia there's no hard and fast rule but generally 20% or so

lapis sequoia Oct 19, 2020, 4:07 AM

#

hm

velvet thorn Oct 19, 2020, 4:07 AM

#

this is my code and output, i just want it to ALSO print the data for the ones that have a zero value, any ideas?
@dusky carbon what do you mean? what are you trying to do?

tall aurora Oct 19, 2020, 4:21 AM

#

what is data science?

lilac minnow Oct 19, 2020, 4:21 AM

#

Please help, I wanna use plt.imshow in Flask for research data visualisations. Creating/displaying .jpg or .png isn't helpful as they cannot be updated on the go. Please suggest a way.
plt.imshow because I wanna use colormap and clim.

tall aurora Oct 19, 2020, 4:22 AM

#

what is data science?

velvet thorn Oct 19, 2020, 4:22 AM

#

Please help, I wanna use plt.imshow in Flask for research data visualisations. Creating/displaying .jpg or .png isn't helpful as they cannot be updated on the go. Please suggest a way.
plt.imshow because I wanna use colormap and clim.
@lilac minnow what do you mean "use it in flask"?

lilac minnow Oct 19, 2020, 4:25 AM

#

I wanna create a server for visualising outputs from TF Models, as numpy arrays. plt.imshow works well with Jupyter. But I'm not able to get them to work in flask.

velvet thorn Oct 19, 2020, 4:25 AM

#

I wanna create a server for visualising outputs from TF Models, as numpy arrays. plt.imshow works well with Jupyter. But I'm not able to get them to work in flask.
@lilac minnow they're two different things...

#

if you want that kind of behaviour, you need JS

lilac minnow Oct 19, 2020, 4:26 AM

#

@velvet thorn thank you. Can you provide me with any example/template for me to get started?

velvet thorn Oct 19, 2020, 4:26 AM

#

you're basically saying you want an interactive interface

#

alternatively, you can consider Dash

#

nope, I can't

#

Google should help

quiet whale Oct 19, 2020, 4:41 AM

#

@quiet whale you probably have data leakage
@velvet thorn ah I did! Thankyou, need to be more careful next time :/

mossy dragon Oct 19, 2020, 4:48 AM

#

hey guys

#

I want to do a neural network model for sentiment analysis on tweets

#

but I dont have much spare time, so I was considering using mturk or fiver to have people manually label a training set

#

thoughts?

serene scaffold Oct 19, 2020, 4:54 AM

#

@mossy dragon what topic are the tweets for? Also, were you planning to have two people label everything independently and compare the results?

mossy dragon Oct 19, 2020, 4:55 AM

#

i dont have a specific topic yet

#

im willing to be flexible on that tbh

#

I wasn't planning on getting different people but that sounds like a good idea

serene scaffold Oct 19, 2020, 4:56 AM

#

one of my coworkers does sentiment analysis. It tends to be difficult because of sarcasm and especially nuanced texts.

mossy dragon Oct 19, 2020, 4:56 AM

#

yea

#

this is a school project though

serene scaffold Oct 19, 2020, 4:56 AM

#

ah

#

I would look for an existing corpus

mossy dragon Oct 19, 2020, 4:57 AM

#

so not like it needs to be perfect

#

you mean

#

abandon the tweet idea?

serene scaffold Oct 19, 2020, 4:57 AM

#

no

#

I would see if someone has already made a set of tweets and associated sentiment data

mossy dragon Oct 19, 2020, 4:57 AM

#

hmm

#

i actually had a similar idea to that

#

i know there is an IMBD dataset containing movie reviews labeled as positive/negative

serene scaffold Oct 19, 2020, 4:58 AM

#

that sounds pretty good

mossy dragon Oct 19, 2020, 4:58 AM

#

so i was considering maybe using that to train a model and then classifying tweets about a new movie trailer that was released or a movie that recently came out

#

i haven't really done any neural net models though

#

do you know a sample size that i should aim for?

serene scaffold Oct 19, 2020, 4:59 AM

#

unfortunately I don't

mossy dragon Oct 19, 2020, 5:00 AM

#

hmm

serene scaffold Oct 19, 2020, 5:00 AM

#

I work in an NLP lab and I'm the worst one

#

probably because I spend too much time on discord

mossy dragon Oct 19, 2020, 5:00 AM

#

lel

#

its a group project

#

i have ~5 other people in the group

serene scaffold Oct 19, 2020, 5:00 AM

#

what class?

mossy dragon Oct 19, 2020, 5:00 AM

#

NLP

serene scaffold Oct 19, 2020, 5:00 AM

#

nice

mossy dragon Oct 19, 2020, 5:01 AM

#

but still we're all either busy with other classes or working full time, but I'm curious if we could get a decent sized training set if we spent ~1 hour manually labeling data

serene scaffold Oct 19, 2020, 5:02 AM

#

our annotators are always complaining about how long it takes

#

so my guess is no

#

but if you're just assigning labels to entire documents (rather than individual tokens) I guess that's faster

#

I don't know the exact specification of your assignment but I would be very surprised if your professor wanted you to create your own data set.

mossy dragon Oct 19, 2020, 5:06 AM

#

oh lol

#

we're not required too

#

but i personally would like to

#

we dont even have to do sentiment analysis, we could do a different method to analyze the text

velvet thorn Oct 19, 2020, 5:12 AM

#

does it have to involve ML?

#

or can you do some other kind of analysis

#

like maybe something that could be interesting is analysis of document structure?

mossy dragon Oct 19, 2020, 5:12 AM

#

nah

#

these are my exact instructions

#

📎 unknown.png

#

📎 unknown.png

velvet thorn Oct 19, 2020, 5:13 AM

#

hm it says a single data set so I guess my idea is out

#

but yeah it seems like you're intended to find your own dataset

mossy dragon Oct 19, 2020, 5:14 AM

#

yea

velvet thorn Oct 19, 2020, 5:14 AM

#

as opposed to creating one

#

however, I'm like 99% sure there are existing tweet datasets out there

#

for sentiment analysis

#

it's a very common task

#

so you could use that as a baseline and find something interesting to add your own spin on things

#

for example, comparing across geographical regions?

mossy dragon Oct 19, 2020, 5:16 AM

#

I'd like to modify this and put this on my github for future job searches

#

so i figured it would be more impressive to extract that data myself

#

but i guess i dont have to do that now

velvet thorn Oct 19, 2020, 5:17 AM

#

so i figured it would be more impressive to extract that data myself
@mossy dragon it would be!

#

but yeah, if it's a group project

#

probably not.

mossy dragon Oct 19, 2020, 5:22 AM

#

thanks for the help catthumbsup

velvet thorn Oct 19, 2020, 5:22 AM

#

yw 🙂

mild topaz Oct 19, 2020, 6:36 AM

#

hello```python
print("hello")
try:
model = load_model(r"E://demo3//albania_100_model.p")
#model = load_model(r"{path}//{country}_100_model.p")
print("model loaded...")

    except OSError:
        
        logger.debug({
                "Status" : "failed",
                "message" : "model not available"})
        
        return{
               "Status" : "failed",
               "message" : "model not available"}```

in output i am getting as python { "Status" : "failed", "message" : "model not available"}

#

i am not able to load model

autumn veldt Oct 19, 2020, 7:00 AM

#

Hello everyone,
I am currently looking for a dataset on cholera, do any of you know where to download a dataset about cholera? or do you guys know where I can find the source dataset like this one? https://github.com/soujanyajoshi/Cholera/blob/master/data.xlsx .Because I have searched for the dataset in Kaggle, but the features on the dataset are different.

GitHub

soujanyajoshi/Cholera

Prediction of Cholera. Contribute to soujanyajoshi/Cholera development by creating an account on GitHub.

pure pond Oct 19, 2020, 7:52 AM

#

@pure pond BTW What is ROOT?
@grave frost https://root.cern.ch/

ROOT

ROOT: analyzing petabytes of data, scientifically.

An open-source data analysis framework used by high energy physics and others.

stiff zealot Oct 19, 2020, 8:37 AM

#

is this correct place to talk about stock market analysis?

lapis sequoia Oct 19, 2020, 8:38 AM

#

What s that

#

Absolutely not

#

I guess

stiff zealot Oct 19, 2020, 8:38 AM

#

lol

#

anyone have experience building trading bots?

pure pond Oct 19, 2020, 9:03 AM

#

Just use an rng stock picker you'll probably outperform other attempts xd

hazy mortar Oct 19, 2020, 9:29 AM

#

a monkey outperformed most

#

😄

#

https://www.ft.com/content/abd15744-9793-11e2-b7ef-00144feabdc0

Subscribe to read | Financial Times

News, analysis and comment from the Financial Times, the worldʼs leading global business publication

marsh tartan Oct 19, 2020, 10:11 AM

#

how long does it take to train a single-thread ntlk classifier model with 8000 training points and 2000 test points?

#

I'm running on a i7-10750H @4.5ghz

#

or is there an easy way to run it with CUDA?

earnest forge Oct 19, 2020, 11:17 AM

#

I need advice. What machine learning course should I take?

unique sandal Oct 19, 2020, 1:04 PM

#

@earnest forge i would highly recommend the complete zero to mastery machine learning course by Andrei Neagoie on Udemy . Its very affordable for its quality and content in my opinion

real geode Oct 19, 2020, 1:44 PM

#

I got two dictionaries that contain several pandas dataframes on it. The columns and the rows are all the same names however i would like to iterate through the dataframes from each dictionary and run df1.compare(df2) one at the time.

#

is there a way to write a function that will make this quicker instead of writing df1[key1].compare(df2[key1]) for each key in these dictionaries

foggy tundra Oct 19, 2020, 1:55 PM

#

Hello ! How can i use raw sql queries in flask_sqlalchemy ?

keen prism Oct 19, 2020, 5:03 PM

#

Hi there!
I'm really new to Python but I want to invite people to take interest in a ML/NLP project. I want us to figure out how to digitize The Turing Digital Archive (http://www.turingarchive.org/) into easy-to-read text.
I'm not sure what the best tool is for the project, so I'm posting this to make interested friends who want to help.
To begin, I was looking at EasyOCR (https://github.com/JaidedAI/EasyOCR) but I don't know if it's the right tool for the job.
We'll be working in conda with Python for this; I personally will be using Windows 10; apart from the experience itself, I think creating one document containing all of Alan Turing's writings will be it's own reward.

📎 unknown.png

wary kelp Oct 19, 2020, 5:48 PM

#

@foggy tundra One option is to ignore the ORM and interact with the database directly with something like pymysql https://pypi.org/project/PyMySQL/

PyPI

PyMySQL

Pure Python MySQL Driver

lapis sequoia Oct 19, 2020, 6:25 PM

#

how long does it take to train a single-thread ntlk classifier model with 8000 training points and 2000 test points?
@marsh tartan Well it will depend the configuration of models and not just on the data. A complex model with higher number of parameter will take more time than a simple one.
And to train with GPU for free than you can try using Google Colab which is free for 12 hours in a single run.

#

Anyway if you are just looking for some simple classifier for text than it should not take more than few minutes. Unless your model architecture is very complex.

#

is there a way to write a function that will make this quicker instead of writing df1[key1].compare(df2[key1]) for each key in these dictionaries
@real geode You can convert each datframe into numpy array and compare.
(A==B).all()

#

test if all values of array (A==B) are True.

Note: maybe you also want to test A and B shape, such as A.shape == B.shape

Special cases and alternatives:

It should be noted that:
this solution can have a strange behaviour in a particular case:
if either A or B is empty and the other one contains a single element, then it return True.
For some reason, the comparison A==B returns an empty array, for which the all operator returns True.
Another risk is if A and B don't have the same shape and aren't broadcast-able, then this approach will raise an error.

Source: https://stackoverflow.com/questions/10580676/comparing-two-numpy-arrays-for-equality-element-wise

Stack Overflow

Comparing two NumPy arrays for equality, element-wise

What is the simplest way to compare two NumPy arrays for equality (where equality is defined as: A = B iff for all indices i: A[i] == B[i])?

Simply using == gives me a boolean array:

real geode Oct 19, 2020, 6:34 PM

#

Thanks for the tip but i managed to find a workaround while still keeping dataframes

#


#Call this function to create crosstab tables
def crosstab_compare(df1cross, df2cross, df1original):
    """
    df1cross = dictionary of pandas dataframe where crosstabs have been performed, the self.
    df2cross = specifies another dictionary of pandas dataframe where crosstab has been performed, the other
    df1original = pandas dataframe non crosstabulated that will be used to extract the list of labels
    The end result is a dictionary
    The tables shown will appear only if results are different from each other
    The function will attempt to compare all dataframes with equal shape. If one dataframe doesnt match with the other, the function will
    continue to work but skip the mismatching dataframe
    """
    question_list = list(df1original.columns)[1:]
    print("Self: Refers to the table that was called first in the arguments")
    comparedf = {}
    
    for k in question_list:
        try:
            comparedf['{}'. format(k)] = df1cross[k].compare(df2cross[k], align_axis='rows')
        except ValueError:
            continue
    return comparedf

#

i had the problem where some DFs didn't have the same shape which is why i added the try block

lapis sequoia Oct 19, 2020, 6:53 PM

#

Is there a lighter-weight alternative to jupyter notebooks?

lapis sequoia Oct 19, 2020, 7:13 PM

#

Is there a lighter-weight alternative to jupyter notebooks?
@lapis sequoia lighter in what sense ?

#

You can just use VS Code editor as a notebook instead of installing anaconda and everything for jupyter if you want.

#

Try this: https://code.visualstudio.com/docs/python/jupyter-support-py

Working with Jupyter code cells in the Python Interactive window

#

Also you can use cloud notebook providers like Google Colab which are hosted on VMs. So your system will not have any load and you get decent Machines.

real geode Oct 19, 2020, 7:20 PM

#

yea I use Visual Studio Code jupyter notebooks for work and is pretty light overall

lapis sequoia Oct 19, 2020, 7:34 PM

#

My notebook has around 10k lines

#

It takes forever for it to load up

final ocean Oct 19, 2020, 7:53 PM

#

oof

civic jackal Oct 19, 2020, 8:05 PM

#

Hi guys, is anyone fimilar with with python script that aling DNA dequence. Have an assignment that I have no idea where to start from

real geode Oct 19, 2020, 8:27 PM

#

they want you to code a BLAST from scratch?

foggy solar Oct 19, 2020, 8:29 PM

#

That's a hell of a school project lol

#

Can you use the NCBI API (if it has to use python)?

#

https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=DeveloperInfo

BLAST Developer Information

real geode Oct 19, 2020, 8:32 PM

#

just use BLAST directly lol i dont know why they would want you to use python just to get there. No need to reinvent the wheel

foggy solar Oct 19, 2020, 8:33 PM

#

Yeah, I agree. Was just suggesting in case it was a project that required Python scripts. Depends on if it is a bio or computer class. No way in hell a biologist would write their own BLAST scripts.

#

Here is the alignment tool: https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch&BLAST_SPEC=blast2seq&LINK_LOC=align2seq

civic mountain Oct 19, 2020, 8:47 PM

#

is there a reason my tensor which is [173, 173] is getting resized to [231, 231] when plotted using plt.imshow() ?

civic mountain Oct 19, 2020, 9:50 PM

#

Okay sorry it was a matplotlib issue.

lilac raven Oct 20, 2020, 4:00 AM

#

Hello, I was directed here. I have a live graph being plotted from incoming ECG data, the two line plots (heart rate and moving average (called rolling mean in the code)) are updating successfully and moving across the screen, while the scatter plot data is not. The initial set of scatter points gets plotted, but remains static, unlike the line plots. I have to set up line. and scatter. plots a bit differently, so that is probably where the problem lies.

#

using funcAnimation

civic jackal Oct 20, 2020, 4:40 AM

#

I basically need to do this: Convert align_seqs.py to a Python program that takes the DNA sequences as an input from a single external file and saves the best alignment along with its corresponding score in a single text file (your choice of format and file type) to an appropriate location. No external input should be required; that is, you should still only need to use python align_seq.py to run it. For example, the input file can be a single .csv file with the two example sequences given at the top of the original script.

lapis sequoia Oct 20, 2020, 5:07 AM

#

Gn

autumn veldt Oct 20, 2020, 5:32 AM

#

Hello everyone,
I am currently looking for a dataset on cholera, do any of you know where to download a dataset about cholera? or do you guys know where I can find the source dataset like this one? https://github.com/soujanyajoshi/Cholera/blob/master/data.xlsx .Because I have searched for the dataset in Kaggle, but the features on the dataset are different.

GitHub

soujanyajoshi/Cholera

Prediction of Cholera. Contribute to soujanyajoshi/Cholera development by creating an account on GitHub.

tight sparrow Oct 20, 2020, 8:49 AM

#

hello I'm trying to access google maps using API Key

#

but i'm getting this

   "error_message" : "You must enable Billing on the Google Cloud Project at https://console.cloud.google.com/project/_/billing/enable Learn more a   "results" : [],
   "status" : "REQUEST_DENIED"```

#

any welp for me?

#

Thank You

uneven wind Oct 20, 2020, 9:21 AM

#

Hello ! I have a problem with pandas and read-Excel feature.
I can't read one of the columns in my excel sheet. The console return this error:

  File "path\to\pandas\core\indexing.py", line 1177, in _validate_read_indexer
    key=key, axis=self.obj._get_axis_name(axis)
KeyError: "None of [Index(['S2007-02', 'S2007-02', 'S2007-02', 'S2007-02', 'S2007-02', 'S2007-02',\n       'S2007-02', 'S2007-02', 'S2007-02', 'S2007-02',\n       ...\n       '1 - New', '1 - New', '3 - Approved', '3 - Approved', '1 - New',\n       '3 - Approved', '3 - Approved', '3 - Approved', '3 - Approved',\n       '1 - New'],\n     dtype='object', length=1043)] are in the [columns]"

But I don't understand what is those "\n". Moreover, they aren't into the string value.
I checked the column format but I don't saw any return line or space in the data. Someone as any clue to fix this ?
Thanks !

solid mantle Oct 20, 2020, 10:44 AM

#

Anyone familiar with pymultinest?

#

If you are, kindly dm me

lapis sequoia Oct 20, 2020, 12:15 PM

#

any welp for me?
@tight sparrow You need to enable billing. Go into Google Cloud console and inside Billing you should be able to see if there is any active billing account.

#

Also check Account Management and enabale the billing if you have closed it in the past. You will need a debit/credit card to do that.

#

hi

#

import sklearn
from sklearn import datasets
from sklearn import svm
from sklearn import metrics
from sklearn.neighbors import KNeighborsClassifier

cancer = datasets.load_breast_cancer()

#print(cancer.feature_names)
#print(cancer.target_names)

x = cancer.data
y = cancer.target

x_train,x_test,y_train,y_test = sklearn.model_selection.train_test_split(x,x,test_size=0.2)

print(x_train,y_train)

classes = ['malignant' 'benign']

clf = svm.SVC()
clf.fit(x_train,y_train)


y_pred = clf.predict(x_test)

acc = metrics.accuracy_score(y_test,y_pred)
print(acc)

#

"got an array of shape {} instead.".format(shape))
ValueError: y should be a 1d array, got an array of shape (455, 30) instead. error

#

Hello ! I have a problem with pandas and read-Excel feature.
I can't read one of the columns in my excel sheet. The console return this error:
  File "path\to\pandas\core\indexing.py", line 1177, in _validate_read_indexer
    key=key, axis=self.obj._get_axis_name(axis)
KeyError: "None of [Index(['S2007-02', 'S2007-02', 'S2007-02', 'S2007-02', 'S2007-02', 'S2007-02',\n       'S2007-02', 'S2007-02', 'S2007-02', 'S2007-02',\n       ...\n       '1 - New', '1 - New', '3 - Approved', '3 - Approved', '1 - New',\n       '3 - Approved', '3 - Approved', '3 - Approved', '3 - Approved',\n       '1 - New'],\n     dtype='object', length=1043)] are in the [columns]" 
But I don't understand what is those "\n". Moreover, they aren't into the string value.
I checked the column format but I don't saw any return line or space in the data. Someone as any clue to fix this ?
Thanks !
@uneven wind \n is used for next line. So it is possible that it is causing the problem. Also Are you passing any other parameters while reading CSV. First try to read without any index and columns. Then choose column and index properly.

#

import sklearn
from sklearn import datasets
from sklearn import svm
from sklearn import metrics
from sklearn.neighbors import KNeighborsClassifier

cancer = datasets.load_breast_cancer()

#print(cancer.feature_names)
#print(cancer.target_names)

x = cancer.data
y = cancer.target

x_train,x_test,y_train,y_test = sklearn.model_selection.train_test_split(x,x,test_size=0.2)

print(x_train,y_train)

classes = ['malignant' 'benign']

clf = svm.SVC()
clf.fit(x_train,y_train)


y_pred = clf.predict(x_test)

acc = metrics.accuracy_score(y_test,y_pred)
print(acc)

@lapis sequoia x_train,x_test,y_train,y_test = sklearn.model_selection.train_test_split(x,y,test_size=0.2)

You made one error here. The input for split should be x and y but you have only passed x and x.

#

OHHH

#

@lapis sequoia thankssss

tight sparrow Oct 20, 2020, 12:47 PM

#

@lapis sequoia do I need to pay some money to gain access?

lapis sequoia Oct 20, 2020, 12:48 PM

#

@lapis sequoia do I need to pay some money to gain access?
@tight sparrow You some free quota, after that you need to pay. Free quota would be more than enough if it is for personal project.

#

Also you get free $300 credits when you register. So you have to use them if you want to get access.

tight sparrow Oct 20, 2020, 1:00 PM

#

okay cool

#

thanks lemon_fingerguns_shades

weary heart Oct 20, 2020, 1:08 PM

#

Hi, i'm trying to learn about machine learning in these few weeks, is there any youtube or website that can help with oversample,logistic regression,etc? thanks

rain nimbus Oct 20, 2020, 1:35 PM

#

Hi, i'm trying to learn about machine learning in these few weeks, is there any youtube or website that can help with oversample,logistic regression,etc? thanks
@weary heart andrew ng?

weary heart Oct 20, 2020, 1:40 PM

#

Thanks i'll look it up 😁

mild topaz Oct 20, 2020, 1:41 PM

#

hello
i have a code which creates image from base64 string to image, now i want to resize this image in desired pixels howi i can do this ? can anyone help me in this ?

earnest forge Oct 20, 2020, 2:29 PM

#

it is pink line drawn. what does it stand for? what is its meaning?

📎 unknown.png

lapis sequoia Oct 20, 2020, 2:46 PM

#

hello
i have a code which creates image from base64 string to image, now i want to resize this image in desired pixels howi i can do this ? can anyone help me in this ?
@mild topaz I'm not sure what tool you are using for creating string to image but when you save the file you can change its dpi and figure size.

If you are using matplotlib then you can resize with the help of matplotlib.pyplot.figure and choose the appropriate parameter values for dpi and figsize.

mild topaz Oct 20, 2020, 2:47 PM

#

https://paste.pythondiscord.com/ibudajodix.py my code @lapis sequoia plz check

pure swan Oct 20, 2020, 2:52 PM

#

Am i allowed to ask a question in this channel?

muted patio Oct 20, 2020, 2:53 PM

#

it is pink line drawn. what does it stand for? what is its meaning?
@earnest forge Correlation?

real geode Oct 20, 2020, 2:59 PM

#

the line between two variables on a scatter plot is supposed to represent the relation between them

#

isn't this some high school level math?

earnest forge Oct 20, 2020, 3:01 PM

#

it is correlation. yes

#

I've just checked it

lapis sequoia Oct 20, 2020, 3:20 PM

#

Can someone help me with pandas regression

earnest forge Oct 20, 2020, 3:25 PM

#

what exactly?

lapis sequoia Oct 20, 2020, 3:31 PM

#

https://paste.pythondiscord.com/ibudajodix.py my code @lapis sequoia plz check
@mild topaz I'm not sure what is the problem in the code.

#

You are resizing the image in the code so it should take care of your needs.

#

it is pink line drawn. what does it stand for? what is its meaning?
@earnest forge that is the best linear fit for your data. If you have to approximate your data with some function then that line gives the best result. And it also tells about how x and y are correlated.

#

@earnest forge So I have a bunch of dummy variables right

#

I groupedby/summed by a certain column

#

But the dummy variables got messed up and now show numbers that aren't either 0 or 1.

#

How can I either fix that or make it where all the dummy variable columns greater than 1 get turned into a 1

halcyon vale Oct 20, 2020, 3:38 PM

#

https://www.linkedin.com/posts/thinam-tamang-3b12831a2_66daysofdata-deeplearning-nlp-activity-6724338148551213058-faKM

Thinam Tamang posted on LinkedIn

Day 47 of #66DaysOfData! with Ken Jee

Continuous Bag of Words :
In the Continuous Bag of Words Model, The distributed representations of context or surrounding...

hollow sentinel Oct 20, 2020, 3:39 PM

#

guys what do you like using

#

seaborn

#

or matplotlib

#

for graphs

#

which one is actually worth my time bc i used matplotlib in my last project

#

seaborn has way prettier graphs imo

austere swift Oct 20, 2020, 3:44 PM

#

I like seaborn cus its a lot prettier

hollow sentinel Oct 20, 2020, 3:47 PM

#

seaborn seems easier to use for me

austere swift Oct 20, 2020, 3:48 PM

#

yeah that too

hollow sentinel Oct 20, 2020, 3:48 PM

#

i've been doing a udemy course on data science & machine learning

#

that's why i've been so quiet

#

Jose Portilla is a beast

lapis sequoia Oct 20, 2020, 3:49 PM

#

matplotlib is more basic and allows you to do alot custom things. Seaborn is built on top of Matplotlib.

hollow sentinel Oct 20, 2020, 3:49 PM

#

ohh

austere swift Oct 20, 2020, 3:49 PM

#

yeah seaborn is just a wrapper for matplotlib that makes it easier to use and has a lot better looking default themes

hollow sentinel Oct 20, 2020, 3:50 PM

#

yeah i think i'll be using seaborn more often now

lapis sequoia Oct 20, 2020, 3:50 PM

#

Any help for me

hollow sentinel Oct 20, 2020, 3:50 PM

#

what did you ask @lapis sequoia

lapis sequoia Oct 20, 2020, 3:50 PM

#

If the graphs you want are available in seaborn or plotly then you can just use them. The idea of matplotlib is to allow any python programmer complex graphs.

hollow sentinel Oct 20, 2020, 3:50 PM

#

i

lapis sequoia Oct 20, 2020, 3:51 PM

#

@hollow sentinel I have a bunch of dummy variables
I groupedby/summed by a certain column
But the dummy variables got messed up and now show numbers that aren't either 0 or 1.
How can I either fix that or make it where all the dummy variable columns greater than 1 get turned into a 1

hollow sentinel Oct 20, 2020, 3:51 PM

#

i'm traumatized by plotly

#

chloropeth 😦

#

idk i remember with pandas you can conditionally select within the dataframe

#

sorry i'm new to this lmao

lapis sequoia Oct 20, 2020, 3:53 PM

#

same lol

#

@lapis sequoia I'm not able to understand your problem. But yeah if you just want to make a column with max value 1 then it is possible. You can apply some map or apply_map to fix it

hollow sentinel Oct 20, 2020, 3:55 PM

#

the only thing is that I'm worried I'm not actually learning anything

#

i don't learn from basic udemy videos I learn from projects

#

built different

lapis sequoia Oct 20, 2020, 3:56 PM

#

@lapis sequoia I created dummy variables for 4 columns

#

Then grouped the rows by a certain column

#

Doing so aggregated all the dummy variables as well, instead of the only column I wanted (as far as I know, there is no way around this)

#

But the dummy variables must be either 0 or 1, some of them have numbers such as 200, 300, 450 etc. So I need all the ones with those numbers to be a 1 so I can perform regression correctly

hollow sentinel Oct 20, 2020, 4:00 PM

#

you're doing linear regression?

lapis sequoia Oct 20, 2020, 4:00 PM

#

yeah

hollow sentinel Oct 20, 2020, 4:00 PM

#

cool I'm still doing data visualization haha

#

noob

lapis sequoia Oct 20, 2020, 4:01 PM

#

idek how to do that in python lol

hollow sentinel Oct 20, 2020, 4:01 PM

#

lmao do you want me to email the udemy course notes

lapis sequoia Oct 20, 2020, 4:02 PM

#

is it complicated lol

#

What are you using the data viz for

hollow sentinel Oct 20, 2020, 4:02 PM

#

i wanted to do a linear regression on a dataset

#

and it's good to use seaborn for the graph

lapis sequoia Oct 20, 2020, 4:03 PM

#

Can you not do graphs in statsmodels

#

im using sm

austere swift Oct 20, 2020, 4:04 PM

#

afaik you can't

hollow sentinel Oct 20, 2020, 4:05 PM

#

never heard of statsmodels

lapis sequoia Oct 20, 2020, 4:05 PM

#

didnt know that

hollow sentinel Oct 20, 2020, 4:05 PM

#

is that another module in python?

grave frost Oct 20, 2020, 4:05 PM

#

@tall aurora Why do you want to know?

earnest forge Oct 20, 2020, 4:06 PM

#

I groupedby/summed by a certain column
@lapis sequoia could you provide a bit of your code?

#

guys what do you like using
@hollow sentinel I combine both seaborn and matplotlib. seaborn ain't capable of everything matplotlib can provide you

grave frost Oct 20, 2020, 4:07 PM

#

@mild topaz If you don't mind me asking, how did you get an Image as base64 string?

hollow sentinel Oct 20, 2020, 4:08 PM

#

@earnest forge yeah when I look at Kaggle they use both seaborn and matplotlib

#

Kaggle is really good

earnest forge Oct 20, 2020, 4:08 PM

#

yes

#

if you don't know what to do next - open kaggle 😄

grave frost Oct 20, 2020, 4:09 PM

#

The only thing I like about Kaggle notebooks is that their kernels are reproducible. Apart from that, Kaggle is just a time-waste

hollow sentinel Oct 20, 2020, 4:09 PM

#

i think i understand linear regression w two variables but i don't understand multiple linear regression

lapis sequoia Oct 20, 2020, 4:10 PM

#

@earnest forge Which part of the code do you wat

#

want

#

The groupby code?

hollow sentinel Oct 20, 2020, 4:10 PM

#

linear regression is just a relationship between two variables right

earnest forge Oct 20, 2020, 4:10 PM

#

The groupby code?
@lapis sequoia yes

grave frost Oct 20, 2020, 4:10 PM

#

linear regression is just a relationship between two variables right
@hollow sentinel no

hollow sentinel Oct 20, 2020, 4:10 PM

#

F

#

then what is it

#

i've watched youtube videos on it

grave frost Oct 20, 2020, 4:10 PM

#

Why are you doing LInear Regression if YOU don't fully unnderstand it?

hollow sentinel Oct 20, 2020, 4:11 PM

#

i thought i would pick it up as I go

earnest forge Oct 20, 2020, 4:11 PM

#

either you did something wrong when grouping or values initially were 'bad'

lapis sequoia Oct 20, 2020, 4:11 PM

#

df = df.groupby(by='Tool').sum()

grave frost Oct 20, 2020, 4:13 PM

#

@hollow sentinel Linear regression just a simple method to find the relationship between data points using (as the name implies) a linear function as a basis of a relationship. If the data does not exhibit linear relation, then it is useless methods and you are better off using other ways like polynomial regression, etc.

hollow sentinel Oct 20, 2020, 4:14 PM

#

https://online.stat.psu.edu/stat462/node/91/#:~:text=Simple linear regression is a,%2C explanatory%2C or independent variable.

#

"Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables"

#

oh

#

allows you to study the relationship between two variables

grave frost Oct 20, 2020, 4:15 PM

#

Yeah, that def seems a bit off because that implies that the data points are like coordinates (with x and y value) and you find the linear relationship between those 2 variables, but that's not actually the most fundamental one

earnest forge Oct 20, 2020, 4:15 PM

#

df = df.groupby(by='Tool').sum()
@lapis sequoia
1st: you better not replace initial dataframe, try to save the result to variable named like df_grouped
2nd: make sure data in Tool is in convenient data type

hollow sentinel Oct 20, 2020, 4:16 PM

#

thank you @grave frost

earnest forge Oct 20, 2020, 4:16 PM

#

is it int or object?

hollow sentinel Oct 20, 2020, 4:16 PM

#

um they were probably trying to simplify it so the layman (me) can understand it

grave frost Oct 20, 2020, 4:16 PM

#

you can find linear relationship in 3D space too

hollow sentinel Oct 20, 2020, 4:16 PM

#

what

lapis sequoia Oct 20, 2020, 4:16 PM

#

@earnest forge I was using new dataframes initially yeah but my then I'd have a bunch of cells which was making me confused

#

What do you mean convenient data type?

#

No clue what that is tbh.

#

The tool column is a product serial # if that helps

earnest forge Oct 20, 2020, 4:17 PM

#

convenient for df.group_by method to work with values

#

in Tool column what type is it?

lapis sequoia Oct 20, 2020, 4:17 PM

#

I have no idea

#

How do I check that?

#

I'm new to this lol

grave frost Oct 20, 2020, 4:18 PM

#

@hollow sentinel Imagine the line connecting you to your ceiling fan - that is a line in 3D space. I doubt much data exhibit linear relationship in 3 dimensions, but that doesn't mean it's impossible to do

earnest forge Oct 20, 2020, 4:18 PM

#

you can check it out using df.dtypes

lapis sequoia Oct 20, 2020, 4:18 PM

#

@hollow sentinel Two distinct but related variables is how I look at it

#

@earnest forge Let me try that thanks

main pelican Oct 20, 2020, 4:18 PM

#

Why is my SMTP code not working ( the emails are fake but I use real ones for the errors shown below):

File "scratch.py", line 10, in <module>
server.login(sender_email, password)
File "C:\Users\dhruv_\AppData\Local\Programs\Python\Python38-32\lib\smtplib.py", line 734, in login
raise last_exception
File "C:\Users\dhruv_\AppData\Local\Programs\Python\Python38-32\lib\smtplib.py", line 723, in login
(code, resp) = self.auth(
File "C:\Users\dhruv_\AppData\Local\Programs\Python\Python38-32\lib\smtplib.py", line 646, in auth
raise SMTPAuthenticationError(code, resp)
smtplib.SMTPAuthenticationError: (534, b'5.7.9 Application-specific password required. Learn more at\n5.7.9 https://support.google.com/mail/?p=InvalidSecondFactor x23sm2799418pfc.47 - gsmtp')

📎 redditerPost.PNG

Sign in with App Passwords - Google Account Help

Tip: App Passwords aren’t recommended and are unnecessary in most cases. To help keep your account secure, use "Sign in with Google" to connect apps to your Google Account.
An App Password is

lapis sequoia Oct 20, 2020, 4:20 PM

#

@earnest forge Not working

#

Tool column is the only one that isn't showing up

#

Is that because I grouped it already?

#

My dummy variables all say float64

hollow sentinel Oct 20, 2020, 4:20 PM

#

@main pelican i like your profile pic of Sokka

earnest forge Oct 20, 2020, 4:20 PM

#

yes. it may be

main pelican Oct 20, 2020, 4:21 PM

#

@hollow sentinel lol

lapis sequoia Oct 20, 2020, 4:21 PM

#

let me retry it

#

I'll rename the groupby df

earnest forge Oct 20, 2020, 4:21 PM

#

reload the data and group it one more time

#

I'll rename the groupby df
@lapis sequoia good

lapis sequoia Oct 20, 2020, 4:21 PM

#

Says it is an object

#

and my dummy variables are now uint8

#

My dependent variable still says float64

grave frost Oct 20, 2020, 4:22 PM

#

Is it just me or does anybody else have problems ssh'ing into a google VM instance?

earnest forge Oct 20, 2020, 4:23 PM

#

Says it is an object
@lapis sequoia can you show df.head() of the data?

lapis sequoia Oct 20, 2020, 4:25 PM

#

on the same code?

hollow sentinel Oct 20, 2020, 4:27 PM

#

I didn't know pandas had it's own data visualization too

#

that's pretty cool

lapis sequoia Oct 20, 2020, 4:28 PM

#

told you bro

earnest forge Oct 20, 2020, 4:28 PM

#

on the same dataframe, yes

hollow sentinel Oct 20, 2020, 4:28 PM

#

yeah but it looks gross

lapis sequoia Oct 20, 2020, 4:28 PM

#

I did it

#

Did you want me to show it here you mean

earnest forge Oct 20, 2020, 4:29 PM

#

yes

lapis sequoia Oct 20, 2020, 4:29 PM

#

uhh

#

Sure give me a second

#

Need to block some info out

#

📎 image0.jpg

#

Everything beginning with LH is a dummy variable

#

I gave them that prefix cause I was trying to fix the aggregation problem

#

@earnest forge

dusky furnace Oct 20, 2020, 4:44 PM

#

Hey

#

Does anyone know how to plot a pandas window when you run a file.py in a linux terminal?

earnest forge Oct 20, 2020, 4:48 PM

#

Oh

#

I got what's wrong

#

You count all values in tool and it exceeds space in the memory

lapis sequoia Oct 20, 2020, 4:51 PM

#

What do you mean?

#

The group is by the tool column but the sum is for the quantity

#

if that makes sense

earnest forge Oct 20, 2020, 4:54 PM

#

Oh

#

You need to bring values in other columns to int data type. They are percepted by object type by pandas, that's the reason you get these unexpected results

lapis sequoia Oct 20, 2020, 5:05 PM

#

So all the dummy variables?

earnest forge Oct 20, 2020, 5:06 PM

#

yes

lapis sequoia Oct 20, 2020, 5:26 PM

#

@earnest forge How can I change the dtype

#

The dummy variables are showing as float64

earnest forge Oct 20, 2020, 5:30 PM

#

check df.dtypes one more time. look at the columns which are desirable to be int (if the dtype is float, then left it that, no need to change)
after you decide which columns' data type values to change use the following:
df = df.astype({'column_name':'int32'})

lapis sequoia Oct 20, 2020, 5:30 PM

#

I have 100+ dummy variable columns

#

is there a way to not set them manually one by one lol

#

Why does the code have to sum thedummy variables i

earnest forge Oct 20, 2020, 5:38 PM

#

oh, then make it all int, except particular columns:

cols = df.columns
df[cols[your_slice]] = df[cols[your_slice]].apply(pd.to_numeric, errors='coerce')

in df[cols[your_slice]] you must specify all columns except those you do not want to convert to numeric type.

For instance, if you want to keep first and fourth columns as they are, you may apply the following slice: df[cols[[1:4]] = that code above
df[cols[[4::]] = that code above

#

sum method can't summarize values that are not represented as numeric types. so it thinks of it as summarizing string. in the end, it gives you weirdly computed result

strong oasis Oct 20, 2020, 5:44 PM

#

Do most people going into data science have a masters or can you get in if you have a bachelors (physics)? Been studying machine learning lately so I figured I might apply for some jobs.

lapis sequoia Oct 20, 2020, 5:53 PM

#

@earnest forge Let me try that out thanks

#

@earnest forge That will fix it ?

earnest forge Oct 20, 2020, 5:54 PM

#

it must fix it

lapis sequoia Oct 20, 2020, 6:10 PM

#

Okay let me try it

#

@earnest forge Wait, I'm confused sorry. Should the dummy variables be numeric

#

tool (serial number I want to group by), quantity (dependent variable, what I want to sum), dummy variables (independent variables)

#

are my columns

radiant forge Oct 20, 2020, 7:24 PM

#

heya! I'm trying to extend pyannote to build a fun NLP app for podcasters. anyone familiar with that lib?

#

trying to make sure it can do the thing i think it can do

#

idea being: running the same set of data through a bunch of different ML algos, and having all the results for the same data tagged. once it gets manually okayed by the EU, the data is marked for each ML set to use for more training data.

#

so, a "master" pyannote annotation with: segments to cut up the source audio, speaker, transcription, sentiment, etc. then once they're all corrected, they can then be cut up by the segment defs to feed the various ML algos.

earnest forge Oct 20, 2020, 7:39 PM

#

tool (serial number I want to group by), quantity (dependent variable, what I want to sum), dummy variables (independent variables)
@lapis sequoia when you group by tool and aggregate summarization, your grouped daraframe represents sum of values in other columns depending on Tool value.

shell berry Oct 20, 2020, 8:14 PM

#

Is there a function to turn a list of labels [cat, dog, dog, rat, rat, rat, cat] into a list of class labels, such as [0, 1, 1, 2, 2, 2, 0]?

#

I can't seem to find it on google so apologies if this is trivial

#

in scikit-learn*

tidal bough Oct 20, 2020, 8:14 PM

#

hmm, this can be coded manually, but I think scikit-learn has one

shell berry Oct 20, 2020, 8:14 PM

#

Yeah I can use a dict and do it manually but I want to learn the built ins to scikit learn

tidal bough Oct 20, 2020, 8:15 PM

#

@shell berry
there
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder

shell berry Oct 20, 2020, 8:15 PM

#

Thanks 🙂

#

Follow up question: Why is my naive bayes model working in scikit learn with just plain text labels for the classes?

#

"dog", "cat", etc. Don't they have to be in an int/vector representation?

tidal bough Oct 20, 2020, 8:17 PM

#

You might be using a high-level enough feature that it handles all the encoding and prediction for you.

shell berry Oct 20, 2020, 8:17 PM

#

Oh weird, thanks

lapis sequoia Oct 20, 2020, 8:24 PM

#

@earnest forge Ya so it would aggregate all of them regardless

#

So which columns am I changing to numeric

#

All of them?

#

@earnest forge Code you gave me isnt working bro

#

syntaxerrro

hollow sentinel Oct 20, 2020, 9:15 PM

#

F

lapis sequoia Oct 20, 2020, 9:42 PM

#

dummy variables are your qualitative variables turned to numbers. if you have 1 qualitative variables with multiple categories (for stock markets, Industry could be a dummy variable). lets say industry can be either financial, tech, industrials. You will have 3-1 dummy variables.
@glad mulch I know what a dummy variable is lmao I have that all set up, I was just asking about the data type for it inside pandas

velvet thorn Oct 20, 2020, 11:09 PM

#

@earnest forge Wait, I'm confused sorry. Should the dummy variables be numeric
@lapis sequoia what other data type would you use?

lapis sequoia Oct 20, 2020, 11:10 PM

#

They’re float run

#

Rn

#

His Code didn’t work anyways

velvet thorn Oct 20, 2020, 11:11 PM

#

They’re float run
@lapis sequoia generally some integer type is appropriate, but honestly it doesn't really matter

lapis sequoia Oct 21, 2020, 12:21 AM

#

Yeah I feel you

#

@velvet thorn Any insight as to why his code didn't work?

velvet thorn Oct 21, 2020, 12:22 AM

#

@velvet thorn Any insight as to why his code didn't work?
@lapis sequoia honestly I only skimmed the discussion

#

but if you still need help maybe you can summarise the problem?

lapis sequoia Oct 21, 2020, 12:27 AM

#

@velvet thorn I have like 90+ dummy variables in my data that I created using pandas. I grouped my data using a product serial # to sum the quantity of hours. Doing this also aggregated the dummy variables , so they show numbers like 500, 294, 348, etc etc instead of just the 0 or 1 like they are supposed to

#

So I am trying to find a way to either fix this or to find a way to just make all the ones > 0 turn to 1

velvet thorn Oct 21, 2020, 12:28 AM

#

@velvet thorn I have like 90+ dummy variables in my data that I created using pandas. I grouped my data using a product serial # to sum the quantity of hours. Doing this also aggregated the dummy variables , so they show numbers like 500, 294, 348, etc etc instead of just the 0 or 1 like they are supposed to
@lapis sequoia how can you identify the dummy variable columns?

lapis sequoia Oct 21, 2020, 12:28 AM

#

What do you mean? They all have names

#

And I put a prefix to all of them

#

Cause I was trying to see if I can apply the >0 make it a 1 thing but couldnt figure it out

velvet thorn Oct 21, 2020, 12:29 AM

#

What do you mean? They all have names
@lapis sequoia like what's the filter you can apply on them

#

okay, I think you said they all start with LH, right?

lapis sequoia Oct 21, 2020, 12:29 AM

#

yeah that's the prefix I gave them

#

someone said I should give them a common prefix to be able to edit them all at once or somethijng

velvet thorn Oct 21, 2020, 12:30 AM

#

dummy_cols = [col for col in df.columns if col.startswith('LH')]

df[dummy_cols] = df[dummy_cols].clip(0, 1)

#

should work

lapis sequoia Oct 21, 2020, 12:31 AM

#

I tried something similar to that and it didn't work, let me try yours I probably had my code fucked up lol

#

@velvet thorn That worked. You're a lifesaver

#

Thank you so much

velvet thorn Oct 21, 2020, 12:32 AM

#

yw!

lapis sequoia Oct 21, 2020, 12:33 AM

#

Doing it that way by the replacing doesn't mess up any regression results right?

#

I'd assume not but just making sure ofc

velvet thorn Oct 21, 2020, 12:33 AM

#

what do you mean?

lapis sequoia Oct 21, 2020, 12:33 AM

#

Like it will still see it as a regulardummy variable

velvet thorn Oct 21, 2020, 12:33 AM

#

like does it affect the validity of a regression fit on this?

#

well

lapis sequoia Oct 21, 2020, 12:33 AM

#

yeah

velvet thorn Oct 21, 2020, 12:33 AM

#

long story short, yes

lapis sequoia Oct 21, 2020, 12:33 AM

#

oof

#

how so?

velvet thorn Oct 21, 2020, 12:33 AM

#

I mean, not in a bad way

#

in the sense that each dummy variable now represents "for this group of results (since you said they're aggregated, right), is <condition> true for at least one of the source rows"

#

when originally it meant "how many source rows was <condition> true for"

#

you get what I mean?

#

that's the effect of the clipping, right

lapis sequoia Oct 21, 2020, 12:35 AM

#

Kinda

#

My adj r-squared got 0.873

velvet thorn Oct 21, 2020, 12:35 AM

#

so if that makes sense for your problem

#

that's fine

lapis sequoia Oct 21, 2020, 12:35 AM

#

Which is good

#

but

velvet thorn Oct 21, 2020, 12:35 AM

#

adj = adjusted?

lapis sequoia Oct 21, 2020, 12:35 AM

#

yeah

#

since there are multiple independent variables gotta use adj.

#

the jarque-bera is 25541 lol

#

hmm

velvet thorn Oct 21, 2020, 12:36 AM

#

the jarque-bera is 25541 lol
@lapis sequoia why does this matter?

lapis sequoia Oct 21, 2020, 12:37 AM

#

its a goodness of fit test to a normal distribution

#

so shouldn't it be close to 0

velvet thorn Oct 21, 2020, 12:37 AM

#

why do you think so?

lapis sequoia Oct 21, 2020, 12:37 AM

#

isnt that how the test works?

velvet thorn Oct 21, 2020, 12:37 AM

#

I mean, yes

#

but what are you running the test on

#

and why do you think the data must be normally distributed?

lapis sequoia Oct 21, 2020, 12:38 AM

#

when I did the regression without fixing the dummy variable 0 or 1s I got an adj r square of 0.996 and jarque bera of like 1350

#

I don't think it must be, just seems high

velvet thorn Oct 21, 2020, 12:38 AM

#

presumably

lapis sequoia Oct 21, 2020, 12:38 AM

#

@velvet thorn my dependent variable is labor hours. independent variables are product, product config, customer, and build type

#

trying to model our labor hours and DL costs

velvet thorn Oct 21, 2020, 12:38 AM

#

nothing wrong with non-normality though

lapis sequoia Oct 21, 2020, 12:38 AM

#

to help the ops guys get a better target

#

oh also

#

going back to your source row thing

#

the reason I grouped them is because the data is set up in the way that each row is labor hours being charged to a certain assembly process

#

but I wanted the total hours for the corresponding product they all went to

#

unless I misunderstood you

velvet thorn Oct 21, 2020, 12:42 AM

#

sure, that makes sense

#

what do the dummy variables represent then?

lapis sequoia Oct 21, 2020, 12:42 AM

#

my independent variables which are all non-numeric values

#

so the product, configuration, customer, and build type

#

certain products and customers for example drive the labor hours more

#

Non-numeric**

#

@velvet thorn Do you know if its possible to see which column is driving it more than others

#

Or are you not familiar with statsmodels

glossy osprey Oct 21, 2020, 2:49 AM

#

Hi everyone. Nice to met you?

#

So, i'm doing a work at my college and i'm needing date about social inequality. Are the date about it?

weary heart Oct 21, 2020, 2:53 AM

#

hi, i'm new to machine learning, i'm curious .. how do you know if the data is overfitting or underfitting? is it trough test and train result? and if so how do you find test and train result? f1 score ? or else? thanks

austere swift Oct 21, 2020, 2:56 AM

#

@weary heart yeah mainly its through the training and testing accuracy, if the training accuracy is high but the testing accuracy is low that's overfitting and if the training accuracy is low and the testing accuracy is low too its underfitting

narrow flume Oct 21, 2020, 2:57 AM

#

Hey guys
is there a way to have [(1, 1, 1, 1, 1) (1, 0, 0, 0, 1) (1, 0, 0, 0, 1) (1, 0, 0, 0, 1) (1, 1, 1, 1, 1)] in one line?
here's my code
a = np.ones((5,1), dtype=[('a', 'i4'), ('b', 'i4'),('c', 'i4'),('d', 'i4'),('e', 'i4')])
print(a)

#

it's numpy array

#

in python btw

weary heart Oct 21, 2020, 2:58 AM

#

ah okay, so if i use SMOTE and i got this result
how do you know if it's overfitting , normal, or underfitting?


           0       0.97      0.70      0.81     66699
           1       0.28      0.83      0.42      9523

    accuracy                           0.71     76222
   macro avg       0.62      0.76      0.62     76222
weighted avg       0.88      0.71      0.76     76222```

hollow sentinel Oct 21, 2020, 3:14 AM

#

https://www.kaggle.com/arslanali4343/real-estate-dataset

Real Estate DataSet

Dragon Real Estate - Price Predictor

#

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
realEstate = pd.read_csv("realEstate.csv")
realEstate.head(5)
sns.pairplot("realEstate")

#

I'm getting an error saying TypeError: 'data' must be pandas DataFrame object, not: <class 'str'>

austere swift Oct 21, 2020, 3:17 AM

#

dont put quotes around it

hollow sentinel Oct 21, 2020, 3:18 AM

#

i am a noob sorry

#

lmao

austere swift Oct 21, 2020, 3:18 AM

#

lol

hollow sentinel Oct 21, 2020, 3:18 AM

#

it still doesn't show anything

#

I took out the quotes

heady hatch Oct 21, 2020, 3:20 AM

#

Hey people, learning how to work with images today.

How do I convert images in numpy to binary string?

from a quick search I was able to get to

array.tobytes() #or array.tostring()

but

np.fromstring(array.tobytes())

doesn't give me the original numpy array back.

Any suggestions? or thoughts on what I'm doing wrong?

hollow sentinel Oct 21, 2020, 3:20 AM

#

idk what's going on

#

📎 unknown.png

#

have you guys ever seen that next to a jupyter notebook

#

does that mean it's loading?

heady hatch Oct 21, 2020, 3:22 AM

#

It means it's running.

hollow sentinel Oct 21, 2020, 3:22 AM

#

oh it's probably bc it's a gigantic dataset

#

should i pick a smaller one i don't wanna deal with this

#

I can't find good numbery datasets everything I find on Kaggle is words

heady hatch Oct 21, 2020, 3:25 AM

#

When you say numbery datasets do you mean tabular (in table formats)?

hollow sentinel Oct 21, 2020, 3:26 AM

#

no i mean under each column it's a number not a word

#

like if the columns were price, age, weight, gender, height

heady hatch Oct 21, 2020, 3:26 AM

#

so like in a table format?

hollow sentinel Oct 21, 2020, 3:27 AM

#

if that's what it's called yes

heady hatch Oct 21, 2020, 3:27 AM

#

where each row is a record and column are features?

hollow sentinel Oct 21, 2020, 3:27 AM

#

yes

#

also the reason why it was taking so long to load was bc the dataset shape was 511, 14

heady hatch Oct 21, 2020, 3:27 AM

#

I think there are quite a few of those on Kaggle. Here's one famous one.

#

https://www.kaggle.com/c/rossmann-store-sales/data

Rossmann Store Sales

Forecast sales using store, promotion, and competitor data

hollow sentinel Oct 21, 2020, 3:28 AM

#

is there any way you can pick a smaller sized data set on kaggle

heady hatch Oct 21, 2020, 3:28 AM

#

You can set a filter.

#

Or you can just grab a subsample of the dataset.

#

Where you only grab a certain number of rows.

#

Let's say you have 511 rows, you only grab 100 of those.

hollow sentinel Oct 21, 2020, 3:29 AM

#

idk how to do that yet lmao

#

I'll see what I can do

heady hatch Oct 21, 2020, 3:31 AM

#

hahaha

#

Good luck.

hollow sentinel Oct 21, 2020, 3:32 AM

#

oh man I need luck to select a couple rows?

#

F

#

that or I might move to another dataset

#

anyways it's also 11 at night so like bedtime

#

gn guys

sacred timber Oct 21, 2020, 3:50 AM

#

Noob here... for the life of me I can not find out how to get the output plot from sklearn's metrics.plot_confusion_matrix into a tkinter gui - can anyone point me to a reference?

earnest blade Oct 21, 2020, 4:34 AM

#

has anyone published a research paper in ML here

#

I need some help

#

ny1?

merry ridge Oct 21, 2020, 5:06 AM

#

I work for a discrete mathematics journal, but my area of research is in math finance and it only has a minor intersection with ML

bitter harbor Oct 21, 2020, 5:07 AM

#

got a link for that?

lapis sequoia Oct 21, 2020, 5:11 AM

#

Anyone know any good code/video examples of rainbow deep q learning?

#

There seems to be quite little information on this for whatever reason, it seems like it should be pretty popular

bitter harbor Oct 21, 2020, 5:13 AM

#

is it not rainbow deep reinforcement learning?

lapis sequoia Oct 21, 2020, 5:14 AM

#

Yeah I might've got the name wrong

bitter harbor Oct 21, 2020, 5:14 AM

#

that'd do it lol

lapis sequoia Oct 21, 2020, 5:15 AM

#

It's using deep q networks tho right

#

I did use the correct name when searching for stuff

📎 unknown.png

#

But most of the result I found are just people reading the paper

#

These two are the only examples I've seen

📎 unknown.png

heady hatch Oct 21, 2020, 5:20 AM

#

Hey guys I have a quick question. I'm using TFRecord to store my numpy array in bytes, then reading the tfrecord.

But after I parse the tfrecord and convert it back to numpy array, the values aren't the same.

ie.

It's an image.
-> Image in numpy array
-> Convert numpy to bytes
-> tfrecord features
-> read tfrecord
-> Turn into tf dataset
-> convert bytes back to numpy

When I took a look at the image, the values has negatives in them. Any advice?

I've also made sure to convert it back from bytes using the original dtype.

hasty grail Oct 21, 2020, 5:25 AM

#

Can you provide your code?

bitter harbor Oct 21, 2020, 5:27 AM

#

@lapis sequoia I could be wrong I'm a bit rusty on this but the difference between deep q/deep reinforcement learning is that q learning doesn't use transition probability distribution (or the reward function) associated with the MDP

#

q learning is considered a model-free reinforcement learning algorithm

heady hatch Oct 21, 2020, 5:28 AM

#

def convert_to_example(image: Dict) -> tf.train.Example:
    """Convert Image to TFRecord ready format"""
    feature = {
        'height': _int64_feature(32),
        'width': _int64_feature(32),
        'channels': _int64_feature(3),
        'label': _int64_feature(image['label']),
        'filename': _bytes_feature(image['filename']),
        'image_raw': _bytes_feature(image['data'].tobytes()),
    }

    return tf.train.Example(features=tf.train.Features(feature=feature))

train_record_file = 'train.tfrecords'

with tf.io.TFRecordWriter(train_record_file) as writer:
    for image in tqdm(train_data):
        tf_example = convert_to_example(image)
        writer.write(tf_example.SerializeToString())

raw_train_dataset = tf.data.TFRecordDataset('train.tfrecords')

I broke it apart into two parts, one to write into TFRecord, one to read from it.

def parse_image_function(ex_proto):
    
    image_feature_desc = {
        'height': tf.io.FixedLenFeature([], tf.int64),
        'width': tf.io.FixedLenFeature([], tf.int64),
        'channels': tf.io.FixedLenFeature([], tf.int64),
        'label': tf.io.FixedLenFeature([], tf.int64),
        'filename': tf.io.FixedLenFeature([], tf.string),
        'image_raw': tf.io.FixedLenFeature([], tf.string),
    }
    example = tf.io.parse_single_example(ex_proto, image_feature_desc)
    
    img_raw = example['image_raw']
    
    return img_raw

for img in raw_train_dataset.map(parse_image_function).take(1):
    print(tf.io.decode_raw(img, np.int8))

#

Please let me know if you need more information.

#

I am trolling. @hasty grail

Thank you so much for your help.

I accidentally converted it into np.int8 instead of np.uint8.

mild topaz Oct 21, 2020, 5:41 AM

#

If you don't mind me asking, how did you get an Image as base64 string?
@grave frost i am getting an base64 string which i have to decode it to make imafe from it

hasty grail Oct 21, 2020, 5:45 AM

#

I accidentally converted it into np.int8 instead of np.uint8.
Problem solved I guess xD

mild topaz Oct 21, 2020, 5:46 AM

#

i am not able to resize image to desired pixels i want

#

my code here https://paste.pythondiscord.com/aboyomupij.py

#

@hasty grail sorry to ping u , can u plz look into it ?

#

i am saving an image but not in desired pixels i want

📎 unknown.png

#

@ripe crane hello

hasty grail Oct 21, 2020, 5:52 AM

#

Have you done what I asked yesterday?

mild topaz Oct 21, 2020, 5:53 AM

#

about what bro ?

hasty grail Oct 21, 2020, 5:54 AM

#

I think that you should take some time to brush up on Python basics

mild topaz Oct 21, 2020, 5:55 AM

#

sure bro, but right now i need to finish this bro , i want to submit this project

#

as soon as i resize my image then further code i know how to deal with it

#

i need a small help in resizing an image

#

i am decoding an base64 string which creates image from it

hasty grail Oct 21, 2020, 5:57 AM

#

Do what I have asked first, it will save you a lot of time with the remaining part

mild topaz Oct 21, 2020, 5:57 AM

#

but not in desired pixels

hasty grail Oct 21, 2020, 5:57 AM

#

Especially the part about functions

mild topaz Oct 21, 2020, 5:58 AM

#

i agree with u bro , but plz try to understand i need to finish this asap

#

at least can u look in this why image is not getting resized

hasty grail Oct 21, 2020, 6:00 AM

#

which line are you at right now?

mild topaz Oct 21, 2020, 6:01 AM

#

line 174 @hasty grail

#

im <PIL.Image.Image image mode=RGB size=200x99 at 0x24D0005C248>
done
wrong here1```

hasty grail Oct 21, 2020, 6:02 AM

#

from your understanding of Python, what would cause the statement at line 174 to be executed?

mild topaz Oct 21, 2020, 6:03 AM

#

wait , i need to comment that part of code from 160to 174

#

bcoz i am again reopening file image file

#

correct @hasty grail ?

hasty grail Oct 21, 2020, 6:04 AM

#

yeah you don't need that code

mild topaz Oct 21, 2020, 6:06 AM

#

now see it has created an image but not in correct pixels i want @hasty grail

hasty grail Oct 21, 2020, 6:10 AM

#

can you display the problem?

mild topaz Oct 21, 2020, 6:12 AM

#

see @hasty grail

📎 unknown.png

hasty grail Oct 21, 2020, 6:13 AM

#

where is the code for saving the image?

#

which variable are you passing into the save function?

#

check carefully

mild topaz Oct 21, 2020, 6:20 AM

#

is this ```python
with open("imageToSave.jpg", "wb") as test_img:
test_img.write(image_data)
try:

            test_img = image.load_img("imageToSave.jpg", target_size= (200,99))
            
        except OSError :
            logger.debug ({"Status" : "failed",
                      "message" : "provide valid base64 string"})
             
            return ({"Status" : "failed",
                      "message" : "provide valid base64 string"})```  @hasty grail

hasty grail Oct 21, 2020, 6:21 AM

#

Can you identify what data are you writing to the file?

mild topaz Oct 21, 2020, 6:21 AM

#

image_data i guess ? @hasty grail

hasty grail Oct 21, 2020, 6:23 AM

#

ok so what is image_data?

#

is it the resized image?

mild topaz Oct 21, 2020, 6:25 AM

#

no

is it the resized image?
@hasty grail

hasty grail Oct 21, 2020, 6:25 AM

#

well there's your problem

#

fix it so that you're actually passing in the resized image

mild topaz Oct 21, 2020, 6:26 AM

#

fix it so that you're actually passing in the resized image
@hasty grail means bro ?

hasty grail Oct 21, 2020, 6:27 AM

#

instead of image_data (the original image) you need to give the function the data that corresponds to the resized image

mild topaz Oct 21, 2020, 6:28 AM

#

u mean (self, image_data ) this

#

def resize_im ? @hasty grail

hasty grail Oct 21, 2020, 6:31 AM

#

you need to write the resized image to the file

#

not the original image

#

you are currently writing image_data (the original image) to the file, of course the image size is unchanged

mild topaz Oct 21, 2020, 6:32 AM

#

you need to write the resized image to the file
@hasty grail means how way u are saying here bro ?

hasty grail Oct 21, 2020, 6:33 AM

#

it means what I said, I don't know how to simplify that

mild topaz Oct 21, 2020, 6:34 AM

#

ok , can u show in code how way u are saying . so i can get clear idea what u are saying ? @hasty grail

hasty grail Oct 21, 2020, 6:36 AM

#

with open("output.jpg", "wb") as f:
    # Don't do this
    f.write(incorrect_image)

    # Do this
    f.write(correct_imgae)

mild topaz Oct 21, 2020, 6:37 AM

#

so in my case ```python
with open("output.jpg", "wb") as f:
# Don't do this
f.write(image_data)

# Do this
f.write(im)``` is this correct ? @hasty grail

hasty grail Oct 21, 2020, 6:37 AM

#

yes

mild topaz Oct 21, 2020, 6:40 AM

#

well, it has created image but

📎 unknown.png

#

@hasty grail

hasty grail Oct 21, 2020, 6:41 AM

#

what data type is im?

mild topaz Oct 21, 2020, 6:46 AM

#

<class 'PIL.Image.Image'> @hasty grail

hasty grail Oct 21, 2020, 6:48 AM

#

shouldn't you be using im.save instead of file.write then?

#

that's what I gathered from the documentation of PIL

mild topaz Oct 21, 2020, 6:48 AM

#

on which line bro ?

shouldn't you be using im.save instead of file.write then?
@hasty grail

hasty grail Oct 21, 2020, 6:51 AM

#

on the line where you write to the file

mild topaz Oct 21, 2020, 6:51 AM

#

with open("imageToSave.jpg", "wb") as test_img: test_img.write(im) @hasty grail here u mean ?

hasty grail Oct 21, 2020, 6:52 AM

#

yes

mild topaz Oct 21, 2020, 6:53 AM

#

with open("imageToSave.jpg", "wb") as test_img: im.save(im) @hasty grail this way ?

hasty grail Oct 21, 2020, 6:53 AM

#

read the documentation of PIL to see how to use Image.save

mild topaz Oct 21, 2020, 6:54 AM

#

        with open("imageToSave.jpg", "wb") as test_img:
            test_img.write("im.jpg")``` @hasty grail

hasty grail Oct 21, 2020, 6:55 AM

#

no

#

with open("imageToSave.jpg", "wb") as test_img:

What does this line do?

mild topaz Oct 21, 2020, 6:56 AM

#

        with open("imageToSave.jpg", "wb") as test_img:
            im.save("im.jpg")``` @hasty grail

hasty grail Oct 21, 2020, 6:56 AM

#

you didn't answer my question

mild topaz Oct 21, 2020, 6:56 AM

#

opens an image file

#

@hasty grail

hasty grail Oct 21, 2020, 6:57 AM

#

why do you have to open an image file when you are saving to a different file?

#

look at the example they have given

#

do you need to use open at all?

mild topaz Oct 21, 2020, 6:58 AM

#

no

#

@hasty grail

hasty grail Oct 21, 2020, 6:58 AM

#

then delete it

mild topaz Oct 21, 2020, 6:59 AM

#

open ?

then delete it
@hasty grail

hasty grail Oct 21, 2020, 6:59 AM

#

yes

mild topaz Oct 21, 2020, 7:04 AM

#

see i am using this code python with ("im.jpg", "wb") as test_img: im.save("im.jpg") @hasty grail

#

image not creted

hasty grail Oct 21, 2020, 7:04 AM

#

do you know what the with statement even does?

#

(if you don't please review your Python basics)

mild topaz Oct 21, 2020, 7:15 AM

#

sure bro , but at this moment i am really messed up with different things also

#

@hasty grail can u plz help in this ?

#

as soon the resized image creates i know how to deal with it

hasty grail Oct 21, 2020, 7:17 AM

#

no, you have to understand what it means, it's so basic

mild topaz Oct 21, 2020, 7:18 AM

#

yes i can understand bro

#

but right now i am messed up with different things bro ? plz

#

just help me to solve this issue @hasty grail

#

lets finish this issue now only

#

are u thier bro ? @hasty grail

hasty grail Oct 21, 2020, 7:25 AM

#

Sorry, I won't finish your code for you, you have to demonstrate your understanding first

mild topaz Oct 21, 2020, 7:26 AM

#

i know bro, can u help in this issue @hasty grail ?

#

so i can go further and try to solve issues by myself @hasty grail

hasty grail Oct 21, 2020, 7:27 AM

#

If you can answer me what the line with ("im.jpg", "wb") as test_img: is supposed to do, then sure

mild topaz Oct 21, 2020, 7:28 AM

#

with makes code compact @hasty grail

hasty grail Oct 21, 2020, 7:30 AM

#

what about the line as a whole though?

mild topaz Oct 21, 2020, 7:31 AM

#

it takes img.jpg and in write mode @hasty grail

hasty grail Oct 21, 2020, 7:32 AM

#

is it needed in this case?

mild topaz Oct 21, 2020, 7:32 AM

#

no, i guess @hasty grail

#

i am correct ? @hasty grail

hasty grail Oct 21, 2020, 7:38 AM

#

mhm

mild topaz Oct 21, 2020, 7:39 AM

#

@hasty grail hello

hasty grail Oct 21, 2020, 7:40 AM

#

yes

mild topaz Oct 21, 2020, 7:41 AM

#

can u plz help in this ?

#

@hasty grail lets finish this bro?

hasty grail Oct 21, 2020, 7:43 AM

#

if it's not needed, what do you do with that line?

#

(I mean you can use your own common sense)

mild topaz Oct 21, 2020, 7:44 AM

#

so how i can make changes here then ,? should i remove it? @hasty grail

hasty grail Oct 21, 2020, 7:44 AM

#

(I mean you can use your own common sense)

mild topaz Oct 21, 2020, 7:45 AM

#

can u be more specific here bro plz @hasty grail

hasty grail Oct 21, 2020, 7:46 AM

#

You can come up with the answer by yourself

#

This is such a simple question

mild topaz Oct 21, 2020, 7:46 AM

#

so i need to remove this line of code , correct? @hasty grail

hasty grail Oct 21, 2020, 7:46 AM

#

you can judge that for yourself

#

I don't think I have to answer that question since it's really obvious

slow adder Oct 21, 2020, 7:47 AM

#

when the textbook says 'open terminal', does it mean cmd or python shell?

mild topaz Oct 21, 2020, 7:49 AM

#

@hasty grail 😞 bro plz , i got confused here , lets finish this ?

hasty grail Oct 21, 2020, 7:50 AM

#

when the textbook says 'open terminal', does it mean cmd or python shell?
Usually that can be inferred from the context

#

bro plz , i got confused here , lets finish this ?
Just delete that line

#

You shouldn't have to ask for help for every single thing you do

mild topaz Oct 21, 2020, 7:54 AM

#

see i have deleted taht line but image is not created here ? @hasty grail https://paste.pythondiscord.com/ewoyetojuh.py

hasty grail Oct 21, 2020, 7:56 AM

#

you deleted im.save as well

#

of course it's not saving the flie

mild topaz Oct 21, 2020, 7:56 AM

#

ok then how it should be ? @hasty grail

hasty grail Oct 21, 2020, 7:56 AM

#

undelete im.save

mild topaz Oct 21, 2020, 7:57 AM

#

ok then ? @hasty grail

hasty grail Oct 21, 2020, 7:57 AM

#

test the code?

mild topaz Oct 21, 2020, 8:03 AM

#

yes worked

#

it has created image to desired size

#

@hasty grail

hasty grail Oct 21, 2020, 8:08 AM

#

ok good

#

is that all?

mild topaz Oct 21, 2020, 8:10 AM

#

no wait see this python Traceback (most recent call last): File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request rv = self.dispatch_request() File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper resp = resource(*args, **kwargs) File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view return self.dispatch_request(*args, **kwargs) File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request resp = meth(*args, **kwargs) File "E:\demo3\findDocumentType1.py", line 126, in post self.resize_im(image_data) File "E:\demo3\findDocumentType1.py", line 202, in resize_im predictions = model.predict(samples_to_predict) NameError: name 'model' is not defined @hasty grail

hasty grail Oct 21, 2020, 8:13 AM

#

The error literally tells you what is wrong, please tell me you can fix this by yourself

clear sail Oct 21, 2020, 8:13 AM

#

Hi

mild topaz Oct 21, 2020, 8:13 AM

#

line 119 i have defined it @hasty grail

hasty grail Oct 21, 2020, 8:14 AM

#

you only defined it in the post function

#

not resize_im

mild topaz Oct 21, 2020, 8:16 AM

#

ok so i have changed to this python def resize_im(self,image_data): print("test_img1") model = load_model(pathlib.Path('E:/', 'demo3', 'united_kingdom_50.h5')) @hasty grail

#

now i am that error is no more

#

Traceback (most recent call last):
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper
    resp = resource(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view
    return self.dispatch_request(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request
    resp = meth(*args, **kwargs)
  File "E:\demo3\findDocumentType1.py", line 126, in post
    self.resize_im(image_data)
  File "E:\demo3\findDocumentType1.py", line 202, in resize_im
    predictions = model.predict(samples_to_predict)
  File "C:\Users\Admin\anaconda3\lib\site-packages\keras\engine\training.py", line 1441, in predict
    x, _, _ = self._standardize_user_data(x)
  File "C:\Users\Admin\anaconda3\lib\site-packages\keras\engine\training.py", line 579, in _standardize_user_data
    exception_prefix='input')
  File "C:\Users\Admin\anaconda3\lib\site-packages\keras\engine\training_utils.py", line 145, in standardize_input_data
    str(data_shape))
ValueError: Error when checking input: expected conv2d_1_input to have shape (99, 200, 1) but got array with shape (200, 99, 3)```

#

@hasty grail

twilit wind Oct 21, 2020, 8:20 AM

#

Have you checked the shape before input

mild topaz Oct 21, 2020, 8:20 AM

#

which shape @twilit wind

twilit wind Oct 21, 2020, 8:20 AM

#

the shape of your input '

#

like before input to the conv layer you need to flatten it or do some resizing

uneven wind Oct 21, 2020, 8:24 AM

#

Thanks @lapis sequoia , It turns out that I was not accessing the data properly. It works fine now 🙂

mild topaz Oct 21, 2020, 9:02 AM

#

Traceback (most recent call last):
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper
    resp = resource(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view
    return self.dispatch_request(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request
    resp = meth(*args, **kwargs)
  File "E:\demo3\findDocumentType1.py", line 126, in post
    self.resize_im(image_data)
  File "E:\demo3\findDocumentType1.py", line 219, in resize_im
    img = preprocessing(img)
  File "E:\demo3\findDocumentType1.py", line 215, in preprocessing
    img = grayscale(img)
  File "E:\demo3\findDocumentType1.py", line 207, in grayscale
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.error: OpenCV(4.2.0) c:\projects\opencv-python\opencv\modules\imgproc\src\color.simd_helpers.hpp:94: error: (-2:Unspecified error) in function '__cdecl cv::impl::`anonymous-namespace'::CvtHelper<struct cv::impl::`anonymous namespace'::Set<3,4,-1>,struct cv::impl::A0xe227985e::Set<1,-1,-1>,struct cv::impl::A0xe227985e::Set<0,2,5>,2>::CvtHelper(const class cv::_InputArray &,const class cv::_OutputArray &,int)'
> Unsupported depth of input image:
>     'VDepth::contains(depth)'
> where
>     'depth' is 6 (CV_64F)
``` @twilit wind @hasty grail

twilit wind Oct 21, 2020, 9:03 AM

#

can you share the code

mild topaz Oct 21, 2020, 9:04 AM

#

my code here https://paste.pythondiscord.com/ohebolimuj.py @twilit wind

verbal sand Oct 21, 2020, 9:08 AM

#

I've some text that contain fraction in text - "one-third", "one-half"......
How do I convert these into their relevant fractions? 1/3, 1/2 etc...

velvet thorn Oct 21, 2020, 9:08 AM

#

I've some text that contain fraction in text - "one-third", "one-half"......
How do I convert these into their relevant fractions? 1/3, 1/2 etc...
@verbal sand how many unique fractions do you have

verbal sand Oct 21, 2020, 9:09 AM

#

It can be any.... this is contained in a text sentence like - "Take one-half of the tablet daily".
Doctor's prescription data.

mild topaz Oct 21, 2020, 9:10 AM

#

@twilit wind do u get my code?

twilit wind Oct 21, 2020, 9:10 AM

#

yes I am having a look

#

by the way what is the code about

mild topaz Oct 21, 2020, 9:11 AM

#

it is for prediction @twilit wind

velvet thorn Oct 21, 2020, 9:13 AM

#

It can be any.... this is contained in a text sentence like - "Take one-half of the tablet daily".
Doctor's prescription data.
@verbal sand create a mapping of fractions to numbers

#

and apply it

mild topaz Oct 21, 2020, 9:18 AM

#

my updated code here https://paste.pythondiscord.com/ficexumiha.py @twilit wind

#

plz check

twilit wind Oct 21, 2020, 9:18 AM

#

do you have any other code on app.py @mild topaz

mild topaz Oct 21, 2020, 9:18 AM

#

Traceback (most recent call last):
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper
    resp = resource(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view
    return self.dispatch_request(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request
    resp = meth(*args, **kwargs)
  File "E:\demo3\findDocumentType1.py", line 126, in post
    self.resize_im(image_data)
  File "E:\demo3\findDocumentType1.py", line 219, in resize_im
    im = preprocessing(im)
  File "E:\demo3\findDocumentType1.py", line 215, in preprocessing
    im = grayscale(im)
  File "E:\demo3\findDocumentType1.py", line 207, in grayscale
    im = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
TypeError: Expected Ptr<cv::UMat> for argument 'src'

#

my updated code here https://paste.pythondiscord.com/ficexumiha.py @twilit wind
plz check this is my updated code for testing a model

twilit wind Oct 21, 2020, 9:20 AM

#

It says a type error

mild topaz Oct 21, 2020, 9:22 AM

#

It says a type error
@twilit wind yes

#

how i can fix this? @twilit wind

twilit wind Oct 21, 2020, 9:23 AM

#

The code is huge it will take time

#

some typo there I think

mild topaz Oct 21, 2020, 9:27 AM

#

some typo there I think
@twilit wind means bro?

twilit wind Oct 21, 2020, 9:27 AM

#

Bro the code seems ok

#

Do you have any other file where you are running the code @mild topaz

#

ay file for flask

mild topaz Oct 21, 2020, 9:29 AM

#

i have a model file @twilit wind

twilit wind Oct 21, 2020, 9:29 AM

#

and any app.py file

#

?

#

@mild topaz

mild topaz Oct 21, 2020, 9:30 AM

#

no @twilit wind

twilit wind Oct 21, 2020, 9:31 AM

#

You are predicting the country name by its image I guess @mild topaz

mild topaz Oct 21, 2020, 9:31 AM

#

yes @twilit wind

twilit wind Oct 21, 2020, 9:34 AM

#

I will let you know if I find any, Now I am not able to find any @mild topaz

#

srry

mild topaz Oct 21, 2020, 9:37 AM

#

ok np

verbal sand Oct 21, 2020, 10:00 AM

#

@velvet thorn isn't there any library?

For the string "one-third" - I though of mapping one with 1 and third with 3 and it becomes 1-3. How do I give it the meaning that the hyphen (-) in "1-3" should be considered as a division and not like "one-three days"?

velvet thorn Oct 21, 2020, 10:01 AM

#

@velvet thorn isn't there any library?

For the string "one-third" - I though of mapping one with 1 and third with 3 and it becomes 1-3. How do I give it the meaning that the hyphen (-) in "1-3" should be considered as a division and not like "one-three days"?
@verbal sand beats me

#

what do you mean?

#

like do you want to convert it into a number?

#

I suggest a regex

verbal sand Oct 21, 2020, 10:05 AM

#

I mean that since the text is doctors's prescription so there can be texts like "one-third of tablet", "one-three days".
The first one mean 1/3 of the tablet while the other means 1 to 3 days.
If I map one with 1 and third with 3 and three with 3 then after replacing with their corresponding texts, it becomes "1-3 of tablet" and "1-3 days". Now, how do I distinguish whether the 3 in both sentences is to be understood as dividing the 1 or just the upper range(1 to 3 days of range).

#

@velvet thorn

what do you mean?
@velvet thorn

#

yes I do want to convert it into number. Later the amount of medicine can be converted into some fractional value. I wanted that to know how much dose a patient takes.

mild topaz Oct 21, 2020, 11:20 AM

#

updated code https://paste.pythondiscord.com/olisidijub.py and my error python Traceback (most recent call last): File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request rv = self.dispatch_request() File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper resp = resource(*args, **kwargs) File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view return self.dispatch_request(*args, **kwargs) File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request resp = meth(*args, **kwargs) File "E:\demo3\findDocumentType1.py", line 126, in post self.resize_im(image_data) File "E:\demo3\findDocumentType1.py", line 231, in resize_im self.getclassname(classNo) NameError: name 'classNo' is not defined

raw mortar Oct 21, 2020, 12:05 PM

#

@mild topaz the variable classNo is not defined
the error message is pretty clear

mild topaz Oct 21, 2020, 12:07 PM

#

https://paste.pythondiscord.com/kasadoxiyo.py line 233 is not printing

lapis sequoia Oct 21, 2020, 12:37 PM

#

is 0.873 adj R square goo enough

vague bear Oct 21, 2020, 12:43 PM

#

Hi guys. Can anyone explain a what a cost function is for a non-math person like me please? The lesson I'm watching introduce us to this equation and said "for simplicity, half of this value is considered the cost function through the derivative process"

#

📎 unknown.png

#

I have absolutely no clue what that means

lapis sequoia Oct 21, 2020, 1:04 PM

#

What are the next steps after I finish my regression in statsmdels

halcyon vale Oct 21, 2020, 1:29 PM

#

If you guys are interested in Natural Language Processing. Here,

#

https://github.com/ThinamXx/66Days__NaturalLanguageProcessing/blob/master/README.md

GitHub

ThinamXx/66Days__NaturalLanguageProcessing

I am sharing my Journey of 66DaysofData in Natural Language Processing. - ThinamXx/66Days__NaturalLanguageProcessing

raw mortar Oct 21, 2020, 2:05 PM

#

@vague bear that's squared error
https://en.m.wikipedia.org/wiki/Mean_squared_error
It's used to find how good the model is
Lesser the value, relatively it's a better model

Mean squared error

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the act...

#

@mild topaz are you trying to make a rest API which takes a base64 image as input and do some prediction with it?

mild topaz Oct 21, 2020, 2:07 PM

#

yes

lapis sequoia Oct 21, 2020, 2:08 PM

#

what should I do are I output my OLS model

#

after

raw mortar Oct 21, 2020, 2:09 PM

#

@mild topaz I'm not very familiar with flash restful, but where are you going wrong?

mild topaz Oct 21, 2020, 2:13 PM

#

@raw mortar give me some time , as soon as get free i will ping u

vague bear Oct 21, 2020, 2:21 PM

#

@raw mortar thanks, i'll do some readings now. I searched cost function on YT and didn't find anything

#

how can you identify it as mean square error? The equation looks differently in the wiki

#

this one?

📎 unknown.png

raw mortar Oct 21, 2020, 2:33 PM

#

@vague bear its not mse, its squared error, in the wiki look for the loss function part

#

mse is when you divide it by num of data points, 1/2 is just squared error

lapis sequoia Oct 21, 2020, 2:38 PM

#

How can I get my regression equation in statsmodels

raw mortar Oct 21, 2020, 2:40 PM

#

@vague bear https://datascience.stackexchange.com/questions/10188/why-do-cost-functions-use-the-square-error
here this has a better explanation

Data Science Stack Exchange

Why do cost functions use the square error?

I'm just getting started with some machine learning, and until now I have been dealing with linear regression over one variable.

I have learnt that there is a hypothesis, which is:

$h_\theta(x)=\

vague bear Oct 21, 2020, 2:41 PM

#

ohh

#

a loss function or cost function is a function that .....

#

I see, thanks

raw mortar Oct 21, 2020, 2:43 PM

#

ya, it is used interchangeably, but some prefer to say its a loss when its a single data point and cost when all the points are considered

#

and there is no consistency in the expressions 🤦

vague bear Oct 21, 2020, 2:45 PM

#

I looked up some tutorial in my language and it uses pi to represents probability. is that normal

raw mortar Oct 21, 2020, 2:46 PM

#

nope have not seen that one

vague bear Oct 21, 2020, 2:47 PM

#

ic

lapis sequoia Oct 21, 2020, 2:50 PM

#

How can I get my regression equation in statsmodels

raw mortar Oct 21, 2020, 2:52 PM

#

@lapis sequoia i don't quite understand your question, someone else might answer it

lapis sequoia Oct 21, 2020, 2:56 PM

#

@raw mortar I ran OLS regression and got the output table. Is there a way to look at the equation for it

raw mortar Oct 21, 2020, 2:59 PM

#

@lapis sequoia the equations would remain the same i think, probably just google for the implemention docs it ge the exact equation

lapis sequoia Oct 21, 2020, 3:00 PM

#

@raw mortar What do you mean?

#

Like is there a way to export it to excel and then plug in the item in each variable to get the output of my dependent

raw mortar Oct 21, 2020, 3:01 PM

#

oh you want to make predictions from the model ?

lapis sequoia Oct 21, 2020, 3:02 PM

#

my dependent var is labor hrs. independent variables (using dummies) are product, customer, build type,product config. Want to be able to plug in certain customers and products for example to get my labor hours output

#

yes exactly

raw mortar Oct 21, 2020, 3:04 PM

#

let me look it up, have not used ols in statmodels before

lapis sequoia Oct 21, 2020, 3:08 PM

#

thank you

raw mortar Oct 21, 2020, 3:11 PM

#

@lapis sequoia https://realpython.com/linear-regression-in-python/
initialize, fit and predict

import statsmodels.api as sm
model = sm.OLS(y, x)
results = model.fit()
results.predict(x)

Linear Regression in Python – Real Python

In this step-by-step tutorial, you'll get started with linear regression in Python. Linear regression is one of the fundamental statistical and machine learning techniques, and Python is a popular choice for machine learning.

vague bear Oct 21, 2020, 3:11 PM

#

is confusion matrix used a lot?

lapis sequoia Oct 21, 2020, 3:11 PM

#

@raw mortar yeah I have the results. so i just use results.predict(x) to predict the y?

raw mortar Oct 21, 2020, 3:12 PM

#

@vague bear yep, usually in classification problems

hollow sentinel Oct 21, 2020, 3:12 PM

#

📎 unknown.png

#

guys what does it mean when your data set does that

#

it's all faded

vague bear Oct 21, 2020, 3:12 PM

#

I see. The correct ones are churn 1,1 and churn 0,0 right

hollow sentinel Oct 21, 2020, 3:13 PM

#

i'm looking at this https://www.kaggle.com/sudalairajkumar/chennai-water-management

Chennai Water Management

Water resources availability data for Chennai

#

is there something wrong with the data

#

is it the color scheme?

lapis sequoia Oct 21, 2020, 3:15 PM

#

@raw mortar do you know if there is a way to export it into excel and just make a dropdown to choose the x variables I want to include to predict the y

raw mortar Oct 21, 2020, 3:18 PM

#

@lapis sequoia not sure about that one though, might be possible

hollow sentinel Oct 21, 2020, 3:25 PM

#

also what does a distplot show

#

distribution?

lapis sequoia Oct 21, 2020, 3:39 PM

#

@raw mortar I'll try to figure it out

lapis sequoia Oct 21, 2020, 4:00 PM

#

Anything else I should do in the meantime after my regression

hollow sentinel Oct 21, 2020, 4:15 PM

#

📎 unknown.png

#

so does this mean it's a good or bad model

#

i'm gonna go out on a limb and say it's bad

bitter harbor Oct 21, 2020, 4:44 PM

#

I've seen worse acc

lapis sequoia Oct 21, 2020, 4:50 PM

#

how can I check which variables are most important/drive the dependent variable the most

#

Do I just use the std coeff

hollow sentinel Oct 21, 2020, 5:06 PM

#

hahah I like your username @bitter harbor

lapis sequoia Oct 21, 2020, 5:24 PM

#

Why are some of my independent variables showing twice in my summary table

cedar sky Oct 21, 2020, 5:26 PM

#

Can anyone recommend the best way to start with reinforcement learning and I am good with most of the deep learning concepts

heady hatch Oct 21, 2020, 5:43 PM

#

My only knowledge of reinforcement learning is the library Gym.

#

Maybe you can start with their docs.

https://gym.openai.com/docs/

or if someone else has a better source of information.

Gym: A toolkit for developing and comparing reinforcement learning ...

lapis sequoia Oct 21, 2020, 5:46 PM

#

guys what does it mean when your data set does that
@hollow sentinel fade and not fade just shows the density. If there are many points at single point it will become darker. Check the alpha or transparency value when you plot.

heady hatch Oct 21, 2020, 5:51 PM

#

Hey guys question on validation data.

Epoch 1/10
1250/1250 [==============================] - 361s 289ms/step - loss: 5.0271 - accuracy: 0.3601 - val_loss: 1.1977 - val_accuracy: 0.5984
Epoch 2/10
1250/1250 [==============================] - 360s 288ms/step - loss: 1.3753 - accuracy: 0.5232 - val_loss: 0.7962 - val_accuracy: 0.7531
Epoch 3/10
1250/1250 [==============================] - 359s 287ms/step - loss: 1.0479 - accuracy: 0.6364 - val_loss: 0.5072 - val_accuracy: 0.8499
Epoch 4/10
1250/1250 [==============================] - 363s 291ms/step - loss: 0.7664 - accuracy: 0.7330 - val_loss: 0.2894 - val_accuracy: 0.9197
Epoch 5/10
1250/1250 [==============================] - 360s 288ms/step - loss: 0.5792 - accuracy: 0.7965 - val_loss: 0.1755 - val_accuracy: 0.9532
Epoch 6/10
1221/1250 [============================>.] - ETA: 7s - loss: 0.4574 - accuracy: 0.8416

The epochs haven't finished yet, but it feels like I'm heavily overfitting on the validation data.

lapis sequoia Oct 21, 2020, 6:03 PM

#

@raw mortar results.predict(x) doesn'twork

lone osprey Oct 21, 2020, 6:16 PM

#

Guys

#

In tutorials anywhere, I can see only basics of ml

#

I can't find like that goes deeper

lapis sequoia Oct 21, 2020, 6:17 PM

#

Like what

lone osprey Oct 21, 2020, 6:17 PM

#

They don't give some deep like pd iloc functions etc...

#

I need to learn that

#

Where can I find it?

#

I know pandas basics

#

I onow numpy basics

#

I know ml basics

heady hatch Oct 21, 2020, 6:18 PM

#

What's your definition of ml basics?

lone osprey Oct 21, 2020, 6:18 PM

#

But I want to learn deeper in numpy, pandas

#

Ml basics mean I know main algorithms like regression, Knn, etc..

#

In scikit learn

#

Any tutorial I can learn deep??

lapis sequoia Oct 21, 2020, 6:19 PM

#

scikit learn sucks balls

#

use statsmodels bro

heady hatch Oct 21, 2020, 6:19 PM

#

deep learning?

lone osprey Oct 21, 2020, 6:19 PM

#

First I need to learn numpy and pandas deep

#

Those tutorials plz???

heady hatch Oct 21, 2020, 6:20 PM

#

For deeper in NumPy and Pandas, here are some exercises.

https://github.com/rougier/numpy-100/blob/master/100_Numpy_exercises.md

https://github.com/ajcr/100-pandas-puzzles

GitHub

rougier/numpy-100

100 numpy exercises (with solutions). Contribute to rougier/numpy-100 development by creating an account on GitHub.

GitHub

ajcr/100-pandas-puzzles

100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete) - ajcr/100-pandas-puzzles

#

Look at the exercises down there.

lapis sequoia Oct 21, 2020, 6:20 PM

#

@heady hatch you ever used statsmdels

lone osprey Oct 21, 2020, 6:20 PM

#

Kk, thanks

heady hatch Oct 21, 2020, 6:20 PM

#

I have.

#

Why do you ask?

lapis sequoia Oct 21, 2020, 6:20 PM

#

Is there a way to export my linear regression model into Excel

#

And use in Excel to predict my y var

heady hatch Oct 21, 2020, 6:21 PM

#

I don't really use excel so I can't give any advice on that.

lone osprey Oct 21, 2020, 6:21 PM

#

I can learn by exercises?

heady hatch Oct 21, 2020, 6:21 PM

#

but

#

https://stackoverflow.com/questions/16420407/python-statsmodels-ols-how-to-save-learned-model-to-file

Stack Overflow

Python statsmodels OLS: how to save learned model to file

I am trying to learn an ordinary least squares model using Python's statsmodels library, as described here.

sm.OLS.fit() returns the learned model. Is there a way to save it to the file and reload...

lapis sequoia Oct 21, 2020, 6:21 PM

#

ty

heady hatch Oct 21, 2020, 6:21 PM

#

If you can somehow read those files into excel.

lone osprey Oct 21, 2020, 6:21 PM

#

I can learn by exercises??

heady hatch Oct 21, 2020, 6:22 PM

#

I can't answer that question for you. hahaha

#

You know yourself better.

lone osprey Oct 21, 2020, 6:22 PM

#

Kk

#

I prefer tutorials but

#

So, I tought u know

heady hatch Oct 21, 2020, 6:23 PM

#

Then I would probably google advance numpy or pandas tutorial.

lone osprey Oct 21, 2020, 6:23 PM

#

Google may give more resources

#

U r expert ppl

heady hatch Oct 21, 2020, 6:24 PM

#

Hey guys,

question on validation.

📎 unknown.png

#

📎 unknown.png

#

My validation accuracy is significantly higher, I was wondering what I could be doing wrong.

#

It's on Cifar 10 dataset.

#

I realized I forgot to check for dataset imbalance.

serene scaffold Oct 21, 2020, 6:26 PM

#

I'm trying to do polynomial regression and I have an array of what the coefficients should be. The number of terms varies. Once I know how far off the prediction was along the y axis, what adjustment am I supposed to make to the coefficients?

heady hatch Oct 21, 2020, 6:28 PM

#

I'm not super familiar with stats.

but how come you're manually adjusting the coefficients?

serene scaffold Oct 21, 2020, 6:29 PM

#

how else would I make sure that the curve is correct?

heady hatch Oct 21, 2020, 6:29 PM

#

Are you not able to base that off of your error?

serene scaffold Oct 21, 2020, 6:30 PM

#

I'm not sure what to do with the error once I have it.

heady hatch Oct 21, 2020, 6:30 PM

#

Oh are you manually calculating the regression?

serene scaffold Oct 21, 2020, 6:30 PM

#

yes, I have to show that I understand how it works.

#

the sample code is in perl 😦

lapis sequoia Oct 21, 2020, 6:32 PM

#

Is a 0.873 adjusted r-squared good?

heady hatch Oct 21, 2020, 6:32 PM

#

Oh man. I'm currently looking up how to calculate regression to give you further thoughts.

serene scaffold Oct 21, 2020, 6:32 PM

#

I appreciate it

lapis sequoia Oct 21, 2020, 6:32 PM

#

@serene scaffold You using statsmodels?

serene scaffold Oct 21, 2020, 6:33 PM

#

@lapis sequoia no, I'm using numpy. I can't use anything that eliminates the need to show how the math works.

lapis sequoia Oct 21, 2020, 6:33 PM

#

Damn idk bro

serene scaffold Oct 21, 2020, 6:33 PM

#

that's okay. thank you.

lapis sequoia Oct 21, 2020, 6:33 PM

#

I'm new to this

#

@heady hatch Is there a way to have my equation show in statsmodels

#

for my linear model

heady hatch Oct 21, 2020, 6:36 PM

#

What do you mean by equation?

lapis sequoia Oct 21, 2020, 6:36 PM

#

my linear formula

heady hatch Oct 21, 2020, 6:37 PM

#

@serene scaffold

I don't know if this is relevant.

http://polynomialregression.drque.net/math.html

From how they're calculating the coefficients, they're using a system of equation to solve for it. And I guess in your case, do you have the data points?

#

If so, you might be able to do the same.

#

@lapis sequoia I'm still unsure of what you mean. Like you want the coefficients?

serene scaffold Oct 21, 2020, 6:38 PM

#

@heady hatch let me look at this. Thanks!

lapis sequoia Oct 21, 2020, 6:40 PM

#

yeah the coefficient and constant inputting the x variables

#

to get the y variable

heady hatch Oct 21, 2020, 6:41 PM

#

I think there's a coefficient method to get it from the models.

#

So after you fit it, you can get the coefficients via the methods.

#

https://stackoverflow.com/questions/47388258/how-to-extract-the-regression-coefficient-from-statsmodels-api

Stack Overflow

How to extract the regression coefficient from statsmodels.api?

result = sm.OLS(gold_lookback, silver_lookback ).fit()
After I get the result, how can I get the coefficient and the constant?

In other words, if
y = ax + c
how to get the values a and c?

lapis sequoia Oct 21, 2020, 6:43 PM

#

Let me try that thank you

serene scaffold Oct 21, 2020, 6:44 PM

#

let me see if I understand correctly

#

basically given my training data, which is a list of (x, y) points, if I want to find the best-fit curve, I should start with a polynomial function y = a * (x ** 1) + b * (x ** 2) + ...

#

and if I have an array of [a, b, ...] then I'll have the answer

#

so the goal is to solve for [a, b, ...] for each instance of (x, y), multiply that array by the alpha, and add that to the weights?

#

does that sound right @heady hatch?

heady hatch Oct 21, 2020, 6:50 PM

#

That's from my understanding of how regressions work.

#

That's not to take into consideration of regularization or anything.

serene scaffold Oct 21, 2020, 6:50 PM

#

I don't think I have to do that

lapis sequoia Oct 21, 2020, 7:00 PM

#

My validation accuracy is significantly higher, I was wondering what I could be doing wrong.
@heady hatch Check if you are splitting the data properly. And that there is no data leakage. It is very rare to have situation like show above.

heady hatch Oct 21, 2020, 7:00 PM

#

I'm not splitting the data myself. It's presplit.

#

Good point about data leakage.

#

So it's the cifar 10 dataset.

They've split the data into train and test already.

40000 training images
10000 test images

I did add couple things to the training dataset pipeline that I didn't for the testing dataset pipeline. Such as shuffling the data and repeating it.

Though I was under the impression that I'm not supposed to shuffle the test data.

lapis sequoia Oct 21, 2020, 7:06 PM

#

@heady hatch That worked thanks

#

I coulda just used it from the table too

#

Is 0.873 adj r squared good enough

#

I want to use the model to be able to use the dependent variable (labor hours) as a benchmark based on the independent variables (product, customer, config, build type)

heady hatch Oct 21, 2020, 7:08 PM

#

Depending on your problem. Is adjusted r squared the metric you want to look at?

lapis sequoia Oct 21, 2020, 7:08 PM

#

yeah

#

since I have multiple independent variales

#

@heady hatch
Split the 40K into 35K and 5K.
Well here shuffling should not do anything and do a stratified split.
Also check if you are plotting right legends. Maybe you are confusing train and vlad while plot.

heady hatch Oct 21, 2020, 7:15 PM

#

@lapis sequoia

So should I leave the test set as a holdout?

There's no simple way to do a stratified split with Tensorflow, is there? I would have to redo the data pipeline and make the test dataset myself.

Thank you for bringing the labelling up, I double checked and they are the correct labels.

lapis sequoia Oct 21, 2020, 7:16 PM

#

What should my p values be

heady hatch Oct 21, 2020, 7:17 PM

#

This is how I'm constructing the data pipeline.

train_ds = (raw_train_dataset.map(parse_image_function)
                  .map(process_image)
                  .repeat()
                  .shuffle(buffer_size=20000)
                  .batch(batch_size=32)
                  .prefetch(buffer_size=100)
)


test_ds = (raw_test_dataset.map(parse_image_function)
                  .map(process_image)
                  .shuffle(buffer_size=5000)
                  .batch(batch_size=32)
                  .prefetch(buffer_size=100)
)

#

process_image standardize the images and resize them.

lapis sequoia Oct 21, 2020, 7:29 PM

#

@heady hatch my model isnt predicting certain mixes well it seems

heady hatch Oct 21, 2020, 7:29 PM

#

certain mixes?

lapis sequoia Oct 21, 2020, 7:29 PM

#

ya like

#

product 2 to customer 5 with build type A

#

etc

#

looks different than historical avg

heady hatch Oct 21, 2020, 7:30 PM

#

Ahh well now here's something to consider.

#

Is r squared the metric you want to look at?

lapis sequoia Oct 21, 2020, 7:30 PM

#

adjusted r sq

heady hatch Oct 21, 2020, 7:30 PM

#

What I mean by this is you don't necessary need to change r squared if that's the few metrics you can get.

#

Because think about the definition of what r squared means.

#

R squared means the goodness of fit.

lapis sequoia Oct 21, 2020, 7:31 PM

#

yea

heady hatch Oct 21, 2020, 7:31 PM

#

But it doesn't necessarily talk about the actual problem itself.

#

It's just a proxy metric for something else you care about.

#

Because yea .81 r squared could be good.

#

But not if it's constantly making mistakes on a particular group of people or product.

#

I don't know your actual problem so you'd have to determine that yourself.

lapis sequoia Oct 21, 2020, 7:33 PM

#

yeah the errors are high for some

heady hatch Oct 21, 2020, 7:33 PM

#

Maybe it's okay for it to keep making mistakes on certain things.