#data-science-and-ml

1 messages ยท Page 261 of 1

proper fable
#

thanks in advance

grave frost
#

@proper fable just explore other EDA notebooks there in Kaggle, by using the search bar in the notebooks section

#

@lapis sequoia The math required for ML/AI is pretty dependent on the task you are doing - simple tasks, simple math complex tasks, complex maths. I think calculus and Algebra basics should be pretty good for general Machine Learning and knowledge about vectors/matrices (usually taught in C.S in schools) would be very helpful too.

proper fable
#

@proper fable just explore other EDA notebooks there in Kaggle, by using the search bar in the notebooks section
@grave frost Thankyouuu that helps me a lot. I dont know that I can do such
a thing before

grave frost
#

np

wild spoke
#

word_vecs = KeyedVectors.load_word2vec_format("./glove.txt") how do get the "glove.txt" file or how do i generate it?

#

I am using gensim.models

lapis sequoia
#

spaCy: Are vocabularies a set of just the words of all analyzed documents or a set beyond former?

mild topaz
#
2020-10-17 18:32:05,249 findDocumentType1 MainThread : test!
2020-10-17 18:32:11,981 findDocumentType1 Thread-19 : Exception on /findDocumentType1 [POST]
Traceback (most recent call last):
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper
    resp = resource(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view
    return self.dispatch_request(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request
    resp = meth(*args, **kwargs)
TypeError: post() takes 0 positional arguments but 1 was given```
dense knot
#

Guys, i'm a beginner in python. Do you guys have reference code for random forest algorithm without using scikit learn (sklearn) in jupyter notebook. Thank you.

sweet ember
#

Hey guys, I ended up getting a gig on DS in freelancing. I have a dataset of users location and activity location and time data from which I have to find how much time the users spends in a specific location.

#

Is there a way to do it?

#

id, time, user_x, user_y, act_x, act_y, activity are the features

#

ids repeat and activity coordinates repeat sometimes.

heady hatch
#

What are the features like?

lapis sequoia
#

@lapis sequoia The math required for ML/AI is pretty dependent on the task you are doing - simple tasks, simple math complex tasks, complex maths. I think calculus and Algebra basics should be pretty good for general Machine Learning and knowledge about vectors/matrices (usually taught in C.S in schools) would be very helpful too.
@grave frost thank you for your help. wish you the best rhings

heady hatch
#

Hello wonderful people,

asking for advice here. I'm doing a semantic search using roberta embeddings, but it's trained on a max length of 512.

But the text data I'm working with are double that. Should I truncate the text data?

#

My end goal is get the embeddings to compare.

#

Or should I not go with fancy approach and go with something simpler like tfidf due to the text length?

heady hatch
#

I think I was able to solve the previous problem just taking a naive approach.

Now I have a new question. Does it matter of the batch size when we're encoding for the embeddings?

ie getting embeddings at batch size 32 vs 256. Using the embeddings only for comparison.

#

I'm aware batch size makes a difference when doing downstream tasks, but what about encoding the actual embedding?

tidal bronze
#
self.df["64gb"] = np.where("64" in self.df["title"], True, False)

returns False but it should work

limpid raft
#

I don't understand why i is a str and not an integer. Also, how could I iterate over this list?:

`lst = [('someting1'), ('something2')]

for i in lst:

first_lst = lst[i].split('|')

`

lapis sequoia
#

@limpid raft i is not always an integer ,in this case i can be ('something1' ) or ('something2')

#
lst = ['someting1', 'something2']
first_lst = lst[0]
limpid raft
#

@lapis sequoia Does it take then the type lst? and what if lst is a list of integers and strings, what does i become in that case? And is it possible to not iterate over this list manually?

lapis sequoia
#

@limpid raft Always lst[0] will be the first it doesn't matter int or str

#

to get all of them there two options you can use While or For loop

#
lst = ['someting1', 'something2']
for lsts in lst:
 print(lsts)
#
lst = ['someting1', 'something2']
i = 0
while i < len(lst):
 print(lst[i])
 i += 1
limpid raft
#

ahh, so lsts[0] would then be something1. But, does the 'in' statement create the variable lsts such that it has the same type as lst?

#

From my understanding it's purpose is to check if a value is present in a sequence (range, list,etc). Is the 'for' loop forcing the type lst onto lsts?

tender umbra
#

Hi, does anyone here worked on graph neural networks?

#

I am looking for efficient implications of SOTAs in graph representation learning. Need to deploy model that works on huge number of small relatively sparse graphs (<100k nodes). Wondering which package would be best etc.

shell berry
#

Can someone please help me understand the X and y inputs to scikit-learn's linear regression? I have a list of X points and a list of corresponding Y points.

austere swift
#

X is the features, y is the labels

#

thats the most basic way of understanding it

#

or in the case of linear regression you can think of it like regressing on a graph with x and y variables

shell berry
#

@austere swift thanks, but when I try it says the sizes of the lists are wrong even though they're both 1x5000

austere swift
#

whats the exact error message?

shell berry
#

and I have a linear relationship

austere swift
#

so sklearn doesnt like lists that look like [a, b, c, d], it wants lists like [[a], [b], [c], [d]]

shell berry
#

Ah I see

austere swift
#

so thats why its asking you to do the array.reshape(-1, 1) thing

#

so you can just reshape it like that

shell berry
#
x = np.reshape(mapping_x, (-1,1))
y = np.reshape(mapping_y, (-1,1))

reg = LinearRegression().fit(x, y)```
#

Same error with this

austere swift
#

try only reshaping the x variable, not y

shell berry
#

Same thing

#

nvm, it worked. thanks!

gray sedge
#

is web scraping data science

#

if web scraping isn't data science can someone tell me where to ask a beautifulsoup question

regal belfry
#

Whats the best way to do a column level compare between two dataframes in pandas?

velvet thorn
#

Whats the best way to do a column level compare between two dataframes in pandas?
@regal belfry what od you mean column level compare

tidal bronze
regal belfry
#

@regal belfry what od you mean column level compare
@velvet thorn if df1.column == df2.column then show all matching rows

lapis sequoia
#

why doi get this error

#
UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
  warnings.warn(msg, UndefinedMetricWarning)
#

im trying to predict data from a csv file

#

ping

#

r is undefined i think

#

no

#

r is not even there in my code

#

sry im newbie

#

idk

sweet ember
#

Hey guys am getting barplot behid catplot, i only want catplot. How do I remove bar graphs and the lines fro the chart?

lapis sequoia
#

if web scraping isn't data science can someone tell me where to ask a beautifulsoup question
@gray sedge You can ask it here too and try in Web Dev channel.

#

Hey guys am getting barplot behid catplot, i only want catplot. How do I remove bar graphs and the lines fro the chart?
@sweet ember give the code that you are using to generate the plot.

pure pond
#

Is ROOT well known/respected/w.e in the data science community? I'm doing a physics masters using it and might be interested in going into data science after

tidal bronze
#

hey, I've made a scraper that will monitor ads posted to craiglist for certain categories and compare against the average price in order to identify bargains

#

is there any other rules you guy would suggest, I was thinking if item is 30% cheaper than the average, notify me

#

but maybe average is not the best metric to use?

grave frost
#

@pure pond BTW What is ROOT?

lapis sequoia
#

can someone help?
ERROR: Could not install packages due to an EnvironmentError: [Errno 2] No such file or directory: 'C:\\Users\\HP\\AppData\\Local\\Packages\\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\\LocalCache\\local-packages\\Python37\\site-packages\\sklearn\\datasets\\tests\\data\\openml\\292\\api-v1-json-data-list-data_name-australian-limit-2-data_version-1-status-deactivated.json.gz'
i get this error when trying to install sklearn
i upgraded pip and it got fixed

#

ugh now i get this

ImportError: cannot import name '__check_build' from 'sklearn' (C:\Users\HP\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\sklearn\__init__.py)

this is the code:

# make predictions
from pandas import read_csv
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC
# Load dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv"
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']
dataset = read_csv(url, names=names)
# Split-out validation dataset
array = dataset.values
X = array[:,0:4]
y = array[:,4]
X_train, X_validation, Y_train, Y_validation = train_test_split(X, y, test_size=0.20, random_state=1)
# Make predictions on validation dataset
model = SVC(gamma='auto')
model.fit(X_train, Y_train)
predictions = model.predict(X_validation)
# Evaluate predictions
print(accuracy_score(Y_validation, predictions))
print(confusion_matrix(Y_validation, predictions))
print(classification_report(Y_validation, predictions))
hollow sentinel
#

me doing a udemy course on data science

hasty thorn
#

can anyone suggest some resources for learning NLTK sentiment analysis

plucky zephyr
#

if i plot error like this, it is overfit right?
so i just need to stop iteration early to make it not overfit?

x-axis = iteration
y-axis = rmse

gilded shadow
#

wow ya looks like after 2 iterations it's there ๐Ÿ™ƒ

dusky carbon
#

hey guys, i'm having some trouble printing zero values from my dataframe/panda code

#

this is my code and output, i just want it to ALSO print the data for the ones that have a zero value, any ideas?

quiet whale
#

I have imbalance dataset and I've done under sampling with decision tree classifier which give me score of f1=1, looks too good to be true then I saw the confusing matrix and it shows that FN and FP is both 0...

is it a good thing? I'm very new at this. I've also try over and under sampling with SMOTE combined with XGBoost classifier and the best f1 score is 0.46

lapis sequoia
#

so i wanted to ask the math behind test_size

#
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X,y, test_size=)```
#

how can we decide test_size

#

just want to know the math behind it

velvet thorn
#

I have imbalance dataset and I've done under sampling with decision tree classifier which give me score of f1=1, looks too good to be true then I saw the confusing matrix and it shows that FN and FP is both 0...

is it a good thing? I'm very new at this. I've also try over and under sampling with SMOTE combined with XGBoost classifier and the best f1 score is 0.46
@quiet whale you probably have data leakage

#

just want to know the math behind it
@lapis sequoia there's no hard and fast rule but generally 20% or so

lapis sequoia
#

hm

velvet thorn
#

this is my code and output, i just want it to ALSO print the data for the ones that have a zero value, any ideas?
@dusky carbon what do you mean? what are you trying to do?

tall aurora
#

what is data science?

lilac minnow
#

Please help, I wanna use plt.imshow in Flask for research data visualisations. Creating/displaying .jpg or .png isn't helpful as they cannot be updated on the go. Please suggest a way.
plt.imshow because I wanna use colormap and clim.

tall aurora
#

what is data science?

velvet thorn
#

Please help, I wanna use plt.imshow in Flask for research data visualisations. Creating/displaying .jpg or .png isn't helpful as they cannot be updated on the go. Please suggest a way.
plt.imshow because I wanna use colormap and clim.
@lilac minnow what do you mean "use it in flask"?

lilac minnow
#

I wanna create a server for visualising outputs from TF Models, as numpy arrays. plt.imshow works well with Jupyter. But I'm not able to get them to work in flask.

velvet thorn
#

I wanna create a server for visualising outputs from TF Models, as numpy arrays. plt.imshow works well with Jupyter. But I'm not able to get them to work in flask.
@lilac minnow they're two different things...

#

if you want that kind of behaviour, you need JS

lilac minnow
#

@velvet thorn thank you. Can you provide me with any example/template for me to get started?

velvet thorn
#

you're basically saying you want an interactive interface

#

alternatively, you can consider Dash

#

nope, I can't

#

Google should help

quiet whale
#

@quiet whale you probably have data leakage
@velvet thorn ah I did! Thankyou, need to be more careful next time :/

mossy dragon
#

hey guys

#

I want to do a neural network model for sentiment analysis on tweets

#

but I dont have much spare time, so I was considering using mturk or fiver to have people manually label a training set

#

thoughts?

serene scaffold
#

@mossy dragon what topic are the tweets for? Also, were you planning to have two people label everything independently and compare the results?

mossy dragon
#

i dont have a specific topic yet

#

im willing to be flexible on that tbh

#

I wasn't planning on getting different people but that sounds like a good idea

serene scaffold
#

one of my coworkers does sentiment analysis. It tends to be difficult because of sarcasm and especially nuanced texts.

mossy dragon
#

yea

#

this is a school project though

serene scaffold
#

ah

#

I would look for an existing corpus

mossy dragon
#

so not like it needs to be perfect

#

you mean

#

abandon the tweet idea?

serene scaffold
#

no

#

I would see if someone has already made a set of tweets and associated sentiment data

mossy dragon
#

hmm

#

i actually had a similar idea to that

#

i know there is an IMBD dataset containing movie reviews labeled as positive/negative

serene scaffold
#

that sounds pretty good

mossy dragon
#

so i was considering maybe using that to train a model and then classifying tweets about a new movie trailer that was released or a movie that recently came out

#

i haven't really done any neural net models though

#

do you know a sample size that i should aim for?

serene scaffold
#

unfortunately I don't

mossy dragon
#

hmm

serene scaffold
#

I work in an NLP lab and I'm the worst one

#

probably because I spend too much time on discord

mossy dragon
#

lel

#

its a group project

#

i have ~5 other people in the group

serene scaffold
#

what class?

mossy dragon
#

NLP

serene scaffold
#

nice

mossy dragon
#

but still we're all either busy with other classes or working full time, but I'm curious if we could get a decent sized training set if we spent ~1 hour manually labeling data

serene scaffold
#

our annotators are always complaining about how long it takes

#

so my guess is no

#

but if you're just assigning labels to entire documents (rather than individual tokens) I guess that's faster

#

I don't know the exact specification of your assignment but I would be very surprised if your professor wanted you to create your own data set.

mossy dragon
#

oh lol

#

we're not required too

#

but i personally would like to

#

we dont even have to do sentiment analysis, we could do a different method to analyze the text

velvet thorn
#

does it have to involve ML?

#

or can you do some other kind of analysis

#

like maybe something that could be interesting is analysis of document structure?

mossy dragon
#

nah

#

these are my exact instructions

velvet thorn
#

hm it says a single data set so I guess my idea is out

#

but yeah it seems like you're intended to find your own dataset

mossy dragon
#

yea

velvet thorn
#

as opposed to creating one

#

however, I'm like 99% sure there are existing tweet datasets out there

#

for sentiment analysis

#

it's a very common task

#

so you could use that as a baseline and find something interesting to add your own spin on things

#

for example, comparing across geographical regions?

mossy dragon
#

I'd like to modify this and put this on my github for future job searches

#

so i figured it would be more impressive to extract that data myself

#

but i guess i dont have to do that now

velvet thorn
#

so i figured it would be more impressive to extract that data myself
@mossy dragon it would be!

#

but yeah, if it's a group project

#

probably not.

mossy dragon
#

thanks for the help catthumbsup

velvet thorn
#

yw ๐Ÿ™‚

mild topaz
#

hello```python
print("hello")
try:
model = load_model(r"E://demo3//albania_100_model.p")
#model = load_model(r"{path}//{country}_100_model.p")
print("model loaded...")

    except OSError:
        
        logger.debug({
                "Status" : "failed",
                "message" : "model not available"})
        
        return{
               "Status" : "failed",
               "message" : "model not available"}```

in output i am getting as python { "Status" : "failed", "message" : "model not available"}

#

i am not able to load model

autumn veldt
#

Hello everyone,
I am currently looking for a dataset on cholera, do any of you know where to download a dataset about cholera? or do you guys know where I can find the source dataset like this one? https://github.com/soujanyajoshi/Cholera/blob/master/data.xlsx .Because I have searched for the dataset in Kaggle, but the features on the dataset are different.

pure pond
stiff zealot
#

is this correct place to talk about stock market analysis?

lapis sequoia
#

What s that

#

Absolutely not

#

I guess

stiff zealot
#

lol

#

anyone have experience building trading bots?

pure pond
#

Just use an rng stock picker you'll probably outperform other attempts xd

hazy mortar
#

a monkey outperformed most

#

๐Ÿ˜„

marsh tartan
#

how long does it take to train a single-thread ntlk classifier model with 8000 training points and 2000 test points?

#

I'm running on a i7-10750H @4.5ghz

#

or is there an easy way to run it with CUDA?

earnest forge
#

I need advice. What machine learning course should I take?

unique sandal
#

@earnest forge i would highly recommend the complete zero to mastery machine learning course by Andrei Neagoie on Udemy . Its very affordable for its quality and content in my opinion

real geode
#

I got two dictionaries that contain several pandas dataframes on it. The columns and the rows are all the same names however i would like to iterate through the dataframes from each dictionary and run df1.compare(df2) one at the time.

#

is there a way to write a function that will make this quicker instead of writing df1[key1].compare(df2[key1]) for each key in these dictionaries

foggy tundra
#

Hello ! How can i use raw sql queries in flask_sqlalchemy ?

keen prism
#

Hi there!
I'm really new to Python but I want to invite people to take interest in a ML/NLP project. I want us to figure out how to digitize The Turing Digital Archive (http://www.turingarchive.org/) into easy-to-read text.
I'm not sure what the best tool is for the project, so I'm posting this to make interested friends who want to help.
To begin, I was looking at EasyOCR (https://github.com/JaidedAI/EasyOCR) but I don't know if it's the right tool for the job.
We'll be working in conda with Python for this; I personally will be using Windows 10; apart from the experience itself, I think creating one document containing all of Alan Turing's writings will be it's own reward.

wary kelp
lapis sequoia
#

how long does it take to train a single-thread ntlk classifier model with 8000 training points and 2000 test points?
@marsh tartan Well it will depend the configuration of models and not just on the data. A complex model with higher number of parameter will take more time than a simple one.
And to train with GPU for free than you can try using Google Colab which is free for 12 hours in a single run.

#

Anyway if you are just looking for some simple classifier for text than it should not take more than few minutes. Unless your model architecture is very complex.

#

is there a way to write a function that will make this quicker instead of writing df1[key1].compare(df2[key1]) for each key in these dictionaries
@real geode You can convert each datframe into numpy array and compare.
(A==B).all()

#

test if all values of array (A==B) are True.

Note: maybe you also want to test A and B shape, such as A.shape == B.shape

Special cases and alternatives:

It should be noted that:
this solution can have a strange behaviour in a particular case:
if either A or B is empty and the other one contains a single element, then it return True.
For some reason, the comparison A==B returns an empty array, for which the all operator returns True.
Another risk is if A and B don't have the same shape and aren't broadcast-able, then this approach will raise an error.

Source: https://stackoverflow.com/questions/10580676/comparing-two-numpy-arrays-for-equality-element-wise

real geode
#

Thanks for the tip but i managed to find a workaround while still keeping dataframes

#

#Call this function to create crosstab tables
def crosstab_compare(df1cross, df2cross, df1original):
    """
    df1cross = dictionary of pandas dataframe where crosstabs have been performed, the self.
    df2cross = specifies another dictionary of pandas dataframe where crosstab has been performed, the other
    df1original = pandas dataframe non crosstabulated that will be used to extract the list of labels
    The end result is a dictionary
    The tables shown will appear only if results are different from each other
    The function will attempt to compare all dataframes with equal shape. If one dataframe doesnt match with the other, the function will
    continue to work but skip the mismatching dataframe
    """
    question_list = list(df1original.columns)[1:]
    print("Self: Refers to the table that was called first in the arguments")
    comparedf = {}
    
    for k in question_list:
        try:
            comparedf['{}'. format(k)] = df1cross[k].compare(df2cross[k], align_axis='rows')
        except ValueError:
            continue
    return comparedf
    
#

i had the problem where some DFs didn't have the same shape which is why i added the try block

lapis sequoia
#

Is there a lighter-weight alternative to jupyter notebooks?

lapis sequoia
#

Is there a lighter-weight alternative to jupyter notebooks?
@lapis sequoia lighter in what sense ?

#

You can just use VS Code editor as a notebook instead of installing anaconda and everything for jupyter if you want.

#

Also you can use cloud notebook providers like Google Colab which are hosted on VMs. So your system will not have any load and you get decent Machines.

real geode
#

yea I use Visual Studio Code jupyter notebooks for work and is pretty light overall

lapis sequoia
#

My notebook has around 10k lines

#

It takes forever for it to load up

final ocean
#

oof

civic jackal
#

Hi guys, is anyone fimilar with with python script that aling DNA dequence. Have an assignment that I have no idea where to start from

real geode
#

they want you to code a BLAST from scratch?

foggy solar
#

That's a hell of a school project lol

#

Can you use the NCBI API (if it has to use python)?

real geode
#

just use BLAST directly lol i dont know why they would want you to use python just to get there. No need to reinvent the wheel

foggy solar
#

Yeah, I agree. Was just suggesting in case it was a project that required Python scripts. Depends on if it is a bio or computer class. No way in hell a biologist would write their own BLAST scripts.

civic mountain
#

is there a reason my tensor which is [173, 173] is getting resized to [231, 231] when plotted using plt.imshow() ?

civic mountain
#

Okay sorry it was a matplotlib issue.

lilac raven
#

Hello, I was directed here. I have a live graph being plotted from incoming ECG data, the two line plots (heart rate and moving average (called rolling mean in the code)) are updating successfully and moving across the screen, while the scatter plot data is not. The initial set of scatter points gets plotted, but remains static, unlike the line plots. I have to set up line. and scatter. plots a bit differently, so that is probably where the problem lies.

#

using funcAnimation

civic jackal
#

I basically need to do this: Convert align_seqs.py to a Python program that takes the DNA sequences as an input from a single external file and saves the best alignment along with its corresponding score in a single text file (your choice of format and file type) to an appropriate location. No external input should be required; that is, you should still only need to use python align_seq.py to run it. For example, the input file can be a single .csv file with the two example sequences given at the top of the original script.

lapis sequoia
#

Gn

autumn veldt
#

Hello everyone,
I am currently looking for a dataset on cholera, do any of you know where to download a dataset about cholera? or do you guys know where I can find the source dataset like this one? https://github.com/soujanyajoshi/Cholera/blob/master/data.xlsx .Because I have searched for the dataset in Kaggle, but the features on the dataset are different.

tight sparrow
#

hello I'm trying to access google maps using API Key

#

but i'm getting this

   "error_message" : "You must enable Billing on the Google Cloud Project at https://console.cloud.google.com/project/_/billing/enable Learn more a   "results" : [],
   "status" : "REQUEST_DENIED"```
#

any welp for me?

#

Thank You

uneven wind
#

Hello ! I have a problem with pandas and read-Excel feature.
I can't read one of the columns in my excel sheet. The console return this error:

  File "path\to\pandas\core\indexing.py", line 1177, in _validate_read_indexer
    key=key, axis=self.obj._get_axis_name(axis)
KeyError: "None of [Index(['S2007-02', 'S2007-02', 'S2007-02', 'S2007-02', 'S2007-02', 'S2007-02',\n       'S2007-02', 'S2007-02', 'S2007-02', 'S2007-02',\n       ...\n       '1 - New', '1 - New', '3 - Approved', '3 - Approved', '1 - New',\n       '3 - Approved', '3 - Approved', '3 - Approved', '3 - Approved',\n       '1 - New'],\n     dtype='object', length=1043)] are in the [columns]" 

But I don't understand what is those "\n". Moreover, they aren't into the string value.
I checked the column format but I don't saw any return line or space in the data. Someone as any clue to fix this ?
Thanks !

solid mantle
#

Anyone familiar with pymultinest?

#

If you are, kindly dm me

lapis sequoia
#

any welp for me?
@tight sparrow You need to enable billing. Go into Google Cloud console and inside Billing you should be able to see if there is any active billing account.

#

Also check Account Management and enabale the billing if you have closed it in the past. You will need a debit/credit card to do that.

#

hi

#
import sklearn
from sklearn import datasets
from sklearn import svm
from sklearn import metrics
from sklearn.neighbors import KNeighborsClassifier

cancer = datasets.load_breast_cancer()

#print(cancer.feature_names)
#print(cancer.target_names)

x = cancer.data
y = cancer.target

x_train,x_test,y_train,y_test = sklearn.model_selection.train_test_split(x,x,test_size=0.2)

print(x_train,y_train)

classes = ['malignant' 'benign']

clf = svm.SVC()
clf.fit(x_train,y_train)


y_pred = clf.predict(x_test)

acc = metrics.accuracy_score(y_test,y_pred)
print(acc)
#

"got an array of shape {} instead.".format(shape))
ValueError: y should be a 1d array, got an array of shape (455, 30) instead. error

#

Hello ! I have a problem with pandas and read-Excel feature.
I can't read one of the columns in my excel sheet. The console return this error:

  File "path\to\pandas\core\indexing.py", line 1177, in _validate_read_indexer
    key=key, axis=self.obj._get_axis_name(axis)
KeyError: "None of [Index(['S2007-02', 'S2007-02', 'S2007-02', 'S2007-02', 'S2007-02', 'S2007-02',\n       'S2007-02', 'S2007-02', 'S2007-02', 'S2007-02',\n       ...\n       '1 - New', '1 - New', '3 - Approved', '3 - Approved', '1 - New',\n       '3 - Approved', '3 - Approved', '3 - Approved', '3 - Approved',\n       '1 - New'],\n     dtype='object', length=1043)] are in the [columns]" 

But I don't understand what is those "\n". Moreover, they aren't into the string value.
I checked the column format but I don't saw any return line or space in the data. Someone as any clue to fix this ?
Thanks !
@uneven wind \n is used for next line. So it is possible that it is causing the problem. Also Are you passing any other parameters while reading CSV. First try to read without any index and columns. Then choose column and index properly.

#
import sklearn
from sklearn import datasets
from sklearn import svm
from sklearn import metrics
from sklearn.neighbors import KNeighborsClassifier

cancer = datasets.load_breast_cancer()

#print(cancer.feature_names)
#print(cancer.target_names)

x = cancer.data
y = cancer.target

x_train,x_test,y_train,y_test = sklearn.model_selection.train_test_split(x,x,test_size=0.2)

print(x_train,y_train)

classes = ['malignant' 'benign']

clf = svm.SVC()
clf.fit(x_train,y_train)


y_pred = clf.predict(x_test)

acc = metrics.accuracy_score(y_test,y_pred)
print(acc)

@lapis sequoia x_train,x_test,y_train,y_test = sklearn.model_selection.train_test_split(x,y,test_size=0.2)

You made one error here. The input for split should be x and y but you have only passed x and x.

#

OHHH

#

@lapis sequoia thankssss

tight sparrow
#

@lapis sequoia do I need to pay some money to gain access?

lapis sequoia
#

@lapis sequoia do I need to pay some money to gain access?
@tight sparrow You some free quota, after that you need to pay. Free quota would be more than enough if it is for personal project.

#

Also you get free $300 credits when you register. So you have to use them if you want to get access.

tight sparrow
#

okay cool

#

thanks lemon_fingerguns_shades

weary heart
#

Hi, i'm trying to learn about machine learning in these few weeks, is there any youtube or website that can help with oversample,logistic regression,etc? thanks

rain nimbus
#

Hi, i'm trying to learn about machine learning in these few weeks, is there any youtube or website that can help with oversample,logistic regression,etc? thanks
@weary heart andrew ng?

weary heart
#

Thanks i'll look it up ๐Ÿ˜

mild topaz
#

hello
i have a code which creates image from base64 string to image, now i want to resize this image in desired pixels howi i can do this ? can anyone help me in this ?

earnest forge
lapis sequoia
#

hello
i have a code which creates image from base64 string to image, now i want to resize this image in desired pixels howi i can do this ? can anyone help me in this ?
@mild topaz I'm not sure what tool you are using for creating string to image but when you save the file you can change its dpi and figure size.

If you are using matplotlib then you can resize with the help of matplotlib.pyplot.figure and choose the appropriate parameter values for dpi and figsize.

mild topaz
pure swan
#

Am i allowed to ask a question in this channel?

muted patio
#

it is pink line drawn. what does it stand for? what is its meaning?
@earnest forge Correlation?

real geode
#

the line between two variables on a scatter plot is supposed to represent the relation between them

#

isn't this some high school level math?

earnest forge
#

it is correlation. yes

#

I've just checked it

lapis sequoia
#

Can someone help me with pandas regression

earnest forge
#

what exactly?

lapis sequoia
#

You are resizing the image in the code so it should take care of your needs.

#

it is pink line drawn. what does it stand for? what is its meaning?
@earnest forge that is the best linear fit for your data. If you have to approximate your data with some function then that line gives the best result. And it also tells about how x and y are correlated.

#

@earnest forge So I have a bunch of dummy variables right

#

I groupedby/summed by a certain column

#

But the dummy variables got messed up and now show numbers that aren't either 0 or 1.

#

How can I either fix that or make it where all the dummy variable columns greater than 1 get turned into a 1

halcyon vale
hollow sentinel
#

guys what do you like using

#

seaborn

#

or matplotlib

#

for graphs

#

which one is actually worth my time bc i used matplotlib in my last project

#

seaborn has way prettier graphs imo

austere swift
#

I like seaborn cus its a lot prettier

hollow sentinel
#

seaborn seems easier to use for me

austere swift
#

yeah that too

hollow sentinel
#

i've been doing a udemy course on data science & machine learning

#

that's why i've been so quiet

#

Jose Portilla is a beast

lapis sequoia
#

matplotlib is more basic and allows you to do alot custom things. Seaborn is built on top of Matplotlib.

hollow sentinel
#

ohh

austere swift
#

yeah seaborn is just a wrapper for matplotlib that makes it easier to use and has a lot better looking default themes

hollow sentinel
#

yeah i think i'll be using seaborn more often now

lapis sequoia
#

Any help for me

hollow sentinel
#

what did you ask @lapis sequoia

lapis sequoia
#

If the graphs you want are available in seaborn or plotly then you can just use them. The idea of matplotlib is to allow any python programmer complex graphs.

hollow sentinel
#

i

lapis sequoia
#

@hollow sentinel I have a bunch of dummy variables
I groupedby/summed by a certain column
But the dummy variables got messed up and now show numbers that aren't either 0 or 1.
How can I either fix that or make it where all the dummy variable columns greater than 1 get turned into a 1

hollow sentinel
#

i'm traumatized by plotly

#

chloropeth ๐Ÿ˜ฆ

#

idk i remember with pandas you can conditionally select within the dataframe

#

sorry i'm new to this lmao

lapis sequoia
#

same lol

#

@lapis sequoia I'm not able to understand your problem. But yeah if you just want to make a column with max value 1 then it is possible. You can apply some map or apply_map to fix it

hollow sentinel
#

the only thing is that I'm worried I'm not actually learning anything

#

i don't learn from basic udemy videos I learn from projects

#

built different

lapis sequoia
#

@lapis sequoia I created dummy variables for 4 columns

#

Then grouped the rows by a certain column

#

Doing so aggregated all the dummy variables as well, instead of the only column I wanted (as far as I know, there is no way around this)

#

But the dummy variables must be either 0 or 1, some of them have numbers such as 200, 300, 450 etc. So I need all the ones with those numbers to be a 1 so I can perform regression correctly

hollow sentinel
#

you're doing linear regression?

lapis sequoia
#

yeah

hollow sentinel
#

cool I'm still doing data visualization haha

#

noob

lapis sequoia
#

idek how to do that in python lol

hollow sentinel
#

lmao do you want me to email the udemy course notes

lapis sequoia
#

is it complicated lol

#

What are you using the data viz for

hollow sentinel
#

i wanted to do a linear regression on a dataset

#

and it's good to use seaborn for the graph

lapis sequoia
#

Can you not do graphs in statsmodels

#

im using sm

austere swift
#

afaik you can't

hollow sentinel
#

never heard of statsmodels

lapis sequoia
#

didnt know that

hollow sentinel
#

is that another module in python?

grave frost
#

@tall aurora Why do you want to know?

earnest forge
#

I groupedby/summed by a certain column
@lapis sequoia could you provide a bit of your code?

#

guys what do you like using
@hollow sentinel I combine both seaborn and matplotlib. seaborn ain't capable of everything matplotlib can provide you

grave frost
#

@mild topaz If you don't mind me asking, how did you get an Image as base64 string?

hollow sentinel
#

@earnest forge yeah when I look at Kaggle they use both seaborn and matplotlib

#

Kaggle is really good

earnest forge
#

yes

#

if you don't know what to do next - open kaggle ๐Ÿ˜„

grave frost
#

The only thing I like about Kaggle notebooks is that their kernels are reproducible. Apart from that, Kaggle is just a time-waste

hollow sentinel
#

i think i understand linear regression w two variables but i don't understand multiple linear regression

lapis sequoia
#

@earnest forge Which part of the code do you wat

#

want

#

The groupby code?

hollow sentinel
#

linear regression is just a relationship between two variables right

earnest forge
#

The groupby code?
@lapis sequoia yes

grave frost
#

linear regression is just a relationship between two variables right
@hollow sentinel no

hollow sentinel
#

F

#

then what is it

#

i've watched youtube videos on it

grave frost
#

Why are you doing LInear Regression if YOU don't fully unnderstand it?

hollow sentinel
#

i thought i would pick it up as I go

earnest forge
#

either you did something wrong when grouping or values initially were 'bad'

lapis sequoia
#

df = df.groupby(by='Tool').sum()

grave frost
#

@hollow sentinel Linear regression just a simple method to find the relationship between data points using (as the name implies) a linear function as a basis of a relationship. If the data does not exhibit linear relation, then it is useless methods and you are better off using other ways like polynomial regression, etc.

hollow sentinel
#

"Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables"

#

oh

#

allows you to study the relationship between two variables

grave frost
#

Yeah, that def seems a bit off because that implies that the data points are like coordinates (with x and y value) and you find the linear relationship between those 2 variables, but that's not actually the most fundamental one

earnest forge
#

df = df.groupby(by='Tool').sum()
@lapis sequoia
1st: you better not replace initial dataframe, try to save the result to variable named like df_grouped
2nd: make sure data in Tool is in convenient data type

hollow sentinel
#

thank you @grave frost

earnest forge
#

is it int or object?

hollow sentinel
#

um they were probably trying to simplify it so the layman (me) can understand it

grave frost
#

you can find linear relationship in 3D space too

hollow sentinel
#

what

lapis sequoia
#

@earnest forge I was using new dataframes initially yeah but my then I'd have a bunch of cells which was making me confused

#

What do you mean convenient data type?

#

No clue what that is tbh.

#

The tool column is a product serial # if that helps

earnest forge
#

convenient for df.group_by method to work with values

#

in Tool column what type is it?

lapis sequoia
#

I have no idea

#

How do I check that?

#

I'm new to this lol

grave frost
#

@hollow sentinel Imagine the line connecting you to your ceiling fan - that is a line in 3D space. I doubt much data exhibit linear relationship in 3 dimensions, but that doesn't mean it's impossible to do

earnest forge
#

you can check it out using df.dtypes

lapis sequoia
#

@hollow sentinel Two distinct but related variables is how I look at it

#

@earnest forge Let me try that thanks

main pelican
#

Why is my SMTP code not working ( the emails are fake but I use real ones for the errors shown below):

File "scratch.py", line 10, in <module>
server.login(sender_email, password)
File "C:\Users\dhruv_\AppData\Local\Programs\Python\Python38-32\lib\smtplib.py", line 734, in login
raise last_exception
File "C:\Users\dhruv_\AppData\Local\Programs\Python\Python38-32\lib\smtplib.py", line 723, in login
(code, resp) = self.auth(
File "C:\Users\dhruv_\AppData\Local\Programs\Python\Python38-32\lib\smtplib.py", line 646, in auth
raise SMTPAuthenticationError(code, resp)
smtplib.SMTPAuthenticationError: (534, b'5.7.9 Application-specific password required. Learn more at\n5.7.9 https://support.google.com/mail/?p=InvalidSecondFactor x23sm2799418pfc.47 - gsmtp')

lapis sequoia
#

@earnest forge Not working

#

Tool column is the only one that isn't showing up

#

Is that because I grouped it already?

#

My dummy variables all say float64

hollow sentinel
#

@main pelican i like your profile pic of Sokka

earnest forge
#

yes. it may be

main pelican
#

@hollow sentinel lol

lapis sequoia
#

let me retry it

#

I'll rename the groupby df

earnest forge
#

reload the data and group it one more time

#

I'll rename the groupby df
@lapis sequoia good

lapis sequoia
#

Says it is an object

#

and my dummy variables are now uint8

#

My dependent variable still says float64

grave frost
#

Is it just me or does anybody else have problems ssh'ing into a google VM instance?

earnest forge
#

Says it is an object
@lapis sequoia can you show df.head() of the data?

lapis sequoia
#

on the same code?

hollow sentinel
#

I didn't know pandas had it's own data visualization too

#

that's pretty cool

lapis sequoia
#

told you bro

earnest forge
#

on the same dataframe, yes

hollow sentinel
#

yeah but it looks gross

lapis sequoia
#

I did it

#

Did you want me to show it here you mean

earnest forge
#

yes

lapis sequoia
#

uhh

#

Sure give me a second

#

Need to block some info out

#

Everything beginning with LH is a dummy variable

#

I gave them that prefix cause I was trying to fix the aggregation problem

#

@earnest forge

dusky furnace
#

Hey

#

Does anyone know how to plot a pandas window when you run a file.py in a linux terminal?

earnest forge
#

Oh

#

I got what's wrong

#

You count all values in tool and it exceeds space in the memory

lapis sequoia
#

What do you mean?

#

The group is by the tool column but the sum is for the quantity

#

if that makes sense

earnest forge
#

Oh

#

You need to bring values in other columns to int data type. They are percepted by object type by pandas, that's the reason you get these unexpected results

lapis sequoia
#

So all the dummy variables?

earnest forge
#

yes

lapis sequoia
#

@earnest forge How can I change the dtype

#

The dummy variables are showing as float64

earnest forge
#

check df.dtypes one more time. look at the columns which are desirable to be int (if the dtype is float, then left it that, no need to change)
after you decide which columns' data type values to change use the following:
df = df.astype({'column_name':'int32'})

lapis sequoia
#

I have 100+ dummy variable columns

#

is there a way to not set them manually one by one lol

#

Why does the code have to sum thedummy variables i

earnest forge
#

oh, then make it all int, except particular columns:

cols = df.columns
df[cols[your_slice]] = df[cols[your_slice]].apply(pd.to_numeric, errors='coerce')

in df[cols[your_slice]] you must specify all columns except those you do not want to convert to numeric type.

For instance, if you want to keep first and fourth columns as they are, you may apply the following slice: df[cols[[1:4]] = that code above
df[cols[[4::]] = that code above

#

sum method can't summarize values that are not represented as numeric types. so it thinks of it as summarizing string. in the end, it gives you weirdly computed result

strong oasis
#

Do most people going into data science have a masters or can you get in if you have a bachelors (physics)? Been studying machine learning lately so I figured I might apply for some jobs.

lapis sequoia
#

@earnest forge Let me try that out thanks

#

@earnest forge That will fix it ?

earnest forge
#

it must fix it

lapis sequoia
#

Okay let me try it

#

@earnest forge Wait, I'm confused sorry. Should the dummy variables be numeric

#

tool (serial number I want to group by), quantity (dependent variable, what I want to sum), dummy variables (independent variables)

#

are my columns

radiant forge
#

heya! I'm trying to extend pyannote to build a fun NLP app for podcasters. anyone familiar with that lib?

#

trying to make sure it can do the thing i think it can do

#

idea being: running the same set of data through a bunch of different ML algos, and having all the results for the same data tagged. once it gets manually okayed by the EU, the data is marked for each ML set to use for more training data.

#

so, a "master" pyannote annotation with: segments to cut up the source audio, speaker, transcription, sentiment, etc. then once they're all corrected, they can then be cut up by the segment defs to feed the various ML algos.

earnest forge
#

tool (serial number I want to group by), quantity (dependent variable, what I want to sum), dummy variables (independent variables)
@lapis sequoia when you group by tool and aggregate summarization, your grouped daraframe represents sum of values in other columns depending on Tool value.

shell berry
#

Is there a function to turn a list of labels [cat, dog, dog, rat, rat, rat, cat] into a list of class labels, such as [0, 1, 1, 2, 2, 2, 0]?

#

I can't seem to find it on google so apologies if this is trivial

#

in scikit-learn*

tidal bough
#

hmm, this can be coded manually, but I think scikit-learn has one

shell berry
#

Yeah I can use a dict and do it manually but I want to learn the built ins to scikit learn

shell berry
#

Thanks ๐Ÿ™‚

#

Follow up question: Why is my naive bayes model working in scikit learn with just plain text labels for the classes?

#

"dog", "cat", etc. Don't they have to be in an int/vector representation?

tidal bough
#

You might be using a high-level enough feature that it handles all the encoding and prediction for you.

shell berry
#

Oh weird, thanks

lapis sequoia
#

@earnest forge Ya so it would aggregate all of them regardless

#

So which columns am I changing to numeric

#

All of them?

#

@earnest forge Code you gave me isnt working bro

#

syntaxerrro

hollow sentinel
#

F

lapis sequoia
#

dummy variables are your qualitative variables turned to numbers. if you have 1 qualitative variables with multiple categories (for stock markets, Industry could be a dummy variable). lets say industry can be either financial, tech, industrials. You will have 3-1 dummy variables.
@glad mulch I know what a dummy variable is lmao I have that all set up, I was just asking about the data type for it inside pandas

velvet thorn
#

@earnest forge Wait, I'm confused sorry. Should the dummy variables be numeric
@lapis sequoia what other data type would you use?

lapis sequoia
#

Theyโ€™re float run

#

Rn

#

His Code didnโ€™t work anyways

velvet thorn
#

Theyโ€™re float run
@lapis sequoia generally some integer type is appropriate, but honestly it doesn't really matter

lapis sequoia
#

Yeah I feel you

#

@velvet thorn Any insight as to why his code didn't work?

velvet thorn
#

@velvet thorn Any insight as to why his code didn't work?
@lapis sequoia honestly I only skimmed the discussion

#

but if you still need help maybe you can summarise the problem?

lapis sequoia
#

@velvet thorn I have like 90+ dummy variables in my data that I created using pandas. I grouped my data using a product serial # to sum the quantity of hours. Doing this also aggregated the dummy variables , so they show numbers like 500, 294, 348, etc etc instead of just the 0 or 1 like they are supposed to

#

So I am trying to find a way to either fix this or to find a way to just make all the ones > 0 turn to 1

velvet thorn
#

@velvet thorn I have like 90+ dummy variables in my data that I created using pandas. I grouped my data using a product serial # to sum the quantity of hours. Doing this also aggregated the dummy variables , so they show numbers like 500, 294, 348, etc etc instead of just the 0 or 1 like they are supposed to
@lapis sequoia how can you identify the dummy variable columns?

lapis sequoia
#

What do you mean? They all have names

#

And I put a prefix to all of them

#

Cause I was trying to see if I can apply the >0 make it a 1 thing but couldnt figure it out

velvet thorn
#

What do you mean? They all have names
@lapis sequoia like what's the filter you can apply on them

#

okay, I think you said they all start with LH, right?

lapis sequoia
#

yeah that's the prefix I gave them

#

someone said I should give them a common prefix to be able to edit them all at once or somethijng

velvet thorn
#
dummy_cols = [col for col in df.columns if col.startswith('LH')]

df[dummy_cols] = df[dummy_cols].clip(0, 1)
#

should work

lapis sequoia
#

I tried something similar to that and it didn't work, let me try yours I probably had my code fucked up lol

#

@velvet thorn That worked. You're a lifesaver

#

Thank you so much

velvet thorn
#

yw!

lapis sequoia
#

Doing it that way by the replacing doesn't mess up any regression results right?

#

I'd assume not but just making sure ofc

velvet thorn
#

what do you mean?

lapis sequoia
#

Like it will still see it as a regulardummy variable

velvet thorn
#

like does it affect the validity of a regression fit on this?

#

well

lapis sequoia
#

yeah

velvet thorn
#

long story short, yes

lapis sequoia
#

oof

#

how so?

velvet thorn
#

I mean, not in a bad way

#

in the sense that each dummy variable now represents "for this group of results (since you said they're aggregated, right), is <condition> true for at least one of the source rows"

#

when originally it meant "how many source rows was <condition> true for"

#

you get what I mean?

#

that's the effect of the clipping, right

lapis sequoia
#

Kinda

#

My adj r-squared got 0.873

velvet thorn
#

so if that makes sense for your problem

#

that's fine

lapis sequoia
#

Which is good

#

but

velvet thorn
#

adj = adjusted?

lapis sequoia
#

yeah

#

since there are multiple independent variables gotta use adj.

#

the jarque-bera is 25541 lol

#

hmm

velvet thorn
#

the jarque-bera is 25541 lol
@lapis sequoia why does this matter?

lapis sequoia
#

its a goodness of fit test to a normal distribution

#

so shouldn't it be close to 0

velvet thorn
#

why do you think so?

lapis sequoia
#

isnt that how the test works?

velvet thorn
#

I mean, yes

#

but what are you running the test on

#

and why do you think the data must be normally distributed?

lapis sequoia
#

when I did the regression without fixing the dummy variable 0 or 1s I got an adj r square of 0.996 and jarque bera of like 1350

#

I don't think it must be, just seems high

velvet thorn
#

presumably

lapis sequoia
#

@velvet thorn my dependent variable is labor hours. independent variables are product, product config, customer, and build type

#

trying to model our labor hours and DL costs

velvet thorn
#

nothing wrong with non-normality though

lapis sequoia
#

to help the ops guys get a better target

#

oh also

#

going back to your source row thing

#

the reason I grouped them is because the data is set up in the way that each row is labor hours being charged to a certain assembly process

#

but I wanted the total hours for the corresponding product they all went to

#

unless I misunderstood you

velvet thorn
#

sure, that makes sense

#

what do the dummy variables represent then?

lapis sequoia
#

my independent variables which are all non-numeric values

#

so the product, configuration, customer, and build type

#

certain products and customers for example drive the labor hours more

#

Non-numeric**

#

@velvet thorn Do you know if its possible to see which column is driving it more than others

#

Or are you not familiar with statsmodels

glossy osprey
#

Hi everyone. Nice to met you?

#

So, i'm doing a work at my college and i'm needing date about social inequality. Are the date about it?

weary heart
#

hi, i'm new to machine learning, i'm curious .. how do you know if the data is overfitting or underfitting? is it trough test and train result? and if so how do you find test and train result? f1 score ? or else? thanks

austere swift
#

@weary heart yeah mainly its through the training and testing accuracy, if the training accuracy is high but the testing accuracy is low that's overfitting and if the training accuracy is low and the testing accuracy is low too its underfitting

narrow flume
#

Hey guys
is there a way to have [(1, 1, 1, 1, 1) (1, 0, 0, 0, 1) (1, 0, 0, 0, 1) (1, 0, 0, 0, 1) (1, 1, 1, 1, 1)] in one line?
here's my code
a = np.ones((5,1), dtype=[('a', 'i4'), ('b', 'i4'),('c', 'i4'),('d', 'i4'),('e', 'i4')])
print(a)

#

it's numpy array

#

in python btw

weary heart
#

ah okay, so if i use SMOTE and i got this result
how do you know if it's overfitting , normal, or underfitting?


           0       0.97      0.70      0.81     66699
           1       0.28      0.83      0.42      9523

    accuracy                           0.71     76222
   macro avg       0.62      0.76      0.62     76222
weighted avg       0.88      0.71      0.76     76222```
hollow sentinel
#
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
realEstate = pd.read_csv("realEstate.csv")
realEstate.head(5)
sns.pairplot("realEstate")
#

I'm getting an error saying TypeError: 'data' must be pandas DataFrame object, not: <class 'str'>

austere swift
#

dont put quotes around it

hollow sentinel
#

i am a noob sorry

#

lmao

austere swift
#

lol

hollow sentinel
#

it still doesn't show anything

#

I took out the quotes

heady hatch
#

Hey people, learning how to work with images today.

How do I convert images in numpy to binary string?

from a quick search I was able to get to

array.tobytes() #or array.tostring()

but

np.fromstring(array.tobytes())

doesn't give me the original numpy array back.

Any suggestions? or thoughts on what I'm doing wrong?

hollow sentinel
#

idk what's going on

#

have you guys ever seen that next to a jupyter notebook

#

does that mean it's loading?

heady hatch
#

It means it's running.

hollow sentinel
#

oh it's probably bc it's a gigantic dataset

#

should i pick a smaller one i don't wanna deal with this

#

I can't find good numbery datasets everything I find on Kaggle is words

heady hatch
#

When you say numbery datasets do you mean tabular (in table formats)?

hollow sentinel
#

no i mean under each column it's a number not a word

#

like if the columns were price, age, weight, gender, height

heady hatch
#

so like in a table format?

hollow sentinel
#

if that's what it's called yes

heady hatch
#

where each row is a record and column are features?

hollow sentinel
#

yes

#

also the reason why it was taking so long to load was bc the dataset shape was 511, 14

heady hatch
#

I think there are quite a few of those on Kaggle. Here's one famous one.

hollow sentinel
#

is there any way you can pick a smaller sized data set on kaggle

heady hatch
#

You can set a filter.

#

Or you can just grab a subsample of the dataset.

#

Where you only grab a certain number of rows.

#

Let's say you have 511 rows, you only grab 100 of those.

hollow sentinel
#

idk how to do that yet lmao

#

I'll see what I can do

heady hatch
#

hahaha

#

Good luck.

hollow sentinel
#

oh man I need luck to select a couple rows?

#

F

#

that or I might move to another dataset

#

anyways it's also 11 at night so like bedtime

#

gn guys

sacred timber
#

Noob here... for the life of me I can not find out how to get the output plot from sklearn's metrics.plot_confusion_matrix into a tkinter gui - can anyone point me to a reference?

earnest blade
#

has anyone published a research paper in ML here

#

I need some help

#

ny1?

merry ridge
#

I work for a discrete mathematics journal, but my area of research is in math finance and it only has a minor intersection with ML

bitter harbor
#

got a link for that?

lapis sequoia
#

Anyone know any good code/video examples of rainbow deep q learning?

#

There seems to be quite little information on this for whatever reason, it seems like it should be pretty popular

bitter harbor
#

is it not rainbow deep reinforcement learning?

lapis sequoia
#

Yeah I might've got the name wrong

bitter harbor
#

that'd do it lol

lapis sequoia
#

It's using deep q networks tho right

#

But most of the result I found are just people reading the paper

heady hatch
#

Hey guys I have a quick question. I'm using TFRecord to store my numpy array in bytes, then reading the tfrecord.

But after I parse the tfrecord and convert it back to numpy array, the values aren't the same.

ie.

It's an image.
-> Image in numpy array
-> Convert numpy to bytes
-> tfrecord features
-> read tfrecord
-> Turn into tf dataset
-> convert bytes back to numpy

When I took a look at the image, the values has negatives in them. Any advice?

I've also made sure to convert it back from bytes using the original dtype.

hasty grail
#

Can you provide your code?

bitter harbor
#

@lapis sequoia I could be wrong I'm a bit rusty on this but the difference between deep q/deep reinforcement learning is that q learning doesn't use transition probability distribution (or the reward function) associated with the MDP

#

q learning is considered a model-free reinforcement learning algorithm

heady hatch
#
def convert_to_example(image: Dict) -> tf.train.Example:
    """Convert Image to TFRecord ready format"""
    feature = {
        'height': _int64_feature(32),
        'width': _int64_feature(32),
        'channels': _int64_feature(3),
        'label': _int64_feature(image['label']),
        'filename': _bytes_feature(image['filename']),
        'image_raw': _bytes_feature(image['data'].tobytes()),
    }

    return tf.train.Example(features=tf.train.Features(feature=feature))

train_record_file = 'train.tfrecords'

with tf.io.TFRecordWriter(train_record_file) as writer:
    for image in tqdm(train_data):
        tf_example = convert_to_example(image)
        writer.write(tf_example.SerializeToString())

raw_train_dataset = tf.data.TFRecordDataset('train.tfrecords')

I broke it apart into two parts, one to write into TFRecord, one to read from it.

def parse_image_function(ex_proto):
    
    image_feature_desc = {
        'height': tf.io.FixedLenFeature([], tf.int64),
        'width': tf.io.FixedLenFeature([], tf.int64),
        'channels': tf.io.FixedLenFeature([], tf.int64),
        'label': tf.io.FixedLenFeature([], tf.int64),
        'filename': tf.io.FixedLenFeature([], tf.string),
        'image_raw': tf.io.FixedLenFeature([], tf.string),
    }
    example = tf.io.parse_single_example(ex_proto, image_feature_desc)
    
    img_raw = example['image_raw']
    
    return img_raw

for img in raw_train_dataset.map(parse_image_function).take(1):
    print(tf.io.decode_raw(img, np.int8))
#

Please let me know if you need more information.

#

I am trolling. @hasty grail

Thank you so much for your help.

I accidentally converted it into np.int8 instead of np.uint8.

mild topaz
#

If you don't mind me asking, how did you get an Image as base64 string?
@grave frost i am getting an base64 string which i have to decode it to make imafe from it

hasty grail
#

I accidentally converted it into np.int8 instead of np.uint8.
Problem solved I guess xD

mild topaz
#

i am not able to resize image to desired pixels i want

#

@hasty grail sorry to ping u , can u plz look into it ?

#

@ripe crane hello

hasty grail
#

Have you done what I asked yesterday?

mild topaz
#

about what bro ?

hasty grail
#

I think that you should take some time to brush up on Python basics

mild topaz
#

sure bro, but right now i need to finish this bro , i want to submit this project

#

as soon as i resize my image then further code i know how to deal with it

#

i need a small help in resizing an image

#

i am decoding an base64 string which creates image from it

hasty grail
#

Do what I have asked first, it will save you a lot of time with the remaining part

mild topaz
#

but not in desired pixels

hasty grail
#

Especially the part about functions

mild topaz
#

i agree with u bro , but plz try to understand i need to finish this asap

#

at least can u look in this why image is not getting resized

hasty grail
#

which line are you at right now?

mild topaz
#

line 174 @hasty grail

#
im <PIL.Image.Image image mode=RGB size=200x99 at 0x24D0005C248>
done
wrong here1```
hasty grail
#

from your understanding of Python, what would cause the statement at line 174 to be executed?

mild topaz
#

wait , i need to comment that part of code from 160to 174

#

bcoz i am again reopening file image file

#

correct @hasty grail ?

hasty grail
#

yeah you don't need that code

mild topaz
#

now see it has created an image but not in correct pixels i want @hasty grail

hasty grail
#

can you display the problem?

mild topaz
hasty grail
#

where is the code for saving the image?

#

which variable are you passing into the save function?

#

check carefully

mild topaz
#

is this ```python
with open("imageToSave.jpg", "wb") as test_img:
test_img.write(image_data)
try:

            test_img = image.load_img("imageToSave.jpg", target_size= (200,99))
            
        except OSError :
            logger.debug ({"Status" : "failed",
                      "message" : "provide valid base64 string"})
             
            return ({"Status" : "failed",
                      "message" : "provide valid base64 string"})```  @hasty grail
hasty grail
#

Can you identify what data are you writing to the file?

mild topaz
#

image_data i guess ? @hasty grail

hasty grail
#

ok so what is image_data?

#

is it the resized image?

mild topaz
#

no

is it the resized image?
@hasty grail

hasty grail
#

well there's your problem

#

fix it so that you're actually passing in the resized image

mild topaz
#

fix it so that you're actually passing in the resized image
@hasty grail means bro ?

hasty grail
#

instead of image_data (the original image) you need to give the function the data that corresponds to the resized image

mild topaz
#

u mean (self, image_data ) this

#

def resize_im ? @hasty grail

hasty grail
#

you need to write the resized image to the file

#

not the original image

#

you are currently writing image_data (the original image) to the file, of course the image size is unchanged

mild topaz
#

you need to write the resized image to the file
@hasty grail means how way u are saying here bro ?

hasty grail
#

it means what I said, I don't know how to simplify that

mild topaz
#

ok , can u show in code how way u are saying . so i can get clear idea what u are saying ? @hasty grail

hasty grail
#
with open("output.jpg", "wb") as f:
    # Don't do this
    f.write(incorrect_image)

    # Do this
    f.write(correct_imgae)
mild topaz
#

so in my case ```python
with open("output.jpg", "wb") as f:
# Don't do this
f.write(image_data)

# Do this
f.write(im)``` is this correct ? @hasty grail
hasty grail
#

yes

mild topaz
#

@hasty grail

hasty grail
#

what data type is im?

mild topaz
#

<class 'PIL.Image.Image'> @hasty grail

hasty grail
#

shouldn't you be using im.save instead of file.write then?

#

that's what I gathered from the documentation of PIL

mild topaz
#

on which line bro ?

shouldn't you be using im.save instead of file.write then?
@hasty grail

hasty grail
#

on the line where you write to the file

mild topaz
#

with open("imageToSave.jpg", "wb") as test_img: test_img.write(im) @hasty grail here u mean ?

hasty grail
#

yes

mild topaz
#

with open("imageToSave.jpg", "wb") as test_img: im.save(im) @hasty grail this way ?

hasty grail
#

read the documentation of PIL to see how to use Image.save

mild topaz
#
        with open("imageToSave.jpg", "wb") as test_img:
            test_img.write("im.jpg")``` @hasty grail
hasty grail
#

no

#
with open("imageToSave.jpg", "wb") as test_img:

What does this line do?

mild topaz
#
        with open("imageToSave.jpg", "wb") as test_img:
            im.save("im.jpg")``` @hasty grail
hasty grail
#

you didn't answer my question

mild topaz
#

opens an image file

#

@hasty grail

hasty grail
#

why do you have to open an image file when you are saving to a different file?

#

look at the example they have given

#

do you need to use open at all?

mild topaz
#

no

#

@hasty grail

hasty grail
#

then delete it

mild topaz
#

open ?

then delete it
@hasty grail

hasty grail
#

yes

mild topaz
#

see i am using this code python with ("im.jpg", "wb") as test_img: im.save("im.jpg") @hasty grail

#

image not creted

hasty grail
#

do you know what the with statement even does?

#

(if you don't please review your Python basics)

mild topaz
#

sure bro , but at this moment i am really messed up with different things also

#

@hasty grail can u plz help in this ?

#

as soon the resized image creates i know how to deal with it

hasty grail
#

no, you have to understand what it means, it's so basic

mild topaz
#

yes i can understand bro

#

but right now i am messed up with different things bro ? plz

#

just help me to solve this issue @hasty grail

#

lets finish this issue now only

#

are u thier bro ? @hasty grail

hasty grail
#

Sorry, I won't finish your code for you, you have to demonstrate your understanding first

mild topaz
#

i know bro, can u help in this issue @hasty grail ?

#

so i can go further and try to solve issues by myself @hasty grail

hasty grail
#

If you can answer me what the line with ("im.jpg", "wb") as test_img: is supposed to do, then sure

mild topaz
#

with makes code compact @hasty grail

hasty grail
#

what about the line as a whole though?

mild topaz
#

it takes img.jpg and in write mode @hasty grail

hasty grail
#

is it needed in this case?

mild topaz
#

no, i guess @hasty grail

#

i am correct ? @hasty grail

hasty grail
#

mhm

mild topaz
#

@hasty grail hello

hasty grail
#

yes

mild topaz
#

can u plz help in this ?

#

@hasty grail lets finish this bro?

hasty grail
#

if it's not needed, what do you do with that line?

#

(I mean you can use your own common sense)

mild topaz
#

so how i can make changes here then ,? should i remove it? @hasty grail

hasty grail
#

(I mean you can use your own common sense)

mild topaz
#

can u be more specific here bro plz @hasty grail

hasty grail
#

You can come up with the answer by yourself

#

This is such a simple question

mild topaz
#

so i need to remove this line of code , correct? @hasty grail

hasty grail
#

you can judge that for yourself

#

I don't think I have to answer that question since it's really obvious

slow adder
#

when the textbook says 'open terminal', does it mean cmd or python shell?

mild topaz
#

@hasty grail ๐Ÿ˜ž bro plz , i got confused here , lets finish this ?

hasty grail
#

when the textbook says 'open terminal', does it mean cmd or python shell?
Usually that can be inferred from the context

#

bro plz , i got confused here , lets finish this ?
Just delete that line

#

You shouldn't have to ask for help for every single thing you do

mild topaz
hasty grail
#

you deleted im.save as well

#

of course it's not saving the flie

mild topaz
#

ok then how it should be ? @hasty grail

hasty grail
#

undelete im.save

mild topaz
#

ok then ? @hasty grail

hasty grail
#

test the code?

mild topaz
#

yes worked

#

it has created image to desired size

#

@hasty grail

hasty grail
#

ok good

#

is that all?

mild topaz
#

no wait see this python Traceback (most recent call last): File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request rv = self.dispatch_request() File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper resp = resource(*args, **kwargs) File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view return self.dispatch_request(*args, **kwargs) File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request resp = meth(*args, **kwargs) File "E:\demo3\findDocumentType1.py", line 126, in post self.resize_im(image_data) File "E:\demo3\findDocumentType1.py", line 202, in resize_im predictions = model.predict(samples_to_predict) NameError: name 'model' is not defined @hasty grail

hasty grail
#

The error literally tells you what is wrong, please tell me you can fix this by yourself

clear sail
#

Hi

mild topaz
#

line 119 i have defined it @hasty grail

hasty grail
#

you only defined it in the post function

#

not resize_im

mild topaz
#

ok so i have changed to this python def resize_im(self,image_data): print("test_img1") model = load_model(pathlib.Path('E:/', 'demo3', 'united_kingdom_50.h5')) @hasty grail

#

now i am that error is no more

#
Traceback (most recent call last):
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper
    resp = resource(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view
    return self.dispatch_request(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request
    resp = meth(*args, **kwargs)
  File "E:\demo3\findDocumentType1.py", line 126, in post
    self.resize_im(image_data)
  File "E:\demo3\findDocumentType1.py", line 202, in resize_im
    predictions = model.predict(samples_to_predict)
  File "C:\Users\Admin\anaconda3\lib\site-packages\keras\engine\training.py", line 1441, in predict
    x, _, _ = self._standardize_user_data(x)
  File "C:\Users\Admin\anaconda3\lib\site-packages\keras\engine\training.py", line 579, in _standardize_user_data
    exception_prefix='input')
  File "C:\Users\Admin\anaconda3\lib\site-packages\keras\engine\training_utils.py", line 145, in standardize_input_data
    str(data_shape))
ValueError: Error when checking input: expected conv2d_1_input to have shape (99, 200, 1) but got array with shape (200, 99, 3)```
#

@hasty grail

twilit wind
#

Have you checked the shape before input

mild topaz
#

which shape @twilit wind

twilit wind
#

the shape of your input '

#

like before input to the conv layer you need to flatten it or do some resizing

uneven wind
#

Thanks @lapis sequoia , It turns out that I was not accessing the data properly. It works fine now ๐Ÿ™‚

mild topaz
#
Traceback (most recent call last):
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper
    resp = resource(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view
    return self.dispatch_request(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request
    resp = meth(*args, **kwargs)
  File "E:\demo3\findDocumentType1.py", line 126, in post
    self.resize_im(image_data)
  File "E:\demo3\findDocumentType1.py", line 219, in resize_im
    img = preprocessing(img)
  File "E:\demo3\findDocumentType1.py", line 215, in preprocessing
    img = grayscale(img)
  File "E:\demo3\findDocumentType1.py", line 207, in grayscale
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
cv2.error: OpenCV(4.2.0) c:\projects\opencv-python\opencv\modules\imgproc\src\color.simd_helpers.hpp:94: error: (-2:Unspecified error) in function '__cdecl cv::impl::`anonymous-namespace'::CvtHelper<struct cv::impl::`anonymous namespace'::Set<3,4,-1>,struct cv::impl::A0xe227985e::Set<1,-1,-1>,struct cv::impl::A0xe227985e::Set<0,2,5>,2>::CvtHelper(const class cv::_InputArray &,const class cv::_OutputArray &,int)'
> Unsupported depth of input image:
>     'VDepth::contains(depth)'
> where
>     'depth' is 6 (CV_64F)
``` @twilit wind @hasty grail
twilit wind
#

can you share the code

mild topaz
verbal sand
#

I've some text that contain fraction in text - "one-third", "one-half"......
How do I convert these into their relevant fractions? 1/3, 1/2 etc...

velvet thorn
#

I've some text that contain fraction in text - "one-third", "one-half"......
How do I convert these into their relevant fractions? 1/3, 1/2 etc...
@verbal sand how many unique fractions do you have

verbal sand
#

It can be any.... this is contained in a text sentence like - "Take one-half of the tablet daily".
Doctor's prescription data.

mild topaz
#

@twilit wind do u get my code?

twilit wind
#

yes I am having a look

#

by the way what is the code about

mild topaz
#

it is for prediction @twilit wind

velvet thorn
#

It can be any.... this is contained in a text sentence like - "Take one-half of the tablet daily".
Doctor's prescription data.
@verbal sand create a mapping of fractions to numbers

#

and apply it

mild topaz
#

plz check

twilit wind
#

do you have any other code on app.py @mild topaz

mild topaz
#
Traceback (most recent call last):
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper
    resp = resource(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view
    return self.dispatch_request(*args, **kwargs)
  File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request
    resp = meth(*args, **kwargs)
  File "E:\demo3\findDocumentType1.py", line 126, in post
    self.resize_im(image_data)
  File "E:\demo3\findDocumentType1.py", line 219, in resize_im
    im = preprocessing(im)
  File "E:\demo3\findDocumentType1.py", line 215, in preprocessing
    im = grayscale(im)
  File "E:\demo3\findDocumentType1.py", line 207, in grayscale
    im = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
TypeError: Expected Ptr<cv::UMat> for argument 'src'
twilit wind
#

It says a type error

mild topaz
#

It says a type error
@twilit wind yes

#

how i can fix this? @twilit wind

twilit wind
#

The code is huge it will take time

#

some typo there I think

mild topaz
#

some typo there I think
@twilit wind means bro?

twilit wind
#

Bro the code seems ok

#

Do you have any other file where you are running the code @mild topaz

#

ay file for flask

mild topaz
#

i have a model file @twilit wind

twilit wind
#

?

#

@mild topaz

mild topaz
#

no @twilit wind

twilit wind
#

You are predicting the country name by its image I guess @mild topaz

mild topaz
#

yes @twilit wind

twilit wind
#

I will let you know if I find any, Now I am not able to find any @mild topaz

#

srry

mild topaz
#

ok np

verbal sand
#

@velvet thorn isn't there any library?

For the string "one-third" - I though of mapping one with 1 and third with 3 and it becomes 1-3. How do I give it the meaning that the hyphen (-) in "1-3" should be considered as a division and not like "one-three days"?

velvet thorn
#

@velvet thorn isn't there any library?

For the string "one-third" - I though of mapping one with 1 and third with 3 and it becomes 1-3. How do I give it the meaning that the hyphen (-) in "1-3" should be considered as a division and not like "one-three days"?
@verbal sand beats me

#

what do you mean?

#

like do you want to convert it into a number?

#

I suggest a regex

verbal sand
#

I mean that since the text is doctors's prescription so there can be texts like "one-third of tablet", "one-three days".
The first one mean 1/3 of the tablet while the other means 1 to 3 days.
If I map one with 1 and third with 3 and three with 3 then after replacing with their corresponding texts, it becomes "1-3 of tablet" and "1-3 days". Now, how do I distinguish whether the 3 in both sentences is to be understood as dividing the 1 or just the upper range(1 to 3 days of range).

#

@velvet thorn

what do you mean?
@velvet thorn

#

yes I do want to convert it into number. Later the amount of medicine can be converted into some fractional value. I wanted that to know how much dose a patient takes.

mild topaz
#

updated code https://paste.pythondiscord.com/olisidijub.py and my error python Traceback (most recent call last): File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1949, in full_dispatch_request rv = self.dispatch_request() File "C:\Users\Admin\anaconda3\lib\site-packages\flask\app.py", line 1935, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 468, in wrapper resp = resource(*args, **kwargs) File "C:\Users\Admin\anaconda3\lib\site-packages\flask\views.py", line 89, in view return self.dispatch_request(*args, **kwargs) File "C:\Users\Admin\anaconda3\lib\site-packages\flask_restful\__init__.py", line 583, in dispatch_request resp = meth(*args, **kwargs) File "E:\demo3\findDocumentType1.py", line 126, in post self.resize_im(image_data) File "E:\demo3\findDocumentType1.py", line 231, in resize_im self.getclassname(classNo) NameError: name 'classNo' is not defined

raw mortar
#

@mild topaz the variable classNo is not defined
the error message is pretty clear

mild topaz
lapis sequoia
#

is 0.873 adj R square goo enough

vague bear
#

Hi guys. Can anyone explain a what a cost function is for a non-math person like me please? The lesson I'm watching introduce us to this equation and said "for simplicity, half of this value is considered the cost function through the derivative process"

#

I have absolutely no clue what that means

lapis sequoia
#

What are the next steps after I finish my regression in statsmdels

halcyon vale
#

If you guys are interested in Natural Language Processing. Here,

raw mortar
#

@vague bear that's squared error
https://en.m.wikipedia.org/wiki/Mean_squared_error
It's used to find how good the model is
Lesser the value, relatively it's a better model

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errorsโ€”that is, the average squared difference between the estimated values and the act...

#

@mild topaz are you trying to make a rest API which takes a base64 image as input and do some prediction with it?

mild topaz
#

yes

lapis sequoia
#

what should I do are I output my OLS model

#

after

raw mortar
#

@mild topaz I'm not very familiar with flash restful, but where are you going wrong?

mild topaz
#

@raw mortar give me some time , as soon as get free i will ping u

vague bear
#

@raw mortar thanks, i'll do some readings now. I searched cost function on YT and didn't find anything

#

how can you identify it as mean square error? The equation looks differently in the wiki

raw mortar
#

@vague bear its not mse, its squared error, in the wiki look for the loss function part

#

mse is when you divide it by num of data points, 1/2 is just squared error

lapis sequoia
#

How can I get my regression equation in statsmodels

raw mortar
vague bear
#

ohh

#

a loss function or cost function is a function that .....

#

I see, thanks

raw mortar
#

ya, it is used interchangeably, but some prefer to say its a loss when its a single data point and cost when all the points are considered

#

and there is no consistency in the expressions ๐Ÿคฆ

vague bear
#

I looked up some tutorial in my language and it uses pi to represents probability. is that normal

raw mortar
#

nope have not seen that one

vague bear
#

ic

lapis sequoia
#

How can I get my regression equation in statsmodels

raw mortar
#

@lapis sequoia i don't quite understand your question, someone else might answer it

lapis sequoia
#

@raw mortar I ran OLS regression and got the output table. Is there a way to look at the equation for it

raw mortar
#

@lapis sequoia the equations would remain the same i think, probably just google for the implemention docs it ge the exact equation

lapis sequoia
#

@raw mortar What do you mean?

#

Like is there a way to export it to excel and then plug in the item in each variable to get the output of my dependent

raw mortar
#

oh you want to make predictions from the model ?

lapis sequoia
#

my dependent var is labor hrs. independent variables (using dummies) are product, customer, build type,product config. Want to be able to plug in certain customers and products for example to get my labor hours output

#

yes exactly

raw mortar
#

let me look it up, have not used ols in statmodels before

lapis sequoia
#

thank you

raw mortar
vague bear
#

is confusion matrix used a lot?

lapis sequoia
#

@raw mortar yeah I have the results. so i just use results.predict(x) to predict the y?

raw mortar
#

@vague bear yep, usually in classification problems

hollow sentinel
#

guys what does it mean when your data set does that

#

it's all faded

vague bear
#

I see. The correct ones are churn 1,1 and churn 0,0 right

hollow sentinel
#

is there something wrong with the data

#

is it the color scheme?

lapis sequoia
#

@raw mortar do you know if there is a way to export it into excel and just make a dropdown to choose the x variables I want to include to predict the y

raw mortar
#

@lapis sequoia not sure about that one though, might be possible

hollow sentinel
#

also what does a distplot show

#

distribution?

lapis sequoia
#

@raw mortar I'll try to figure it out

lapis sequoia
#

Anything else I should do in the meantime after my regression

hollow sentinel
#

so does this mean it's a good or bad model

#

i'm gonna go out on a limb and say it's bad

bitter harbor
#

I've seen worse acc

lapis sequoia
#

how can I check which variables are most important/drive the dependent variable the most

#

Do I just use the std coeff

hollow sentinel
#

hahah I like your username @bitter harbor

lapis sequoia
#

Why are some of my independent variables showing twice in my summary table

cedar sky
#

Can anyone recommend the best way to start with reinforcement learning and I am good with most of the deep learning concepts

heady hatch
#

My only knowledge of reinforcement learning is the library Gym.

lapis sequoia
#

guys what does it mean when your data set does that
@hollow sentinel fade and not fade just shows the density. If there are many points at single point it will become darker. Check the alpha or transparency value when you plot.

heady hatch
#

Hey guys question on validation data.

Epoch 1/10
1250/1250 [==============================] - 361s 289ms/step - loss: 5.0271 - accuracy: 0.3601 - val_loss: 1.1977 - val_accuracy: 0.5984
Epoch 2/10
1250/1250 [==============================] - 360s 288ms/step - loss: 1.3753 - accuracy: 0.5232 - val_loss: 0.7962 - val_accuracy: 0.7531
Epoch 3/10
1250/1250 [==============================] - 359s 287ms/step - loss: 1.0479 - accuracy: 0.6364 - val_loss: 0.5072 - val_accuracy: 0.8499
Epoch 4/10
1250/1250 [==============================] - 363s 291ms/step - loss: 0.7664 - accuracy: 0.7330 - val_loss: 0.2894 - val_accuracy: 0.9197
Epoch 5/10
1250/1250 [==============================] - 360s 288ms/step - loss: 0.5792 - accuracy: 0.7965 - val_loss: 0.1755 - val_accuracy: 0.9532
Epoch 6/10
1221/1250 [============================>.] - ETA: 7s - loss: 0.4574 - accuracy: 0.8416

The epochs haven't finished yet, but it feels like I'm heavily overfitting on the validation data.

lapis sequoia
#

@raw mortar results.predict(x) doesn'twork

lone osprey
#

Guys

#

In tutorials anywhere, I can see only basics of ml

#

I can't find like that goes deeper

lapis sequoia
#

Like what

lone osprey
#

They don't give some deep like pd iloc functions etc...

#

I need to learn that

#

Where can I find it?

#

I know pandas basics

#

I onow numpy basics

#

I know ml basics

heady hatch
#

What's your definition of ml basics?

lone osprey
#

But I want to learn deeper in numpy, pandas

#

Ml basics mean I know main algorithms like regression, Knn, etc..

#

In scikit learn

#

Any tutorial I can learn deep??

lapis sequoia
#

scikit learn sucks balls

#

use statsmodels bro

heady hatch
#

deep learning?

lone osprey
#

First I need to learn numpy and pandas deep

#

Those tutorials plz???

heady hatch
#

Look at the exercises down there.

lapis sequoia
#

@heady hatch you ever used statsmdels

lone osprey
#

Kk, thanks

heady hatch
#

I have.

#

Why do you ask?

lapis sequoia
#

Is there a way to export my linear regression model into Excel

#

And use in Excel to predict my y var

heady hatch
#

I don't really use excel so I can't give any advice on that.

lone osprey
#

I can learn by exercises?

heady hatch
#

but

lapis sequoia
#

ty

heady hatch
#

If you can somehow read those files into excel.

lone osprey
#

I can learn by exercises??

heady hatch
#

I can't answer that question for you. hahaha

#

You know yourself better.

lone osprey
#

Kk

#

I prefer tutorials but

#

So, I tought u know

heady hatch
#

Then I would probably google advance numpy or pandas tutorial.

lone osprey
#

Google may give more resources

#

U r expert ppl

heady hatch
#

My validation accuracy is significantly higher, I was wondering what I could be doing wrong.

#

It's on Cifar 10 dataset.

#

I realized I forgot to check for dataset imbalance.

serene scaffold
#

I'm trying to do polynomial regression and I have an array of what the coefficients should be. The number of terms varies. Once I know how far off the prediction was along the y axis, what adjustment am I supposed to make to the coefficients?

heady hatch
#

I'm not super familiar with stats.

but how come you're manually adjusting the coefficients?

serene scaffold
#

how else would I make sure that the curve is correct?

heady hatch
#

Are you not able to base that off of your error?

serene scaffold
#

I'm not sure what to do with the error once I have it.

heady hatch
#

Oh are you manually calculating the regression?

serene scaffold
#

yes, I have to show that I understand how it works.

#

the sample code is in perl ๐Ÿ˜ฆ

lapis sequoia
#

Is a 0.873 adjusted r-squared good?

heady hatch
#

Oh man. I'm currently looking up how to calculate regression to give you further thoughts.

serene scaffold
#

I appreciate it

lapis sequoia
#

@serene scaffold You using statsmodels?

serene scaffold
#

@lapis sequoia no, I'm using numpy. I can't use anything that eliminates the need to show how the math works.

lapis sequoia
#

Damn idk bro

serene scaffold
#

that's okay. thank you.

lapis sequoia
#

I'm new to this

#

@heady hatch Is there a way to have my equation show in statsmodels

#

for my linear model

heady hatch
#

What do you mean by equation?

lapis sequoia
#

my linear formula

heady hatch
#

@serene scaffold

I don't know if this is relevant.

http://polynomialregression.drque.net/math.html

From how they're calculating the coefficients, they're using a system of equation to solve for it. And I guess in your case, do you have the data points?

#

If so, you might be able to do the same.

#

@lapis sequoia I'm still unsure of what you mean. Like you want the coefficients?

serene scaffold
#

@heady hatch let me look at this. Thanks!

lapis sequoia
#

yeah the coefficient and constant inputting the x variables

#

to get the y variable

heady hatch
#

I think there's a coefficient method to get it from the models.

#

So after you fit it, you can get the coefficients via the methods.

lapis sequoia
#

Let me try that thank you

serene scaffold
#

let me see if I understand correctly

#

basically given my training data, which is a list of (x, y) points, if I want to find the best-fit curve, I should start with a polynomial function y = a * (x ** 1) + b * (x ** 2) + ...

#

and if I have an array of [a, b, ...] then I'll have the answer

#

so the goal is to solve for [a, b, ...] for each instance of (x, y), multiply that array by the alpha, and add that to the weights?

#

does that sound right @heady hatch?

heady hatch
#

That's from my understanding of how regressions work.

#

That's not to take into consideration of regularization or anything.

serene scaffold
#

I don't think I have to do that

lapis sequoia
#

My validation accuracy is significantly higher, I was wondering what I could be doing wrong.
@heady hatch Check if you are splitting the data properly. And that there is no data leakage. It is very rare to have situation like show above.

heady hatch
#

I'm not splitting the data myself. It's presplit.

#

Good point about data leakage.

#

So it's the cifar 10 dataset.

They've split the data into train and test already.

40000 training images
10000 test images

I did add couple things to the training dataset pipeline that I didn't for the testing dataset pipeline. Such as shuffling the data and repeating it.

Though I was under the impression that I'm not supposed to shuffle the test data.

lapis sequoia
#

@heady hatch That worked thanks

#

I coulda just used it from the table too

#

Is 0.873 adj r squared good enough

#

I want to use the model to be able to use the dependent variable (labor hours) as a benchmark based on the independent variables (product, customer, config, build type)

heady hatch
#

Depending on your problem. Is adjusted r squared the metric you want to look at?

lapis sequoia
#

yeah

#

since I have multiple independent variales

#

@heady hatch
Split the 40K into 35K and 5K.
Well here shuffling should not do anything and do a stratified split.
Also check if you are plotting right legends. Maybe you are confusing train and vlad while plot.

heady hatch
#

@lapis sequoia

So should I leave the test set as a holdout?

There's no simple way to do a stratified split with Tensorflow, is there? I would have to redo the data pipeline and make the test dataset myself.

Thank you for bringing the labelling up, I double checked and they are the correct labels.

lapis sequoia
#

What should my p values be

heady hatch
#

This is how I'm constructing the data pipeline.

train_ds = (raw_train_dataset.map(parse_image_function)
                  .map(process_image)
                  .repeat()
                  .shuffle(buffer_size=20000)
                  .batch(batch_size=32)
                  .prefetch(buffer_size=100)
)


test_ds = (raw_test_dataset.map(parse_image_function)
                  .map(process_image)
                  .shuffle(buffer_size=5000)
                  .batch(batch_size=32)
                  .prefetch(buffer_size=100)
)
                
#

process_image standardize the images and resize them.

lapis sequoia
#

@heady hatch my model isnt predicting certain mixes well it seems

heady hatch
#

certain mixes?

lapis sequoia
#

ya like

#

product 2 to customer 5 with build type A

#

etc

#

looks different than historical avg

heady hatch
#

Ahh well now here's something to consider.

#

Is r squared the metric you want to look at?

lapis sequoia
#

adjusted r sq

heady hatch
#

What I mean by this is you don't necessary need to change r squared if that's the few metrics you can get.

#

Because think about the definition of what r squared means.

#

R squared means the goodness of fit.

lapis sequoia
#

yea

heady hatch
#

But it doesn't necessarily talk about the actual problem itself.

#

It's just a proxy metric for something else you care about.

#

Because yea .81 r squared could be good.

#

But not if it's constantly making mistakes on a particular group of people or product.

#

I don't know your actual problem so you'd have to determine that yourself.

lapis sequoia
#

yeah the errors are high for some

heady hatch
#

Maybe it's okay for it to keep making mistakes on certain things.