#data-science-and-ml

1 messages · Page 269 of 1

small tartan
#

I ask this because if either the value of the max or min changes, it will influence the score for the rest of the records although they did not 'change' anything

boreal summit
#

If the min and Max of that column is too wide apart with lots of digits in btw, better use standard scaler.

small tartan
#

for context, var a is a %, var b, is a number between 0-5000, var c is a value between 0-10000

#

ultimatly joining the 3 to create a 'score' from 0-100'

boreal summit
#

I haven't used them much outside basic knowledge, I'm also still learning. But I know that MinMax scaler clusters digits together when the difference is very large. For instance, 0.15 and 2.4 could be lumped into a single variable which might affect results.

#

Try the both and see which gives better results.

small tartan
#

Fair enough

#

Thanks

boreal summit
#

Yea, you can wait for the other experienced guys to come give you more insights or Google stuff on your own. Happy coding!

#

*More.

small tartan
#

TY! Google has lead to a few, but its always nice to ask you guys and get a bit of the expert opinion

hollow gull
#

The scaling shouldn't change in deployment with minmax scaler because you should have trained it on the training data set and you are applying it to any unseen data. I assume it will give values outside of 0 and 1 if your data goes outside of the range of the training data, but I am not sure. Maybe it just caps its.

somber torrent
#

Do you guys any websites that give away free data? I want to practice my data science skills

sweet plaza
#

Can anyone guide me on how to implement the I python kernel in to vs code. Freaking having a hard time doing it even with the main website and downloading conda

hollow gull
somber torrent
#

thanks bro

heady hatch
#

Hey all what's a way to convert a series of lists in a column of a groupby object to one giant list for each group?

ie.

a [1, 2]
a [2, 3]
b [1, 2]
b [2, 3]

into

a [1, 2, 2 3]
b [1, 2, 2, 3]
green hemlock
#

Do you guys any websites that give away free data? I want to practice my data science skills
@somber torrent try kaggle or data.world

velvet thorn
#

The scaling shouldn't change in deployment with minmax scaler because you should have trained it on the training data set and you are applying it to any unseen data. I assume it will give values outside of 0 and 1 if your data goes outside of the range of the training data, but I am not sure. Maybe it just caps its.
@hollow gull by default, it can give values outside the range of [0, 1]

#

Hey all what's a way to convert a series of lists in a column of a groupby object to one giant list for each group?

ie.

a [1, 2]
a [2, 3]
b [1, 2]
b [2, 3]

into

a [1, 2, 2 3]
b [1, 2, 2, 3]

@heady hatch write a custom aggregation function

#

and pass that into .agg

#

Hey guys, i have a question around dataset and what actions to take. I have 3 variables that are not at all on the same scale. I need to normalize or standardize them so they result in something i can weight and then pull into a single score as a result.
@small tartan where do the weights come from

small tartan
#

To clarify, i am not deploying this in a ML model. But a dashboard

#

I will manually adjust the weights to achieve my desired rack and stack

#

i have 50 records (with about 5 being added quarterly)

velvet thorn
#

hm

#

well

#

you could also scale the weights to the data

small tartan
#

I've picked the top 10 and bottom 10 based on understanding the data and what it represents. I need to apply the weights to basically get those top and bottom 10 in the correct area and let the rest work within the scale

#

I'm just getting really caught up in the standardizing of the data so its on an equal playing field

velvet thorn
#

seems like

#

scaling them to 0-100 would make sense

small tartan
#

Since its not exactly dealing with metrics that are easy to just add together, hence standardizing. I did standardize so 95% is within 1 Standard deviation. and the outputs are mostly between -1 and 1

velvet thorn
#

since the first value is a percentage 🤷‍♂️

small tartan
#

Well i can build that metric to not be a percentage. It just comes that way raw

#

but yeah the first value being a percent is nice since thats inherently a 0-100 already ha

#

I'm building a score for spaces where content is held. The content age, usage, and some data about it is what is driving the variables. I'll have this score updated monthly with the backwards rolling 180 days worth of info

deep spire
#

anyone have any luck using to_sql in pandas to load data into Snowflake when the dataframe has a datetime field? It keeps giving me this error: Failed processing pyformat-parameters; 255001: Binding data in type (timestamp) is not supported. and I haven't been able to find a solution online that doesn't involve manually converting each column that is a date, which isn't feasible for my use case

deep spire
#

or is there any way to dynamically convert all datetime columns to strings in pandas without knowing every single column name that is a datetime?

green hemlock
#

you can always loop, after getting pd.dtypes

#

and convert them to string, whosever data type is datetime

deep spire
#

i cant even get the string conversion to work using astype(str) or unicode due to ascii-unicode errors

#

really wonder why pandas apparently had this working in 0.15 but then broke it in 0.24 🤷‍♂️

velvet thorn
#

or is there any way to dynamically convert all datetime columns to strings in pandas without knowing every single column name that is a datetime?
@deep spire .select_dtypes

#

into .apply

#

into .dt.strftime

earnest forge
#

I'm at beginning of my ML learning path, so it'd be nice of you if you help me clarify one question: is PCA the same as estimator?

deep spire
#

@velvet thorn whats the proper way to call this and set those columns in the df? I'm trying df.select_dtypes(include='datetime64') = df.select_dtypes(include='datetime64').apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S')) which fails (as does apply(dt.strftime('%Y-%m-%d %H:%M:%S'))

But I think the error is with setting the columns to be the new column with converted type. getting SyntaxError: can't assign to function call

flat turtle
#

hi :)
how to correctly install vaex lib
I mean, that i downloaded vaex by
pip install vaex, but when i want to start it in shell
shell throw ModuleNotFoundError: No module named 'vaex.remote'.
Than i want to install this one module,....
ERROR: Could not find a version that satisfies the requirement vaex.remote (from versions: none) ERROR: No matching distribution found for vaex.remote

lavish zinc
#

or how to set x-axis each element text on multiple line?

earnest forge
#

@lavish zinc you better rotate them 90°
plt.xticks(roation=90)

#

Conversely, you can set huge width in plt.figure(figsize=(30, 5))

#

But second option is not recommended, of course

brave crest
#

My Validation accuracy and Training accuracy print there own values. Am I meant to make them print together as an average of both?

velvet thorn
#

@deep spire lambda s: s.dt.strftime

#

df.select_dtypes(include='datetime64')[:]

late torrent
#

hi 🙂

#

for any JupyterLab users who like a good dark theme, and maybe want something that looks a bit more modern than what JupyterLab offers, I just published a build of One Dark Pro

#

which can be installed in the Extension manager 😊

boreal summit
#

@somber torrent try out kaggle.com for loads of datasets. You'll also find ML and data analysis examples of those datasets.

vapid sorrel
#

Hi, someone knows how can I use TPU on google colab to compute a ANN please?

tawny cradle
#

Hi people I made a paper for my school project on AI and ML

#

Can you please check it out and tell me if it’s good or not

delicate night
#

I have a very good math background, and a lot of experience in swe, but not data sci. Are there any intermediate projects you guys could recommend? I want something that can allow me to get a feel for this field

bronze barn
#

@hoary sluice thank you kind stranger for your suggestion to use HDBSCAN it worked nicely!

azure locust
#

Hi, does anyone have an idea about calculating the reading time of an article by considering the syllables of the words as well, apart from considering the number of words and words per minute? or does anyone know how the reading time and speaking time is calculated in Grammarly?

vapid sorrel
#

Hi guys, I have question about NN. I create a custom loss function which work on a ANN but it doesn't when I put it in a RNN. Why? def correlation(y_true, y_pred): corr = tfp.stats.correlation(y_true, y_pred, sample_axis=0, event_axis=None) return corr

torpid cave
#

Hi guys, any scraper here? Just looking for opinnions on Scrapy vs BS

sharp herald
#

@torpid cave what you mean by BS?

heady hatch
#

Hmm I've never used BeautifulSoup as a scraper but more as a parser and I've never used Scrapy as a parser but as a scraper.

Often I just use request + bs instead of Scrapy unless I need something heavy duty.

What are you scraping?

sharp herald
#

Ah thats BeautifulSoup

#

they are not comparable

#

BS is a HTML parser, Scrapy is a web crawler framework which includes a HTML parser too

#

using scrapy just to parse HTML does not makes sense

#

you can use BS to parse the HTML pages crawled by Scrapy

slender nymph
#

hi how can i convert this in python

#
> set.seed(1)
> x <- w <- rnorm(100)
> for (t in 3:100) x[t] <- 0.666*x[t-1] - 0.333*x[t-2] + w[t]
> layout(1:2)
> plot(x, type="l")
> acf(x)```
#

it is R

heady hatch
#

What's x <- w <- rnorm(100)?

slender nymph
#

x=w=np.random.normal(100 values)

heady hatch
#

And what's x[t]? is that accessing x at index t?

#

if so

import random
import numpy as np
random.seed(1)
np.random.seed(1)

x = w = np.random.normal(size=100)
for t in range(3, 100):
  x[t] = 0.666 * x[t-1] - 0.333 * x[t-2] + w[t]

... plotting
slender nymph
#

thank you master @heady hatch

waxen birch
#

Hello guys, I'm new to data science and python, however i have some experience with languages like C#, java or js. I have to do some tasks, is it a good place to ask some questions?

#

Right now i have to complete some programming tasks using pandas module

heady hatch
#

Sounds relevant to data science, shoot your questions.

waxen birch
#

okay so, i have a dataframe with columns(ID;Country;owns_car;gender;Age) and i have to create new df that has coums Country, average goods, minimalAge and %ofWomen

#

so i don't know how to create a new df with given columns and them populate the columns

#

i am a total pandas noob so maybe it is a simple task but i don't know the tool to achieve my goal 😄

cerulean spindle
#

I used to use jupyter notebook with VScode. It was really slow and sometimes made really weird errors (not on me). Did anyone have the same problem? Does anyone recommend alternatives?

heady hatch
#

@waxen birch

We'll work on it one at a time, but I do recommend reading up on basic Pandas first then we'll break down the problem at hand.

#

@cerulean spindle

Hmm what do you mean by really slow? Comparing it to regular Jupyter instances?

lapis sequoia
#

@cerulean spindle i usually use a docker image to run jupyter notebooks and just pass the url to localhost

waxen birch
#

@heady hatch okay, do you know maybe some good source of basic pandas? Maybe some tutorial which is valuable? ;)

heady hatch
#

I recommend getting the basics of pandas down first because otherwise you have to think about data transformation and pandas syntax as the same time.

#

Unless you feel comfortable enough to dive right in, then show us your data and we can go straight in.

waxen birch
#

Ooo! This looks great! Thank you so much. I've read in one of the O'Reilly book that python community is really nice. I guess they were right :)

heady hatch
#

@glad mulch I don't know if this will work, but you can try df.T

#

hahah

Another way would be

df.index = df.columns

#

Though I'm unsure how that will go.

#

I guess you can do a temp or xor exchange.

#
df.columns, df.index = df.columns, df.index
#

I think you can just

#

pd.DataFrame or

#

pd.concat(list)

#

I might need more information.

#

What do you mean by same indexes and how do you want the dataframe to look?

median dove
#

Hey, how could I update a complete Pandas column following a condition? For example: update Sex: male, female, male, male, female... to Sex: 0, 1, 0, 0, 1...?

#

Yeah but df[“sex”].map(...)?

#

It did, thanks

heady hatch
#

@glad mulch oh 6 of those have the same indexes?

strong oasis
#

Anybody else staring at some code not knowing where to start or is that just me?? (I'm still fairly new to python, but I spent some time away from it so I'm picking it back up and trying to lean ML 😓 )

hoary sluice
#

@hoary sluice thank you kind stranger for your suggestion to use HDBSCAN it worked nicely!
@bronze barn no problem, i suggested HDBSCAN because i myself found in similar problem... HDBSCAN improves the way density clustering works building a hierarquical structure also, its Very good to find outliers... And toghter with UMAP ia great

cyan sun
#

started by trying to create a droplist but i'm running into problems

#

droplist = [col for col in df.columns if ((df.loc[df['date'] == today][col]).isna()) == True]

heady hatch
#

@cyan sun
You can try
df.iloc[-1].isna()

#

That should give you all the columns that have nan in the last row.

cosmic prairie
graceful glacier
#

can anyone who knows SQL tell me why this works

#

but this doesnt

cyan sun
#

@heady hatch thanks for the help but the drop method doesn't allow for boolean arrays - any suggestion on how to handle that?

heady hatch
#

@cyan sun You don't need to use drop. You can just filter it out using boolean indexing.

minor star
#

Yo so i have a csv file with a list of multipolygons that are 'community areas' of chicago. I am trying to find the center coordinate of each polygon, how should i go about doing this?

bleak spindle
#

could you guys take a look at help-oxygen?

#

please

slate wagon
#

Make a 10X4 dataframe with random numbers, you can use any names for columns names.

Use one easy built in function to show the basic statistics of all the columns such as count, mean, std and percentiles.

Transpose your dataframe.

Print the 3rd row and 5th and 6th columns from the transposed dataframe.

#

can someone help with this?

cosmic prairie
#

if anyone could take a look at help-copper aswell that would be great

cyan sun
#

@heady hatch got it working. Thanks again for your help 👍

heady hatch
#

Nice nice.

shell berry
#

Anyone here good at pytorch?

#

Willing to pay for some help

heady hatch
#

@shell berry What do you need help with in pt?

stray spade
#

Hye everyone

weary heart
#

hi, does anyone know how to plot eeg using csv files?

#

and willing to help me with a projects?

pliant kestrel
#

hi all, i have posted my issue in help-nickle but no one replied. I will summarize the essay i have written there in one sentence. any one expert in machine learning in python can give me a private tutoring to guide me in my project. I don't think i can learn everything in 10 days and submit my project.

velvet thorn
#

hi all, i have posted my issue in help-nickle but no one replied. I will summarize the essay i have written there in one sentence. any one expert in machine learning in python can give me a private tutoring to guide me in my project. I don't think i can learn everything in 10 days and submit my project.
@pliant kestrel honestly

#

your problem

#

isn't really that advanced

#

but you're asking for quite a big commitment.

pliant kestrel
#

i know man, i know, but i just feel totally alone in this shit that is driving me into despair

shell berry
#

@heady hatch Are you familiar with pytorch lightning

heady hatch
#

Nope.

shell berry
#

oh rip

#

I guess my question can be generalized to basic pytorch too

heady hatch
#

It looks super clean though.

velvet thorn
#

i know man, i know, but i just feel totally alone in this shit that is driving me into despair
@pliant kestrel you can ask specific questions here

#

and have a reasonably high chance of an answer.

shell berry
#

If I have a tensor of input tensors and a tensor of outputs tensors, how exactly should I feed it into the model

velvet thorn
#

but I think your problems run deeper than that

#

and well

#

maybe this isn't exactly the place

shell berry
#

[inputs] + [labels]

#

or

#

[(input, label), (input, label)]

#

I know I have to feed it into a DataLoader

pliant kestrel
#

@velvet thorn if this is not the place, then where is the proper place? i tried asking in reddit, but no buddy responded

#

i will try to see the codes that have been used in this course, and try to figure out as much as i can

velvet thorn
#

like I said

#

this isn't really the place to find someone who is willing to commit to that long term

#

you might, but it's really unlikely.

#

especially for free

pliant kestrel
#

what if not for free

heady hatch
#

Interestingly @shell berry I might actually look into pytorch lightning. hahaha Thank you for this.

In terms of your question. Depends on your model.

velvet thorn
#

what if not for free
@pliant kestrel then you need to take into account that you get what you pay for

shell berry
#

Haha np, it is pretty clean

velvet thorn
#

and if you want quality it's not going to be cheap

#

yeah.

shell berry
#

I just want to use a straightforward MLP for now

#

Super basic

pliant kestrel
#

what do u think the prices might be

shell berry
#
    def __init__(self, input_size, hidden_size, output_size, dropout=False, dropout_p=0.1):
        super(MultiLayerPerceptron, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size, bias=True)
        self.fc2 = nn.Linear(hidden_size, output_size, bias=True)

        self.add_dropout = dropout
        self.dropout = nn.Dropout(dropout_p)```
#

Super basic

pliant kestrel
#

i have been laid off since july, so I can barely offer much

shell berry
#

@pliant kestrel If you're using sklearn I can help you for free tomorrow.

heady hatch
#

@shell berry So I see you have your layers set up.

For PT, you can either define a full on loop or a

def forward(self, x):
  x = layer1(x)
  x = layer2(x)
  x = layer3(x)
  return x
shell berry
#

Yes, thank you

heady hatch
#

And then when you're calling the model in your loop

shell berry
#

I have the model done and working with basic pytorch

#

Im just using lightning now and trying to wrap my head around datamodule and dataloader haha

#

Datalaoder takes in one input corresponding to data, so I assume its a list of tuples of inputs and labels?

heady hatch
#

So I think you use a dataloader with a dataset.

#

ie make your dataset subclass, then create a dataloader based on that dataset.

shell berry
#

Yup, so I created my dataset with another class

#

Now I just want to put it into a dataloader

#

But my subclass just returns a list of inputs and outputs

#

should I zip them into tuples and then into the dataloader?

heady hatch
#

I want to say yes, but if you don't mind let me read it up real quick.

#

Been working with tf largely recently.

shell berry
#

Sure, thank you very much

heady hatch
#

Ahh okay. Yea you bundle the two up together.

#

And when you're loading it, you would unpack it.

pliant kestrel
#

@shell berry kind sir, what is in sklearn that you are willing to explaing

shell berry
#

@heady hatch Thank you

#

If you're working with text data @pliant kestrel

heady hatch
#

So let's say you have your dataset.

iterating through your dataset object from the dataloader

for idx, data in enumerate(dataloader):
  image, label = data[0], data[1]
  ...
pliant kestrel
#

the file is in csv file

shell berry
#

I can help reading it in, cleaning it, augmenting/tokenizing/etc., then putting it into a format sklearn can read, then using a SVC or decision tree or whatever you need on it

heady hatch
#

Yea no problem, happy to help.

#

@glad mulch 👋 hello.

shell berry
#

Also by the way @heady hatch

#

For my labels

#

Does it matter if I'm using a labexindexer or a multilabelbinarizer

#

Like let's say I have 5 labels

#

I can represent them as [0], [1], etc

#

or [1 0 0 0 0], [0 1 0 0 0]

#

I'm doing multilabel classification so I'm using the latter, but I see some people use the former

pliant kestrel
#

@shell berry, sounds good

#

when are u free tomorrow

shell berry
#

Just message me whenever tomorrow and Ill let you know, but please don't hedge your entire project on me helping, I don't want you to get screwed if I'm busy or something

heady hatch
#

I have no idea what either of those are. hahaha

In terms of deep learning, you would end up using different losses.

If you have 5 labels, and depending on what the output is like you would either use sparse categorical crossentropy or categorical crossentropy.

shell berry
#

oh lol my bad

#

Ive been doing alot of classical machine learning with sklearn before this so Im thinking of everything like that lol

heady hatch
#

No worries, I had a similar transition too.

pliant kestrel
#

no man, i just want the push to start, since i am lost and don't know where to go

shell berry
#

What does your data look like @pliant kestrel

pliant kestrel
#

the professor is supposed to post the project today, but he did not yet for some reason

shell berry
#

Wait you don't have it?

#

Then why are you worrying

pliant kestrel
#

it is due in two weeks

shell berry
#

Yeah u havent even looked at it man

pliant kestrel
#

it is supposed to be posted already

shell berry
#

atleast look at it before you give up 😛

pliant kestrel
#

since the begining, we had miniprojects, a guy in our group was doing the coding while the rest did the exercises in excel for better understanding of the concepts

#

the final project is an indiviual work, i haven't look at a code for ml, since the code our professor gave us was old that the confusing matrix was working due to an update in panda_ml that is contradicting something else, and since our beloved instructor did not update, and me being total noob in python, i gave up on learning

velvet thorn
#

Does it matter if I'm using a labexindexer or a multilabelbinarizer
@shell berry in short, not really.

#

but

#

okay wait do you understand the difference between multi-class and multi-label?

shell berry
#

Yup

#

@pliant kestrel So you're a complete python noob?

#

I recommend learnign python before you start doing ML then

#

Also don't let one guy in the group do all the coding

pliant kestrel
#

but now this individual project came like a 12 inch stick in the .. and now i need to do the following, 1-import the data ( i believe it is cleaned, since we are given training and test data) 2- do classification ( various classifiers) 3- using the confusion matrix, 4- write a report about each observation

#

well, i don't know any more man

shell berry
#

How much python do you know?

pliant kestrel
#

@glad mulch i think so

#

basics

shell berry
#

Because all the stuff you told me can be done in like 50 lines lol

pliant kestrel
#

i know, that is why i am a bit confused in this discussion 😄

velvet thorn
#

Yup
@shell berry yeah so the latter representation can handle multi-label classification

#

since you can have multiple 1s

shell berry
#

@velvet thorn Got it, thanks

#

Why can't the former? What if I have like

pliant kestrel
#

@glad mulch how much of a pain was it

shell berry
#

[5, 16, 20] for each label

velvet thorn
#

[5, 16, 20] for each label
@shell berry then you'd have a variable-length target

shell berry
#

Oh yes

#

Silly me 🙂

#

Thanks

velvet thorn
#

yw

shell berry
#

@pliant kestrel you're getting ahead of yourself

pliant kestrel
#

no wait man, i am not, am i?

shell berry
#

if you don't know programming then don't worry about the data yet

pliant kestrel
#

but i have a project to deliver

#

and i have to do it by myself

shell berry
#

I don't know what to tell you man

#

If you don't know programming

pliant kestrel
#

tell me whatever u want

shell berry
#

When did you start learning python?

pliant kestrel
#

yesterday

shell berry
#

yikes

pliant kestrel
#

i had some circumstances this semester

#

being laid off and such

#

it was not funny

#

you get destroyed when u are alone

shell berry
#

Yeah man Im not judging you or anything, I said that with kind intentions

#

I just dont know how you can do ML in python if you dont even know python

#

But your project seems pretty basic, you could look up tutorials and put together the pieces

pliant kestrel
#

that is what i am trying to do at the moment.

shell berry
#

You have two weeks?

pliant kestrel
#

i will try to ask more clearer questions in the future

#

yea

shell berry
#

Your project can be done in two hours

#

I'd spend a week just learning python itself

pliant kestrel
#

that is a ray of hope

shell berry
#

What is your masters degree in?

pliant kestrel
#

well, i have another project in regression analysis that i am trying to solve also using JMP

#

engineering mangement

#

the course i am taking is called : data mining

shell berry
#

They didn't have prereqs for that course?

pliant kestrel
#

statistical learning in machine learning or close enough

shell berry
#

I have never seen a course like this without programming courses as a prerequisite

pliant kestrel
#

the prerequiset was that u should have taken a programming in ur undergrade

#

i did, that was 9 years ago in Java

#

so i am a bit flexed on some conepts, but that is that, i have never dealt with python past this point

shell berry
#

study that for 2-4 days and you'll be fine

pliant kestrel
#

should i neglect the course i am taking in datacamp?

lapis sequoia
#

So for handwritten dataset. Is it true that sprite sheet is more common than csv?

minor star
#

Practice cant hurt

shell berry
#

That is up to you

pliant kestrel
#

ok i see, but can i continue to ask more questions on what to do in the future?

shell berry
#

Sure

pliant kestrel
#

sorry man i laughed a bit

shell berry
#

are you sure you didnt open ur data incorrectly cause wtf

pliant kestrel
#

ok guys, see you, going to watch some lectures. Thanks kali, gm, light

shell berry
#

good luck, sorry to hear about ur lay off

pliant kestrel
#

thanks, it is ok, hopefully things will be resolved soon

shell berry
#

I'm in grad school for NLP 😛

#

how about you

#

oh nice

#

you look like a finance student with your suit 😛

#

pct?

#

Can't you just parse through and check for what you need at each row

#

Sorry, I know literally nothing about sql 😛

#

Parse through the dataframe and read each row

#

each row will have the values of each column

#

so itll be like

#

[date, ticker, price], [date, ticker, price]

minor star
#

How am i able to find the Geolocation and Geocoding Limits for API usage? i have a dataset of approximately 300,000 entries it needs ran on. Will i be able to run it out all of them?

#

Google APIs

#

How much is 100QPS?

velvet thorn
#

if i wanted to do pct changes in price for each ticker based on the date, how would i do that
@glad mulch groupby

#

then diff

#

oh, no, not diff

#

pct_change

#

yeah

#

so what's wrong with groupby

shadow quiver
#

In a saved keras model (via model.save()), there are two keys in the .h5 file: model_weights and optimizer_weights. Am I gonna ever have to use optimizer_weights If I'm never gonna continue the training on the model? I'm willing to use it for prediction only

shell berry
#

Getting a loss of almost 0 after only 200 epochs.. Is this fishy? Something is wrong, right?

#

For a multilabel dataset with 3k examples and ~40 labels

winged lark
#

hello

#

Can anyone help me on how to load multiple models from checkpoints in TensorFlow 2.1?

#

Have two checkpoint directories, and I need to load in a model for each.

#

🙂

lapis sequoia
#

Getting a loss of almost 0 after only 200 epochs.. Is this fishy? Something is wrong, right?
@shell berrydo you have a test dataset or validation dataset? its likely that your model has overfitted/overfit (unsure of correct grammar here)

#

if you can run a test with your model using the test or validation dataset and see what the loss there is then you could potentially get an answer

#

if your testing loss is super high and ur test accuracy is low then your model has overfit

#

if your model has overfit, well it depends on the model and what you want to actually do because there's various ways to counter overfitting but if it's a CNN for example you can add dropout layers

gaunt venture
#

Hello, so i simply want to calculate the percentage of something, yet i keep getting a divide by zero error, when i'm not dividing by 0

bitter harbor
#

you wouldn't get the error otherwise

#

oh no actually a = 0, you don't have any parentheses so I'm assuming it's doing ((number/a) + sa + nad + d+ sd)

gaunt venture
#

oh no...i removed the brackets when i moved it to a function, thank you

mild topaz
#

i am saving an image at path_resources folder and then i am deleting it os.remove(path_resources/"im.png") this way
i am getting error at test_img = cv2.imread(path_resources/"im.png") this line

#
Traceback (most recent call last):
  File "E:\demo3\modules\recDoc1.py", line 211, in post
    test_img = cv2.imread(path_resources/"im.png")
SystemError: <built-in function imread> returned NULL without setting an error```
surreal willow
#

CAN I use Conda with the PyCharm Community Edition?

agile pollen
#

what is the module for calculus?

#

anyone?

light warren
bitter harbor
#

what is the module for calculus?
@agile pollen scipy's got quite a bit for it as well as sympy

agile pollen
#
ModuleNotFoundError: No module named 'Tkinter'
#

please help

bitter harbor
#

lowercase t

#

this is the wrong channel for that tho

split eagle
#

I am trying to apply a function to all columns in a df without typing out the column names. Is there a faster way to do this? I've looked to see if I could use the column numbers, but this hasn't worked.

torpid cave
#

Whta function?

split eagle
#

.astype(int)

#

I've been .astype(int) individual columns, but I need to do it to all 16 in my df. The column titles are long, and I am to save time.

torpid cave
#
df = df.astype(int)
#

Assuming all your columns are numbers

split eagle
#

The columns are objects.

torpid cave
#
import pandas as pd
d = {'A':['1','2','3','4','5'], "B":['2','3','4','5','6'], 'C':['3','4','5','6','7']}
df = pd.DataFrame(d)

df = df.astype(int)
type(df.loc[1,'A'])
#

Just tried tat and it worked

#

by numbers I meant, whatever they are but in the bottom of their hearth they are numbers

#

Other way is creating a column index and then changing the type to that column index

#
col_index = df.columns[0:2]
df[col_index] = df[col_index].astype(int)
#

Then you can use slicers

#

To select your columns

#

Or use a loop

#
for col in df.columns:
    df[col] = df[col].astype(int)
#

Last one I don't approve as it is not vectorized

#

There are at least 3 other ways I can think of doing this, let me know if what I did earlier works or if I missunderstood your problem

split eagle
#

I'll give these a shot and let you know.

#

Thanks.

torpid cave
#

Nww

split eagle
#

col_index = df.columns[0:2] df[col_index] = df[col_index].astype(int) Worked like a charm. Thanks again.

slender nymph
#

hi, how can i convert this code in python. it's is matlab ```py
%Simulate AR(3)
T = 1000; %Set how many observations you need
y = ones(T,1); %Create a vector of dim Tx1 to store the simulations in
y(1) = 1; %Set the first obs. to 1
y(2) = 0.5; %Set the second obs. to 0.5
y(3) = 1.5; %Set the third obs. to 1.5
rho1 = 0.2; %Set the value of rho1 (coefficient on y(t-1))
rho2 = 0.2; %Set the value of rho2 (coefficient on y(t-2))
rho3 = 0.1; %Set the value of rho3 (coefficient on y(t-3))
sigma = 1; %Set the value of the s.d. of the error term
mu_e = 0; %Set the value of the mean of the error term
eps = normrnd(mu_e, sigma, T, 1); %Creat a vector of normal random numbers with mean, mu_e and s.d. sigma. Dimension is Tx1

for t=4:1000; %Start the loop running from obs. 4 to 1000
y(t) = rho1y(t-1) + rho2y(t-2) + rho2*y(t-3) + eps(t); %The AR(3) model
end```

torpid cave
#

Well up to mu_e is exactly the same

#

After that I think it is better if you explain what you want to do

#

Nevermind, I see what you are trying to do

#

It sucks that I can do it in R but not Python

#

I think you could either work out the equation to get Y

#

And loop y

#

it would be easier to ignore y1, y2, y3.. and just do an AR(3) simulation

#
import statsmodels.api as sm
import numpy as np

arparams = np.array([0.2, 0.2, 0.1])
ar = np.r_[1, -arparams]
arma_process =sm.tsa.ArmaProcess(ar=ar, nobs=1000)

#

Something among those lines

slender nymph
#

arparms it is nothing like the exemple i put . you cannot put 3 (y) in a same array

#

i already tried it. i did like you. i took it from forums

burnt wharf
#

cross_val_score of sklearn returning list of nan values...any help guys?

torpid cave
#

Hmm

#

A loop should work with some calculus

halcyon vale
torpid cave
#

I mean you are simulating an AR process

burnt wharf
#

@torpid cave any help?

torpid cave
#

I am thinking but it is 230 am here

#

I havent done much calculus in python tbh

#

I guess I would try to get Y on one side and roll the equations

#

Ok got it I think

#

I am using my tablet so I cant test this but I think the idea is quite clear

#
y_1 = 1.5
y_2 = 0.5
y_3 = 1
rho_1 = 0.2
rho_2 = 0.2
rho_3 = 0.1
mu, sigma = 0, 1
error = np.random.normal(mu, sigma, 1000)
y_list = [y_3, y_2, y_1]
for i in range(3,999):
    y = rho_1 * y_1 + rho2 * y_2 + rho3 * y_3 + error[i]
    d = {i: 'y_value'}
    y_list.append(y)
    #Update lag variables
    y = y[i]
    y = y[i-1]
    y = y[i-2]
#

Damn

#

I forgot index starts at 0

#

Just fix that and you should be good I guess

#

*fixed

light warren
#

can someone help me with this error

sharp herald
#

In poisson regression what means "deviance"?

heady hatch
#

Check your dataframe.

df.info()
light warren
#

yeah it does, do u have how i can make it skips those data rows?

heady hatch
#

You can drop the na via df.dropna().

light warren
#

would i just add that to the above code?

heady hatch
#

You're going to need to reassign it.

df = df.dropna()

fast plover
#

Ok so I have a pandas dataframe i need to split into quintiles as I need to get the average of the top/bottom 20% of the rows in it by a given key (INDEX, an integer that's a calculated score)

#

having difficulty finding the function i need in the docs

sharp herald
#

Is there a way to do the following transformation without that for loop? I am spliting rows and changing column names

def df_split_rows(df: pd.DataFrame):
    raw_df = {'Attacker': [], 'Defender': [], 'AttackerAdvantage': [], 'Damage': []}
    for _, row in df.iterrows():
        raw_df['Attacker'].append(row['Player1'])
        raw_df['Defender'].append(row['Player2'])
        raw_df['AttackerAdvantage'].append(1)
        raw_df['Damage'].append(row['Player1_score'])
        raw_df['Attacker'].append(row['Player2'])
        raw_df['Defender'].append(row['Player1'])
        raw_df['AttackerAdvantage'].append(0)
        raw_df['Damage'].append(row['Player2_score'])
    return pd.DataFrame(raw_df)
rich silo
#

Hey guys does anyone knows of a way to dynamically create from a list of numerical data, a dataframe with 2 columns (Bins, frequency) AKA a frequency table, with the only arguments being the list with the data and the number of desired bins?
The bins should be of equal size. For ease here is a random list:

lst = [111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139]
#

!code

arctic wedgeBOT
#

Here's how to format Python code on Discord:

```py
print('Hello world!')
```

These are backticks, not quotes. Check this out if you can't find the backtick key.

light warren
#

@heady hatch thanks sooo much

bitter harbor
#

@rich silo range(111, 140)?

cosmic prairie
#

can anyone give me some help with running this code im on help-aluminium?

light warren
earnest herald
#

Can someone point me out if I am wrong here. I am making an image recognition application (for verifying signatures).

I am currently looking at Tensorflow to get the work done but there are just so many libraries such as OpenCV and LSH.

I was hoping someone can point out what I should implement in my code. PS- I have to make a check for mirrored images as well

I just joined the industry so go easy on me. Cheers!

sage rock
#

getting an error while importing matplotlib

#

How do i resolve this?

earnest forge
#

I have faced this problem before, solution is simple: update matplotlib

deep ingot
#

hi , im struggling on starting this question that i have been given on data science, i tried doing it on my own thou im getting more confused

this is my guide line

neon shard
#

@deep ingot What part are you struggling with?

deep ingot
#

@neon shard i am new to data science , basically i started reading my notes and now application is calling and i dont know where to start. i have knowledge on the topiv but dont know how to apply

neon shard
#

Specifically, what in the above document are you stuck on?

deep ingot
#

what i have done

#

i just like to get a whole example of how i can do this question so i can do it by myself again if that makes sense

neon shard
#

I don't think anyone is just going to do your whole homework assignment for you. You need to break it down into chunks, try it, and then ask for help on parts that you're stuck on

#

It looks like you need to generate the student number with range() or a loop. Do you know how to do that?

deep ingot
#

uhhm is it like so

#

for x in range(150):

#

studentNumber = input()

neon shard
#

That will require the user to input 150 numbers. I think it's asking you to just generate the IDs

#

So, for each loop you can just automatically generate an ID

#

You could do this if the IDs can be 1-150

for x in range(150):
    ids.append(x)
#

Or this which does the same thing in a cleaner way. It's a list comprehension

[x for x in range(150)]
deep ingot
#

i perfer the 1st way u did it cause i have done that prev

lapis sequoia
#

Anyone know if you can use NumPy to solve algebra problems?

cosmic prairie
#

Hello does anyone know how I can get this code to run struggling at the minute?

blazing bridge
#

Don’t quote me on this but I don’t think you can run matplotlib on repl.it

cosmic prairie
#

is there anyway around it

#

just remove it

#

can you run it on jupiter notebook

worn hinge
#

You can run it on any real IDE

cosmic prairie
#

sorry my coding language isn the best fairly new to the game lol what do you mean

#

do u mean a debugger

blazing bridge
#

Integrated Development Environment. A place to write and run code

#

Like pycharm

#

If you really wanna do it in the web you can use google colab or kaggle notebooks

cosmic prairie
#

i dont know why it isnt running cause he ran it on repl.it

blazing bridge
#

Or the best thing to use when using matplotlib and numpy is Jupyter notebook

cosmic prairie
#

yeah unless I use that

#

nearly sure I was told you can do it but

blazing bridge
#

Are you using a video

#

Like who ran it

cosmic prairie
#

ahhh he sent me the code just on an email and I copied it

blazing bridge
#

Ok maybe in the terminal you can try doing pip install matplotlib

#

Or !pip install matplotlib

#

Not really sure about this one

#

Just some advice:

#

Do yourself a favour and start running code on an IDE rather than using these web based editors

cosmic prairie
#

could it be I am using the wrong code for matplotlib?

blazing bridge
#

It doesn’t look like it is

cosmic prairie
#

def plot_it(x,y,p): #(uncomment to start working on this function - optional)

plot_it(x,y,p)

#

are those plot commands needed

blazing bridge
#

No these are just functions and the error is with matplotlib before anything else

#

Python goes from the top to bottom and when it encounters a error it displays like and doesn’t show the other errors

cosmic prairie
#

yeah I noticed that first error it sees it just tells you that one when you could have 5 more

#

What do you think is the best way round this problem?

blazing bridge
#

Honestly, just use google colab instead

#

Much better

#

Before you copy and paste the code

#

Write !pip install matplotlib in the first code cell

cosmic prairie
#

and then just copy the rest

blazing bridge
#

Yeah try that

cosmic prairie
#

right I will try that and get back to you,thanks @blazing bridge ,

blazing bridge
#

Ok gl

cosmic prairie
#

seem to be getting no output tho

blazing bridge
#

That’s because the code has to be separate from the installation

#

With cells you only get one output

#

Break the code up to different outputs

cosmic prairie
#

How do I do that lol ?

heady falcon
#

How is the cost function minimized in a neural network?

cosmic prairie
#

hahha my code skills aint the best got this off a mate, I just need to which variables to change so that I get an output

serene scaffold
#

I have a dataframe like this:

2         DNA  False
3         DNA  False
4         DNA  False
...       ...    ...
8790  nonDRNA  False

I need to get the percentage of rows where the boolean value is True, grouped by the first column.

#

df.groupby('class').count() is a great start but that counts every row; I could divide another dataframe by this

#

I want to say it's x.groupby('class').sum() / x.groupby('class').count() but idk what is being summed

cyan matrix
#

anyone utilize pdftotext often? trying to pull out data from a huge PDF but having trouble with it

lapis sequoia
#

I successfully made a script to load npz files. I need some advices about how can I extract datetime corresponding to some of the vars of the npz. Thanks for the help

velvet thorn
#

i have a dataframe where i want to calculate the returns for each ticker
@glad mulch what do you mean not working

#

I want to say it's x.groupby('class').sum() / x.groupby('class').count() but idk what is being summed
@serene scaffold the boolean value

serene scaffold
#

@velvet thorn so it's just summing all numeric-like values in the dataframe?

low oracle
#

Hey, I need a little help... Anyone here use jupyter on AWS?

serene scaffold
#

@low oracle go ahead and ask what you would ask if someone said yes

low oracle
#

@serene scaffold lol alrighty then. So I'm trying to utilize a tsv file into jupyter, I have looked around (YouTube, sof, etc...) and cant figure out how to properly work with my tsv data file

serene scaffold
#

are you using pandas?

low oracle
#

Attempting to yes

serene scaffold
#

@low oracle if you're using pandas, it's pd.read_csv but you have to specify that tabs are the delimiter

#

!docs pandas.read_csv

arctic wedgeBOT
#
pandas.read_csv(filepath_or_buffer, sep=',', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, skipfooter=0, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, [...]```
Read a comma-separated values (csv) file into DataFrame.

Also supports optionally iterating or breaking of the file into chunks.

Additional help can be found in the online docs for [IO Tools](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html).

Parameters  **filepath\_or\_buffer**str, path object or file-like objectAny valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, gs, and file. For file URLs, a host is expected. A local file could be: <file://localhost/path/to/table.csv>.

If you want to pass in a path object, pandas accepts any `os.PathLike`.

By file-like object, we refer to objects with a `read()` method, such as a file handler (e.g. via builtin `open` function) or `StringIO`.... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv)
serene scaffold
#

and then you get to experience the joy of learning how to use pandas, which as you can see from my earlier question is not something I've fully accomplished myself.

#

but there are people who hang out in this channel who have

low oracle
hollow gull
#

@low oracle those seem like really strange errors. Are you creating a spark session somewhere? Are you trying to use pyspark?

low oracle
#

Yeah I made that mistake and changed to conda-python3

#

but I'm still having issues

#
<ipython-input-12-530695f4cce5> in <module>
      2 import matplotlib.pyplot as plt
      3 tsv_file = open('data.tsv')
----> 4 read_tsv = csv.reader(tsv_file, delimiter='\t')

NameError: name 'csv' is not defined
hollow gull
#

you just haven't imported csv it looks like

low oracle
#

ok that changed things

#

Now it's saying tsv_file does not exist

hollow gull
#

okay, maybe your path is wrong.

slender nymph
#

guys i need some help : well i need to do a list like y = 5 ; y = 0.98 *y; y = 0.90 *y

#
for i in range(1, 20):
    np.random.seed(1000)
    y[i] = y[i-1]+ np.random.normal(0,1,size=200)```
after i need to use them here
#

someone can help me?

hollow gull
#

@glad mulch there is a argument in df.dropna that lets you specify the axis. Think you want axis=1 for column or axis=0 for row, so axis=1 for you.

#

@slender nymph I don't really understand your question based on what you have said.

hollow gull
#

!docs pandas.DataFrame.dropna

arctic wedgeBOT
#
DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)```
Remove missing values.

See the [User Guide](../../user_guide/missing_data.html#missing-data) for more on which values are considered missing, and how to work with missing data.

Parameters  **axis**{0 or ‘index’, 1 or ‘columns’}, default 0Determine if rows or columns which contain missing values are removed.

• 0, or ‘index’ : Drop rows which contain missing values.

• 1, or ‘columns’ : Drop columns which contain missing value.

Changed in version 1.0.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.

**how**{‘any’, ‘all’}, default ‘any’Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

• ‘any’ : If any NA values are present, drop that row or column.

• ‘all’ : If all values are NA, drop that row or column.

**thresh**int, optionalRequire that many non-NA values.... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html#pandas.DataFrame.dropna)
ruby wyvern
#

Basically, I've found a StackOverflow answer, but this did not answer the question.

boreal summit
#

Mehn, I was practicing and stuff, training a heavy dataset on my PC, now my PC is not responding. This is the first time. 😂

#

The activity light is on ATM. Normally, it blinks once every minute but now it's just on. Don't know if I should shut down and restart or leave it.

#

I first did some processing like oneHotEncoder and stuff, then I scaled it using standard scaler, used isomap to reduce it to 3 components, fit it using linear regression, make predictions and check the mean accuracy score. That's basically what I was doing.

#

Even the clock on my PC is not working anymore. I'll leave it for 10 more minutes.

#

@glad mulch not sure if the first guy answered your questions well enough. If you use axis=1, that means it should drop columns with null values, there's other parameters which you can use to fine-tune this also like threshold, how etc. If you use axis=0, that means it would drop any row that contains NaN, you can also set threshold and stuff for this argument also. You can read the documentation to get insights about the other parameters.

velvet thorn
#

@velvet thorn so it's just summing all numeric-like values in the dataframe?
@serene scaffold yup

#

count, on the other hand, counts non-null values

#

@velvet thorn ok, lemme do a different question. if i wanted to skip the first date in my data frame how would i do that in multiindex
@glad mulch are you thinking of .iloc

serene scaffold
#

how is iloc different from loc?

velvet thorn
#

how is iloc different from loc?
@serene scaffold .loc takes boolean series or string indexers (labels, strictly speaking)

#

.iloc takes boolean series or positional indexers

#

so one common pattern is

#

selecting a subset of a DataFrame by applying a condition to the rows and taking only some columns

#

e.g. df.loc[df['value'] > 3000, ['colour', 'model']]

serene scaffold
#

huh, I didn't think that worked

velvet thorn
#

whoops I forgot the .loc

#

LOL

serene scaffold
#

I didn't think that worked, either

#

but the first one doesn't? (without the .loc)

velvet thorn
#

but the first one doesn't? (without the .loc)
@serene scaffold nope

#

but with .loc it does

#

!e

import pandas as pd

df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
print(df.loc[df['a'] > 2, ['b']], end='\n\n')
print(df[df['a'] > 2, ['b']])
arctic wedgeBOT
#

@velvet thorn :x: Your eval job has completed with return code 1.

001 |    b
002 | 1  4
003 | 
004 | Traceback (most recent call last):
005 |   File "<string>", line 5, in <module>
006 |   File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 2906, in __getitem__
007 |     indexer = self.columns.get_loc(key)
008 |   File "/usr/local/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 2895, in get_loc
009 |     return self._engine.get_loc(casted_key)
010 |   File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
011 |   File "pandas/_libs/index.pyx", line 75, in pandas._libs.index.IndexEngine.get_loc
... (truncated - too many lines)

Full output: https://paste.pythondiscord.com/ipoquzinuz.txt

serene scaffold
#

@velvet thorn I'm learning lemon_hyperpleased

lapis sequoia
#

h

undone flare
#

OwO snekbox back

lapis sequoia
#

(Noob) I successfully made a script to load npz files. I need some advices about how can I extract datetime corresponding to some of the vars of the npz. Thanks for the help

cobalt jetty
#

if the position of the dates are the same in the files, can't you loop through it and record them in a list?

#

the datetime module would be helpful if you need to reformat those dates later on.

lapis sequoia
#

What do you mean by loop through it and record them in a list? How can I capture those data

cobalt jetty
#

it seems that you have an array.
So your array must have items like rows? Using a for loop wouldn't you be able to go through that array?

lapis sequoia
#

Yes, I thought the same, the thing is that I have no clue about how to extract some of the columns of the npz

#

just regular slicing isn't it?

dim moss
#

where can I learn data science

lapis sequoia
#

where can I learn data science
@dim moss datacamp

dim moss
#

is it free

cobalt jetty
#

If you're using python, Nass, and you assigned that array to a variable name, type

type(var_name)
#

to see what you got.

lapis sequoia
#

these are the vars included in the data files

cobalt jetty
#

I'm not asking about what is in the file. I'm asking about the type of your data structure in your python shell.

lapis sequoia
#

2 sec

#

I'll check it

cobalt jetty
#

cuz the type/structure will impact what you can do with it.

dim moss
#

is data camp free

cobalt jetty
#

The best way to pick up data science is you build yourself your own project, cloneb.

#

Practice is important

dim moss
#

I have not learnt datascience

#

I am looking for a free source

#

to learn data science

molten hamlet
#

@dim moss check maybe some pinned messages

dim moss
#

@dim moss check maybe some pinned messages
@molten hamlet nah

molten hamlet
#

try kaggle

#

pros are there

#

i think

dim moss
#

what is kaggle

#

oh it is google some community

#

I need a free data science learning scourcw

lapis sequoia
#

I need help with an array, is someone available ?

molten hamlet
#

@dim moss go to kaggle

#

now

#

grab any set

#

and start some tutorial 😄

dim moss
#

what?

lapis sequoia
#

yo can i DM anyone with my colab link my shit is crashing

#

like im tryna run this GAN but its crashing

lapis sequoia
#

@cobalt jetty sorry to bother you would you be able to help me

cobalt jetty
#

Hey, I'm still in class for the next 5 hours. Maybe then. But understand I've never implemented a GAN.

lapis sequoia
#

no worries, is it cool if i DM you and you can look at it when you're available?

cobalt jetty
#

I'll ping you here when I'm available.

deep ingot
#

Hi I have to calculate the percentage for maths score but with a maximum of 130 can someone assist me

sage rock
#

I have faced this problem before, solution is simple: update matplotlib
@earnest forge This didnt work

#

ImportError: DLL load failed while importing ft2font: The specified procedure could not be found.

#

Still getting this

#

if anyone could help me out

boreal summit
#

@dim moss first, if you know Python basics and stuff, you should learn data analysis before moving to data science.

#

I could help you with resources to learn data analysis and data science, PDF files.

prisma isle
#

Is there a faster way to find a linear combination of several large numpy memmaps?

#

I can't load them all into disk

earnest forge
#

@sage rock try to import marplotlib without %inline

lapis sequoia
#

can anyone help me figure this error out

#

ValueError: Dimensions must be equal, but are 16 and 60000 for '{{node mean_squared_error/SquaredDifference}} = SquaredDifference[T=DT_FLOAT](generator/activation_17/Relu, mean_squared_error/Cast/x)' with input shapes: [16,28,28,1], [60000,28,28,1].

#

im inputting a dataset with batch size of 16

#

but i cant seem to batch the mnist dataset to the same size

earnest forge
#

@lapis sequoia it'd be better if you provided an exerpt from your code?

lapis sequoia
#

i can paste it one sec

#

@earnest forge

#

i just changed the batch size to 16

earnest forge
#

._.

#

i meant particular part of code

#

not the whole

lapis sequoia
#

oh

earnest forge
#

ValueError: Dimensions must be equal, but are 16 and 60000 for '{{node mean_squared_error/SquaredDifference}} = SquaredDifference[T=DT_FLOAT](generator/activation_17/Relu, mean_squared_error/Cast/x)' with input shapes: [16,28,28,1], [60000,28,28,1].
@lapis sequoia anyway, it says your second input is the size of 60000, not 16

lapis sequoia
#

lol

earnest forge
#

make sure you pass it the right data

halcyon vale
lapis sequoia
#

just wanted to say that i think ive gotten my model working and i want to thank everyone in here for all their help because i think i was on the verge of a mental breakdown i have a very good feeling that this shit will look like hot ass but we made it

tribal wind
#

is anyone good with nltk that I could ask a beginner question?

hasty orchid
#

When was save_fig() introduced in pyplot? Is it recent?

molten hamlet
#

no

#

i think there was always in it

lapis sequoia
#

Anyone do Monte Carlo sin?

#

Sim

glad kestrel
#

Does anybody here know good dimensionality reduction techniques for binary data? I'm working on a data science project, with a dataset of 130 binary attributes. I'm looking for something that could be easily implemented in Python using sklearn or similar

#

@tribal wind I have **some experience with nltk, so shoot your shot

lapis sequoia
#

Anyone do Monte Carlo sim?

hasty orchid
#

They are very easy to find through Google solar

cobalt jetty
#

You should increase your batch size if your hardware allow it, @lapis sequoia

#

You're also missing

validation_split=0.2,
subset="validation"``` in your preprocessing function to create test_ds
ancient venture
#

Hi, I am very new to Python. By that I mean, I only started learning it about 1 month ago as part of my university physics course.

#

I have been given a csv file with wavelength (x) and 31 sets of observations (y)

#

I need to fit a linear + gaussian model for each observation (though if I do it for one I can probably repeat a similar thing for the remaining observations)

#

I have fit a linear model to the first observation data but I am struggling with fitting a gaussian fit

#

We are using the numpy and matplotlib packages

#

As you can see the linear fit is there

#

I am supposed to fit a gaussian to it, similar to how it is done in this example

#

If someone could help me with using the scipy.optimise.curve_fit() function, it would be appreciated

#

I need to make initial guesses for the peak and width of the gaussian, and that should hopefully be automated so that I can repeat it for the other 31 observations

ancient venture
#

but I still only get a linear fit

lapis sequoia
#

hi, i need help

#

ModuleNotFoundError: No module named 'scitools.std'

ivory panther
#

Hello everybody, Is there an discord chanel or a forum focused on python Pandas?

grave path
#

How can I make them go next to each other

velvet thorn
velvet thorn
velvet thorn
velvet thorn
hasty orchid
#

Was sorta testing the new reply feature

frozen gazelle
#

Hey. I need some help with data science within 2 hours. Can I pay somebody here to help me 1:1?

#

Sorry if the wrong place. Have yet to find anyone who wants to help me 😦

#

Paying $60 for some quick q&a

north plinth
#

Can anybody tell me how feed forward works in a conv model

velvet thorn
#

a convolutional layer?

north plinth
#

I mean like we have 32 filter in the first conv layer

#

Yes

velvet thorn
#

what dimension?

north plinth
#

Conv2d

velvet thorn
#

okay

#

so what specifically are you confused about

north plinth
#

So we have 32 filters

#

That mean if i enter 1 img that img will be 32 imgs after passing the first conv layer

#

Isnt it?

velvet thorn
#

uh

#

no?

#

okay, purely in the abstract sense

#

and assuming you're using same padding

north plinth
#

Yaa

#

Then the maxpooling will shrink the imgs

velvet thorn
#

say each image is of shape (w, h, c) (width, height, channels) and you pass it into a layer with k filters, the output will be of shape (w, h, k).

north plinth
#

Okay that makes some sense

#

I thought each filter will be applied on the img and get a new img

velvet thorn
#

no

north plinth
#

But it is not that simple

velvet thorn
#

each filter is applied on a channel.

north plinth
#

Thanks dude

velvet thorn
#

therefore, the number of parameters for a layer with k filters of size (fw, fh) is (fw * fh) * k * fc + k

#

yw

whole mica
#

Hey guys!

#

How are yall?

#

is there anyone here who is really good with neural networking?

velvet thorn
whole mica
#

Ok ok. So i am wanting to build a game bot that plays through Pokemon Emerald all on its own. But i am having trouble getting it set up

velvet thorn
#

maybe you could elaborate on what trouble you're having

#

in general it would be better to ask a question

#

which people can answer

#

without having to ask for more info.

whole mica
#

just about everything. I cannot find any resources on how to create a game bot.

velvet thorn
#

well

#

then this probably isn't a good place to start

#

it's not as difficult or specialised as it was

#

but it's still a fair bit of work.

whole mica
#

I'm wanting it to play on an emulator and i want the bot to recognize the emulator and play through it

velvet thorn
#

how much Python and DL experience do you have

north plinth
#

Specific game bot should be easy

#

Start with pyautogui

whole mica
#

uhhhhhh

#

ive made a classifier

north plinth
#

So what is ur problem

whole mica
#

how to i get it to recognize the emulator as a whole?

north plinth
#

What u r talking bro

#

Take a screen of ur game

#

Feed it into ur classifier

#

Predict an action

whole mica
#

uh

#

gonna be honest

#

do not really know how the classifier works

#

just kinda built it

north plinth
#

It doesnt predict the right?

velvet thorn
#

at all

north plinth
#

Or got error during training

whole mica
#

no it does

#

it works perfect

north plinth
whole mica
#

i think itll work

velvet thorn
#

many reasons, but the simplest one is

#

not much of the gamestate is exposed by the screen

velvet thorn
#

good luck!

north plinth
#

Yaaa

whole mica
#

i just need a lot of help

#

what if i can get the games code?

north plinth
#

Is it a strategy game?

whole mica
#

its pokemon

north plinth
#

Never tired out..ive a bot but it was of dino run

velvet thorn
north plinth
#

Which worked pretty nice as it can be easily predict through screen data

velvet thorn
#

yes

#

which is my point

velvet thorn
whole mica
#

well if you have extensive knowledge of the game it should not be hard to know the game state, just coding it is the hard part for me

velvet thorn
whole mica
#

i think i do, but i am not sure

velvet thorn
#

okay.

#

so

#

you said you think

#

it's enough to predict the next action

#

given a screencap.

#

that doesn't make sense for multiple reasons.

#

the first is that a screencap won't, for example, take into account what Pokemon you have

#

or where in the story you are (because you might need to backtrack, for example)

#

the second is that you're going to need a ton of training data that more or less requires you to play the game yourself

#

where will you get that?

whole mica
#

well, if i play the game will that work?

velvet thorn
#

probably not

#

not enough data

whole mica
#

what if i get a few people to play it?

velvet thorn
#

you could try it

#

but my guess is not enough data

#

if you said

#

write AI to handle a subset of the game

#

like battling

#

that would be much simpler

#

to play the whole game?

#

I think you underestimate the scope of that project.

#

by a lot.

#

unless you hardcode a ton of stuff, but even then

#

Pokemon vs Dino Run is like chess vs tic-tac-toe.

whole mica
#

even if i know the game in and out>

#

that would not help ?

velvet thorn
#

it's not that it wouldn't help

#

that is the very least you need

#

and it's nowhere near enough

whole mica
#

i do not know. That is why i need help.

north plinth
#

Translate the image data into feature set as u know bout the game

velvet thorn
whole mica
#

The reason i want to do this big of a project is because i want to learn as i go

velvet thorn
#

then you can properly appreciate how difficult this is

velvet thorn
rotund sail
#

I have a decently simple data science question that I posted in #help-croissant , if anyone would be able to help me out it would be much appreciated 🙂

velvet thorn
#

would probably make more sense to just pull data from the game's memory

velvet thorn
#

but you can just post here

#

the question

#

@rotund sail pandas methods in general make copies; they do not modify inplace.

#

anyway, that's a bad way to do things

#

you should use vectorised filtering

#

penguin_data = penguins.loc[penguins['species'] != 'Chinstrap', ['species', 'flipper_length_mm']

#

look up the .loc indexer.

#

in general, if you have a for loop in pandas code, you're doing things wrong.

whole mica
#

well gm, what do you think would be easier?

velvet thorn
#

also look up the inplace parameter.

rotund sail
#

Oh really? I took a data science course at my university last year and they wanted it

velvet thorn
rotund sail
#

Essentially every project we did utilized a for loop lol

velvet thorn
#

200%, actually

velvet thorn
#

and then

#

writing an AI for a simpler game.

#

well I mean

#

you don't have to take what I say as the truth

whole mica
#

no! i mean an easier game

velvet thorn
#

like...tic-tac-toe or something

#

that's a good start

north plinth
#

Tic tac toe should be used to learn RL

whole mica
#

ok cool! and yes im gonna take advice. I do not know what im doing haha !

north plinth
#

Cos that would be too easy for DL

north plinth
whole mica
#

im trying, i dont have the money to go to school so im trying to learn online

velvet thorn
#

you can try this

#

it's free

rotund sail
#

@velvet thorn Is data science your profession?

#

if you don't mind me asking that is

velvet thorn
#

just for fun

#

or rather

#

not right now

rotund sail
#

Nice, was just curious since you seemed rather knowledgeable about it

velvet thorn
#

but I used to be a DS/teach DS

#

thank you

#

I try 👋

rotund sail
#

So why are for loops bad usage in DS?

velvet thorn
#

just in pandas (and not all the time, but in general)

#

okay, pandas DataFrames use numpy arrays for storage

#

these arrays have fixed sizes.

whole mica
#

what do you do for work?

velvet thorn
#

so every time you remove or add a column/row, you're actually creating a whole new array (and DataFrame wrapping it).

#

in that for loop, therefore

#

for every row that doesn't satisfy your condition, you create a new DataFrame

rotund sail
#

oh, so it's just very inefficient

velvet thorn
#

so say there are N such rows; you end up creating N - 1 throwaway DataFrames

#

yup

#

that's the first thing

#

secondly

#

modern processors have something called SIMD instructions

#

which basically let them perform arithmetic on more than one memory address at a time

#

if you use a for loop, this optimisation isn't triggered

rotund sail
#

So, for efficiency purposes, I should read up on vectorization

velvet thorn
#

as an illustration

velvet thorn
#

run this

#

and watch the prints

velvet thorn
#

some frontend

#

I'm looking at going back into DS/ML though

rotund sail
#

that was actually so fast

#

jesus

velvet thorn
#

so there are two problems with the for loop.

#

one, it's not vectorised (slow)

#

two, it creates throwaway objects (slow AND wastes memory)

whole mica
#

well is it ok if i keep asking ya questions?

velvet thorn
whole mica
#

well its about jobs

velvet thorn
whole mica
#

okie!

#

so, using that book you gave me should help a lot right?

#

I really wanna get to make a A.I to play pokemon

velvet thorn
whole mica
#

and tic-tac-to is a good game to start with?

round tulip
#

for the 12 people who haven't seen this yet https://www.youtube.com/watch?v=aircAruvnKk&ab_channel=3Blue1Brown

Home page: https://www.3blue1brown.com/
Brought to you by you: http://3b1b.co/nn1-thanks
Additional funding provided by Amplify Partners

Full playlist: http://3b1b.co/neural-networks

Typo correction: At 14 minutes 45 seconds, the last index on the bias vector is n, when it's supposed to in fact be a k. Thanks for the sharp eyes that caught th...

▶ Play video
peak schooner
#

are there any resources for converting excel spreadsheets to python

sand escarp
#

Do you mean read excel files? Or is it something else I'm too inexperienced to understand? If it is the former, there is a read_excel() function in pandas.

#

Hello guys, do you happen to know any free resources for learning statistics and probability? What I want to do is supplement a course I'm learning on statistics to have a better intuitive understanding of the things, with some interactivity and stuff. I have in mind jupyter notebooks or webpages or books or anything at all. One I tried is Think Stats by Allen Downey but he doesn't delve too much into the mathematics so it isn't that helpful to me.

neon cave
#

does anyone here know R?

#

I need help with R

earnest herald
#

Hello (:
I am looking for a decent image comparison algorithm. I have looked into MSE and SSIM so far. Can someone recommend another algorithm except LSH or using OpenCV?

My end goal is to create a large scale image comparison (hand written signatures) algorithm most likely using Tensorflow.

Thanks

#

Ping me (:

rough cedar
earnest herald
#

Thanks a lot for VGG heads up by the way (:

rough cedar
#

you would encounter it anyway since you are going to use VGG

earnest herald
#

Awesome! This makes things a lot easier for me. Thanks bud

rough cedar
#

you are welcome

signal delta
#

Hello,
Can someone help me in the help-nitrogen chanel. I've described the problem I am facing in that channel

#

Thank you!

safe tapir
#

Is there a way to make a nested dataframe accessor?

e.g.
@pd.api.extensions.register_dataframe_accessor("base.second.third")

charred pagoda
#

Can someone help me? I have a list of times it takes my server to proccess and I want to find anomalis but for some reason It's not working,
I'm doing the following if

stdev = statistics.stdev(times)
if min(currentTime) + stdev < statistics.mean(times):
#

But it's not working, it doesn't find the anomalis or mark normals as anomalis

vital drift
#

Hi, can someone help me in the -krypton channel? I'm new to python and it's just some simple excel manipulation, but I don't start python in my grad program until next year.

gentle kindle
#

Hello, I am looking for a way to automatically classify JSON data that may or may not have headers, using an external site like Wikipedia to determine context and collect tags. Is there a script I can look at?

#

Please ping me if anyone knows. Thanks.

heady hatch
#

Heya anyone here familiar with protobuf and parsing tfrecords?

#

I'm having issue parsing tfrecords and not too sure how to work with it since I'm not familiar with protobufs at all.

snow maple
#

so waht is up

rich silo
#

Hello guys, i am look to create a function take as arguments continuous data and bin number and create a frequency table with 2 columns (pandas), the bin ranges and the frequency count.
The data input should be a list like range(0,1500).
Anyone has any ideas about this?

green hemlock
#

do you need to create function @rich silo , or can you use function like pd.qcut()? (assuming thats what you meant by binning continous value)

rich silo
#

That's what i have tried at first but i couldnt get it to the format that i wanted meaning the 2 column table

#

Maybe i am missing something obvious......

green hemlock
#

how would output look like? can you give an example, will make it easier to understand exact objective

rich silo
#

Sure give me a sec

#

something along those lines

green hemlock
#

@rich silo ^

rich silo
#

Yeah kinda. can this be converted into a pandas table?

green hemlock
#

definetely

#

you can also perform some cleanup, to format the data, incase you need

rich silo
#

Something else can the bound of the bins be open?
For example , to also have less than -0.05 and more than 50

green hemlock
#

your min/max values becomes the bound, when you use cut

rich silo
#

A i see. i think i got it from here

#

Thanks a lot for your helps

#

help

#

I am quite new to this

green hemlock
#

sure, no problem

rich silo
#

@green hemlock its seems that now it creates the bins but all of them have the same number of observations.

#

Could i perhaps use numpy linspace to try and sort this?

green hemlock
#

they are not same

#

Last 2 values are 386, rest are 387

#

And yeah, you can use linspace, but your problems should be possible to be solved by cut/qcut.

#

@rich silo

rich silo
#

Is there anyway to sort the bins from lowest to greatest and also can those be formatted as percentage for example

green hemlock
#

Can you try qcut and see the results

#

And by sorting lowest to greatest, do you mean frequency or 1st value of tuple?

rich silo
#

the first value of the tuple

#

because i will be this to go in a plotly graph

green hemlock
#

Your best best would be extract the numbers by treating it as string, or replacing the last ] with ), and then use ast.literal_eval for converting it into python tuple. Will make it easier to sort then

rich silo
#

Hm that's what i was thinking as well although string manipulation is always painful to me

green hemlock
#

I am not sure, if there is any other easier way, but if there is, let me know too

boreal summit
#

On classification reports for sklearn, I'm finding it hard to wrap my head around what recall actually means. I know support is the total number of that variable present in that dataset, f1 is the harmonic mean btw precision and recall, predictions is the percentage of right predictions over the total of the dataset being predicted, but can't seem to understand recall.

#

I'd appreciate if someone explains it to me like a 10 years old. Tha is.

#

*thanks.

blissful pendant
#

im trying to find away to convert latex expressions into images however i cant find a way to do this, any help would be apprecieated

velvet thorn
#

and I tell you "go into the room and get me all the dogs".

boreal summit
#

Listening...

velvet thorn
#

but you're not very good at differentiating dogs and cats

#

so you can make the very safe play

#

and bring out every single animal

#

or

#

you could choose only those you are very sure are dogs.

#

in the first case, you get 10 dogs and 10 cats.

#

which is good in one sense, because you got every dog that was there to get

#

but you also have lots of stuff that I didn't ask for (cats)

#

in the second case, maybe you only got 2 dogs?

#

but you didn't get any cats

#

which is good in a different sense, because you didn't get any extraneous rubbish.

#

precision measures the second sense of goodness: how much stuff you got that wasn't relevant

#

recall measures the first sense of goodness: of the relevant stuff that was available, how much did you get?

#

and that's generally why you want measures that combine both, like f1 score: because you usually want results that are largely complete (get most of what you're looking for) and relevant (don't contain much of what you're not looking for)

#

make sense?

velvet thorn
#

like you pass a LaTeX expression to code and it generates a .png or .jpg or something like that?

boreal summit
#

So recall is like the percentage of right predictions you got (dogs) over all the stuff I brought out (dogs and cats). So recall is 0.5 in the first instance?

velvet thorn
#

for the percentage thing, pass normalize=True to value_counts

#

to sort, call .sort_index().

velvet thorn
#

recall is the percentage of correct predictions over all the correct predictions there are to make

#

so there were 10 dogs and you got all 10

#

recall is 1

#

in the second case, there were 10 dogs and you got 2

boreal summit
#

So recall in the second case is 0.2?

blissful pendant
#

sin(sqrt(x**2 + 20)) + 1

velvet thorn
velvet thorn
#

p sure MPL can do that @blissful pendant

boreal summit
#

Ooh, okay. Thanks. I really appreciate @velvet thorn 🙏🏿

blissful pendant
#

yes but sympy wasnt working correctly, prob should have specified that

blissful pendant
velvet thorn
#

but yeah just go try it out

blissful pendant
#

ok thx

molten hamlet
#

did anybody heard about geopandas?

austere swift
#

!d pandas.DataFrame.dropna

arctic wedgeBOT
#
DataFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)```
Remove missing values.

See the [User Guide](../../user_guide/missing_data.html#missing-data) for more on which values are considered missing, and how to work with missing data.

Parameters  **axis**{0 or ‘index’, 1 or ‘columns’}, default 0Determine if rows or columns which contain missing values are removed.

• 0, or ‘index’ : Drop rows which contain missing values.

• 1, or ‘columns’ : Drop columns which contain missing value.

Changed in version 1.0.0: Pass tuple or list to drop on multiple axes. Only a single axis is allowed.

**how**{‘any’, ‘all’}, default ‘any’Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

• ‘any’ : If any NA values are present, drop that row or column.

• ‘all’ : If all values are NA, drop that row or column.

**thresh**int, optionalRequire that many non-NA values.... [read more](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html#pandas.DataFrame.dropna)
velvet thorn
#

@glad mulch show code

austere swift
#

you still have a threshold on it, so those are probably just nans that werent dropped since it was below the threshold

velvet thorn
#

the code you're using to drop nulls

#

and how specifically it's not doing what you want it to do

#

how big is the original

austere swift
#

if you wanna check how many nans there are left do df[column].isna().sum()

#

and see if thats below the threshold